Discover an index of datasets, SDKs, APIs and open-source tools developed by Microsoft researchers and shared with the global academic community below. These experimental technologies—available through Azure AI Foundry Labs (opens in new tab)—offer a glimpse into the future of AI innovation.
Bitnet
Developed by Microsoft Research, BitNet b1.58 2B4T is the first open-source, native 1-bit large language model (LLM) in which every parameter is ternary (i.e., -1, 0, 1), at a 2-billion parameter scale. Trained on a…
LLM2CLIP
LLM2CLIP is a novel approach that embraces the power of LLMs to unlock CLIP’s potential. By fine-tuning the LLM in the caption space with contrastive learning, we extract its textual capabilities into the output embeddings,…
Aurora
Aurora is a machine learning model that can predict atmospheric variables, such as temperature. It is a foundation model, which means that it was first generally trained on a lot of data and then can…
vAttention
vAttention is a memory manager for KV-cache in LLM serving systems. It decouples the allocation of virtual memory and physical memory using the CUDA virtual memory APIs. This approach enables allocating physical memory on demand…
RepoClassBench
RepoClassBench (RCB): is a repository-level code-generation benchmark. Retrieve-RepoTools-Reflect (RRR) is a framework for code generation using Language Models (LLMs) with static-analysis tools in an agent setup.
Trace
Trace is a new AutoDiff-like tool for training AI systems end-to-end with general feedback (like numerical rewards or losses, natural language text, compiler errors, etc.). Trace generalizes the back-propagation algorithm by capturing and propagating an…
LongRoPE
LongRoPE is a novel method that extends the context window of pre-trained LLMs to an impressive 2048k tokens by non-uniformly rescaling RoPE positional embeddings. LongRoPE has been integrated into Microsoft Phi-3.
DNATagging
An implementation of data encoding and decoding using DNA Tags and paper tickets. The api directory contains implementations for REST API endpoints to enable a DNA Tagging application. The test directory contains configurations and tests…