Models

Text Models

AI21 Labs

Jurassic-2 Large (7.5B) — `ai21/j2-large`

Jurassic-2 Large (7.5B parameters) (docs)

Jurassic-2 Grande (17B) — `ai21/j2-grande`

Jurassic-2 Grande (17B parameters) (docs)

Jurassic-2 Jumbo (178B) — `ai21/j2-jumbo`

Jurassic-2 Jumbo (178B parameters) (docs)

Jamba Instruct — `ai21/jamba-instruct`

Jamba Instruct is an instruction tuned version of Jamba, which uses a hybrid Transformer-Mamba mixture-of-experts (MoE) architecture that interleaves blocks of Transformer and Mamba layers. (blog)

Jamba 1.5 Mini — `ai21/jamba-1.5-mini`

Jamba 1.5 Mini is a long-context, hybrid SSM-Transformer instruction following foundation model that is optimized for function calling, structured output, and grounded generation. (blog)

Jamba 1.5 Large — `ai21/jamba-1.5-large`

Jamba 1.5 Large is a long-context, hybrid SSM-Transformer instruction following foundation model that is optimized for function calling, structured output, and grounded generation. (blog)

AI Singapore

SEA-LION 7B — `aisingapore/sea-lion-7b`

SEA-LION is a collection of language models which has been pretrained and instruct-tuned on languages from the Southeast Asia region. It utilizes the MPT architecture and a custom SEABPETokenizer for tokenization.

SEA-LION 7B Instruct — `aisingapore/sea-lion-7b-instruct`

SEA-LION is a collection of language models which has been pretrained and instruct-tuned on languages from the Southeast Asia region. It utilizes the MPT architecture and a custom SEABPETokenizer for tokenization.

Llama3 8B CPT SEA-LIONv2 — `aisingapore/llama3-8b-cpt-sea-lionv2-base`

Llama3 8B CPT SEA-LIONv2 is a multilingual model which was continued pre-trained on 48B additional tokens, including tokens in Southeast Asian languages.

Llama3 8B CPT SEA-LIONv2.1 Instruct — `aisingapore/llama3-8b-cpt-sea-lionv2.1-instruct`

Llama3 8B CPT SEA-LIONv2.1 Instruct is a multilingual model which has been fine-tuned with around 100,000 English instruction-completion pairs alongside a smaller pool of around 50,000 instruction-completion pairs from other Southeast Asian languages, such as Indonesian, Thai and Vietnamese.

Gemma2 9B CPT SEA-LIONv3 — `aisingapore/gemma2-9b-cpt-sea-lionv3-base`

Gemma2 9B CPT SEA-LIONv3 Base is a multilingual model which has undergone continued pre-training on approximately 200B tokens across the 11 official Southeast Asian languages, such as English, Chinese, Vietnamese, Indonesian, Thai, Tamil, Filipino, Malay, Khmer, Lao, Burmese.

Gemma2 9B CPT SEA-LIONv3 Instruct — `aisingapore/gemma2-9b-cpt-sea-lionv3-instruct`

Gemma2 9B CPT SEA-LIONv3 Instruct is a multilingual model which has been fine-tuned with around 500,000 English instruction-completion pairs alongside a larger pool of around 1,000,000 instruction-completion pairs from other ASEAN languages, such as Indonesian, Thai and Vietnamese.

Llama3.1 8B CPT SEA-LIONv3 — `aisingapore/llama3.1-8b-cpt-sea-lionv3-base`

Llama3.1 8B CPT SEA-LIONv3 Base is a multilingual model which has undergone continued pre-training on approximately 200B tokens across 11 SEA languages, such as Burmese, Chinese, English, Filipino, Indonesia, Khmer, Lao, Malay, Tamil, Thai and Vietnamese.

Llama3.1 8B CPT SEA-LIONv3 Instruct — `aisingapore/llama3.1-8b-cpt-sea-lionv3-instruct`

Llama3.1 8B CPT SEA-LIONv3 Instruct is a multilingual model that has been fine-tuned in two stages on approximately 12.3M English instruction-completion pairs alongside a pool of 4.5M Southeast Asian instruction-completion pairs from SEA languages such as Indonesian, Javanese, Sundanese, Tamil, Thai and Vietnamese.

Llama3.1 70B CPT SEA-LIONv3 — `aisingapore/llama3.1-70b-cpt-sea-lionv3-base`

Llama3.1 70B CPT SEA-LIONv3 Base is a multilingual model which has undergone continued pre-training on approximately 200B tokens across 11 SEA languages, such as Burmese, Chinese, English, Filipino, Indonesia, Khmer, Lao, Malay, Tamil, Thai and Vietnamese.

Llama3.1 70B CPT SEA-LIONv3 Instruct — `aisingapore/llama3.1-70b-cpt-sea-lionv3-instruct`

Llama3.1 70B CPT SEA-LIONv3 Instruct is a multilingual model that has been fine-tuned in two stages on approximately 12.3M English instruction-completion pairs alongside a pool of 4.5M Southeast Asian instruction-completion pairs from SEA languages such as Indonesian, Javanese, Sundanese, Tamil, Thai, and Vietnamese.

Aleph Alpha

Luminous Base (13B) — `AlephAlpha/luminous-base`

Luminous Base (13B parameters) (docs)

Luminous Extended (30B) — `AlephAlpha/luminous-extended`

Luminous Extended (30B parameters) (docs)

Luminous Supreme (70B) — `AlephAlpha/luminous-supreme`

Luminous Supreme (70B parameters) (docs)

Amazon

Amazon Nova Premier — `amazon/nova-premier-v1:0`

Amazon Nova Premier is a capable multimodal foundation model and teacher for model distillation that processes text, images, and videos with a one-million token context window. (model card, blog)

Amazon Nova Pro — `amazon/nova-pro-v1:0`

Amazon Nova Pro is a highly capable multimodal model that balances of accuracy, speed, and cost for a wide range of tasks (model card)

Amazon Nova Lite — `amazon/nova-lite-v1:0`

Amazon Nova Lite is a low-cost multimodal model that is fast for processing images, video, documents and text. (model card)

Amazon Nova Micro — `amazon/nova-micro-v1:0`

Amazon Nova Micro is a text-only model that delivers low-latency responses at low cost. (model card)

Amazon Nova 2 Pro — `amazon/nova-2-pro-v1:0`

Amazon Nova 2 Pro is a highly advanced reasoning model for complex agentic tasks such as multi-document analysis, video reasoning, and software migrations with extended thinking capabilities. (blog)

Amazon Nova 2 Lite — `amazon/nova-2-lite-v1:0`

Amazon Nova 2 Lite is a fast, cost-effective reasoning model for everyday workloads that can process text, images, and videos to generate text. It supports a one million-token context window, enabling expanded reasoning and richer in-context learning. (blog)

Amazon Titan Text Lite — `amazon/titan-text-lite-v1`

Amazon Titan Text Lite is a lightweight, efficient model perfect for fine-tuning English-language tasks like summarization and copywriting. It caters to customers seeking a smaller, cost-effective, and highly customizable model. It supports various formats, including text generation, code generation, rich text formatting, and orchestration (agents). Key model attributes encompass fine-tuning, text generation, code generation, and rich text formatting.

Amazon Titan Text Express — `amazon/titan-text-express-v1`

Amazon Titan Text Express, with a context length of up to 8,000 tokens, excels in advanced language tasks like open-ended text generation and conversational chat. It's also optimized for Retrieval Augmented Generation (RAG). Initially designed for English, the model offers preview multilingual support for over 100 additional languages.

Mistral

Mistral 7B Instruct on Amazon Bedrock — `mistralai/amazon-mistral-7b-instruct-v0:2`

A 7B dense Transformer, fast-deployed and easily customisable. Small, yet powerful for a variety of use cases. Supports English and code, and a 32k context window.

Mixtral 8x7B Instruct on Amazon Bedrock — `mistralai/amazon-mixtral-8x7b-instruct-v0:1`

A 7B sparse Mixture-of-Experts model with stronger capabilities than Mistral 7B. Uses 12B active parameters out of 45B total. Supports multiple languages, code and 32k context window.

Mistral Large(2402) on Amazon Bedrock — `mistralai/amazon-mistral-large-2402-v1:0`

The most advanced Mistral AI Large Language model capable of handling any language task including complex multilingual reasoning, text understanding, transformation, and code generation.

Mistral Small on Amazon Bedrock — `mistralai/amazon-mistral-small-2402-v1:0`

Mistral Small is perfectly suited for straightforward tasks that can be performed in bulk, such as classification, customer support, or text generation. It provides outstanding performance at a cost-effective price point.

Mistral Large(2407) on Amazon Bedrock — `mistralai/amazon-mistral-large-2407-v1:0`

Mistral Large 2407 is an advanced Large Language Model (LLM) that supports dozens of languages and is trained on 80+ coding languages. It has best-in-class agentic capabilities with native function calling JSON outputting and reasoning capabilities.

Anthropic

Claude v1.3 — `anthropic/claude-v1.3`

A 52B parameter language model, trained using reinforcement learning from human feedback paper.

Claude Instant V1 — `anthropic/claude-instant-v1`

A lightweight version of Claude, a model trained using reinforcement learning from human feedback (docs).

Claude Instant 1.2 — `anthropic/claude-instant-1.2`

A lightweight version of Claude, a model trained using reinforcement learning from human feedback (docs).

Claude 2.0 — `anthropic/claude-2.0`

Claude 2.0 is a general purpose large language model developed by Anthropic. It uses a transformer architecture and is trained via unsupervised learning, RLHF, and Constitutional AI (including both a supervised and Reinforcement Learning (RL) phase). (model card)

Claude 2.1 — `anthropic/claude-2.1`

Claude 2.1 is a general purpose large language model developed by Anthropic. It uses a transformer architecture and is trained via unsupervised learning, RLHF, and Constitutional AI (including both a supervised and Reinforcement Learning (RL) phase). (model card)

Claude 3 Haiku (20240307) — `anthropic/claude-3-haiku-20240307`

Claude 3 is a a family of models that possess vision and multilingual capabilities. They were trained with various methods such as unsupervised learning and Constitutional AI (blog).

Claude 3 Sonnet (20240229) — `anthropic/claude-3-sonnet-20240229`

Claude 3 is a a family of models that possess vision and multilingual capabilities. They were trained with various methods such as unsupervised learning and Constitutional AI (blog).

Claude 3 Opus (20240229) — `anthropic/claude-3-opus-20240229`

Claude 3 is a a family of models that possess vision and multilingual capabilities. They were trained with various methods such as unsupervised learning and Constitutional AI (blog).

Claude 3.5 Haiku (20241022) — `anthropic/claude-3-5-haiku-20241022`

Claude 3.5 Haiku is a Claude 3 family model which matches the performance of Claude 3 Opus at a similar speed to the previous generation of Haiku (blog).

Claude 3.5 Sonnet (20240620) — `anthropic/claude-3-5-sonnet-20240620`

Claude 3.5 Sonnet is a Claude 3 family model which outperforms Claude 3 Opus while operating faster and at a lower cost. (blog)

Claude 3.5 Sonnet (20241022) — `anthropic/claude-3-5-sonnet-20241022`

Claude 3.5 Sonnet is a Claude 3 family model which outperforms Claude 3 Opus while operating faster and at a lower cost (blog). This is an upgraded snapshot released on 2024-10-22 (blog).

Claude 3.7 Sonnet (20250219) — `anthropic/claude-3-7-sonnet-20250219`

Claude 3.7 Sonnet is a Claude 3 family hybrid reasoning model that can produce near-instant responses or extended, step-by-step thinking that is made visible to the user (blog).

Claude 3.7 Sonnet (20250219, extended thinking) — `anthropic/claude-3-7-sonnet-20250219-thinking-10k`

Claude 3.7 Sonnet is a Claude 3 family hybrid reasoning model that can produce near-instant responses or extended, step-by-step thinking that is made visible to the user (blog). Extended thinking is enabled with 10k budget tokens.

Claude 4 Sonnet (20250514) — `anthropic/claude-sonnet-4-20250514`

Claude 4 Sonnet is a hybrid model offering two modes - near-instant responses and extended thinking for deeper reasoning (blog).

Claude 4 Sonnet (20250514, extended thinking) — `anthropic/claude-sonnet-4-20250514-thinking-10k`

Claude 4 Sonnet is a hybrid model offering two modes - near-instant responses and extended thinking for deeper reasoning (blog). Extended thinking is enabled with 10k budget tokens.

Claude 4 Opus (20250514) — `anthropic/claude-opus-4-20250514`

Claude 4 Opus is a hybrid model offering two modes - near-instant responses and extended thinking for deeper reasoning (blog).

Claude 4 Opus (20250514, extended thinking) — `anthropic/claude-opus-4-20250514-thinking-10k`

Claude 4 Opus is a hybrid model offering two modes - near-instant responses and extended thinking for deeper reasoning (blog). Extended thinking is enabled with 10k budget tokens.

Claude 4.5 Sonnet (20250929) — `anthropic/claude-sonnet-4-5-20250929`

Claude 4.5 Sonnet is a model from Anthropic that shows particular strengths in software coding, in agentic tasks where it runs in a loop and uses tools, and in using computers. (blog, system card)

Claude 4.5 Haiku (20251001) — `anthropic/claude-haiku-4-5-20251001`

Claude 4.5 Haiku is a hybrid model from Anthropic in their small, fast model class that is particularly effective at coding tasks and computer use. (blog, system card)

Claude 4.5 Opus (20251124) — `anthropic/claude-opus-4-5-20251124`

Claude 4.5 Opus is Anthropic's most intelligent model to date and sets a new standard across coding, agents, computer use, and enterprise workflows. (blog)

Claude 4.6 Sonnet — `anthropic/claude-sonnet-4-6`

Claude 4.6 Sonnet is a Sonnet model from Anthropic that upgrades Sonnet's skills across coding, computer use, long-context reasoning, agent planning, knowledge work, and design. (blog, system card)

Claude 4.6 Opus — `anthropic/claude-opus-4-6`

Claude 4.6 Opus is a large language model from Anthropic with strong capabilities in software engineering, agentic tasks, and long context reasoning, as well as in knowledge work. (blog, system card)

Claude 4.7 Opus — `anthropic/claude-opus-4-7`

Claude 4.7 Opus is a large language model with particular skills in areas such as software engineering, knowledge work, agentic tool use, and computer use. (blog, system card)

Claude 3.7 Sonnet (20250219) (DSPy Zero-Shot Predict) — `anthropic/claude-3-7-sonnet-20250219-dspy-zs-predict`

Claude 3.7 Sonnet is a Claude 3 family hybrid reasoning model that can produce near-instant responses or extended, step-by-step thinking that is made visible to the user (blog).

Claude 3.7 Sonnet (20250219) (DSPy Zero-Shot ChainOfThought) — `anthropic/claude-3-7-sonnet-20250219-dspy-zs-cot`

Claude 3.7 Sonnet is a Claude 3 family hybrid reasoning model that can produce near-instant responses or extended, step-by-step thinking that is made visible to the user (blog).

Claude 3.7 Sonnet (20250219) (DSPy BootstrapFewShotWithRandomSearch) — `anthropic/claude-3-7-sonnet-20250219-dspy-fs-bfrs`

Claude 3.7 Sonnet is a Claude 3 family hybrid reasoning model that can produce near-instant responses or extended, step-by-step thinking that is made visible to the user (blog).

Claude 3.7 Sonnet (20250219) (DSPy MIPROv2) — `anthropic/claude-3-7-sonnet-20250219-dspy-fs-miprov2`

Claude 3.7 Sonnet is a Claude 3 family hybrid reasoning model that can produce near-instant responses or extended, step-by-step thinking that is made visible to the user (blog).

BigScience

BLOOM (176B) — `bigscience/bloom`

BLOOM (176B parameters) is an autoregressive model trained on 46 natural languages and 13 programming languages (paper).

T0pp (11B) — `bigscience/t0pp`

T0pp (11B parameters) is an encoder-decoder model trained on a large set of different tasks specified in natural language prompts (paper).

BioMistral

BioMistral (7B) — `biomistral/biomistral-7b`

BioMistral 7B is an open-source LLM tailored for the biomedical domain, utilizing Mistral as its foundation model and further pre-trained on PubMed Central.

Cohere

Command — `cohere/command`

Command is Cohere’s flagship text generation model. It is trained to follow user commands and to be instantly useful in practical business applications. docs and changelog

Command Light — `cohere/command-light`

Command is Cohere’s flagship text generation model. It is trained to follow user commands and to be instantly useful in practical business applications. docs and changelog

Command R — `cohere/command-r`

Command R is a multilingual 35B parameter model with a context length of 128K that has been trained with conversational tool use capabilities.

Command R Plus — `cohere/command-r-plus`

Command R+ is a multilingual 104B parameter model with a context length of 128K that has been trained with conversational tool use capabilities.

Cohere Labs Command A — `cohere/command-a-03-2025`

Cohere Labs Command A is an open weights research release of a 111 billion parameter model optimized for enterprise use-cases. (blog, paper)

Databricks

Dolly V2 (3B) — `databricks/dolly-v2-3b`

Dolly V2 (3B) is an instruction-following large language model trained on the Databricks machine learning platform. It is based on pythia-12b.

Dolly V2 (7B) — `databricks/dolly-v2-7b`

Dolly V2 (7B) is an instruction-following large language model trained on the Databricks machine learning platform. It is based on pythia-12b.

Dolly V2 (12B) — `databricks/dolly-v2-12b`

Dolly V2 (12B) is an instruction-following large language model trained on the Databricks machine learning platform. It is based on pythia-12b.

DBRX Instruct — `databricks/dbrx-instruct`

DBRX is a large language model with a fine-grained mixture-of-experts (MoE) architecture that uses 16 experts and chooses 4. It has 132B total parameters, of which 36B parameters are active on any input. (blog post)

DeepSeek

DeepSeek LLM Chat (67B) — `deepseek-ai/deepseek-llm-67b-chat`

DeepSeek LLM Chat is a open-source language model trained on 2 trillion tokens in both English and Chinese, and fine-tuned supervised fine-tuning (SFT) and Direct Preference Optimization (DPO). (paper)

DeepSeek v3 — `deepseek-ai/deepseek-v3`

DeepSeek v3 a Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. It adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures. (paper)

DeepSeek v3.1 — `deepseek-ai/deepseek-v3.1`

DeepSeek v3.1 is a hybrid model that supports both thinking mode and non-thinking mode. (blog)

DeepSeek v4 Pro — `deepseek-ai/deepseek-v4-pro`

DeepSeek V4 Pro is a MoE model that incorporates several key upgrades in architecture and optimization, including Compressed Sparse Attention (CSA), Heavily Compressed Attention (HSA), Manifold-Constrained Hyper-Connections (mHC) and the Muon optimizer. (documentation, report)

DeepSeek v4 Pro — `deepseek-ai/deepseek-v4-pro-thinking-disabled`

DeepSeek V4 Pro is a MoE model that incorporates several key upgrades in architecture and optimization, including Compressed Sparse Attention (CSA), Heavily Compressed Attention (HSA), Manifold-Constrained Hyper-Connections (mHC) and the Muon optimizer. This model was run with thinking disabled. (documentation, report)

DeepSeek-R1-0528 — `deepseek-ai/deepseek-r1-0528`

DeepSeek-R1-0528 is a minor version upgrade from DeepSeek R1 that has improved its depth of reasoning and inference capabilities by leveraging increased computational resources and introducing algorithmic optimization mechanisms during post-training. (paper)

DeepSeek-R1-Distill-Llama-8b — `deepseek-ai/DeepSeek-R1-Distill-Llama-8B`

DeepSeek-R1-Distill-Llama-8b is a model that is distilled from LLaMA 8B model for the DeepSeek-R1 task.

DeepSeek-R1-Distill-Llama-70B — `deepseek-ai/deepseek-r1-distill-llama-70b`

DeepSeek-R1-Distill-Llama-70B is a fine-tuned open-source models based on Llama-3.3-70B-Instruct using samples generated by DeepSeek-R1. (documentation)

DeepSeek-R1-Distill-Qwen-14B — `deepseek-ai/deepseek-r1-distill-qwen-14b`

DeepSeek-R1-Distill-Qwen-14B is a fine-tuned open-source models based on Qwen2.5-14B using samples generated by DeepSeek-R1.

DeepSeek-Coder-6.7b-Instruct — `deepseek-ai/deepseek-coder-6.7b-instruct`

DeepSeek-Coder-6.7b-Instruct is a model that is fine-tuned from the LLaMA 6.7B model for the DeepSeek-Coder task.

EleutherAI

GPT-J (6B) — `eleutherai/gpt-j-6b`

GPT-J (6B parameters) autoregressive language model trained on The Pile (details).

GPT-NeoX (20B) — `eleutherai/gpt-neox-20b`

GPT-NeoX (20B parameters) autoregressive language model trained on The Pile (paper).

Pythia (1B) — `eleutherai/pythia-1b-v0`

Pythia (1B parameters). The Pythia project combines interpretability analysis and scaling laws to understand how knowledge develops and evolves during training in autoregressive transformers.

Pythia (2.8B) — `eleutherai/pythia-2.8b-v0`

Pythia (2.8B parameters). The Pythia project combines interpretability analysis and scaling laws to understand how knowledge develops and evolves during training in autoregressive transformers.

Pythia (6.9B) — `eleutherai/pythia-6.9b`

Pythia (6.9B parameters). The Pythia project combines interpretability analysis and scaling laws to understand how knowledge develops and evolves during training in autoregressive transformers.

Pythia (12B) — `eleutherai/pythia-12b-v0`

Pythia (12B parameters). The Pythia project combines interpretability analysis and scaling laws to understand how knowledge develops and evolves during training in autoregressive transformers.

EPFL LLM

Meditron (7B) — `epfl-llm/meditron-7b`

Meditron-7B is a 7 billion parameter model adapted to the medical domain from Llama-2-7B through continued pretraining on a comprehensively curated medical corpus.

Google

T5 (11B) — `google/t5-11b`

T5 (11B parameters) is an encoder-decoder model trained on a multi-task mixture, where each task is converted into a text-to-text format (paper).

UL2 (20B) — `google/ul2`

UL2 (20B parameters) is an encoder-decoder model trained on the C4 corpus. It's similar to T5 but trained with a different objective and slightly different scaling knobs (paper).

Flan-T5 (11B) — `google/flan-t5-xxl`

Flan-T5 (11B parameters) is T5 fine-tuned on 1.8K tasks (paper).

Gemini Pro — `google/gemini-pro`

Gemini Pro is a multimodal model able to reason across text, images, video, audio and code. (paper)

Gemini 1.0 Pro (001) — `google/gemini-1.0-pro-001`

Gemini 1.0 Pro is a multimodal model able to reason across text, images, video, audio and code. (paper)

Gemini 1.0 Pro (002) — `google/gemini-1.0-pro-002`

Gemini 1.0 Pro is a multimodal model able to reason across text, images, video, audio and code. (paper)

Gemini 1.5 Pro (001) — `google/gemini-1.5-pro-001`

Gemini 1.5 Pro is a multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from long contexts. This model is accessed through Vertex AI and has all safety thresholds set to BLOCK_NONE. (paper)

Gemini 1.5 Flash (001) — `google/gemini-1.5-flash-001`

Gemini 1.5 Flash is a multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from long contexts. This model is accessed through Vertex AI and has all safety thresholds set to BLOCK_NONE. (paper)

Gemini 1.5 Pro (0409 preview) — `google/gemini-1.5-pro-preview-0409`

Gemini 1.5 Pro is a multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from long contexts. This model is accessed through Vertex AI and has all safety thresholds set to BLOCK_NONE. (paper)

Gemini 1.5 Pro (0514 preview) — `google/gemini-1.5-pro-preview-0514`

Gemini 1.5 Pro is a multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from long contexts. This model is accessed through Vertex AI and has all safety thresholds set to BLOCK_NONE. (paper)

Gemini 1.5 Flash (0514 preview) — `google/gemini-1.5-flash-preview-0514`

Gemini 1.5 Flash is a smaller Gemini model. It has a 1 million token context window and allows interleaving text, images, audio and video as inputs. This model is accessed through Vertex AI and has all safety thresholds set to BLOCK_NONE. (blog)

Gemini 1.5 Pro (001, default safety) — `google/gemini-1.5-pro-001-safety-default`

Gemini 1.5 Pro is a multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from long contexts. This model is accessed through Vertex AI and uses default safety settings. (paper)

Gemini 1.5 Pro (001, BLOCK_NONE safety) — `google/gemini-1.5-pro-001-safety-block-none`

Gemini 1.5 Pro is a multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from long contexts. This model is accessed through Vertex AI and has all safety thresholds set to BLOCK_NONE. (paper)

Gemini 1.5 Flash (001, default safety) — `google/gemini-1.5-flash-001-safety-default`

Gemini 1.5 Flash is a multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from long contexts. This model is accessed through Vertex AI and uses default safety settings. (paper)

Gemini 1.5 Flash (001, BLOCK_NONE safety) — `google/gemini-1.5-flash-001-safety-block-none`

Gemini 1.5 Flash is a multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from long contexts. This model is accessed through Vertex AI and has all safety thresholds set to BLOCK_NONE. (paper)

Gemini 1.5 Pro (002) — `google/gemini-1.5-pro-002`

Gemini 1.5 Pro is a multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from long contexts. This model is accessed through Vertex AI and has all safety thresholds set to BLOCK_NONE. (paper)

Gemini 1.5 Flash (002) — `google/gemini-1.5-flash-002`

Gemini 1.5 Flash is a multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from long contexts. This model is accessed through Vertex AI and has all safety thresholds set to BLOCK_NONE. (paper)

Gemini 2.0 Flash (Experimental) — `google/gemini-2.0-flash-exp`

Gemini 2.0 Flash (Experimental) is a Gemini model that supports multimodal inputs like images, video and audio, as well as multimodal output like natively generated images mixed with text and steerable text-to-speech (TTS) multilingual audio. (blog)

Gemini 1.5 Flash 8B — `google/gemini-1.5-flash-8b-001`

Gemini 1.5 Flash-8B is a small model designed for lower intelligence tasks. (documentation)

Gemini 2.0 Flash — `google/gemini-2.0-flash-001`

Gemini 2.0 Flash is a member of the Gemini 2.0 series of models, a suite of highly-capable, natively multimodal models designed to power agentic systems. (model card, documentation)

Gemini 2.0 Flash Lite (02-05 preview) — `google/gemini-2.0-flash-lite-preview-02-05`

Gemini 2.0 Flash Lite (02-05 preview) (model card, documentation)

Gemini 2.0 Flash Lite — `google/gemini-2.0-flash-lite-001`

Gemini 2.0 Flash Lite is the fastest and most cost efficient Flash model in the Gemini 2.0 series of models, a suite of highly-capable, natively multimodal models designed to power agentic systems. (model card, documentation)

Gemini 2.0 Flash Thinking (01-21 preview) — `google/gemini-2.0-flash-thinking-exp-01-21`

Gemini 2.0 Flash Thinking (01-21 preview) (documentation)

Gemini 2.0 Pro (02-05 preview) — `google/gemini-2.0-pro-exp-02-05`

Gemini 2.0 Pro (02-05 preview) (documentation)

Gemini 2.5 Flash-Lite (thinking disabled) — `google/gemini-2.5-flash-lite-thinking-disabled`

Gemini 2.5 Flash-Lite with thinking disabled (blog)

Gemini 2.5 Flash-Lite — `google/gemini-2.5-flash-lite`

Gemini 2.5 Flash-Lite (blog)

Gemini 2.5 Flash (thinking disabled) — `google/gemini-2.5-flash-thinking-disabled`

Gemini 2.5 Flash with thinking disabled (documentation)

Gemini 2.5 Flash — `google/gemini-2.5-flash`

Gemini 2.5 Flash (documentation)

Gemini 2.5 Pro — `google/gemini-2.5-pro`

Gemini 2.5 Pro (documentation)

Gemini 3 Pro (Preview) — `google/gemini-3-pro-preview`

Gemini 3.0 models are thinking models, capable of reasoning through their thoughts before responding, resulting in enhanced performance and improved accuracy. (blog, blog)

Gemini 3.1 Pro (Preview) — `google/gemini-3.1-pro-preview`

Gemini 3.1 Pro is the next iteration in the Gemini 3 series of models, a suite of highly capable, natively multimodal reasoning models. (blog, model card)

Gemini 3 Flash (Preview) — `google/gemini-3-flash-preview`

Gemini 3.0 models are thinking models, capable of reasoning through their thoughts before responding, resulting in enhanced performance and improved accuracy. (blog)

Gemini 3.1 Flash-Lite (Preview) — `google/gemini-3.1-flash-lite-preview`

Gemini 3.1 Flash-Lite (Preview) is the fastest and most cost-efficient Gemini 3 series model. (blog)

Gemini Robotics-ER 1.5 — `google/gemini-robotics-er-1.5-preview`

Gemini Robotics-ER 1.5 is a vision-language model (VLM) designed for advanced reasoning in the physical world, allowing robots to interpret complex visual data, perform spatial reasoning, and plan actions from natural language commands.

Gemma (2B) — `google/gemma-2b`

Gemma is a family of lightweight, open models built from the research and technology that Google used to create the Gemini models. (model card, blog post)

Gemma Instruct (2B) — `google/gemma-2b-it`

Gemma is a family of lightweight, open models built from the research and technology that Google used to create the Gemini models. (model card, blog post)

Gemma (7B) — `google/gemma-7b`

Gemma is a family of lightweight, open models built from the research and technology that Google used to create the Gemini models. (model card, blog post)

Gemma Instruct (7B) — `google/gemma-7b-it`

Gemma is a family of lightweight, open models built from the research and technology that Google used to create the Gemini models. (model card, blog post)

Gemma 2 (9B) — `google/gemma-2-9b`

Gemma is a family of lightweight, open models built from the research and technology that Google used to create the Gemini models. (model card, blog post)

Gemma 2 Instruct (9B) — `google/gemma-2-9b-it`

Gemma is a family of lightweight, open models built from the research and technology that Google used to create the Gemini models. (model card, blog post)

Gemma 2 (27B) — `google/gemma-2-27b`

Gemma is a family of lightweight, open models built from the research and technology that Google used to create the Gemini models. (model card, blog post)

Gemma 2 Instruct (27B) — `google/gemma-2-27b-it`

Gemma is a family of lightweight, open models built from the research and technology that Google used to create the Gemini models. (model card, blog post)

Gemma 3n E4B Instruct — `google/gemma-3n-e4b-it`

Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. (model card, blog post)

Gemma 3n E4B Instruct — `google/gemma-3n-e4b-it-thinking-disabled`

Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. This model was run with thinking disabled. (model card, blog post)

Gemma 4 31B Instruct — `google/gemma-4-31b-it`

Gemma is a family of open models built by Google DeepMind. Gemma 4 models are multimodal, handling text and image input (with audio supported on small models) and generating text output. (blog post)

Gemma 4 31B Instruct — `google/gemma-4-31b-it-thinking-disabled`

Gemma is a family of open models built by Google DeepMind. Gemma 4 models are multimodal, handling text and image input (with audio supported on small models) and generating text output. This model was run with thinking disabled. (blog post)

MedGemma (4B) — `google/medgemma-4b-it`

Gemma is a family of lightweight, open models built from the research and technology that Google used to create the Gemini models. (model card, blog post)

PaLM-2 (Bison) — `google/text-bison@001`

The best value PaLM model. PaLM 2 (Pathways Language Model) is a Transformer-based model trained using a mixture of objectives that was evaluated on English and multilingual language, and reasoning tasks. (report)

PaLM-2 (Bison) — `google/text-bison@002`

The best value PaLM model. PaLM 2 (Pathways Language Model) is a Transformer-based model trained using a mixture of objectives that was evaluated on English and multilingual language, and reasoning tasks. (report)

PaLM-2 (Bison) — `google/text-bison-32k`

The best value PaLM model with a 32K context. PaLM 2 (Pathways Language Model) is a Transformer-based model trained using a mixture of objectives that was evaluated on English and multilingual language, and reasoning tasks. (report)

PaLM-2 (Unicorn) — `google/text-unicorn@001`

The largest model in PaLM family. PaLM 2 (Pathways Language Model) is a Transformer-based model trained using a mixture of objectives that was evaluated on English and multilingual language, and reasoning tasks. (report)

MedLM (Medium) — `google/medlm-medium`

MedLM is a family of foundation models fine-tuned for the healthcare industry based on Google Research's medically-tuned large language model, Med-PaLM 2. (documentation)

MedLM (Large) — `google/medlm-large`

MedLM is a family of foundation models fine-tuned for the healthcare industry based on Google Research's medically-tuned large language model, Med-PaLM 2. (documentation)

Gemini 2.0 Flash (DSPy Zero-Shot Predict) — `google/gemini-2.0-flash-001-dspy-zs-predict`

Gemini 2.0 Flash is a member of the Gemini 2.0 series of models, a suite of highly-capable, natively multimodal models designed to power agentic systems. (model card, documentation)

Gemini 2.0 Flash (DSPy Zero-Shot ChainOfThought) — `google/gemini-2.0-flash-001-dspy-zs-cot`

Gemini 2.0 Flash is a member of the Gemini 2.0 series of models, a suite of highly-capable, natively multimodal models designed to power agentic systems. (model card, documentation)

Gemini 2.0 Flash (DSPy BootstrapFewShotWithRandomSearch) — `google/gemini-2.0-flash-001-dspy-fs-bfrs`

Gemini 2.0 Flash is a member of the Gemini 2.0 series of models, a suite of highly-capable, natively multimodal models designed to power agentic systems. (model card, documentation)

Gemini 2.0 Flash (DSPy MIPROv2) — `google/gemini-2.0-flash-001-dspy-fs-miprov2`

Gemini 2.0 Flash is a member of the Gemini 2.0 series of models, a suite of highly-capable, natively multimodal models designed to power agentic systems. (model card, documentation)

HuggingFace

SmolLM2 (135M) — `huggingface/smollm2-135m`

SmolLM2 is a family of compact language models that are capable of solving a wide range of tasks while being lightweight enough to run on-device. (paper)

SmolLM2 (360M) — `huggingface/smollm2-360m`

SmolLM2 is a family of compact language models that are capable of solving a wide range of tasks while being lightweight enough to run on-device. (paper)

SmolLM2 (1.7B) — `huggingface/smollm2-1.7b`

SmolLM2 is a family of compact language models that are capable of solving a wide range of tasks while being lightweight enough to run on-device. (paper)

SmolLM2 Instruct (135M) — `huggingface/smollm2-135m-instruct`

SmolLM2 is a family of compact language models that are capable of solving a wide range of tasks while being lightweight enough to run on-device. (paper)

SmolLM2 Instruct (360M) — `huggingface/smollm2-360m-instruct`

SmolLM2 is a family of compact language models that are capable of solving a wide range of tasks while being lightweight enough to run on-device. (paper)

SmolLM2 Instruct (1.7B) — `huggingface/smollm2-1.7b-instruct`

SmolLM2 is a family of compact language models that are capable of solving a wide range of tasks while being lightweight enough to run on-device. (paper)

Lightning AI

Lit-GPT — `lightningai/lit-gpt`

Lit-GPT is an optimized collection of open-source LLMs for finetuning and inference. It supports – Falcon, Llama 2, Vicuna, LongChat, and other top-performing open-source large language models.

LMSYS

Vicuna v1.3 (7B) — `lmsys/vicuna-7b-v1.3`

Vicuna v1.3 (7B) is an open-source chatbot trained by fine-tuning LLaMA on user-shared conversations collected from ShareGPT.

Vicuna v1.3 (13B) — `lmsys/vicuna-13b-v1.3`

Vicuna v1.3 (13B) is an open-source chatbot trained by fine-tuning LLaMA on user-shared conversations collected from ShareGPT.

Marin Community

Marin 8B Instruct — `marin-community/marin-8b-instruct`

Marin 8B Instruct is an open-source chatbot trained by fine-tuning LLaMA on user-shared conversations collected from ShareGPT.

Microsoft

Phi-2 — `microsoft/phi-2`

Phi-2 is a Transformer with 2.7 billion parameters. It was trained using the same data sources as Phi-1.5, augmented with a new data source that consists of various NLP synthetic texts and filtered websites (for safety and educational value)

Phi-3 (7B) — `microsoft/phi-3-small-8k-instruct`

Phi-3-Small-8K-Instruct is a lightweight model trained with synthetic data and filtered publicly available website data with a focus on high-quality and reasoning dense properties. (paper, blog)

Phi-3 (14B) — `microsoft/phi-3-medium-4k-instruct`

Phi-3-Medium-4K-Instruct is a lightweight model trained with synthetic data and filtered publicly available website data with a focus on high-quality and reasoning dense properties. (paper, blog)

Phi-3.5-mini-instruct (3.8B) — `microsoft/phi-3.5-mini-instruct`

Phi-3.5-mini is a lightweight, state-of-the-art open model built upon datasets used for Phi-3 - synthetic data and filtered publicly available websites. (paper, blog)

Phi-3.5 MoE — `microsoft/phi-3.5-moe-instruct`

Phi-3.5 MoE is a lightweight, state-of-the-art open model built upon datasets used for Phi-3 - synthetic data and filtered publicly available documents - with a focus on very high-quality, reasoning dense data. (paper, blog)

01.AI

Yi (6B) — `01-ai/yi-6b`

The Yi models are large language models trained from scratch by developers at 01.AI.

Yi (34B) — `01-ai/yi-34b`

The Yi models are large language models trained from scratch by developers at 01.AI.

Yi Chat (6B) — `01-ai/yi-6b-chat`

The Yi models are large language models trained from scratch by developers at 01.AI.

Yi Chat (34B) — `01-ai/yi-34b-chat`

The Yi models are large language models trained from scratch by developers at 01.AI.

Yi Large — `01-ai/yi-large`

The Yi models are large language models trained from scratch by developers at 01.AI. (tweet)

Yi Large (Preview) — `01-ai/yi-large-preview`

The Yi models are large language models trained from scratch by developers at 01.AI. (tweet)

Allen Institute for AI

OLMo (7B) — `allenai/olmo-7b`

OLMo is a series of Open Language Models trained on the Dolma dataset.

OLMo (7B Twin 2T) — `allenai/olmo-7b-twin-2t`

OLMo is a series of Open Language Models trained on the Dolma dataset.

OLMo (7B Instruct) — `allenai/olmo-7b-instruct`

OLMo is a series of Open Language Models trained on the Dolma dataset. The instruct versions was trained on the Tulu SFT mixture and a cleaned version of the UltraFeedback dataset.

OLMo 1.7 (7B) — `allenai/olmo-1.7-7b`

OLMo is a series of Open Language Models trained on the Dolma dataset. The instruct versions was trained on the Tulu SFT mixture and a cleaned version of the UltraFeedback dataset.

OLMo 2 7B Instruct November 2024 — `allenai/olmo-2-1124-7b-instruct`

OLMo 2 is a family of 7B and 13B models trained on up to 5T tokens. (blog)

OLMo 2 13B Instruct November 2024 — `allenai/olmo-2-1124-13b-instruct`

OLMo 2 is a family of 7B and 13B models trained on up to 5T tokens. (blog)

OLMo 2 32B Instruct March 2025 — `allenai/olmo-2-0325-32b-instruct`

OLMo 2 32B Instruct March 2025 is trained up to 6T tokens and post-trained using Tulu 3.1. (blog)

OLMoE 1B-7B Instruct January 2025 — `allenai/olmoe-1b-7b-0125-instruct`

OLMoE 1B-7B Instruct January 2025 is a fully open language model leveraging sparse Mixture-of-Experts (MoE). It has 7B parameters but uses only 1B per input token. It was pretrained on 5T tokens. (blog, paper)

Mistral AI

Mistral v0.1 (7B) — `mistralai/mistral-7b-v0.1`

Mistral 7B is a 7.3B parameter transformer model that uses Grouped-Query Attention (GQA) and Sliding-Window Attention (SWA). (blog post)

Mistral Instruct v0.1 (7B) — `mistralai/mistral-7b-instruct-v0.1`

Mistral v0.1 Instruct 7B is a 7.3B parameter transformer model that uses Grouped-Query Attention (GQA) and Sliding-Window Attention (SWA). The instruct version was fined-tuned using publicly available conversation datasets. (blog post)

Mistral Instruct v0.2 (7B) — `mistralai/mistral-7b-instruct-v0.2`

Mistral v0.2 Instruct 7B is a 7.3B parameter transformer model that uses Grouped-Query Attention (GQA). Compared to v0.1, v0.2 has a 32k context window and no Sliding-Window Attention (SWA). (blog post)

Mistral Instruct v0.3 (7B) — `mistralai/mistral-7b-instruct-v0.3`

Mistral v0.3 Instruct 7B is a 7.3B parameter transformer model that uses Grouped-Query Attention (GQA). Compared to v0.1, v0.2 has a 32k context window and no Sliding-Window Attention (SWA). (blog post)

Mistral Instruct v0.3 (7B) — `mistralai/mistral-7b-instruct-v0.3-hf`

Mistral v0.3 Instruct 7B is a 7.3B parameter transformer model that uses Grouped-Query Attention (GQA). Compared to v0.1, v0.2 has a 32k context window and no Sliding-Window Attention (SWA). (blog post)

Mixtral (8x7B 32K seqlen) — `mistralai/mixtral-8x7b-32kseqlen`

Mixtral is a mixture-of-experts model that has 46.7B total parameters but only uses 12.9B parameters per token. (blog post, tweet).

Mixtral Instruct (8x7B) — `mistralai/mixtral-8x7b-instruct-v0.1`

Mixtral Instruct (8x7B) is a version of Mixtral (8x7B) that was optimized through supervised fine-tuning and direct preference optimisation (DPO) for careful instruction following. (blog post).

Mixtral (8x22B) — `mistralai/mixtral-8x22b`

Mistral AI's mixture-of-experts model that uses 39B active parameters out of 141B (blog post).

Mixtral Instruct (8x22B) — `mistralai/mixtral-8x22b-instruct-v0.1`

Mistral AI's mixture-of-experts model that uses 39B active parameters out of 141B (blog post).

Ministral 3B (2402) — `mistralai/ministral-3b-2410`

Ministral 3B (2402) is a model for on-device computing and at-the-edge use cases (blog).

Ministral 8B (2402) — `mistralai/ministral-8b-2410`

Ministral 8B (2402) is a model for on-device computing and at-the-edge use cases a special interleaved sliding-window attention pattern for faster and memory-efficient inference (blog).

Mistral Small (2402) — `mistralai/mistral-small-2402`

Mistral Small is a multilingual model with a 32K tokens context window and function-calling capabilities. (blog)

Mistral Small (2409) — `mistralai/mistral-small-2409`

Mistral Small is a multilingual model with a 32K tokens context window and function-calling capabilities. (blog)

Mistral Small 3 (2501) — `mistralai/mistral-small-2501`

Mistral Small 3 (2501) is a pre-trained and instructed model catered to the '80%' of generative AI tasks—those that require robust language and instruction following performance, with very low latency. (blog)

Mistral Small 3.1 (2503) — `mistralai/mistral-small-2503`

Mistral Small 3.1 (2503) is a model with improved text performance, multimodal understanding, and an expanded context window of up to 128k tokens. (blog)

Mistral Medium (2312) — `mistralai/mistral-medium-2312`

Mistral is a transformer model that uses Grouped-Query Attention (GQA) and Sliding-Window Attention (SWA).

Mistral Medium 3 (2505) — `mistralai/mistral-medium-2505`

Mistral Medium 3 (2505) is a language model that is intended to to deliver state-of-the-art performance at lower cost. (blog)

Mistral Medium 3.1 — `mistralai/mistral-medium-3.1`

Mistral Medium 3.1 is a language model that is intended to to deliver state-of-the-art performance at lower cost. (blog)

Mistral Large (2402) — `mistralai/mistral-large-2402`

Mistral Large is a multilingual model with a 32K tokens context window and function-calling capabilities. (blog)

Mistral Large 2 (2407) — `mistralai/mistral-large-2407`

Mistral Large 2 is a 123 billion parameter model that has a 128k context window and supports dozens of languages and 80+ coding languages. (blog)

Mistral Large (2411) — `mistralai/mistral-large-2411`

Mistral Large (2411) is a 123B parameter model that has a 128k context window. (blog)

Mistral Large 3 (2512) — `mistralai/mistral-large-2512`

Mistral Large 3 is a open-weight, general-purpose multimodal model with a granular Mixture-of-Experts architecture with 41B active parameters and 675B total parameters. (doc, blog)

Mistral Medium 3.1 (2508) — `mistralai/mistral-medium-2508`

Mistral Medium 3.1 is a Mistral multimodal model. (doc)

Mistral Small 3.2 (2506) — `mistralai/mistral-small-2506`

Mistral Small 3.2 is a Mistral multimodal model. (doc)

Mistral Small 4 (2603) — `mistralai/mistral-small-2603`

Mistral Small 4 is a model in the Mistral Small family that unifies reasoning, multimodal, and agentic capabilities. (blog, doc)

Ministral 3 (14B 2512) — `mistralai/ministral-14b-2512`

Ministral 3 is a family of parameter-efficient dense language models designed for compute and memory constrained applications. (doc, blog, paper)

Ministral 3 (8B 2512) — `mistralai/ministral-8b-2512`

Ministral 3 is a family of parameter-efficient dense language models designed for compute and memory constrained applications. (doc, blog, paper)

Ministral 3 (3B 2512) — `mistralai/ministral-3b-2512`

Ministral 3 is a family of parameter-efficient dense language models designed for compute and memory constrained applications. (doc, blog, paper)

Mistral NeMo (2402) — `mistralai/open-mistral-nemo-2407`

Mistral NeMo is a multilingual 12B model with a large context window of 128K tokens. (blog)

Mistral Pixtral (2409) — `mistralai/pixtral-12b-2409`

Mistral Pixtral 12B is the first multimodal Mistral model for image understanding. (blog)

Mistral Pixtral Large (2411) — `mistralai/pixtral-large-2411`

Mistral Pixtral Large is a 124B open-weights multimodal model built on top of Mistral Large 2 (2407). (blog)

Moonshot AI

Kimi K2 Instruct — `moonshotai/kimi-k2-instruct`

Kimi K2 Instruct is a mixture-of-experts (MoE) language model with 32 billion activated parameters and 1 trillion total parameters trained with the Muon optimizer on 15.5T tokens. (blog)

Kimi K2 Instruct 0905 — `moonshotai/kimi-k2-instruct-0905`

Kimi K2 Instruct 0905 is the latest, most capable version of Kimi K2. It is a state-of-the-art mixture-of-experts (MoE) language model, featuring 32 billion activated parameters and a total of 1 trillion parameters. (model card)

Kimi K2 Thinking — `moonshotai/kimi-k2-thinking`

Kimi K2 Thinking is an open-weights thinking model that uses native INT4 quantization and maintains coherent goal-directed behavior across up to 200–300 consecutive tool invocations. (blog)

MosaicML

MPT (7B) — `mosaicml/mpt-7b`

MPT (7B) is a Transformer trained from scratch on 1T tokens of text and code.

MPT-Instruct (7B) — `mosaicml/mpt-instruct-7b`

MPT-Instruct (7B) is a model for short-form instruction following. It is built by finetuning MPT (30B), a Transformer trained from scratch on 1T tokens of text and code.

MPT (30B) — `mosaicml/mpt-30b`

MPT (30B) is a Transformer trained from scratch on 1T tokens of text and code.

MPT-Instruct (30B) — `mosaicml/mpt-instruct-30b`

MPT-Instruct (30B) is a model for short-form instruction following. It is built by finetuning MPT (30B), a Transformer trained from scratch on 1T tokens of text and code.

nectec

Pathumma-llm-text-1.0.0 (7B) — `nectec/Pathumma-llm-text-1.0.0`

Pathumma-llm-text-1.0.0 (7B) is a instruction model from OpenThaiLLM-Prebuilt-7B (blog)

OpenThaiLLM-Prebuilt-7B (7B) — `nectec/OpenThaiLLM-Prebuilt-7B`

OpenThaiLLM-Prebuilt-7B (7B) is a pretrained Thai large language model with 7 billion parameters based on Qwen2.5-7B.

Neurips

Neurips Local — `neurips/local`

Neurips Local

NVIDIA

Megatron GPT2 — `nvidia/megatron-gpt2`

GPT-2 implemented in Megatron-LM (paper).

Nemotron-4 Instruct (340B) — `nvidia/nemotron-4-340b-instruct`

Nemotron-4 Instruct (340B) is an open weights model sized to fit on a single DGX H100 with 8 GPUs when deployed in FP8 precision. 98% of the data used for model alignment was synthetically generated (paper).

Llama 3.1 Nemotron Instruct (70B) — `nvidia/llama-3.1-nemotron-70b-instruct`

Llama-3.1-Nemotron-70B-Instruct is a large language model customized by NVIDIA to improve the helpfulness of LLM generated responses to user queries. It was trained using RLHF (specifically, REINFORCE), Llama-3.1-Nemotron-70B-Reward and HelpSteer2-Preference prompts on a Llama-3.1-70B-Instruct model. (paper)

OpenAI

GPT-2 (1.5B) — `openai/gpt2`

GPT-2 (1.5B parameters) is a transformer model trained on a large corpus of English text in a self-supervised fashion (paper).

davinci-002 — `openai/davinci-002`

Replacement for the GPT-3 curie and davinci base models.

babbage-002 — `openai/babbage-002`

Replacement for the GPT-3 ada and babbage base models.

GPT-3.5 Turbo Instruct — `openai/gpt-3.5-turbo-instruct`

Similar capabilities as GPT-3 era models. Compatible with legacy Completions endpoint and not Chat Completions.

GPT-3.5 Turbo (0301) — `openai/gpt-3.5-turbo-0301`

Sibling model of text-davinci-003 that is optimized for chat but works well for traditional completions tasks as well. Snapshot from 2023-03-01.

GPT-3.5 Turbo (0613) — `openai/gpt-3.5-turbo-0613`

Sibling model of text-davinci-003 that is optimized for chat but works well for traditional completions tasks as well. Snapshot from 2023-06-13.

GPT-3.5 Turbo (1106) — `openai/gpt-3.5-turbo-1106`

Sibling model of text-davinci-003 that is optimized for chat but works well for traditional completions tasks as well. Snapshot from 2023-11-06.

GPT-3.5 Turbo (0125) — `openai/gpt-3.5-turbo-0125`

Sibling model of text-davinci-003 that is optimized for chat but works well for traditional completions tasks as well. Snapshot from 2024-01-25.

gpt-3.5-turbo-16k-0613 — `openai/gpt-3.5-turbo-16k-0613`

Sibling model of text-davinci-003 that is optimized for chat but works well for traditional completions tasks as well. Snapshot from 2023-06-13 with a longer context length of 16,384 tokens.

GPT-4 Turbo (1106 preview) — `openai/gpt-4-1106-preview`

GPT-4 Turbo (preview) is a large multimodal model that is optimized for chat but works well for traditional completions tasks. The model is cheaper and faster than the original GPT-4 model. Preview snapshot from 2023-11-06.

GPT-4 (0314) — `openai/gpt-4-0314`

GPT-4 is a large multimodal model (currently only accepting text inputs and emitting text outputs) that is optimized for chat but works well for traditional completions tasks. Snapshot of gpt-4 from 2023-03-14.

gpt-4-32k-0314 — `openai/gpt-4-32k-0314`

GPT-4 is a large multimodal model (currently only accepting text inputs and emitting text outputs) that is optimized for chat but works well for traditional completions tasks. Snapshot of gpt-4 with a longer context length of 32,768 tokens from March 14th 2023.

GPT-4 (0613) — `openai/gpt-4-0613`

GPT-4 is a large multimodal model (currently only accepting text inputs and emitting text outputs) that is optimized for chat but works well for traditional completions tasks. Snapshot of gpt-4 from 2023-06-13.

gpt-4-32k-0613 — `openai/gpt-4-32k-0613`

GPT-4 is a large multimodal model (currently only accepting text inputs and emitting text outputs) that is optimized for chat but works well for traditional completions tasks. Snapshot of gpt-4 with a longer context length of 32,768 tokens from 2023-06-13.

GPT-4 Turbo (0125 preview) — `openai/gpt-4-0125-preview`

GPT-4 Turbo (preview) is a large multimodal model that is optimized for chat but works well for traditional completions tasks. The model is cheaper and faster than the original GPT-4 model. Preview snapshot from 2023-01-25. This snapshot is intended to reduce cases of “laziness” where the model doesn’t complete a task.

GPT-4 Turbo (2024-04-09) — `openai/gpt-4-turbo-2024-04-09`

GPT-4 Turbo (2024-04-09) is a large multimodal model that is optimized for chat but works well for traditional completions tasks. The model is cheaper and faster than the original GPT-4 model. Snapshot from 2024-04-09.

GPT-4o (2024-05-13) — `openai/gpt-4o-2024-05-13`

GPT-4o (2024-05-13) is a large multimodal model that accepts as input any combination of text, audio, and image and generates any combination of text, audio, and image outputs. (blog)

GPT-4o (2024-08-06) — `openai/gpt-4o-2024-08-06`

GPT-4o (2024-08-06) is a large multimodal model that accepts as input any combination of text, audio, and image and generates any combination of text, audio, and image outputs. (blog)

GPT-4o (2024-11-20) — `openai/gpt-4o-2024-11-20`

GPT-4o (2024-11-20) is a large multimodal model that accepts as input any combination of text, audio, and image and generates any combination of text, audio, and image outputs. (blog)

GPT-4o mini (2024-07-18) — `openai/gpt-4o-mini-2024-07-18`

GPT-4o mini (2024-07-18) is a multimodal model with a context window of 128K tokens and improved handling of non-English text. (blog)

GPT-4.1 (2025-04-14) — `openai/gpt-4.1-2025-04-14`

GPT-4.1 (2025-04-14) is a multimdodal model in the GPT-4.1 family, which outperforms the GPT-4o family, with major gains in coding and instruction following. They also have larger context windows of 1 million tokens and are able to better use that context with improved long-context comprehension. (blog)

GPT-4.1 mini (2025-04-14) — `openai/gpt-4.1-mini-2025-04-14`

GPT-4.1 mini (2025-04-14) is a multimdodal model in the GPT-4.1 family, which outperforms the GPT-4o family, with major gains in coding and instruction following. They also have larger context windows of 1 million tokens and are able to better use that context with improved long-context comprehension. (blog)

GPT-4.1 nano (2025-04-14) — `openai/gpt-4.1-nano-2025-04-14`

GPT-4.1 nano (2025-04-14) is a multimdodal model in the GPT-4.1 family, which outperforms the GPT-4o family, with major gains in coding and instruction following. They also have larger context windows of 1 million tokens and are able to better use that context with improved long-context comprehension. (blog)

GPT-5 (2025-08-07) — `openai/gpt-5-2025-08-07`

GPT-5 (2025-08-07) is a multimdodal model trained for real-world coding tasks and long-running agentic tasks. (blog, system card)

GPT-5 mini (2025-08-07) — `openai/gpt-5-mini-2025-08-07`

GPT-5 mini (2025-08-07) is a multimdodal model trained for real-world coding tasks and long-running agentic tasks. (blog, system card)

GPT-5 nano (2025-08-07) — `openai/gpt-5-nano-2025-08-07`

GPT-5 nano (2025-08-07) is a multimdodal model trained for real-world coding tasks and long-running agentic tasks. (blog, system card)

GPT-5.4 Pro (2026-03-05) — `openai/gpt-5.4-pro-2026-03-05`

GPT-5.4 Pro (2026-03-05) is a model in the GPT-5 model family that incorporates recent advances in reasoning, coding, and agentic workflows. (blog)

GPT-5.4 (2026-03-05) — `openai/gpt-5.4-2026-03-05`

GPT-5.4 (2026-03-05) is a model in the GPT-5 model family that incorporates recent advances in reasoning, coding, and agentic workflows. (blog)

GPT-5.4 mini (2026-03-17) — `openai/gpt-5.4-mini-2026-03-17`

GPT-5.4 mini (2026-03-17) is a more efficient model designed for high-volume workloads. (blog)

GPT-5.4 nano (2026-03-17) — `openai/gpt-5.4-nano-2026-03-17`

GPT-5.4 nano (2026-03-17) is a more efficient model designed for high-volume workloads. (blog)

GPT-5.2 (2025-12-11) — `openai/gpt-5.2-2025-12-11`

GPT-5.2 (2025-12-11) is a model in the GPT-5 model family that is intended for coding and agentic tasks across industries. (blog)

GPT-5.1 (2025-11-13) — `openai/gpt-5.1-2025-11-13`

GPT-5.1 (2025-11-13) is a model in the GPT-5 model family, and has similar training for code generation, bug fixing, refactoring, instruction following, long context and tool calling. (blog)

GPT-4.5 (2025-02-27 preview) — `openai/gpt-4.5-preview-2025-02-27`

GPT-4.5 (2025-02-27 preview) is a large multimodal model that is designed to be more general-purpose than OpenAI's STEM-focused reasoning models. It was trained using new supervision techniques combined with traditional methods like supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF). (blog, system card)

o1 pro (2025-03-19) — `openai/o1-pro-2025-03-19`

o1 is a new large language model trained with reinforcement learning to perform complex reasoning. (model card, blog post)

o1 pro (2025-03-19, low reasoning effort) — `openai/o1-pro-2025-03-19-low-reasoning-effort`

o1 is a new large language model trained with reinforcement learning to perform complex reasoning. (model card, blog post) The requests' reasoning effort parameter in is set to low.

o1 pro (2025-03-19, high reasoning effort) — `openai/o1-pro-2025-03-19-high-reasoning-effort`

o1 is a new large language model trained with reinforcement learning to perform complex reasoning. (model card, blog post) The requests' reasoning effort parameter in is set to high.

o1 (2024-12-17) — `openai/o1-2024-12-17`

o1 is a new large language model trained with reinforcement learning to perform complex reasoning. (model card, blog post)

o1 (2024-12-17, low reasoning effort) — `openai/o1-2024-12-17-low-reasoning-effort`

o1 is a new large language model trained with reinforcement learning to perform complex reasoning. (model card, blog post) The requests' reasoning effort parameter in is set to low.

o1 (2024-12-17, high reasoning effort) — `openai/o1-2024-12-17-high-reasoning-effort`

o1 is a new large language model trained with reinforcement learning to perform complex reasoning. (model card, blog post) The requests' reasoning effort parameter in is set to high.

o1-preview (2024-09-12) — `openai/o1-preview-2024-09-12`

o1-preview is a language model trained with reinforcement learning to perform complex reasoning that can produce a long internal chain of thought before responding to the user. (model card, blog post)

o1-mini (2024-09-12) — `openai/o1-mini-2024-09-12`

o1-mini is a cost-effective reasoning model for applications that require reasoning without broad world knowledge. (model card, blog post)

o3-mini (2025-01-31) — `openai/o3-mini-2025-01-31`

o3-mini is a small reasoning model form OpenAI that aims to deliver STEM capabilities while maintaining the low cost and reduced latency of OpenAI o1-mini. (blog post)

o3-mini (2025-01-31, low reasoning effort) — `openai/o3-mini-2025-01-31-low-reasoning-effort`

o3-mini is a small reasoning model form OpenAI that aims to deliver STEM capabilities while maintaining the low cost and reduced latency of OpenAI o1-mini. (blog post) The requests' reasoning effort parameter in is set to low.

o3-mini (2025-01-31, high reasoning effort) — `openai/o3-mini-2025-01-31-high-reasoning-effort`

o3-mini is a small reasoning model form OpenAI that aims to deliver STEM capabilities while maintaining the low cost and reduced latency of OpenAI o1-mini. (blog post) The requests' reasoning effort parameter in is set to high.

o3 (2025-04-16) — `openai/o3-2025-04-16`

o3 is a reasoning model for math, science, coding, and visual reasoning tasks. (blog post)

o3 (2025-04-16, low reasoning effort) — `openai/o3-2025-04-16-low-reasoning-effort`

o3 is a reasoning model for math, science, coding, and visual reasoning tasks. (blog post)

o3 (2025-04-16, high reasoning effort) — `openai/o3-2025-04-16-high-reasoning-effort`

o3 is a reasoning model for math, science, coding, and visual reasoning tasks. (blog post)

o4-mini (2025-04-16) — `openai/o4-mini-2025-04-16`

o4-mini is an o-series model optimized for fast, effective reasoning with exceptionally efficient performance in coding and visual tasks. (blog post)

o4-mini (2025-04-16, low reasoning effort) — `openai/o4-mini-2025-04-16-low-reasoning-effort`

o4-mini is an o-series model optimized for fast, effective reasoning with exceptionally efficient performance in coding and visual tasks. (blog post)

o4-mini (2025-04-16, high reasoning effort) — `openai/o4-mini-2025-04-16-high-reasoning-effort`

o4-mini is an o-series model optimized for fast, effective reasoning with exceptionally efficient performance in coding and visual tasks. (blog post)

o3-pro (2025-06-10, high reasoning effort) — `openai/o3-pro-2025-06-10-high-reasoning-effort`

o3-pro is an o-series model designed to think longer and provide the most reliable responses. (blog post)

gpt-oss-20b — `openai/gpt-oss-20b`

gpt-oss-20b is an open-weight language model that was trained using a mix of reinforcement learning and other techniques informed by OpenAI's internal models. It uses a mixture-of-experts architecture and activates 3.6B parameters per token. (blog)

gpt-oss-120b — `openai/gpt-oss-120b`

gpt-oss-120b is an open-weight language model that was trained using a mix of reinforcement learning and other techniques informed by OpenAI's internal models. It uses a mixture-of-experts architecture and activates 5.1B parameters per token. (blog)

GPT-4o (2024-05-13) (DSPy Zero-Shot Predict) — `openai/gpt-4o-2024-05-13-dspy-zs-predict`

GPT-4o (2024-05-13) is a large multimodal model that accepts as input any combination of text, audio, and image and generates any combination of text, audio, and image outputs. (blog)

o3-mini (2025-01-31) (DSPy Zero-Shot Predict) — `openai/o3-mini-2025-01-31-dspy-zs-predict`

o3-mini is a small reasoning model form OpenAI that aims to deliver STEM capabilities while maintaining the low cost and reduced latency of OpenAI o1-mini. (blog post)

GPT-4o (2024-05-13) (DSPy Zero-Shot ChainOfThought) — `openai/gpt-4o-2024-05-13-dspy-zs-cot`

GPT-4o (2024-05-13) is a large multimodal model that accepts as input any combination of text, audio, and image and generates any combination of text, audio, and image outputs. (blog)

o3-mini (2025-01-31) (DSPy Zero-Shot ChainOfThought) — `openai/o3-mini-2025-01-31-dspy-zs-cot`

o3-mini is a small reasoning model form OpenAI that aims to deliver STEM capabilities while maintaining the low cost and reduced latency of OpenAI o1-mini. (blog post)

GPT-4o (2024-05-13) (DSPy BootstrapFewShotWithRandomSearch) — `openai/gpt-4o-2024-05-13-dspy-fs-bfrs`

GPT-4o (2024-05-13) is a large multimodal model that accepts as input any combination of text, audio, and image and generates any combination of text, audio, and image outputs. (blog)

o3-mini (2025-01-31) (DSPy BootstrapFewShotWithRandomSearch) — `openai/o3-mini-2025-01-31-dspy-fs-bfrs`

o3-mini is a small reasoning model form OpenAI that aims to deliver STEM capabilities while maintaining the low cost and reduced latency of OpenAI o1-mini. (blog post)

GPT-4o (2024-05-13) (DSPy MIPROv2) — `openai/gpt-4o-2024-05-13-dspy-fs-miprov2`

GPT-4o (2024-05-13) is a large multimodal model that accepts as input any combination of text, audio, and image and generates any combination of text, audio, and image outputs. (blog)

o3-mini (2025-01-31) (DSPy MIPROv2) — `openai/o3-mini-2025-01-31-dspy-fs-miprov2`

o3-mini is a small reasoning model form OpenAI that aims to deliver STEM capabilities while maintaining the low cost and reduced latency of OpenAI o1-mini. (blog post)

OpenThaiGPT

OpenThaiGPT v1.0.0 (7B) — `openthaigpt/openthaigpt-1.0.0-7b-chat`

OpenThaiGPT v1.0.0 (7B) is a Thai language chat model based on Llama 2 that has been specifically fine-tuned for Thai instructions and enhanced by incorporating over 10,000 of the most commonly used Thai words into the dictionary. (blog post)

OpenThaiGPT v1.0.0 (13B) — `openthaigpt/openthaigpt-1.0.0-13b-chat`

OpenThaiGPT v1.0.0 (13B) is a Thai language chat model based on Llama 2 that has been specifically fine-tuned for Thai instructions and enhanced by incorporating over 10,000 of the most commonly used Thai words into the dictionary. (blog post)

OpenThaiGPT v1.0.0 (70B) — `openthaigpt/openthaigpt-1.0.0-70b-chat`

OpenThaiGPT v1.0.0 (70B) is a Thai language chat model based on Llama 2 that has been specifically fine-tuned for Thai instructions and enhanced by incorporating over 10,000 of the most commonly used Thai words into the dictionary. (blog post)

Qwen

Qwen — `qwen/qwen-7b`

7B-parameter version of the large language model series, Qwen (abbr. Tongyi Qianwen), proposed by Aibaba Cloud. Qwen is a family of transformer models with SwiGLU activation, RoPE, and multi-head attention. (blog)

Qwen1.5 (7B) — `qwen/qwen1.5-7b`

7B-parameter version of the large language model series, Qwen 1.5 (abbr. Tongyi Qianwen), proposed by Aibaba Cloud. Qwen is a family of transformer models with SwiGLU activation, RoPE, and multi-head attention. (blog)

Qwen1.5 (14B) — `qwen/qwen1.5-14b`

14B-parameter version of the large language model series, Qwen 1.5 (abbr. Tongyi Qianwen), proposed by Aibaba Cloud. Qwen is a family of transformer models with SwiGLU activation, RoPE, and multi-head attention. (blog)

Qwen1.5 (32B) — `qwen/qwen1.5-32b`

32B-parameter version of the large language model series, Qwen 1.5 (abbr. Tongyi Qianwen), proposed by Aibaba Cloud. Qwen is a family of transformer models with SwiGLU activation, RoPE, and multi-head attention. The 32B version also includes grouped query attention (GQA). (blog)

Qwen1.5 (72B) — `qwen/qwen1.5-72b`

72B-parameter version of the large language model series, Qwen 1.5 (abbr. Tongyi Qianwen), proposed by Aibaba Cloud. Qwen is a family of transformer models with SwiGLU activation, RoPE, and multi-head attention. (blog)

Qwen1.5 Chat (7B) — `qwen/qwen1.5-7b-chat`

7B-parameter version of the large language model series, Qwen 1.5 (abbr. Tongyi Qianwen), proposed by Aibaba Cloud. Qwen is a family of transformer models with SwiGLU activation, RoPE, and multi-head attention. (blog)

Qwen1.5 Chat (14B) — `qwen/qwen1.5-14b-chat`

14B-parameter chat version of the large language model series, Qwen 1.5 (abbr. Tongyi Qianwen), proposed by Aibaba Cloud. Qwen is a family of transformer models with SwiGLU activation, RoPE, and multi-head attention. (blog)

Qwen1.5 Chat (32B) — `qwen/qwen1.5-32b-chat`

32B-parameter version of the large language model series, Qwen 1.5 (abbr. Tongyi Qianwen), proposed by Aibaba Cloud. Qwen is a family of transformer models with SwiGLU activation, RoPE, and multi-head attention. The 32B version also includes grouped query attention (GQA). (blog)

Qwen1.5 Chat (72B) — `qwen/qwen1.5-72b-chat`

72B-parameter chat version of the large language model series, Qwen 1.5 (abbr. Tongyi Qianwen), proposed by Aibaba Cloud. Qwen is a family of transformer models with SwiGLU activation, RoPE, and multi-head attention. (blog)

Qwen1.5 Chat (110B) — `qwen/qwen1.5-110b-chat`

110B-parameter chat version of the large language model series, Qwen 1.5 (abbr. Tongyi Qianwen), proposed by Aibaba Cloud. Qwen is a family of transformer models with SwiGLU activation, RoPE, and multi-head attention. The 110B version also includes grouped query attention (GQA). (blog)

Qwen2 Instruct (72B) — `qwen/qwen2-72b-instruct`

72B-parameter chat version of the large language model series, Qwen2. Qwen2 uses Group Query Attention (GQA) and has extended context length support up to 128K tokens. (blog)

Qwen2.5 Instruct Turbo (7B) — `qwen/qwen2.5-7b-instruct-turbo`

Qwen2.5 Instruct Turbo (7B) was trained on 18 trillion tokens and supports 29 languages, and shows improvements over Qwen2 in knowledge, coding, mathematics, instruction following, generating long texts, and processing structure data. (blog) Turbo is Together's cost-efficient implementation, providing fast FP8 performance while maintaining quality, closely matching FP16 reference models. (blog)

Qwen2.5 Instruct (7B) — `qwen/qwen2.5-7b-instruct`

Qwen2.5 Instruct (7B) was trained on 18 trillion tokens and supports 29 languages, and shows improvements over Qwen2 in knowledge, coding, mathematics, instruction following, generating long texts, and processing structure data. (blog) Turbo is Together's cost-efficient implementation, providing fast FP8 performance while maintaining quality, closely matching FP16 reference models. (blog)

Qwen2.5 Instruct Turbo (72B) — `qwen/qwen2.5-72b-instruct-turbo`

Qwen2.5 Instruct Turbo (72B) was trained on 18 trillion tokens and supports 29 languages, and shows improvements over Qwen2 in knowledge, coding, mathematics, instruction following, generating long texts, and processing structure data. (blog) Turbo is Together's cost-efficient implementation, providing fast FP8 performance while maintaining quality, closely matching FP16 reference models. (blog)

Qwen3 235B A22B FP8 Throughput — `qwen/qwen3-235b-a22b-fp8-tput`

Qwen3 235B A22B FP8 Throughput is a hybrid instruct and reasoning mixture-of-experts model (blog).

Qwen3-Next 80B A3B Instruct — `qwen/qwen3-next-80b-a3b-instruct`

Qwen3-Next is a new model architecture for improving training and inference efficiency under long-context and large-parameter settings. Compared to the MoE structure of Qwen3, Qwen3-Next introduces a hybrid attention mechanism, a highly sparse Mixture-of-Experts (MoE) structure, training-stability-friendly optimizations, and a multi-token prediction mechanism for faster inference. (blog)

Qwen3-Next 80B A3B Thinking — `qwen/qwen3-next-80b-a3b-thinking`

Qwen3-Next is a new model architecture for improving training and inference efficiency under long-context and large-parameter settings. Compared to the MoE structure of Qwen3, Qwen3-Next introduces a hybrid attention mechanism, a highly sparse Mixture-of-Experts (MoE) structure, training-stability-friendly optimizations, and a multi-token prediction mechanism for faster inference. (blog)

Qwen3 235B A22B Instruct 2507 FP8 — `qwen/qwen3-235b-a22b-instruct-2507-fp8`

Qwen3 235B A22B Instruct 2507 FP8 is an updated version of the non-thinking mode of Qwen3 235B A22B FP8.

Qwen3 235B A22B Thinking 2507 — `qwen/qwen3-235b-a22b-thinking-2507`

Qwen3 235B A22B Thinking 2507 is an updated version of the thinking mode of Qwen3 235B A22B.

Qwen3.5 397B A17B — `qwen/qwen3.5-397b-a17b`

Qwen3.5 397B A17B uses a hybrid architecture that fuses linear attention (via Gated Delta Networks) with a sparse mixture-of-experts. (blog)

Qwen3.5 397B A17B — `qwen/qwen3.5-397b-a17b-thinking-disabled`

Qwen3.5 397B A17B uses a hybrid architecture that fuses linear attention (via Gated Delta Networks) with a sparse mixture-of-experts. This model was run with thinking disabled. (blog)

Qwen3.5 9B — `qwen/qwen3.5-9b`

Qwen3.5 9B A17B uses a hybrid architecture that fuses linear attention (via Gated Delta Networks) with a sparse mixture-of-experts. (blog)

Qwen3.5 9B — `qwen/qwen3.5-9b-thinking-disabled`

Qwen3.5 9B A17B uses a hybrid architecture that fuses linear attention (via Gated Delta Networks) with a sparse mixture-of-experts. This model was run with thinking disabled. (blog)

Alibaba Cloud

QwQ (32B Preview) — `qwen/qwq-32b-preview`

QwQ-32B-Preview is an experimental research model developed by the Qwen Team, focused on advancing AI reasoning capabilities. (blog post).

SAIL

Sailor (7B) — `sail/sailor-7b`

Sailor is a suite of Open Language Models tailored for South-East Asia, focusing on languages such as Indonesian, Thai, Vietnamese, Malay, and Lao. These models were continually pre-trained from Qwen1.5. (paper)

Sailor Chat (7B) — `sail/sailor-7b-chat`

Sailor is a suite of Open Language Models tailored for South-East Asia, focusing on languages such as Indonesian, Thai, Vietnamese, Malay, and Lao. These models were continually pre-trained from Qwen1.5. (paper)

Sailor (14B) — `sail/sailor-14b`

Sailor is a suite of Open Language Models tailored for South-East Asia, focusing on languages such as Indonesian, Thai, Vietnamese, Malay, and Lao. These models were continually pre-trained from Qwen1.5. (paper)

Sailor Chat (14B) — `sail/sailor-14b-chat`

Sailor is a suite of Open Language Models tailored for South-East Asia, focusing on languages such as Indonesian, Thai, Vietnamese, Malay, and Lao. These models were continually pre-trained from Qwen1.5. (paper)

SambaLingo

SambaLingo-Thai-Base — `sambanova/sambalingo-thai-base`

SambaLingo-Thai-Base is a pretrained bi-lingual Thai and English model that adapts Llama 2 (7B) to Thai by training on 38 billion tokens from the Thai split of the Cultura-X dataset. (paper)

SambaLingo-Thai-Chat — `sambanova/sambalingo-thai-chat`

SambaLingo-Thai-Chat is a chat model trained using direct preference optimization on SambaLingo-Thai-Base. SambaLingo-Thai-Base adapts Llama 2 (7B) to Thai by training on 38 billion tokens from the Thai split of the Cultura-X dataset. (paper)

SambaLingo-Thai-Base-70B — `sambanova/sambalingo-thai-base-70b`

SambaLingo-Thai-Base-70B is a pretrained bi-lingual Thai and English model that adapts Llama 2 (70B) to Thai by training on 26 billion tokens from the Thai split of the Cultura-X dataset. (paper)

SambaLingo-Thai-Chat-70B — `sambanova/sambalingo-thai-chat-70b`

SambaLingo-Thai-Chat-70B is a chat model trained using direct preference optimization on SambaLingo-Thai-Base-70B. SambaLingo-Thai-Base-70B adapts Llama 2 (7B) to Thai by training on 26 billion tokens from the Thai split of the Cultura-X dataset. (paper)

SCB10X

Typhoon (7B) — `scb10x/typhoon-7b`

Typhoon (7B) is pretrained Thai large language model with 7 billion parameters based on Mistral 7B. (paper)

Typhoon v1.5 (8B) — `scb10x/typhoon-v1.5-8b`

Typhoon v1.5 (8B) is a pretrained Thai large language model with 8 billion parameters based on Llama 3 8B. (blog)

Typhoon v1.5 Instruct (8B) — `scb10x/typhoon-v1.5-8b-instruct`

Typhoon v1.5 Instruct (8B) is a pretrained Thai large language model with 8 billion parameters based on Llama 3 8B. (blog)

Typhoon v1.5 (72B) — `scb10x/typhoon-v1.5-72b`

Typhoon v1.5 (72B) is a pretrained Thai large language model with 72 billion parameters based on Qwen1.5-72B. (blog)

Typhoon v1.5 Instruct (72B) — `scb10x/typhoon-v1.5-72b-instruct`

Typhoon v1.5 Instruct (72B) is a pretrained Thai large language model with 72 billion parameters based on Qwen1.5-72B. (blog)

Typhoon 1.5X instruct (8B) — `scb10x/llama-3-typhoon-v1.5x-8b-instruct`

Llama-3-Typhoon-1.5X-8B-instruct is a 8 billion parameter instruct model designed for the Thai language based on Llama 3 Instruct. It utilizes the task-arithmetic model editing technique. (blog)

Typhoon 1.5X instruct (70B) — `scb10x/llama-3-typhoon-v1.5x-70b-instruct`

Llama-3-Typhoon-1.5X-70B-instruct is a 70 billion parameter instruct model designed for the Thai language based on Llama 3 Instruct. It utilizes the task-arithmetic model editing technique. (blog)

Alibaba DAMO Academy

SeaLLM v2 (7B) — `damo/seallm-7b-v2`

SeaLLM v2 is a multilingual LLM for Southeast Asian (SEA) languages trained from Mistral (7B). (website)

SeaLLM v2.5 (7B) — `damo/seallm-7b-v2.5`

SeaLLM is a multilingual LLM for Southeast Asian (SEA) languages trained from Gemma (7B). (website)

Snowflake

Arctic Instruct — `snowflake/snowflake-arctic-instruct`

Arctic combines a 10B dense transformer model with a residual 128x3.66B MoE MLP resulting in 480B total and 17B active parameters chosen using a top-2 gating.

Stability AI

StableLM-Base-Alpha (3B) — `stabilityai/stablelm-base-alpha-3b`

StableLM-Base-Alpha is a suite of 3B and 7B parameter decoder-only language models pre-trained on a diverse collection of English datasets with a sequence length of 4096 to push beyond the context window limitations of existing open-source language models.

StableLM-Base-Alpha (7B) — `stabilityai/stablelm-base-alpha-7b`

StableLM-Base-Alpha is a suite of 3B and 7B parameter decoder-only language models pre-trained on a diverse collection of English datasets with a sequence length of 4096 to push beyond the context window limitations of existing open-source language models.

Stanford

Alpaca (7B) — `stanford/alpaca-7b`

Alpaca 7B is a model fine-tuned from the LLaMA 7B model on 52K instruction-following demonstrations

TII UAE

Falcon (7B) — `tiiuae/falcon-7b`

Falcon-7B is a 7B parameters causal decoder-only model built by TII and trained on 1,500B tokens of RefinedWeb enhanced with curated corpora.

Falcon-Instruct (7B) — `tiiuae/falcon-7b-instruct`

Falcon-7B-Instruct is a 7B parameters causal decoder-only model built by TII based on Falcon-7B and finetuned on a mixture of chat/instruct datasets.

Falcon (40B) — `tiiuae/falcon-40b`

Falcon-40B is a 40B parameters causal decoder-only model built by TII and trained on 1,500B tokens of RefinedWeb enhanced with curated corpora.

Falcon-Instruct (40B) — `tiiuae/falcon-40b-instruct`

Falcon-40B-Instruct is a 40B parameters causal decoder-only model built by TII based on Falcon-7B and finetuned on a mixture of chat/instruct datasets.

Falcon3-1B-Instruct — `tiiuae/falcon3-1b-instruct`

Falcon3-1B-Instruct is an open-weights foundation model that supports 4 languages (English, French, Spanish, Portuguese) that was trained on 14T tokens. (blog)

Falcon3-3B-Instruct — `tiiuae/falcon3-3b-instruct`

Falcon3-3B-Instruct is an open-weights foundation model that supports 4 languages (English, French, Spanish, Portuguese) that was trained on 14T tokens. (blog)

Falcon3-7B-Instruct — `tiiuae/falcon3-7b-instruct`

Falcon3-7B-Instruct is an open-weights foundation model that supports 4 languages (English, French, Spanish, Portuguese) that was trained on 14T tokens. (blog)

Falcon3-10B-Instruct — `tiiuae/falcon3-10b-instruct`

Falcon3-10B-Instruct is an open-weights foundation model that supports 4 languages (English, French, Spanish, Portuguese) that was trained on 14T tokens. (blog)

FreedomAI

AceGPT-v2-8B-Chat — `freedomintelligence/acegpt-v2-8b-chat`

AceGPT is a fully fine-tuned generative text model collection, particularly focused on the Arabic language domain. AceGPT-v2-8B-Chat is based on Meta-Llama-3-8B. (paper)

AceGPT-v2-32B-Chat — `freedomintelligence/acegpt-v2-32b-chat`

AceGPT is a fully fine-tuned generative text model collection, particularly focused on the Arabic language domain. AceGPT-v2-32B-Chat is based on Qwen1.5-32B. (paper)

AceGPT-v2-70B-Chat — `freedomintelligence/acegpt-v2-70b-chat`

AceGPT is a fully fine-tuned generative text model collection, particularly focused on the Arabic language domain. AceGPT-v2-70B-Chat is based on Meta-Llama-3-70B. (paper)

NCAI & SDAIA

ALLaM-7B-Instruct-preview — `allam-ai/allam-7b-instruct-preview`

ALLaM-7B-Instruct-preview is a model designed to advance Arabic language technology, which used a recipe of training on 4T English tokens followed by training on 1.2T mixed Arabic/English tokens. (paper)

SILMA AI

SILMA 9B — `silma-ai/silma-9b-instruct-v1.0`

SILMA 9B is a compact Arabic language model based on Google Gemma. (model card)

Inception

Jais-family-590m-chat — `inceptionai/jais-family-590m-chat`

The Jais family of models is a series of bilingual English-Arabic large language models (LLMs) that are trained from scratch and optimized to excel in Arabic while having strong English capabilities. (website, blog)

Jais-family-1p3b-chat — `inceptionai/jais-family-1p3b-chat`

The Jais family of models is a series of bilingual English-Arabic large language models (LLMs) that are trained from scratch and optimized to excel in Arabic while having strong English capabilities. (website, blog)

Jais-family-2p7b-chat — `inceptionai/jais-family-2p7b-chat`

The Jais family of models is a series of bilingual English-Arabic large language models (LLMs) that are trained from scratch and optimized to excel in Arabic while having strong English capabilities. (website, blog)

Jais-family-6p7b-chat — `inceptionai/jais-family-6p7b-chat`

The Jais family of models is a series of bilingual English-Arabic large language models (LLMs) that are trained from scratch and optimized to excel in Arabic while having strong English capabilities. (website, blog)

Jais-family-6p7b-chat — `inceptionai/jais-family-6p7b-chat`

The Jais family of models is a series of bilingual English-Arabic large language models (LLMs) that are trained from scratch and optimized to excel in Arabic while having strong English capabilities. (website, blog)

Jais-family-13b-chat — `inceptionai/jais-family-13b-chat`

The Jais family of models is a series of bilingual English-Arabic large language models (LLMs) that are trained from scratch and optimized to excel in Arabic while having strong English capabilities. (website, blog)

Jais-family-30b-8k-chat — `inceptionai/jais-family-30b-8k-chat`

The Jais family of models is a series of bilingual English-Arabic large language models (LLMs) that are trained from scratch and optimized to excel in Arabic while having strong English capabilities. (website, blog)

Jais-family-30b-16k-chat — `inceptionai/jais-family-30b-16k-chat`

The Jais family of models is a series of bilingual English-Arabic large language models (LLMs) that are trained from scratch and optimized to excel in Arabic while having strong English capabilities. (website, blog)

Jais-adapted-7b-chat — `inceptionai/jais-adapted-7b-chat`

The Jais adapted models are bilingual English-Arabic large language models (LLMs) that are trained adaptively from Llama-2 and optimized to excel in Arabic while having strong English capabilities. (website, blog)

Jais-adapted-13b-chat — `inceptionai/jais-adapted-13b-chat`

The Jais adapted models are bilingual English-Arabic large language models (LLMs) that are trained adaptively from Llama-2 and optimized to excel in Arabic while having strong English capabilities. (website, blog)

Jais-adapted-70b-chat — `inceptionai/jais-adapted-70b-chat`

The Jais adapted models are bilingual English-Arabic large language models (LLMs) that are trained adaptively from Llama-2 and optimized to excel in Arabic while having strong English capabilities. (website, blog)

Together

GPT-JT (6B) — `together/gpt-jt-6b-v1`

GPT-JT (6B parameters) is a fork of GPT-J (blog post).

GPT-NeoXT-Chat-Base (20B) — `together/gpt-neoxt-chat-base-20b`

GPT-NeoXT-Chat-Base (20B) is fine-tuned from GPT-NeoX, serving as a base model for developing open-source chatbots.

RedPajama-INCITE-Base-v1 (3B) — `together/redpajama-incite-base-3b-v1`

RedPajama-INCITE-Base-v1 (3B parameters) is a 3 billion base model that aims to replicate the LLaMA recipe as closely as possible.

RedPajama-INCITE-Instruct-v1 (3B) — `together/redpajama-incite-instruct-3b-v1`

RedPajama-INCITE-Instruct-v1 (3B parameters) is a model fine-tuned for few-shot applications on the data of GPT-JT. It is built from RedPajama-INCITE-Base-v1 (3B), a 3 billion base model that aims to replicate the LLaMA recipe as closely as possible.

RedPajama-INCITE-Base (7B) — `together/redpajama-incite-base-7b`

RedPajama-INCITE-Base (7B parameters) is a 7 billion base model that aims to replicate the LLaMA recipe as closely as possible.

RedPajama-INCITE-Instruct (7B) — `together/redpajama-incite-instruct-7b`

RedPajama-INCITE-Instruct (7B parameters) is a model fine-tuned for few-shot applications on the data of GPT-JT. It is built from RedPajama-INCITE-Base (7B), a 7 billion base model that aims to replicate the LLaMA recipe as closely as possible.

Upstage

Solar Pro Preview (22B) — `upstage/solar-pro-preview-instruct`

Solar Pro Preview (22B) is open-weights model for single GPU inference that is a preview of the upcoming Solar Pro model (blog).

Solar Pro — `upstage/solar-pro-241126`

Solar Pro is a LLM designed for instruction-following and processing structured formats like HTML and Markdown. It supports English, Korean, and Japanese and has domain expertise in Finance, Healthcare, and Legal. (blog).

Writer

Palmyra Base (5B) — `writer/palmyra-base`

Palmyra Base (5B)

Palmyra Large (20B) — `writer/palmyra-large`

Palmyra Large (20B)

Silk Road (35B) — `writer/silk-road`

Silk Road (35B)

Palmyra X (43B) — `writer/palmyra-x`

Palmyra-X (43B parameters) is trained to adhere to instructions using human feedback and utilizes a technique called multiquery attention. Furthermore, a new feature called 'self-instruct' has been introduced, which includes the implementation of an early stopping criteria specifically designed for minimal instruction tuning (paper).

Palmyra X V2 (33B) — `writer/palmyra-x-v2`

Palmyra-X V2 (33B parameters) is a Transformer-based model, which is trained on extremely large-scale pre-training data. The pre-training data more than 2 trillion tokens types are diverse and cover a wide range of areas, used FlashAttention-2.

Palmyra X V3 (72B) — `writer/palmyra-x-v3`

Palmyra-X V3 (72B parameters) is a Transformer-based model, which is trained on extremely large-scale pre-training data. It is trained via unsupervised learning and DPO and use multiquery attention.

Palmyra X-32K (33B) — `writer/palmyra-x-32k`

Palmyra-X-32K (33B parameters) is a Transformer-based model, which is trained on large-scale pre-training data. The pre-training data types are diverse and cover a wide range of areas. These data types are used in conjunction and the alignment mechanism to extend context window.

Palmyra-X-004 — `writer/palmyra-x-004`

Palmyra-X-004 language model with a large context window of up to 128,000 tokens that excels in processing and understanding complex tasks.

Palmyra X5 — `writer/palmyra-x5`

Palmyra X5 is a language model for enterprise that uses a Mixture of Experts (MoE) architecture and a hybrid attention mechanism that blends linear and softmax attention. (blog)

Palmyra X5 (Bedrock) — `writer/palmyra-x5-v1-bedrock`

Palmyra X5 is a language model for enterprise that uses a Mixture of Experts (MoE) architecture and a hybrid attention mechanism that blends linear and softmax attention. (blog) This is the model verison that is hosted on Bedrock. (blog)

Palmyra-Med 32K (70B) — `writer/palmyra-med-32k`

Palmyra-Med 32K (70B) is a model finetuned from Palmyra-X-003 intended for medical applications.

Palmyra Med — `writer/palmyra-med`

Palmyra Med is a model intended for medical applications.

Palmyra-Fin 32K (70B) — `writer/palmyra-fin-32k`

Palmyra-Fin 32K (70B) is a model finetuned from Palmyra-X-003 intended for financial applications.

Palmyra Fin — `writer/palmyra-fin`

Palmyra Fin is a financial LLM built using combining a well-curated set of financial training data with custom fine-tuning instruction data(blog).

xAI

Grok 3 Beta — `xai/grok-3-beta`

Grok 3 Beta is a model trained on xAI's Colossus supercluster with significant improvements in reasoning, mathematics, coding, world knowledge, and instruction-following tasks. (blog)

Grok 3 mini Beta — `xai/grok-3-mini-beta`

Grok 3 mini Beta is a model trained on xAI's Colossus supercluster with significant improvements in reasoning, mathematics, coding, world knowledge, and instruction-following tasks. (blog)

Grok 4 (0709) — `xai/grok-4-0709`

Grok 4 (0709) is a model that includes native tool use and real-time search integration. (blog)

Grok 4 Fast (Reasoning) — `xai/grok-4-fast-reasoning`

Grok 4 Fast (Reasoning) (blog)

Grok 4 Fast (Non-Reasoning) — `xai/grok-4-fast-non-reasoning`

Grok 4 Fast (Non-Reasoning) (blog)

Grok 4.1 Fast (Reasoning) — `xai/grok-4-1-fast-reasoning`

Grok 4.1 Fast (Reasoning) (blog)

Grok 4.1 Fast (Non-Reasoning) — `xai/grok-4-1-fast-non-reasoning`

Grok 4.1 Fast (Non-Reasoning) (blog)

Yandex

YaLM (100B) — `yandex/yalm`

YaLM (100B parameters) is an autoregressive language model trained on English and Russian text (GitHub).

MARITACA-AI

Sabia 7B — `maritaca-ai/sabia-7b`

Sabia 7B

Maritaca AI

Sabiazinho 3 — `maritaca-ai/sabiazinho-3`

Sabiazinho-3 is a decoder-only language model designed for Portuguese text generation and understanding tasks. It supports a long context window of up to 128,000 tokens and is offered via API with scalable rate limits. The model is trained on diverse Portuguese corpora with knowledge up to july 2023.

Sabía 3 — `maritaca-ai/sabia-3`

Sabiá-3 is a decoder-only language model designed for Portuguese text generation and understanding tasks. It supports a long context window of up to 128,000 tokens and is offered via API with scalable rate limits. The model is trained on diverse Portuguese corpora with knowledge up to july 2023.

Sabía 3.1 — `maritaca-ai/sabia-3.1-2025-05-08`

Sabiá-3.1 is a decoder-only language model designed for Portuguese text generation and understanding tasks. It supports a long context window of up to 128,000 tokens and is offered via API with scalable rate limits. The model is trained on diverse Portuguese corpora with knowledge up to August 2024.

Z.ai

GLM-4.5-Air-FP8 — `zai-org/glm-4.5-air-fp8`

GLM-4.5-Air-FP8 is a hybrid reasoning model designed to unify reasoning, coding, and agentic capabilities into a single model. It has 106 billion total parameters and 12 billion active parameters. The thinking mode is enabled by default. (blog)

IBM

Granite 3.0 base (2B) — `ibm-granite/granite-3.0-2b-base`

Granite-3.0-2B-Base is a decoder-only language model to support a variety of text-to-text generation tasks.

Granite 3.0 Instruct (2B) — `ibm-granite/granite-3.0-2b-instruct`

Granite-3.0-2B-Instruct is a 2B parameter model finetuned from Granite-3.0-2B-Base using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets.

Granite 3.0 instruct (8B) — `ibm-granite/granite-3.0-8b-instruct`

Granite-3.0-8B-Instruct is a 8B parameter model finetuned from Granite-3.0-8B-Base using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets.

Granite 3.0 base (8B) — `ibm-granite/granite-3.0-8b-base`

Granite-3.0-8B-Base is a decoder-only language model to support a variety of text-to-text generation tasks.

Granite 3.0 A800M instruct (3B) — `ibm-granite/granite-3.0-3b-a800m-instruct`

Granite-3.0-3B-A800M-Instruct is a 3B parameter model finetuned from Granite-3.0-3B-A800M-Base-4K using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets.

Granite 3.0 A800M base (3B) — `ibm-granite/granite-3.0-3b-a800m-base`

Granite-3.0-3B-A800M-Base is a decoder-only language model to support a variety of text-to-text generation tasks.

Granite 3.0 A400M instruct (1B) — `ibm-granite/granite-3.0-1b-a400m-instruct`

Granite-3.0-1B-A400M-Instruct is an 1B parameter model finetuned from Granite-3.0-1B-A400M-Base using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets.

Granite 3.0 A400M base (1B) — `ibm-granite/granite-3.0-1b-a400m-base`

Granite-3.0-1B-A400M-Base is a decoder-only language model to support a variety of text-to-text generation tasks. It is trained from scratch following a two-stage training strategy.

Granite 3.1 - 8B - Instruct — `ibm-granite/granite-3.1-8b-instruct`

Granite-3.1-8B-Instruct is a 8B parameter long-context instruct model finetuned from Granite-3.1-8B-Base using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets tailored for solving long context problems.

Granite 3.1 - 2B - Instruct — `ibm-granite/granite-3.1-2b-instruct`

Granite-3.1-2B-Instruct is a 2B parameter long-context instruct model finetuned from Granite-3.1-2B-Base using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets tailored for solving long context problems.

Granite 13b instruct v2 — `ibm/granite-13b-instruct-v2`

Granite Base (13B) Instruct V2.0 is a large decoder-only transformer model.The following features were used in the design of the model Decoder-only model

Granite 20b code instruct (8K) — `ibm/granite-20b-code-instruct-8k`

Granite-20B-Code-Base-8K is a decoder-only code model designed for code generative tasks (e.g., code generation, code explanation, code fixing, etc.). It is trained from scratch with a two-phase training strategy. In phase 1, our model is trained on 3 trillion tokens sourced from 116 programming languages, ensuring a comprehensive understanding of programming languages and syntax. In phase 2, our model is trained on 500 billion tokens with a carefully designed mixture of high-quality data from code and natural language domains to improve the models’ ability to reason and follow instructions.

Granite 34b code instruct — `ibm/granite-34b-code-instruct`

Granite Base (34B) Code Instruct is a 34B parameter model fine tuned from Granite-34B-Code-Base on a combination of permissively licensed instruction data to enhance instruction following capabilities including logical reasoning and problem-solving skills.

Granite 3b code instruct — `ibm/granite-3b-code-instruct`

Granite-3B-Code-Instruct-128K is a 3B parameter long-context instruct model fine tuned from Granite-3B-Code-Base-128K on a combination of permissively licensed data used in training the original Granite code instruct models, in addition to synthetically generated code instruction datasets tailored for solving long context problems. By exposing the model to both short and long context data, we aim to enhance its long-context capability without sacrificing code generation performance at short input context.

Granite 8b code instruct — `ibm/granite-8b-code-instruct`

Granite-8B-Code-Instruct-128K is a 8B parameter long-context instruct model fine tuned from Granite-8B-Code-Base-128K on a combination of permissively licensed data used in training the original Granite code instruct models, in addition to synthetically generated code instruction datasets tailored for solving long context problems. By exposing the model to both short and long context data, we aim to enhance its long-context capability without sacrificing code generation performance at short input context.

Granite 3.1 - 8B - Instruct — `ibm/granite-3.1-8b-instruct`

Granite-3.1-8B-Instruct is a 8B parameter long-context instruct model finetuned from Granite-3.1-8B-Base using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets tailored for solving long context problems.

Granite 3.1 - 2B - Instruct — `ibm/granite-3.1-2b-instruct`

Granite-3.1-2B-Instruct is a 2B parameter long-context instruct model finetuned from Granite-3.1-2B-Base using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets tailored for solving long context problems.

IBM Granite 3.3 8B Instruct — `ibm/granite-3.3-8b-instruct`

IBM Granite 3.3 8B Instruct is an 8-billion parameter 128K context length language model fine-tuned for improved reasoning and instruction-following capabilities. (model card)

IBM Granite 4.0 Small — `ibm/granite-4.0-h-small`

IBM Granite 4.0 Small is a hybrid model with 32B total parameters and 9B active parameters that uses the Mixture of Experts (MoE) routing strategy with Mamba-2 and Transformer-based self-attention components.

IBM Granite 4.0 Micro — `ibm/granite-4.0-micro`

IBM Granite 4.0 Micro is a dense Transformer model with 3B total parameters that provides an alternative option for users when Mamba2 support is not yet optimized.

IBM-GRANITE

Granite 3.1 - 8B - Base — `ibm-granite/granite-3.1-8b-base`

Granite-3.1-8B-Base extends the context length of Granite-3.0-8B-Base from 4K to 128K using a progressive training strategy by increasing the supported context length in increments while adjusting RoPE theta until the model has successfully adapted to desired length of 128K.

Granite 3.1 - 2B - Base — `ibm-granite/granite-3.1-2b-base`

Granite-3.1-2B-Base extends the context length of Granite-3.0-2B-Base from 4K to 128K using a progressive training strategy by increasing the supported context length in increments while adjusting RoPE theta until the model has successfully adapted to desired length of 128K.

Granite 3.1 - 3B - A800M - Instruct — `ibm-granite/granite-3.1-3b-a800m-instruct`

Granite-3.1-3B-A800M-Instruct is a 3B parameter long-context instruct model finetuned from Granite-3.1-3B-A800M-Base using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets tailored for solving long context problems.

Granite 3.1 - 3B - A800M - Base — `ibm-granite/granite-3.1-3b-a800m-base`

Granite-3.1-3B-A800M-Base extends the context length of Granite-3.0-3B-A800M-Base from 4K to 128K using a progressive training strategy by increasing the supported context length in increments while adjusting RoPE theta until the model has successfully adapted to desired length of 128K.

Granite 3.1 - 1B - A400M - Instruct — `ibm-granite/granite-3.1-1b-a400m-instruct`

Granite-3.1-1B-A400M-Instruct is a 8B parameter long-context instruct model finetuned from Granite-3.1-1B-A400M-Base using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets tailored for solving long context problems.

Granite 3.1 - 1B - A400M - Base — `ibm-granite/granite-3.1-1b-a400m-base`

Granite-3.1-1B-A400M-Base extends the context length of Granite-3.0-1B-A400M-Base from 4K to 128K using a progressive training strategy by increasing the supported context length in increments while adjusting RoPE theta until the model has successfully adapted to desired length of 128K.

URA

URA-Llama 2.1 (8B) — `ura-hcmut/ura-llama-2.1-8b`

URA-Llama 2.1 (8B) is a model trained on a large corpus of Vietnamese text data, including books, articles, and websites. It is designed to understand and generate Vietnamese text.

URA-Llama 2 (8B) — `ura-hcmut/ura-llama-2-8b`

URA-Llama 2 (8B) is a model trained on a large corpus of Vietnamese text data, including books, articles, and websites. It is designed to understand and generate Vietnamese text.

URA-Llama 7B (7B) — `ura-hcmut/ura-llama-7b`

URA-Llama 7B (7B) is a model trained on a large corpus of Vietnamese text data, including books, articles, and websites. It is designed to understand and generate Vietnamese text.

URA-Llama 13B (13B) — `ura-hcmut/ura-llama-13b`

URA-Llama 13B (13B) is a model trained on a large corpus of Vietnamese text data, including books, articles, and websites. It is designed to understand and generate Vietnamese text.

URA-Llama 70B (70B) — `ura-hcmut/ura-llama-70b`

URA-Llama 70B (70B) is a model trained on a large corpus of Vietnamese text data, including books, articles, and websites. It is designed to understand and generate Vietnamese text.

GemSUra 7B — `ura-hcmut/GemSUra-7B`

GemSUra 7B is a model trained on a large corpus of Vietnamese text data, including books, articles, and websites. It is designed to understand and generate Vietnamese text.

GemSUra 2B — `ura-hcmut/GemSUra-2B`

GemSUra 2B is a model trained on a large corpus of Vietnamese text data, including books, articles, and websites. It is designed to understand and generate Vietnamese text.

MixSUra — `ura-hcmut/MixSUra`

MixSUra is a model trained on a large corpus of Vietnamese text data, including books, articles, and websites. It is designed to understand and generate Vietnamese text. It is a mixture of experts model with 8 active experts.

ViLM

VinaLLaMa — `vilm/vinallama-7b-chat`

VinaLLaMa is a model trained on a large corpus of Vietnamese text data, including books, articles, and websites. It is designed to understand and generate Vietnamese text.

VinaLLaMa 2.7B — `vilm/vinallama-2.7b-chat`

VinaLLaMa 2.7B is a model trained on a large corpus of Vietnamese text data, including books, articles, and websites. It is designed to understand and generate Vietnamese text.

VietCuna 7B (v3) — `vilm/vietcuna-7b-v3`

VietCuna 7B is a model trained on a large corpus of Vietnamese text data, including books, articles, and websites. It is designed to understand and generate Vietnamese text.

VietCuna 3B (v2) — `vilm/vietcuna-3b-v2`

VietCuna 3B is a model trained on a large corpus of Vietnamese text data, including books, articles, and websites. It is designed to understand and generate Vietnamese text.

Quyen (v0.1) — `vilm/Quyen-v0.1`

Quyen is a model trained on a large corpus of Vietnamese text data, including books, articles, and websites. It is designed to understand and generate Vietnamese text.

Quyen Plus (v0.1) — `vilm/Quyen-Plus-v0.1`

Quyen Plus is a model trained on a large corpus of Vietnamese text data, including books, articles, and websites. It is designed to understand and generate Vietnamese text.

Quyen Pro (v0.1) — `vilm/Quyen-Pro-v0.1`

Quyen Pro is a model trained on a large corpus of Vietnamese text data, including books, articles, and websites. It is designed to understand and generate Vietnamese text.

Quyen Pro Max (v0.1) — `vilm/Quyen-Pro-Max-v0.1`

Quyen Pro Max is a model trained on a large corpus of Vietnamese text data, including books, articles, and websites. It is designed to understand and generate Vietnamese text.

Quyen Mini (v0.1) — `vilm/Quyen-Mini-v0.1`

Quyen Mini is a model trained on a large corpus of Vietnamese text data, including books, articles, and websites. It is designed to understand and generate Vietnamese text.

Quyen SE (v0.1) — `vilm/Quyen-SE-v0.1`

Quyen SE is a model trained on a large corpus of Vietnamese text data, including books, articles, and websites. It is designed to understand and generate Vietnamese text.

Viet-Mistral

Vistral 7B Chat — `Viet-Mistral/Vistral-7B-Chat`

Vistral 7B Chat is a model trained on a large corpus of Vietnamese text data, including books, articles, and websites. It is designed to understand and generate Vietnamese text.

VinAI

PhoGPT 7B5 Instruct — `vinai/PhoGPT-7B5-Instruct`

PhoGPT 7B5 Instruct is a model trained on a large corpus of Vietnamese text data, including books, articles, and websites. It is designed to understand and generate Vietnamese text.

PhoGPT 4B Chat — `vinai/PhoGPT-4B-Chat`

PhoGPT 4B Chat is a model trained on a large corpus of Vietnamese text data, including books, articles, and websites. It is designed to understand and generate Vietnamese text.

CEIA-UFG

Gemma-3 Gaia PT-BR 4b Instruct — `CEIA-UFG/Gemma-3-Gaia-PT-BR-4b-it`

Gemma-3 Gaia PT-BR 4b Instruct is a model trained by CEIA-UFG for understanding and generating Brazilian Portuguese text.

Recogna NLP

Bode 13B Alpaca PT-BR — `recogna-nlp/bode-13b-alpaca-pt-br-no-peft`

Bode is a language model (LLM) for Portuguese, based on LLaMA 2 and fine-tuned with the Alpaca dataset translated into Portuguese. Suitable for instruction, text generation, translation and tasks in Portuguese.

22h

Cabrita PT-BR 7B — `22h/cabrita_7b_pt_850000`

Cabrita is an OpenLLaMA-based model, continuously trained in Portuguese (mC4-pt subset) for 850000 steps with efficient tokenization adapted to the language.

PORTULAN (University of Lisbon NLX)

Gervásio PT-BR/PT-PT 7B Decoder — `PORTULAN/gervasio-7b-portuguese-ptbr-decoder`

Gervásio PT* is a 7B parameter decoder model, adapted from LLaMA27B, trained for both Brazilian and European Portuguese. Fine-tuned with translated data from benchmarks such as GLUE and SuperGLUE.

TucanoBR (University of Bonn)

Tucano PT-BR 2b4 — `TucanoBR/Tucano-2b4`

Tucano is a series of decoder models based on LLaMA2, natively pre-trained in Portuguese using the GigaVerbo dataset (200B tokens), with the 2B model trained for 1.96M steps over 845h (515B tokens, 4 epochs).

Nicholas Kluge.

TeenyTinyLlama 460M PT-BR — `nicholasKluge/TeenyTinyLlama-460m`

TeenyTinyLlama-460m is a lightweight and efficient model based on LLaMA2, trained exclusively on Brazilian Portuguese. It uses RoPE embeddings and SwiGLU activations, with a refined SentencePiece tokenizer and a low-resource optimized architecture.

BigCode

SantaCoder (1.1B) — `bigcode/santacoder`

SantaCoder (1.1B parameters) model trained on the Python, Java, and JavaScript subset of The Stack (v1.1) (model card).

StarCoder (15.5B) — `bigcode/starcoder`

The StarCoder (15.5B parameter) model trained on 80+ programming languages from The Stack (v1.2) (model card).

Google

Codey PaLM-2 (Bison) — `google/code-bison@001`

A model fine-tuned to generate code based on a natural language description of the desired code. PaLM 2 (Pathways Language Model) is a Transformer-based model trained using a mixture of objectives that was evaluated on English and multilingual language, and reasoning tasks. (report)

Codey PaLM-2 (Bison) — `google/code-bison@002`

A model fine-tuned to generate code based on a natural language description of the desired code. PaLM 2 (Pathways Language Model) is a Transformer-based model trained using a mixture of objectives that was evaluated on English and multilingual language, and reasoning tasks. (report)

Codey PaLM-2 (Bison) — `google/code-bison-32k`

Codey with a 32K context. PaLM 2 (Pathways Language Model) is a Transformer-based model trained using a mixture of objectives that was evaluated on English and multilingual language, and reasoning tasks. (report)

Vision-Language Models

Aleph Alpha

Luminous Base (13B) — `AlephAlpha/luminous-base`

Luminous Base (13B parameters) (docs)

Luminous Extended (30B) — `AlephAlpha/luminous-extended`

Luminous Extended (30B parameters) (docs)

Anthropic

Claude 3 Haiku (20240307) — `anthropic/claude-3-haiku-20240307`

Claude 3 is a a family of models that possess vision and multilingual capabilities. They were trained with various methods such as unsupervised learning and Constitutional AI (blog).

Claude 3 Sonnet (20240229) — `anthropic/claude-3-sonnet-20240229`

Claude 3 is a a family of models that possess vision and multilingual capabilities. They were trained with various methods such as unsupervised learning and Constitutional AI (blog).

Claude 3 Opus (20240229) — `anthropic/claude-3-opus-20240229`

Claude 3 is a a family of models that possess vision and multilingual capabilities. They were trained with various methods such as unsupervised learning and Constitutional AI (blog).

Claude 3.5 Sonnet (20240620) — `anthropic/claude-3-5-sonnet-20240620`

Claude 3.5 Sonnet is a Claude 3 family model which outperforms Claude 3 Opus while operating faster and at a lower cost. (blog)

Claude 3.5 Sonnet (20241022) — `anthropic/claude-3-5-sonnet-20241022`

Claude 3.5 Sonnet is a Claude 3 family model which outperforms Claude 3 Opus while operating faster and at a lower cost (blog). This is an upgraded snapshot released on 2024-10-22 (blog).

Claude 3.7 Sonnet (20250219) — `anthropic/claude-3-7-sonnet-20250219`

Claude 3.7 Sonnet is a Claude 3 family hybrid reasoning model that can produce near-instant responses or extended, step-by-step thinking that is made visible to the user (blog).

Claude 3.7 Sonnet (20250219, extended thinking) — `anthropic/claude-3-7-sonnet-20250219-thinking-10k`

Claude 3.7 Sonnet is a Claude 3 family hybrid reasoning model that can produce near-instant responses or extended, step-by-step thinking that is made visible to the user (blog). Extended thinking is enabled with 10k budget tokens.

Claude 4 Sonnet (20250514) — `anthropic/claude-sonnet-4-20250514`

Claude 4 Sonnet is a hybrid model offering two modes - near-instant responses and extended thinking for deeper reasoning (blog).

Claude 4 Sonnet (20250514, extended thinking) — `anthropic/claude-sonnet-4-20250514-thinking-10k`

Claude 4 Sonnet is a hybrid model offering two modes - near-instant responses and extended thinking for deeper reasoning (blog). Extended thinking is enabled with 10k budget tokens.

Claude 4 Opus (20250514) — `anthropic/claude-opus-4-20250514`

Claude 4 Opus is a hybrid model offering two modes - near-instant responses and extended thinking for deeper reasoning (blog).

Claude 4 Opus (20250514, extended thinking) — `anthropic/claude-opus-4-20250514-thinking-10k`

Claude 4 Opus is a hybrid model offering two modes - near-instant responses and extended thinking for deeper reasoning (blog). Extended thinking is enabled with 10k budget tokens.

Claude 4.5 Sonnet (20250929) — `anthropic/claude-sonnet-4-5-20250929`

Claude 4.5 Sonnet is a model from Anthropic that shows particular strengths in software coding, in agentic tasks where it runs in a loop and uses tools, and in using computers. (blog, system card)

Claude 4.5 Haiku (20251001) — `anthropic/claude-haiku-4-5-20251001`

Claude 4.5 Haiku is a hybrid model from Anthropic in their small, fast model class that is particularly effective at coding tasks and computer use. (blog, system card)

Claude 4.5 Opus (20251124) — `anthropic/claude-opus-4-5-20251124`

Claude 4.5 Opus is Anthropic's most intelligent model to date and sets a new standard across coding, agents, computer use, and enterprise workflows. (blog)

Claude 4.6 Sonnet — `anthropic/claude-sonnet-4-6`

Claude 4.6 Sonnet is a Sonnet model from Anthropic that upgrades Sonnet's skills across coding, computer use, long-context reasoning, agent planning, knowledge work, and design. (blog, system card)

Claude 4.6 Opus — `anthropic/claude-opus-4-6`

Claude 4.6 Opus is a large language model from Anthropic with strong capabilities in software engineering, agentic tasks, and long context reasoning, as well as in knowledge work. (blog, system card)

Claude 4.7 Opus — `anthropic/claude-opus-4-7`

Claude 4.7 Opus is a large language model with particular skills in areas such as software engineering, knowledge work, agentic tool use, and computer use. (blog, system card)

Claude 3.7 Sonnet (20250219) (DSPy Zero-Shot Predict) — `anthropic/claude-3-7-sonnet-20250219-dspy-zs-predict`

Claude 3.7 Sonnet is a Claude 3 family hybrid reasoning model that can produce near-instant responses or extended, step-by-step thinking that is made visible to the user (blog).

Claude 3.7 Sonnet (20250219) (DSPy Zero-Shot ChainOfThought) — `anthropic/claude-3-7-sonnet-20250219-dspy-zs-cot`

Claude 3.7 Sonnet is a Claude 3 family hybrid reasoning model that can produce near-instant responses or extended, step-by-step thinking that is made visible to the user (blog).

Claude 3.7 Sonnet (20250219) (DSPy BootstrapFewShotWithRandomSearch) — `anthropic/claude-3-7-sonnet-20250219-dspy-fs-bfrs`

Claude 3.7 Sonnet is a Claude 3 family hybrid reasoning model that can produce near-instant responses or extended, step-by-step thinking that is made visible to the user (blog).

Claude 3.7 Sonnet (20250219) (DSPy MIPROv2) — `anthropic/claude-3-7-sonnet-20250219-dspy-fs-miprov2`

Claude 3.7 Sonnet is a Claude 3 family hybrid reasoning model that can produce near-instant responses or extended, step-by-step thinking that is made visible to the user (blog).

Google

Gemini Pro Vision — `google/gemini-pro-vision`

Gemini Pro Vision is a multimodal model able to reason across text, images, video, audio and code. (paper)

Gemini 1.0 Pro Vision — `google/gemini-1.0-pro-vision-001`

Gemini 1.0 Pro Vision is a multimodal model able to reason across text, images, video, audio and code. (paper)

Gemini 1.5 Pro (001) — `google/gemini-1.5-pro-001`

Gemini 1.5 Pro is a multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from long contexts. This model is accessed through Vertex AI and has all safety thresholds set to BLOCK_NONE. (paper)

Gemini 1.5 Flash (001) — `google/gemini-1.5-flash-001`

Gemini 1.5 Flash is a multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from long contexts. This model is accessed through Vertex AI and has all safety thresholds set to BLOCK_NONE. (paper)

Gemini 1.5 Pro (0409 preview) — `google/gemini-1.5-pro-preview-0409`

Gemini 1.5 Pro is a multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from long contexts. This model is accessed through Vertex AI and has all safety thresholds set to BLOCK_NONE. (paper)

Gemini 1.5 Pro (0514 preview) — `google/gemini-1.5-pro-preview-0514`

Gemini 1.5 Pro is a multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from long contexts. This model is accessed through Vertex AI and has all safety thresholds set to BLOCK_NONE. (paper)

Gemini 1.5 Flash (0514 preview) — `google/gemini-1.5-flash-preview-0514`

Gemini 1.5 Flash is a smaller Gemini model. It has a 1 million token context window and allows interleaving text, images, audio and video as inputs. This model is accessed through Vertex AI and has all safety thresholds set to BLOCK_NONE. (blog)

Gemini 1.5 Pro (001, default safety) — `google/gemini-1.5-pro-001-safety-default`

Gemini 1.5 Pro is a multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from long contexts. This model is accessed through Vertex AI and uses default safety settings. (paper)

Gemini 1.5 Pro (001, BLOCK_NONE safety) — `google/gemini-1.5-pro-001-safety-block-none`

Gemini 1.5 Pro is a multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from long contexts. This model is accessed through Vertex AI and has all safety thresholds set to BLOCK_NONE. (paper)

Gemini 1.5 Flash (001, default safety) — `google/gemini-1.5-flash-001-safety-default`

Gemini 1.5 Flash is a multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from long contexts. This model is accessed through Vertex AI and uses default safety settings. (paper)

Gemini 1.5 Flash (001, BLOCK_NONE safety) — `google/gemini-1.5-flash-001-safety-block-none`

Gemini 1.5 Flash is a multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from long contexts. This model is accessed through Vertex AI and has all safety thresholds set to BLOCK_NONE. (paper)

Gemini 1.5 Pro (002) — `google/gemini-1.5-pro-002`

Gemini 1.5 Pro is a multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from long contexts. This model is accessed through Vertex AI and has all safety thresholds set to BLOCK_NONE. (paper)

Gemini 1.5 Flash (002) — `google/gemini-1.5-flash-002`

Gemini 1.5 Flash is a multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from long contexts. This model is accessed through Vertex AI and has all safety thresholds set to BLOCK_NONE. (paper)

Gemini 2.0 Flash (Experimental) — `google/gemini-2.0-flash-exp`

Gemini 2.0 Flash (Experimental) is a Gemini model that supports multimodal inputs like images, video and audio, as well as multimodal output like natively generated images mixed with text and steerable text-to-speech (TTS) multilingual audio. (blog)

Gemini 1.5 Flash 8B — `google/gemini-1.5-flash-8b-001`

Gemini 1.5 Flash-8B is a small model designed for lower intelligence tasks. (documentation)

Gemini 2.0 Flash — `google/gemini-2.0-flash-001`

Gemini 2.0 Flash is a member of the Gemini 2.0 series of models, a suite of highly-capable, natively multimodal models designed to power agentic systems. (model card, documentation)

Gemini 2.0 Flash Lite (02-05 preview) — `google/gemini-2.0-flash-lite-preview-02-05`

Gemini 2.0 Flash Lite (02-05 preview) (model card, documentation)

Gemini 2.0 Flash Lite — `google/gemini-2.0-flash-lite-001`

Gemini 2.0 Flash Lite is the fastest and most cost efficient Flash model in the Gemini 2.0 series of models, a suite of highly-capable, natively multimodal models designed to power agentic systems. (model card, documentation)

Gemini 2.0 Flash Thinking (01-21 preview) — `google/gemini-2.0-flash-thinking-exp-01-21`

Gemini 2.0 Flash Thinking (01-21 preview) (documentation)

Gemini 2.0 Pro (02-05 preview) — `google/gemini-2.0-pro-exp-02-05`

Gemini 2.0 Pro (02-05 preview) (documentation)

Gemini 2.5 Flash-Lite (thinking disabled) — `google/gemini-2.5-flash-lite-thinking-disabled`

Gemini 2.5 Flash-Lite with thinking disabled (blog)

Gemini 2.5 Flash-Lite — `google/gemini-2.5-flash-lite`

Gemini 2.5 Flash-Lite (blog)

Gemini 2.5 Flash (thinking disabled) — `google/gemini-2.5-flash-thinking-disabled`

Gemini 2.5 Flash with thinking disabled (documentation)

Gemini 2.5 Flash — `google/gemini-2.5-flash`

Gemini 2.5 Flash (documentation)

Gemini 2.5 Pro — `google/gemini-2.5-pro`

Gemini 2.5 Pro (documentation)

Gemini 3 Pro (Preview) — `google/gemini-3-pro-preview`

Gemini 3.0 models are thinking models, capable of reasoning through their thoughts before responding, resulting in enhanced performance and improved accuracy. (blog, blog)

Gemini 3.1 Pro (Preview) — `google/gemini-3.1-pro-preview`

Gemini 3.1 Pro is the next iteration in the Gemini 3 series of models, a suite of highly capable, natively multimodal reasoning models. (blog, model card)

Gemini 3 Flash (Preview) — `google/gemini-3-flash-preview`

Gemini 3.0 models are thinking models, capable of reasoning through their thoughts before responding, resulting in enhanced performance and improved accuracy. (blog)

Gemini 3.1 Flash-Lite (Preview) — `google/gemini-3.1-flash-lite-preview`

Gemini 3.1 Flash-Lite (Preview) is the fastest and most cost-efficient Gemini 3 series model. (blog)

Gemini Robotics-ER 1.5 — `google/gemini-robotics-er-1.5-preview`

Gemini Robotics-ER 1.5 is a vision-language model (VLM) designed for advanced reasoning in the physical world, allowing robots to interpret complex visual data, perform spatial reasoning, and plan actions from natural language commands.

PaliGemma (3B) Mix 224 — `google/paligemma-3b-mix-224`

PaliGemma is a versatile and lightweight vision-language model (VLM) inspired by PaLI-3 and based on open components such as the SigLIP vision model and the Gemma language model. Pre-trained with 224x224 input images and 128 token input/output text sequences. Finetuned on a mixture of downstream academic datasets. (blog)

PaliGemma (3B) Mix 448 — `google/paligemma-3b-mix-448`

PaliGemma is a versatile and lightweight vision-language model (VLM) inspired by PaLI-3 and based on open components such as the SigLIP vision model and the Gemma language model. Pre-trained with 448x448 input images and 512 token input/output text sequences. Finetuned on a mixture of downstream academic datasets. (blog)

Gemini 2.0 Flash (DSPy Zero-Shot Predict) — `google/gemini-2.0-flash-001-dspy-zs-predict`

Gemini 2.0 Flash is a member of the Gemini 2.0 series of models, a suite of highly-capable, natively multimodal models designed to power agentic systems. (model card, documentation)

Gemini 2.0 Flash (DSPy Zero-Shot ChainOfThought) — `google/gemini-2.0-flash-001-dspy-zs-cot`

Gemini 2.0 Flash is a member of the Gemini 2.0 series of models, a suite of highly-capable, natively multimodal models designed to power agentic systems. (model card, documentation)

Gemini 2.0 Flash (DSPy BootstrapFewShotWithRandomSearch) — `google/gemini-2.0-flash-001-dspy-fs-bfrs`

Gemini 2.0 Flash is a member of the Gemini 2.0 series of models, a suite of highly-capable, natively multimodal models designed to power agentic systems. (model card, documentation)

Gemini 2.0 Flash (DSPy MIPROv2) — `google/gemini-2.0-flash-001-dspy-fs-miprov2`

Gemini 2.0 Flash is a member of the Gemini 2.0 series of models, a suite of highly-capable, natively multimodal models designed to power agentic systems. (model card, documentation)

HuggingFace

IDEFICS 2 (8B) — `HuggingFaceM4/idefics2-8b`

IDEFICS 2 (8B parameters) is an open multimodal model that accepts arbitrary sequences of image and text inputs and produces text outputs. (blog).

IDEFICS (9B) — `HuggingFaceM4/idefics-9b`

IDEFICS (9B parameters) is an open-source model based on DeepMind's Flamingo (blog).

IDEFICS-instruct (9B) — `HuggingFaceM4/idefics-9b-instruct`

IDEFICS-instruct (9B parameters) is the instruction-tuned version of IDEFICS 9B (blog).

IDEFICS (80B) — `HuggingFaceM4/idefics-80b`

IDEFICS (80B parameters) is an open-source model based on DeepMind's Flamingo (blog).

IDEFICS-instruct (80B) — `HuggingFaceM4/idefics-80b-instruct`

IDEFICS-instruct (80B parameters) is the instruction-tuned version of IDEFICS 80B (blog).

Microsoft

LLaVA 1.5 (7B) — `microsoft/llava-1.5-7b-hf`

LLaVa is an open-source chatbot trained by fine-tuning LlamA/Vicuna on GPT-generated multimodal instruction-following data. (paper)

LLaVA 1.5 (13B) — `microsoft/llava-1.5-13b-hf`

LLaVa is an open-source chatbot trained by fine-tuning LlamA/Vicuna on GPT-generated multimodal instruction-following data. (paper)

LLaVA 1.6 (7B) — `uw-madison/llava-v1.6-vicuna-7b-hf`

LLaVa is an open-source chatbot trained by fine-tuning LlamA/Vicuna on GPT-generated multimodal instruction-following data. (paper)

LLaVA 1.6 (13B) — `uw-madison/llava-v1.6-vicuna-13b-hf`

LLaVa is an open-source chatbot trained by fine-tuning LlamA/Vicuna on GPT-generated multimodal instruction-following data. (paper)

LLaVA 1.6 + Mistral (7B) — `uw-madison/llava-v1.6-mistral-7b-hf`

LLaVa is an open-source chatbot trained by fine-tuning LlamA/Vicuna on GPT-generated multimodal instruction-following data. (paper)

LLaVA + Nous-Hermes-2-Yi-34B (34B) — `uw-madison/llava-v1.6-34b-hf`

LLaVa is an open-source chatbot trained by fine-tuning LlamA/Vicuna on GPT-generated multimodal instruction-following data. (paper)

OpenFlamingo

OpenFlamingo (9B) — `openflamingo/OpenFlamingo-9B-vitl-mpt7b`

OpenFlamingo is an open source implementation of DeepMind's Flamingo models. This 9B-parameter model uses a CLIP ViT-L/14 vision encoder and MPT-7B language model (paper).

KAIST AI

LLaVA + Vicuna-v1.5 (13B) — `kaistai/prometheus-vision-13b-v1.0-hf`

LLaVa is an open-source chatbot trained by fine-tuning LlamA/Vicuna on GPT-generated multimodal instruction-following data. (paper)

Mistral AI

BakLLaVA v1 (7B) — `mistralai/bakLlava-v1-hf`

BakLLaVA v1 is a Mistral 7B base augmented with the LLaVA 1.5 architecture. (blog)

Mistral Pixtral (2409) — `mistralai/pixtral-12b-2409`

Mistral Pixtral 12B is the first multimodal Mistral model for image understanding. (blog)

Mistral Pixtral Large (2411) — `mistralai/pixtral-large-2411`

Mistral Pixtral Large is a 124B open-weights multimodal model built on top of Mistral Large 2 (2407). (blog)

OpenAI

GPT-4 Turbo (2024-04-09) — `openai/gpt-4-turbo-2024-04-09`

GPT-4 Turbo (2024-04-09) is a large multimodal model that is optimized for chat but works well for traditional completions tasks. The model is cheaper and faster than the original GPT-4 model. Snapshot from 2024-04-09.

GPT-4o (2024-05-13) — `openai/gpt-4o-2024-05-13`

GPT-4o (2024-05-13) is a large multimodal model that accepts as input any combination of text, audio, and image and generates any combination of text, audio, and image outputs. (blog)

GPT-4o (2024-08-06) — `openai/gpt-4o-2024-08-06`

GPT-4o (2024-08-06) is a large multimodal model that accepts as input any combination of text, audio, and image and generates any combination of text, audio, and image outputs. (blog)

GPT-4o (2024-11-20) — `openai/gpt-4o-2024-11-20`

GPT-4o (2024-11-20) is a large multimodal model that accepts as input any combination of text, audio, and image and generates any combination of text, audio, and image outputs. (blog)

GPT-4o mini (2024-07-18) — `openai/gpt-4o-mini-2024-07-18`

GPT-4o mini (2024-07-18) is a multimodal model with a context window of 128K tokens and improved handling of non-English text. (blog)

GPT-4.1 (2025-04-14) — `openai/gpt-4.1-2025-04-14`

GPT-4.1 (2025-04-14) is a multimdodal model in the GPT-4.1 family, which outperforms the GPT-4o family, with major gains in coding and instruction following. They also have larger context windows of 1 million tokens and are able to better use that context with improved long-context comprehension. (blog)

GPT-4.1 mini (2025-04-14) — `openai/gpt-4.1-mini-2025-04-14`

GPT-4.1 mini (2025-04-14) is a multimdodal model in the GPT-4.1 family, which outperforms the GPT-4o family, with major gains in coding and instruction following. They also have larger context windows of 1 million tokens and are able to better use that context with improved long-context comprehension. (blog)

GPT-4.1 nano (2025-04-14) — `openai/gpt-4.1-nano-2025-04-14`

GPT-4.1 nano (2025-04-14) is a multimdodal model in the GPT-4.1 family, which outperforms the GPT-4o family, with major gains in coding and instruction following. They also have larger context windows of 1 million tokens and are able to better use that context with improved long-context comprehension. (blog)

GPT-5 (2025-08-07) — `openai/gpt-5-2025-08-07`

GPT-5 (2025-08-07) is a multimdodal model trained for real-world coding tasks and long-running agentic tasks. (blog, system card)

GPT-5 mini (2025-08-07) — `openai/gpt-5-mini-2025-08-07`

GPT-5 mini (2025-08-07) is a multimdodal model trained for real-world coding tasks and long-running agentic tasks. (blog, system card)

GPT-5 nano (2025-08-07) — `openai/gpt-5-nano-2025-08-07`

GPT-5 nano (2025-08-07) is a multimdodal model trained for real-world coding tasks and long-running agentic tasks. (blog, system card)

GPT-5.4 Pro (2026-03-05) — `openai/gpt-5.4-pro-2026-03-05`

GPT-5.4 Pro (2026-03-05) is a model in the GPT-5 model family that incorporates recent advances in reasoning, coding, and agentic workflows. (blog)

GPT-5.4 (2026-03-05) — `openai/gpt-5.4-2026-03-05`

GPT-5.4 (2026-03-05) is a model in the GPT-5 model family that incorporates recent advances in reasoning, coding, and agentic workflows. (blog)

GPT-5.4 mini (2026-03-17) — `openai/gpt-5.4-mini-2026-03-17`

GPT-5.4 mini (2026-03-17) is a more efficient model designed for high-volume workloads. (blog)

GPT-5.4 nano (2026-03-17) — `openai/gpt-5.4-nano-2026-03-17`

GPT-5.4 nano (2026-03-17) is a more efficient model designed for high-volume workloads. (blog)

GPT-5.2 (2025-12-11) — `openai/gpt-5.2-2025-12-11`

GPT-5.2 (2025-12-11) is a model in the GPT-5 model family that is intended for coding and agentic tasks across industries. (blog)

GPT-5.1 (2025-11-13) — `openai/gpt-5.1-2025-11-13`

GPT-5.1 (2025-11-13) is a model in the GPT-5 model family, and has similar training for code generation, bug fixing, refactoring, instruction following, long context and tool calling. (blog)

GPT-4V (1106 preview) — `openai/gpt-4-vision-preview`

GPT-4V is a large multimodal model that accepts both text and images and is optimized for chat (model card).

GPT-4V (1106 preview) — `openai/gpt-4-1106-vision-preview`

GPT-4V is a large multimodal model that accepts both text and images and is optimized for chat (model card).

GPT-4.5 (2025-02-27 preview) — `openai/gpt-4.5-preview-2025-02-27`

GPT-4.5 (2025-02-27 preview) is a large multimodal model that is designed to be more general-purpose than OpenAI's STEM-focused reasoning models. It was trained using new supervision techniques combined with traditional methods like supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF). (blog, system card)

o1 pro (2025-03-19) — `openai/o1-pro-2025-03-19`

o1 is a new large language model trained with reinforcement learning to perform complex reasoning. (model card, blog post)

o1 pro (2025-03-19, low reasoning effort) — `openai/o1-pro-2025-03-19-low-reasoning-effort`

o1 is a new large language model trained with reinforcement learning to perform complex reasoning. (model card, blog post) The requests' reasoning effort parameter in is set to low.

o1 pro (2025-03-19, high reasoning effort) — `openai/o1-pro-2025-03-19-high-reasoning-effort`

o1 is a new large language model trained with reinforcement learning to perform complex reasoning. (model card, blog post) The requests' reasoning effort parameter in is set to high.

o1 (2024-12-17) — `openai/o1-2024-12-17`

o1 is a new large language model trained with reinforcement learning to perform complex reasoning. (model card, blog post)

o1 (2024-12-17, low reasoning effort) — `openai/o1-2024-12-17-low-reasoning-effort`

o1 is a new large language model trained with reinforcement learning to perform complex reasoning. (model card, blog post) The requests' reasoning effort parameter in is set to low.

o1 (2024-12-17, high reasoning effort) — `openai/o1-2024-12-17-high-reasoning-effort`

o1 is a new large language model trained with reinforcement learning to perform complex reasoning. (model card, blog post) The requests' reasoning effort parameter in is set to high.

o3 (2025-04-16) — `openai/o3-2025-04-16`

o3 is a reasoning model for math, science, coding, and visual reasoning tasks. (blog post)

o3 (2025-04-16, low reasoning effort) — `openai/o3-2025-04-16-low-reasoning-effort`

o3 is a reasoning model for math, science, coding, and visual reasoning tasks. (blog post)

o3 (2025-04-16, high reasoning effort) — `openai/o3-2025-04-16-high-reasoning-effort`

o3 is a reasoning model for math, science, coding, and visual reasoning tasks. (blog post)

o4-mini (2025-04-16) — `openai/o4-mini-2025-04-16`

o4-mini is an o-series model optimized for fast, effective reasoning with exceptionally efficient performance in coding and visual tasks. (blog post)

o4-mini (2025-04-16, low reasoning effort) — `openai/o4-mini-2025-04-16-low-reasoning-effort`

o4-mini is an o-series model optimized for fast, effective reasoning with exceptionally efficient performance in coding and visual tasks. (blog post)

o4-mini (2025-04-16, high reasoning effort) — `openai/o4-mini-2025-04-16-high-reasoning-effort`

o4-mini is an o-series model optimized for fast, effective reasoning with exceptionally efficient performance in coding and visual tasks. (blog post)

o3-pro (2025-06-10, high reasoning effort) — `openai/o3-pro-2025-06-10-high-reasoning-effort`

o3-pro is an o-series model designed to think longer and provide the most reliable responses. (blog post)

GPT-4o (2024-05-13) (DSPy Zero-Shot Predict) — `openai/gpt-4o-2024-05-13-dspy-zs-predict`

GPT-4o (2024-05-13) is a large multimodal model that accepts as input any combination of text, audio, and image and generates any combination of text, audio, and image outputs. (blog)

GPT-4o (2024-05-13) (DSPy Zero-Shot ChainOfThought) — `openai/gpt-4o-2024-05-13-dspy-zs-cot`

GPT-4o (2024-05-13) is a large multimodal model that accepts as input any combination of text, audio, and image and generates any combination of text, audio, and image outputs. (blog)

GPT-4o (2024-05-13) (DSPy BootstrapFewShotWithRandomSearch) — `openai/gpt-4o-2024-05-13-dspy-fs-bfrs`

GPT-4o (2024-05-13) is a large multimodal model that accepts as input any combination of text, audio, and image and generates any combination of text, audio, and image outputs. (blog)

GPT-4o (2024-05-13) (DSPy MIPROv2) — `openai/gpt-4o-2024-05-13-dspy-fs-miprov2`

GPT-4o (2024-05-13) is a large multimodal model that accepts as input any combination of text, audio, and image and generates any combination of text, audio, and image outputs. (blog)

Alibaba Cloud

Qwen-VL — `qwen/qwen-vl`

Visual multimodal version of the Qwen large language model series (paper).

Qwen-VL Chat — `qwen/qwen-vl-chat`

Chat version of Qwen-VL (paper).

Qwen2.5-Omni (7B) — `qwen/qwen2.5-omni-7b`

The new flagship end-to-end multimodal model in the Qwen series that can process inputs including text, images, audio, and video (paper).

Alibaba Group

Qwen2-VL Instruct (7B) — `qwen/qwen2-vl-7b-instruct`

The second generation of Qwen2-VL models (paper).

Qwen2-VL Instruct (72B) — `qwen/qwen2-vl-72b-instruct`

The second generation of Qwen2-VL models (paper).

Qwen2.5-VL Instruct (3B) — `qwen/qwen2.5-vl-3b-instruct`

The second generation of Qwen2.5-VL models (blog).

Qwen2.5-VL Instruct (7B) — `qwen/qwen2.5-vl-7b-instruct`

The second generation of Qwen2.5-VL models (blog).

Qwen2.5-VL Instruct (32B) — `qwen/qwen2.5-vl-32b-instruct`

The second generation of Qwen2.5-VL models (blog).

Qwen2.5-VL Instruct (72B) — `qwen/qwen2.5-vl-72b-instruct`

The second generation of Qwen2.5-VL models (blog).

Writer

Palmyra Vision 003 — `writer/palmyra-vision-003`

Palmyra Vision 003 (internal only)

Reka AI

Reka-Core — `reka/reka-core`

Reka-Core

Reka-Core-20240415 — `reka/reka-core-20240415`

Reka-Core-20240415

Reka-Core-20240501 — `reka/reka-core-20240501`

Reka-Core-20240501

Reka-Flash (21B) — `reka/reka-flash`

Reka-Flash (21B)

Reka-Flash-20240226 (21B) — `reka/reka-flash-20240226`

Reka-Flash-20240226 (21B)

Reka-Edge (7B) — `reka/reka-edge`

Reka-Edge (7B)

Reka-Edge-20240208 (7B) — `reka/reka-edge-20240208`

Reka-Edge-20240208 (7B)

Text-to-image Models

Adobe

GigaGAN (1B) — `adobe/giga-gan`

GigaGAN is a GAN model that produces high-quality images extremely quickly. The model was trained on text and image pairs from LAION2B-en and COYO-700M. (paper).

Aleph Alpha

MultiFusion (13B) — `AlephAlpha/m-vader`

MultiFusion is a multimodal, multilingual diffusion model that extend the capabilities of Stable Diffusion v1.4 by integrating different pre-trained modules, which transfers capabilities to the downstream model (paper)

Craiyon

DALL-E mini (0.4B) — `craiyon/dalle-mini`

DALL-E mini is an open-source text-to-image model that attempt to reproduce OpenAI's DALL-E 1 (code).

DALL-E mega (2.6B) — `craiyon/dalle-mega`

DALL-E mega is an open-source text-to-image model that attempt to reproduce OpenAI's DALL-E 1 (code).

DeepFloyd

DeepFloyd IF Medium (0.4B) — `DeepFloyd/IF-I-M-v1.0`

DeepFloyd-IF is a pixel-based text-to-image triple-cascaded diffusion model with state-of-the-art photorealism and language understanding (paper coming soon).

DeepFloyd IF Large (0.9B) — `DeepFloyd/IF-I-L-v1.0`

DeepFloyd-IF is a pixel-based text-to-image triple-cascaded diffusion model with state-of-the-art photorealism and language understanding (paper coming soon).

DeepFloyd IF X-Large (4.3B) — `DeepFloyd/IF-I-XL-v1.0`

DeepFloyd-IF is a pixel-based text-to-image triple-cascaded diffusion model with state-of-the-art photorealism and language understanding (paper coming soon).

dreamlike.art

Dreamlike Diffusion v1.0 (1B) — `huggingface/dreamlike-diffusion-v1-0`

Dreamlike Diffusion v1.0 is Stable Diffusion v1.5 fine tuned on high quality art (HuggingFace model card)

Dreamlike Photoreal v2.0 (1B) — `huggingface/dreamlike-photoreal-v2-0`

Dreamlike Photoreal v2.0 is a photorealistic model based on Stable Diffusion v1.5 (HuggingFace model card)

PromptHero

Openjourney (1B) — `huggingface/openjourney-v1-0`

Openjourney is an open source Stable Diffusion fine tuned model on Midjourney images (HuggingFace model card)

Openjourney v2 (1B) — `huggingface/openjourney-v2-0`

Openjourney v2 is an open source Stable Diffusion fine tuned model on Midjourney images. Openjourney v2 is now referred to as Openjourney v4 in Hugging Face (HuggingFace model card).

Microsoft

Promptist + Stable Diffusion v1.4 (1B) — `huggingface/promptist-stable-diffusion-v1-4`

Trained with human preferences, Promptist optimizes user input into model-preferred prompts for Stable Diffusion v1.4 (paper)

nitrosocke

Redshift Diffusion (1B) — `huggingface/redshift-diffusion`

Redshift Diffusion is an open source Stable Diffusion model fine tuned on high resolution 3D artworks (HuggingFace model card)

TU Darmstadt

Safe Stable Diffusion weak (1B) — `huggingface/stable-diffusion-safe-weak`

Safe Stable Diffusion is an extension to the Stable Diffusion that drastically reduces inappropriate content (paper).

Safe Stable Diffusion medium (1B) — `huggingface/stable-diffusion-safe-medium`

Safe Stable Diffusion is an extension to the Stable Diffusion that drastically reduces inappropriate content (paper)

Safe Stable Diffusion strong (1B) — `huggingface/stable-diffusion-safe-strong`

Safe Stable Diffusion is an extension to the Stable Diffusion that drastically reduces inappropriate content (paper)

Safe Stable Diffusion max (1B) — `huggingface/stable-diffusion-safe-max`

Safe Stable Diffusion is an extension to the Stable Diffusion that drastically reduces inappropriate content (paper)

Ludwig Maximilian University of Munich CompVis

Stable Diffusion v1.4 (1B) — `huggingface/stable-diffusion-v1-4`

Stable Diffusion v1.4 is a latent text-to-image diffusion model capable of generating photorealistic images given any text input (paper)

Runway

Stable Diffusion v1.5 (1B) — `huggingface/stable-diffusion-v1-5`

The Stable-Diffusion-v1-5 checkpoint was initialized with the weights of the Stable-Diffusion-v1-2 checkpoint and subsequently fine-tuned on 595k steps at resolution 512x512 on laion-aesthetics v2 5+ and 10% dropping of the text-conditioning to improve classifier-free guidance sampling (paper)

Stability AI

Stable Diffusion v2 base (1B) — `huggingface/stable-diffusion-v2-base`

The model is trained from scratch 550k steps at resolution 256x256 on a subset of LAION-5B filtered for explicit pornographic material, using the LAION-NSFW classifier with punsafe=0.1 and an aesthetic score greater than 4.5. Then it is further trained for 850k steps at resolution 512x512 on the same dataset on images with resolution greater than 512x512 (paper)

Stable Diffusion v2.1 base (1B) — `huggingface/stable-diffusion-v2-1-base`

This stable-diffusion-2-1-base model fine-tunes stable-diffusion-2-base with 220k extra steps taken, with punsafe=0.98 on the same dataset (paper)

Stable Diffusion XL — `stabilityai/stable-diffusion-xl-base-1.0`

Stable Diffusion XL (SDXL) consists of an ensemble of experts pipeline for latent diffusion. (HuggingFace model card)

22 Hours

Vintedois (22h) Diffusion model v0.1 (1B) — `huggingface/vintedois-diffusion-v0-1`

Vintedois (22h) Diffusion model v0.1 is Stable Diffusion v1.5 that was finetuned on a large amount of high quality images with simple prompts to generate beautiful images without a lot of prompt engineering (HuggingFace model card)

Segmind

Segmind Stable Diffusion (0.74B) — `segmind/Segmind-Vega`

The Segmind-Vega Model is a distilled version of the Stable Diffusion XL (SDXL), offering a remarkable 70% reduction in size and an impressive 100% speedup while retaining high-quality text-to-image generation capabilities. Trained on diverse datasets, including Grit and Midjourney scrape data, it excels at creating a wide range of visual content based on textual prompts. (HuggingFace model card)

Segmind Stable Diffusion (1B) — `segmind/SSD-1B`

The Segmind Stable Diffusion Model (SSD-1B) is a distilled 50% smaller version of the Stable Diffusion XL (SDXL), offering a 60% speedup while maintaining high-quality text-to-image generation capabilities. It has been trained on diverse datasets, including Grit and Midjourney scrape data, to enhance its ability to create a wide range of visual content based on textual prompts. (HuggingFace model card)

Kakao

minDALL-E (1.3B) — `kakaobrain/mindall-e`

minDALL-E, named after minGPT, is an autoregressive text-to-image generation model trained on 14 million image-text pairs (code)

Lexica

Lexica Search with Stable Diffusion v1.5 (1B) — `lexica/search-stable-diffusion-1.5`

Retrieves Stable Diffusion v1.5 images Lexica users generated (docs).

OpenAI

DALL-E 2 (3.5B) — `openai/dall-e-2`

DALL-E 2 is a encoder-decoder-based latent diffusion model trained on large-scale paired text-image datasets. The model is available via the OpenAI API (paper).

DALL-E 3 — `openai/dall-e-3`

DALL-E 3 is a text-to-image generation model built natively on ChatGPT, used to prompt engineer automatically. The default style, vivid, causes the model to lean towards generating hyper-real and dramatic images. The model is available via the OpenAI API (paper).

DALL-E 3 (natural style) — `openai/dall-e-3-natural`

DALL-E 3 is a text-to-image generation model built natively on ChatGPT, used to prompt engineer automatically. The natural style causes the model to produce more natural, less hyper-real looking images. The model is available via the OpenAI API (paper).

DALL-E 3 HD — `openai/dall-e-3-hd`

DALL-E 3 is a text-to-image generation model built natively on ChatGPT, used to prompt engineer automatically. The HD version creates images with finer details and greater consistency across the image, but generation is slower. The default style, vivid, causes the model to lean towards generating hyper-real and dramatic images. The model is available via the OpenAI API (paper).

DALL-E 3 HD (natural style) — `openai/dall-e-3-hd-natural`

DALL-E 3 is a text-to-image generation model built natively on ChatGPT, used to prompt engineer automatically. The HD version creates images with finer details and greater consistency across the image, but generation is slower. The natural style causes the model to produce more natural, less hyper-real looking images. The model is available via the OpenAI API (paper).

Tsinghua

CogView2 (6B) — `thudm/cogview2`

CogView2 is a hierarchical transformer (6B-9B-9B parameters) for text-to-image generation that supports both English and Chinese input text (paper)

Audio-Language Models

Google

Gemini 1.5 Pro (001) — `google/gemini-1.5-pro-001`

Gemini 1.5 Pro is a multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from long contexts. This model is accessed through Vertex AI and has all safety thresholds set to BLOCK_NONE. (paper)

Gemini 1.5 Flash (001) — `google/gemini-1.5-flash-001`

Gemini 1.5 Flash is a multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from long contexts. This model is accessed through Vertex AI and has all safety thresholds set to BLOCK_NONE. (paper)

Gemini 1.5 Pro (002) — `google/gemini-1.5-pro-002`

Gemini 1.5 Pro is a multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from long contexts. This model is accessed through Vertex AI and has all safety thresholds set to BLOCK_NONE. (paper)

Gemini 1.5 Flash (002) — `google/gemini-1.5-flash-002`

Gemini 1.5 Flash is a multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from long contexts. This model is accessed through Vertex AI and has all safety thresholds set to BLOCK_NONE. (paper)

Gemini 2.0 Flash (Experimental) — `google/gemini-2.0-flash-exp`

Gemini 2.0 Flash (Experimental) is a Gemini model that supports multimodal inputs like images, video and audio, as well as multimodal output like natively generated images mixed with text and steerable text-to-speech (TTS) multilingual audio. (blog)

Gemini 1.5 Flash 8B — `google/gemini-1.5-flash-8b-001`

Gemini 1.5 Flash-8B is a small model designed for lower intelligence tasks. (documentation)

Gemini 2.0 Flash — `google/gemini-2.0-flash-001`

Gemini 2.0 Flash is a member of the Gemini 2.0 series of models, a suite of highly-capable, natively multimodal models designed to power agentic systems. (model card, documentation)

Gemini 2.0 Flash Lite (02-05 preview) — `google/gemini-2.0-flash-lite-preview-02-05`

Gemini 2.0 Flash Lite (02-05 preview) (model card, documentation)

Gemini 2.0 Flash Lite — `google/gemini-2.0-flash-lite-001`

Gemini 2.0 Flash Lite is the fastest and most cost efficient Flash model in the Gemini 2.0 series of models, a suite of highly-capable, natively multimodal models designed to power agentic systems. (model card, documentation)

Gemini 2.0 Flash Thinking (01-21 preview) — `google/gemini-2.0-flash-thinking-exp-01-21`

Gemini 2.0 Flash Thinking (01-21 preview) (documentation)

Gemini 2.0 Pro (02-05 preview) — `google/gemini-2.0-pro-exp-02-05`

Gemini 2.0 Pro (02-05 preview) (documentation)

Gemini 2.5 Flash-Lite (thinking disabled) — `google/gemini-2.5-flash-lite-thinking-disabled`

Gemini 2.5 Flash-Lite with thinking disabled (blog)

Gemini 2.5 Flash-Lite — `google/gemini-2.5-flash-lite`

Gemini 2.5 Flash-Lite (blog)

Gemini 2.5 Flash (thinking disabled) — `google/gemini-2.5-flash-thinking-disabled`

Gemini 2.5 Flash with thinking disabled (documentation)

Gemini 2.5 Flash — `google/gemini-2.5-flash`

Gemini 2.5 Flash (documentation)

Gemini 2.5 Pro — `google/gemini-2.5-pro`

Gemini 2.5 Pro (documentation)

Gemini 3 Pro (Preview) — `google/gemini-3-pro-preview`

Gemini 3.0 models are thinking models, capable of reasoning through their thoughts before responding, resulting in enhanced performance and improved accuracy. (blog, blog)

Gemini 3.1 Pro (Preview) — `google/gemini-3.1-pro-preview`

Gemini 3.1 Pro is the next iteration in the Gemini 3 series of models, a suite of highly capable, natively multimodal reasoning models. (blog, model card)

Gemini 3 Flash (Preview) — `google/gemini-3-flash-preview`

Gemini 3.0 models are thinking models, capable of reasoning through their thoughts before responding, resulting in enhanced performance and improved accuracy. (blog)

Gemini 3.1 Flash-Lite (Preview) — `google/gemini-3.1-flash-lite-preview`

Gemini 3.1 Flash-Lite (Preview) is the fastest and most cost-efficient Gemini 3 series model. (blog)

Gemini 2.0 Flash (DSPy Zero-Shot Predict) — `google/gemini-2.0-flash-001-dspy-zs-predict`

Gemini 2.0 Flash is a member of the Gemini 2.0 series of models, a suite of highly-capable, natively multimodal models designed to power agentic systems. (model card, documentation)

Gemini 2.0 Flash (DSPy Zero-Shot ChainOfThought) — `google/gemini-2.0-flash-001-dspy-zs-cot`

Gemini 2.0 Flash is a member of the Gemini 2.0 series of models, a suite of highly-capable, natively multimodal models designed to power agentic systems. (model card, documentation)

Gemini 2.0 Flash (DSPy BootstrapFewShotWithRandomSearch) — `google/gemini-2.0-flash-001-dspy-fs-bfrs`

Gemini 2.0 Flash is a member of the Gemini 2.0 series of models, a suite of highly-capable, natively multimodal models designed to power agentic systems. (model card, documentation)

Gemini 2.0 Flash (DSPy MIPROv2) — `google/gemini-2.0-flash-001-dspy-fs-miprov2`

Gemini 2.0 Flash is a member of the Gemini 2.0 series of models, a suite of highly-capable, natively multimodal models designed to power agentic systems. (model card, documentation)

OpenAI

Whisper-1 + GPT-4o (2024-11-20) — `openai/whisper-1_gpt-4o-2024-11-20`

Transcribes the text with Whisper-1 and then uses GPT-4o to generate a response.

GPT-4o Transcribe + GPT-4o (2024-11-20) — `openai/gpt-4o-transcribe_gpt-4o-2024-11-20`

Transcribes the text with GPT-4o Transcribe and then uses GPT-4o to generate a response.

GPT-4o mini Transcribe + GPT-4o (2024-11-20) — `openai/gpt-4o-mini-transcribe_gpt-4o-2024-11-20`

Transcribes the text with GPT-4o mini Transcribe and then uses GPT-4o to generate a response.

GPT-4o Audio (Preview 2024-10-01) — `openai/gpt-4o-audio-preview-2024-10-01`

GPT-4o Audio (Preview 2024-10-01) is a preview model that allows using use audio inputs to prompt the model (documentation).

GPT-4o Audio (Preview 2024-12-17) — `openai/gpt-4o-audio-preview-2024-12-17`

GPT-4o Audio (Preview 2024-12-17) is a preview model that allows using use audio inputs to prompt the model (documentation).

GPT-4o mini Audio (Preview 2024-12-17) — `openai/gpt-4o-mini-audio-preview-2024-12-17`

GPT-4o mini Audio (Preview 2024-12-17) is a preview model that allows using use audio inputs to prompt the model (documentation).

Alibaba Cloud

Qwen-Audio Chat — `qwen/qwen-audio-chat`

Auditory multimodal version of the Qwen large language model series (paper).

Qwen2-Audio Instruct (7B) — `qwen/qwen2-audio-7b-instruct`

The second version of auditory multimodal version of the Qwen large language model series (paper).

Qwen2.5-Omni (7B) — `qwen/qwen2.5-omni-7b`

The new flagship end-to-end multimodal model in the Qwen series that can process inputs including text, images, audio, and video (paper).

Stanford

Diva Llama 3 (8B) — `stanford/diva-llama`

Diva Llama 3 is an end-to-end Voice Assistant Model which can handle speech and text as inputs. It was trained using distillation loss. (paper)

ICTNLP

LLaMA-Omni (8B) — `ictnlp/llama-3.1-8b-omni`

The audio-visual multimodal version of the LLaMA 3.1 model (paper).

Models

Text Models

AI21 Labs

Jurassic-2 Large (7.5B) — ai21/j2-large

Jurassic-2 Grande (17B) — ai21/j2-grande

Jurassic-2 Jumbo (178B) — ai21/j2-jumbo

Jamba Instruct — ai21/jamba-instruct

Jamba 1.5 Mini — ai21/jamba-1.5-mini

Jamba 1.5 Large — ai21/jamba-1.5-large

AI Singapore

SEA-LION 7B — aisingapore/sea-lion-7b

SEA-LION 7B Instruct — aisingapore/sea-lion-7b-instruct

Llama3 8B CPT SEA-LIONv2 — aisingapore/llama3-8b-cpt-sea-lionv2-base

Llama3 8B CPT SEA-LIONv2.1 Instruct — aisingapore/llama3-8b-cpt-sea-lionv2.1-instruct

Gemma2 9B CPT SEA-LIONv3 — aisingapore/gemma2-9b-cpt-sea-lionv3-base

Gemma2 9B CPT SEA-LIONv3 Instruct — aisingapore/gemma2-9b-cpt-sea-lionv3-instruct

Llama3.1 8B CPT SEA-LIONv3 — aisingapore/llama3.1-8b-cpt-sea-lionv3-base

Llama3.1 8B CPT SEA-LIONv3 Instruct — aisingapore/llama3.1-8b-cpt-sea-lionv3-instruct

Llama3.1 70B CPT SEA-LIONv3 — aisingapore/llama3.1-70b-cpt-sea-lionv3-base

Llama3.1 70B CPT SEA-LIONv3 Instruct — aisingapore/llama3.1-70b-cpt-sea-lionv3-instruct

Aleph Alpha

Luminous Base (13B) — AlephAlpha/luminous-base

Luminous Extended (30B) — AlephAlpha/luminous-extended

Luminous Supreme (70B) — AlephAlpha/luminous-supreme

Amazon

Amazon Nova Premier — amazon/nova-premier-v1:0

Amazon Nova Pro — amazon/nova-pro-v1:0

Amazon Nova Lite — amazon/nova-lite-v1:0

Amazon Nova Micro — amazon/nova-micro-v1:0

Amazon Nova 2 Pro — amazon/nova-2-pro-v1:0

Amazon Nova 2 Lite — amazon/nova-2-lite-v1:0

Amazon Titan Text Lite — amazon/titan-text-lite-v1

Amazon Titan Text Express — amazon/titan-text-express-v1

Mistral

Mistral 7B Instruct on Amazon Bedrock — mistralai/amazon-mistral-7b-instruct-v0:2

Mixtral 8x7B Instruct on Amazon Bedrock — mistralai/amazon-mixtral-8x7b-instruct-v0:1

Mistral Large(2402) on Amazon Bedrock — mistralai/amazon-mistral-large-2402-v1:0

Mistral Small on Amazon Bedrock — mistralai/amazon-mistral-small-2402-v1:0

Mistral Large(2407) on Amazon Bedrock — mistralai/amazon-mistral-large-2407-v1:0

Meta

Llama 3 8B Instruct on Amazon Bedrock — meta/amazon-llama3-8b-instruct-v1:0

Llama 3 70B Instruct on Amazon Bedrock — meta/amazon-llama3-70b-instruct-v1:0

Llama 3.1 405b Instruct on Amazon Bedrock. — meta/amazon-llama3-1-405b-instruct-v1:0

Llama 3.1 70b Instruct on Amazon Bedrock. — meta/amazon-llama3-1-70b-instruct-v1:0

Llama 3.1 8b Instruct on Amazon Bedrock. — meta/amazon-llama3-1-8b-instruct-v1:0

OPT (175B) — meta/opt-175b

OPT (66B) — meta/opt-66b

OPT (6.7B) — meta/opt-6.7b

OPT (1.3B) — meta/opt-1.3b

LLaMA (7B) — meta/llama-7b

LLaMA (13B) — meta/llama-13b

LLaMA (30B) — meta/llama-30b

LLaMA (65B) — meta/llama-65b

Llama 2 (7B) — meta/llama-2-7b

Llama 2 (13B) — meta/llama-2-13b

Llama 2 (70B) — meta/llama-2-70b

Llama 3 (8B) — meta/llama-3-8b

Llama 3 Instruct Turbo (8B) — meta/llama-3-8b-instruct-turbo

Llama 3 Instruct Lite (8B) — meta/llama-3-8b-instruct-lite

Llama 3 (70B) — meta/llama-3-70b

Llama 3 Instruct Turbo (70B) — meta/llama-3-70b-instruct-turbo

Llama 3 Instruct Lite (70B) — meta/llama-3-70b-instruct-lite

Llama 3.1 Instruct (8B) — meta/llama-3.1-8b-instruct

Llama 3.1 Instruct (70B) — meta/llama-3.1-70b-instruct

Llama 3.1 Instruct (405B) — meta/llama-3.1-405b-instruct

Llama 3.1 Instruct Turbo (8B) — meta/llama-3.1-8b-instruct-turbo

Llama 3.1 Instruct Turbo (70B) — meta/llama-3.1-70b-instruct-turbo

Llama 3.2 Instruct (1.23B) — meta/llama-3.2-1b-instruct

Llama 3.2 Instruct Turbo (3B) — meta/llama-3.2-3b-instruct-turbo

Llama 3.2 Vision Instruct Turbo (11B) — meta/llama-3.2-11b-vision-instruct-turbo

Llama 3.2 Vision Instruct Turbo (90B) — meta/llama-3.2-90b-vision-instruct-turbo

Llama 3.3 Instruct Turbo (70B) — meta/llama-3.3-70b-instruct-turbo

Llama 3.3 Instruct (70B) — meta/llama-3.3-70b-instruct

Llama 4 Scout (17Bx16E) Instruct — meta/llama-4-scout-17b-16e-instruct

Llama 4 Maverick (17Bx128E) Instruct — meta/llama-4-maverick-17b-128e-instruct

Llama 4 Maverick (17Bx128E) Instruct FP8 — meta/llama-4-maverick-17b-128e-instruct-fp8

Llama 3 Instruct (8B) — meta/llama-3-8b-chat

Llama 3 Instruct (70B) — meta/llama-3-70b-chat

Llama Guard (7B) — meta/llama-guard-7b

Llama Guard 2 (8B) — meta/llama-guard-2-8b

Jurassic-2 Large (7.5B) — `ai21/j2-large`

Jurassic-2 Grande (17B) — `ai21/j2-grande`

Jurassic-2 Jumbo (178B) — `ai21/j2-jumbo`

Jamba Instruct — `ai21/jamba-instruct`

Jamba 1.5 Mini — `ai21/jamba-1.5-mini`

Jamba 1.5 Large — `ai21/jamba-1.5-large`

SEA-LION 7B — `aisingapore/sea-lion-7b`

SEA-LION 7B Instruct — `aisingapore/sea-lion-7b-instruct`

Llama3 8B CPT SEA-LIONv2 — `aisingapore/llama3-8b-cpt-sea-lionv2-base`

Llama3 8B CPT SEA-LIONv2.1 Instruct — `aisingapore/llama3-8b-cpt-sea-lionv2.1-instruct`

Gemma2 9B CPT SEA-LIONv3 — `aisingapore/gemma2-9b-cpt-sea-lionv3-base`

Gemma2 9B CPT SEA-LIONv3 Instruct — `aisingapore/gemma2-9b-cpt-sea-lionv3-instruct`

Llama3.1 8B CPT SEA-LIONv3 — `aisingapore/llama3.1-8b-cpt-sea-lionv3-base`

Llama3.1 8B CPT SEA-LIONv3 Instruct — `aisingapore/llama3.1-8b-cpt-sea-lionv3-instruct`

Llama3.1 70B CPT SEA-LIONv3 — `aisingapore/llama3.1-70b-cpt-sea-lionv3-base`

Llama3.1 70B CPT SEA-LIONv3 Instruct — `aisingapore/llama3.1-70b-cpt-sea-lionv3-instruct`

Luminous Base (13B) — `AlephAlpha/luminous-base`

Luminous Extended (30B) — `AlephAlpha/luminous-extended`

Luminous Supreme (70B) — `AlephAlpha/luminous-supreme`

Amazon Nova Premier — `amazon/nova-premier-v1:0`

Amazon Nova Pro — `amazon/nova-pro-v1:0`

Amazon Nova Lite — `amazon/nova-lite-v1:0`

Amazon Nova Micro — `amazon/nova-micro-v1:0`

Amazon Nova 2 Pro — `amazon/nova-2-pro-v1:0`

Amazon Nova 2 Lite — `amazon/nova-2-lite-v1:0`

Amazon Titan Text Lite — `amazon/titan-text-lite-v1`

Amazon Titan Text Express — `amazon/titan-text-express-v1`

Mistral 7B Instruct on Amazon Bedrock — `mistralai/amazon-mistral-7b-instruct-v0:2`

Mixtral 8x7B Instruct on Amazon Bedrock — `mistralai/amazon-mixtral-8x7b-instruct-v0:1`

Mistral Large(2402) on Amazon Bedrock — `mistralai/amazon-mistral-large-2402-v1:0`

Mistral Small on Amazon Bedrock — `mistralai/amazon-mistral-small-2402-v1:0`

Mistral Large(2407) on Amazon Bedrock — `mistralai/amazon-mistral-large-2407-v1:0`

Llama 3 8B Instruct on Amazon Bedrock — `meta/amazon-llama3-8b-instruct-v1:0`

Llama 3 70B Instruct on Amazon Bedrock — `meta/amazon-llama3-70b-instruct-v1:0`

Llama 3.1 405b Instruct on Amazon Bedrock. — `meta/amazon-llama3-1-405b-instruct-v1:0`

Llama 3.1 70b Instruct on Amazon Bedrock. — `meta/amazon-llama3-1-70b-instruct-v1:0`

Llama 3.1 8b Instruct on Amazon Bedrock. — `meta/amazon-llama3-1-8b-instruct-v1:0`

OPT (175B) — `meta/opt-175b`

OPT (66B) — `meta/opt-66b`

OPT (6.7B) — `meta/opt-6.7b`

OPT (1.3B) — `meta/opt-1.3b`

LLaMA (7B) — `meta/llama-7b`

LLaMA (13B) — `meta/llama-13b`

LLaMA (30B) — `meta/llama-30b`

LLaMA (65B) — `meta/llama-65b`

Llama 2 (7B) — `meta/llama-2-7b`

Llama 2 (13B) — `meta/llama-2-13b`

Llama 2 (70B) — `meta/llama-2-70b`

Llama 3 (8B) — `meta/llama-3-8b`

Llama 3 Instruct Turbo (8B) — `meta/llama-3-8b-instruct-turbo`

Llama 3 Instruct Lite (8B) — `meta/llama-3-8b-instruct-lite`

Llama 3 (70B) — `meta/llama-3-70b`

Llama 3 Instruct Turbo (70B) — `meta/llama-3-70b-instruct-turbo`

Llama 3 Instruct Lite (70B) — `meta/llama-3-70b-instruct-lite`

Llama 3.1 Instruct (8B) — `meta/llama-3.1-8b-instruct`

Llama 3.1 Instruct (70B) — `meta/llama-3.1-70b-instruct`

Llama 3.1 Instruct (405B) — `meta/llama-3.1-405b-instruct`

Llama 3.1 Instruct Turbo (8B) — `meta/llama-3.1-8b-instruct-turbo`

Llama 3.1 Instruct Turbo (70B) — `meta/llama-3.1-70b-instruct-turbo`

Llama 3.2 Instruct (1.23B) — `meta/llama-3.2-1b-instruct`

Llama 3.2 Instruct Turbo (3B) — `meta/llama-3.2-3b-instruct-turbo`

Llama 3.2 Vision Instruct Turbo (11B) — `meta/llama-3.2-11b-vision-instruct-turbo`

Llama 3.2 Vision Instruct Turbo (90B) — `meta/llama-3.2-90b-vision-instruct-turbo`

Llama 3.3 Instruct Turbo (70B) — `meta/llama-3.3-70b-instruct-turbo`

Llama 3.3 Instruct (70B) — `meta/llama-3.3-70b-instruct`

Llama 4 Scout (17Bx16E) Instruct — `meta/llama-4-scout-17b-16e-instruct`

Llama 4 Maverick (17Bx128E) Instruct — `meta/llama-4-maverick-17b-128e-instruct`

Llama 4 Maverick (17Bx128E) Instruct FP8 — `meta/llama-4-maverick-17b-128e-instruct-fp8`

Llama 3 Instruct (8B) — `meta/llama-3-8b-chat`

Llama 3 Instruct (70B) — `meta/llama-3-70b-chat`

Llama Guard (7B) — `meta/llama-guard-7b`

Llama Guard 2 (8B) — `meta/llama-guard-2-8b`

Llama Guard 3 (8B) — `meta/llama-guard-3-8b`

Claude v1.3 — `anthropic/claude-v1.3`

Claude Instant V1 — `anthropic/claude-instant-v1`

Claude Instant 1.2 — `anthropic/claude-instant-1.2`

Claude 2.0 — `anthropic/claude-2.0`

Claude 2.1 — `anthropic/claude-2.1`

Claude 3 Haiku (20240307) — `anthropic/claude-3-haiku-20240307`

Claude 3 Sonnet (20240229) — `anthropic/claude-3-sonnet-20240229`