Models

Text Models

AI21 Labs

Jurassic-2 Large (7.5B) — ai21/j2-large

Jurassic-2 Large (7.5B parameters) (docs)

Jurassic-2 Grande (17B) — ai21/j2-grande

Jurassic-2 Grande (17B parameters) (docs)

Jurassic-2 Jumbo (178B) — ai21/j2-jumbo

Jurassic-2 Jumbo (178B parameters) (docs)

Jamba Instruct — ai21/jamba-instruct

Jamba Instruct is an instruction tuned version of Jamba, which uses a hybrid Transformer-Mamba mixture-of-experts (MoE) architecture that interleaves blocks of Transformer and Mamba layers. (blog)

Jamba 1.5 Mini — ai21/jamba-1.5-mini

Jamba 1.5 Mini is a long-context, hybrid SSM-Transformer instruction following foundation model that is optimized for function calling, structured output, and grounded generation. (blog)

Jamba 1.5 Large — ai21/jamba-1.5-large

Jamba 1.5 Large is a long-context, hybrid SSM-Transformer instruction following foundation model that is optimized for function calling, structured output, and grounded generation. (blog)

AI Singapore

SEA-LION (7B) — aisingapore/sea-lion-7b

SEA-LION is a collection of language models which has been pretrained and instruct-tuned on languages from the Southeast Asia region. It utilizes the MPT architecture and a custom SEABPETokenizer for tokenization.

SEA-LION Instruct (7B) — aisingapore/sea-lion-7b-instruct

SEA-LION is a collection of language models which has been pretrained and instruct-tuned on languages from the Southeast Asia region. It utilizes the MPT architecture and a custom SEABPETokenizer for tokenization.

Llama 3 CPT SEA-Lion v2 (8B) — aisingapore/llama3-8b-cpt-sea-lionv2-base

Llama 3 CPT SEA-Lion v2 (8B) is a multilingual model which was continued pre-trained on 48B additional tokens, including tokens in Southeast Asian languages.

Llama 3 CPT SEA-Lion v2.1 Instruct (8B) — aisingapore/llama3-8b-cpt-sea-lionv2.1-instruct

Llama 3 CPT SEA-Lion v2.1 Instruct (8B) is a multilingual model which has been fine-tuned with around 100,000 English instruction-completion pairs alongside a smaller pool of around 50,000 instruction-completion pairs from other Southeast Asian languages, such as Indonesian, Thai and Vietnamese.

Aleph Alpha

Luminous Base (13B) — AlephAlpha/luminous-base

Luminous Base (13B parameters) (docs)

Luminous Extended (30B) — AlephAlpha/luminous-extended

Luminous Extended (30B parameters) (docs)

Luminous Supreme (70B) — AlephAlpha/luminous-supreme

Luminous Supreme (70B parameters) (docs)

Amazon

Amazon Titan Text Lite — amazon/titan-text-lite-v1

Amazon Titan Text Lite is a lightweight, efficient model perfect for fine-tuning English-language tasks like summarization and copywriting. It caters to customers seeking a smaller, cost-effective, and highly customizable model. It supports various formats, including text generation, code generation, rich text formatting, and orchestration (agents). Key model attributes encompass fine-tuning, text generation, code generation, and rich text formatting.

Amazon Titan Text Express — amazon/titan-text-express-v1

Amazon Titan Text Express, with a context length of up to 8,000 tokens, excels in advanced language tasks like open-ended text generation and conversational chat. It's also optimized for Retrieval Augmented Generation (RAG). Initially designed for English, the model offers preview multilingual support for over 100 additional languages.

Mistral

Mistral 7B Instruct on Amazon Bedrock — mistralai/amazon-mistral-7b-instruct-v0:2

A 7B dense Transformer, fast-deployed and easily customisable. Small, yet powerful for a variety of use cases. Supports English and code, and a 32k context window.

Mixtral 8x7B Instruct on Amazon Bedrock — mistralai/amazon-mixtral-8x7b-instruct-v0:1

A 7B sparse Mixture-of-Experts model with stronger capabilities than Mistral 7B. Uses 12B active parameters out of 45B total. Supports multiple languages, code and 32k context window.

Mistral Large(2402) on Amazon Bedrock — mistralai/amazon-mistral-large-2402-v1:0

The most advanced Mistral AI Large Language model capable of handling any language task including complex multilingual reasoning, text understanding, transformation, and code generation.

Mistral Small on Amazon Bedrock — mistralai/amazon-mistral-small-2402-v1:0

Mistral Small is perfectly suited for straightforward tasks that can be performed in bulk, such as classification, customer support, or text generation. It provides outstanding performance at a cost-effective price point.

Mistral Large(2407) on Amazon Bedrock — mistralai/amazon-mistral-large-2407-v1:0

Mistral Large 2407 is an advanced Large Language Model (LLM) that supports dozens of languages and is trained on 80+ coding languages. It has best-in-class agentic capabilities with native function calling JSON outputting and reasoning capabilities.

Meta

Llama 3 8B Instruct on Amazon Bedrock — meta/amazon-llama3-8b-instruct-v1:0

Meta Llama 3 is an accessible, open large language model (LLM) designed for developers, researchers, and businesses to build, experiment, and responsibly scale their generative AI ideas. Part of a foundational system, it serves as a bedrock for innovation in the global community. Ideal for limited computational power and resources, edge devices, and faster training times.

Llama 3 70B Instruct on Amazon Bedrock — meta/amazon-llama3-70b-instruct-v1:0

Meta Llama 3 is an accessible, open large language model (LLM) designed for developers, researchers, and businesses to build, experiment, and responsibly scale their generative AI ideas. Part of a foundational system, it serves as a bedrock for innovation in the global community. Ideal for content creation, conversational AI, language understanding, R&D, and Enterprise applications.

Llama 3.1 405b Instruct on Amazon Bedrock. — meta/amazon-llama3-1-405b-instruct-v1:0

Meta's Llama 3.1 offers multilingual models (8B, 70B, 405B) with 128K context, improved reasoning, and optimization for dialogue. It outperforms many open-source chat models and is designed for commercial and research use in multiple languages.

Llama 3.1 70b Instruct on Amazon Bedrock. — meta/amazon-llama3-1-70b-instruct-v1:0

Meta's Llama 3.1 offers multilingual models (8B, 70B, 405B) with 128K context, improved reasoning, and optimization for dialogue. It outperforms many open-source chat models and is designed for commercial and research use in multiple languages.

Llama 3.1 8b Instruct on Amazon Bedrock. — meta/amazon-llama3-1-8b-instruct-v1:0

Meta's Llama 3.1 offers multilingual models (8B, 70B, 405B) with 128K context, improved reasoning, and optimization for dialogue. It outperforms many open-source chat models and is designed for commercial and research use in multiple languages.

OPT (175B) — meta/opt-175b

Open Pre-trained Transformers (175B parameters) is a suite of decoder-only pre-trained transformers that are fully and responsibly shared with interested researchers (paper).

OPT (66B) — meta/opt-66b

Open Pre-trained Transformers (66B parameters) is a suite of decoder-only pre-trained transformers that are fully and responsibly shared with interested researchers (paper).

OPT (6.7B) — meta/opt-6.7b

Open Pre-trained Transformers (6.7B parameters) is a suite of decoder-only pre-trained transformers that are fully and responsibly shared with interested researchers (paper).

OPT (1.3B) — meta/opt-1.3b

Open Pre-trained Transformers (1.3B parameters) is a suite of decoder-only pre-trained transformers that are fully and responsibly shared with interested researchers (paper).

LLaMA (7B) — meta/llama-7b

LLaMA is a collection of foundation language models ranging from 7B to 65B parameters.

LLaMA (13B) — meta/llama-13b

LLaMA is a collection of foundation language models ranging from 7B to 65B parameters.

LLaMA (30B) — meta/llama-30b

LLaMA is a collection of foundation language models ranging from 7B to 65B parameters.

LLaMA (65B) — meta/llama-65b

LLaMA is a collection of foundation language models ranging from 7B to 65B parameters.

Llama 2 (7B) — meta/llama-2-7b

Llama 2 pretrained models are trained on 2 trillion tokens, and have double the context length than Llama 1.

Llama 2 (13B) — meta/llama-2-13b

Llama 2 pretrained models are trained on 2 trillion tokens, and have double the context length than Llama 1.

Llama 2 (70B) — meta/llama-2-70b

Llama 2 pretrained models are trained on 2 trillion tokens, and have double the context length than Llama 1.

Llama 3 (8B) — meta/llama-3-8b

Llama 3 is a family of language models that have been trained on more than 15 trillion tokens, and use Grouped-Query Attention (GQA) for improved inference scalability. (paper

Llama 3 Instruct Turbo (8B) — meta/llama-3-8b-instruct-turbo

Llama 3 is a family of language models that have been trained on more than 15 trillion tokens, and use Grouped-Query Attention (GQA) for improved inference scalability. (paper Turbo is Together's implementation, providing fast FP8 performance while maintaining quality, closely matching FP16 reference models. (blog)

Llama 3 Instruct Lite (8B) — meta/llama-3-8b-instruct-lite

Llama 3 is a family of language models that have been trained on more than 15 trillion tokens, and use Grouped-Query Attention (GQA) for improved inference scalability. (paper Lite is Together's implementation, it leverages a number of optimizations including INT4 quantization, provides the most cost-efficient and scalable Llama 3 models available anywhere, while maintaining excellent quality relative to full precision reference implementations (blog)

Llama 3 (70B) — meta/llama-3-70b

Llama 3 is a family of language models that have been trained on more than 15 trillion tokens, and use Grouped-Query Attention (GQA) for improved inference scalability. (paper

Llama 3 Instruct Turbo (70B) — meta/llama-3-70b-instruct-turbo

Llama 3 is a family of language models that have been trained on more than 15 trillion tokens, and use Grouped-Query Attention (GQA) for improved inference scalability. (paper Turbo is Together's implementation, providing fast FP8 performance while maintaining quality, closely matching FP16 reference models. (blog)

Llama 3 Instruct Lite (70B) — meta/llama-3-70b-instruct-lite

Llama 3 is a family of language models that have been trained on more than 15 trillion tokens, and use Grouped-Query Attention (GQA) for improved inference scalability. (paper Lite is Together's implementation, it leverages a number of optimizations including INT4 quantization, provides the most cost-efficient and scalable Llama 3 models available anywhere, while maintaining excellent quality relative to full precision reference implementations (blog)

Llama 3.1 Instruct Turbo (8B) — meta/llama-3.1-8b-instruct-turbo

Llama 3.1 (8B) is part of the Llama 3 family of dense Transformer models that that natively support multilinguality, coding, reasoning, and tool usage. (paper, blog) Turbo is Together's implementation, providing a near negligible difference in quality from the reference implementation with faster performance and lower cost, currently using FP8 quantization. (blog)

Llama 3.1 Instruct Turbo (70B) — meta/llama-3.1-70b-instruct-turbo

Llama 3.1 (70B) is part of the Llama 3 family of dense Transformer models that that natively support multilinguality, coding, reasoning, and tool usage. (paper, blog) Turbo is Together's implementation, providing a near negligible difference in quality from the reference implementation with faster performance and lower cost, currently using FP8 quantization. (blog)

Llama 3.1 Instruct Turbo (405B) — meta/llama-3.1-405b-instruct-turbo

Llama 3.1 (405B) is part of the Llama 3 family of dense Transformer models that that natively support multilinguality, coding, reasoning, and tool usage. (paper, blog) Turbo is Together's implementation, providing a near negligible difference in quality from the reference implementation with faster performance and lower cost, currently using FP8 quantization. (blog)

Llama 3.2 Instruct Turbo (3B) — meta/llama-3.2-3b-instruct-turbo

The Meta Llama 3.2 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned text-only generative models in 1B and 3B sizes. (blog) Turbo is Together's implementation, providing a near negligible difference in quality from the reference implementation with faster performance and lower cost, currently using FP8 quantization. (blog)

Llama 3.2 Vision Instruct Turbo (11B) — meta/llama-3.2-11b-vision-instruct-turbo

The Llama 3.2 Vision collection of multimodal large language models (LLMs) is a collection of pretrained and instruction-tuned image reasoning generative models in 11B and 90B sizes. (blog) Turbo is Together's implementation, providing a near negligible difference in quality from the reference implementation with faster performance and lower cost, currently using FP8 quantization. (blog)

Llama 3.2 Vision Instruct Turbo (90B) — meta/llama-3.2-90b-vision-instruct-turbo

The Llama 3.2 Vision collection of multimodal large language models (LLMs) is a collection of pretrained and instruction-tuned image reasoning generative models in 11B and 90B sizes. (blog) Turbo is Together's implementation, providing a near negligible difference in quality from the reference implementation with faster performance and lower cost, currently using FP8 quantization. (blog)

Llama 3 Instruct (8B) — meta/llama-3-8b-chat

Llama 3 is a family of language models that have been trained on more than 15 trillion tokens, and use Grouped-Query Attention (GQA) for improved inference scalability. It used SFT, rejection sampling, PPO and DPO for post-training. (paper

Llama 3 Instruct (70B) — meta/llama-3-70b-chat

Llama 3 is a family of language models that have been trained on more than 15 trillion tokens, and use Grouped-Query Attention (GQA) for improved inference scalability. It used SFT, rejection sampling, PPO and DPO for post-training. (paper

Llama Guard (7B) — meta/llama-guard-7b

Llama-Guard is a 7B parameter Llama 2-based input-output safeguard model. It can be used for classifying content in both LLM inputs (prompt classification) and in LLM responses (response classification). It acts as an LLM it generates text in its output that indicates whether a given prompt or response is safe/unsafe, and if unsafe based on a policy, it also lists the violating subcategories.

Llama Guard 2 (8B) — meta/llama-guard-2-8b

Llama Guard 2 is an 8B parameter Llama 3-based LLM safeguard model. Similar to Llama Guard, it can be used for classifying content in both LLM inputs (prompt classification) and in LLM responses (response classification). It acts as an LLM – it generates text in its output that indicates whether a given prompt or response is safe or unsafe, and if unsafe, it also lists the content categories violated.

Llama Guard 3 (8B) — meta/llama-guard-3-8b

Llama Guard 3 is an 8B parameter Llama 3.1-based LLM safeguard model. Similar to Llama Guard, it can be used for classifying content in both LLM inputs (prompt classification) and in LLM responses (response classification). It acts as an LLM – it generates text in its output that indicates whether a given prompt or response is safe or unsafe, and if unsafe, it also lists the content categories violated.

Anthropic

Claude v1.3 — anthropic/claude-v1.3

A 52B parameter language model, trained using reinforcement learning from human feedback paper.

Claude Instant V1 — anthropic/claude-instant-v1

A lightweight version of Claude, a model trained using reinforcement learning from human feedback (docs).

Claude Instant 1.2 — anthropic/claude-instant-1.2

A lightweight version of Claude, a model trained using reinforcement learning from human feedback (docs).

Claude 2.0 — anthropic/claude-2.0

Claude 2.0 is a general purpose large language model developed by Anthropic. It uses a transformer architecture and is trained via unsupervised learning, RLHF, and Constitutional AI (including both a supervised and Reinforcement Learning (RL) phase). (model card)

Claude 2.1 — anthropic/claude-2.1

Claude 2.1 is a general purpose large language model developed by Anthropic. It uses a transformer architecture and is trained via unsupervised learning, RLHF, and Constitutional AI (including both a supervised and Reinforcement Learning (RL) phase). (model card)

Claude 3 Haiku (20240307) — anthropic/claude-3-haiku-20240307

Claude 3 is a a family of models that possess vision and multilingual capabilities. They were trained with various methods such as unsupervised learning and Constitutional AI (blog).

Claude 3 Sonnet (20240229) — anthropic/claude-3-sonnet-20240229

Claude 3 is a a family of models that possess vision and multilingual capabilities. They were trained with various methods such as unsupervised learning and Constitutional AI (blog).

Claude 3 Opus (20240229) — anthropic/claude-3-opus-20240229

Claude 3 is a a family of models that possess vision and multilingual capabilities. They were trained with various methods such as unsupervised learning and Constitutional AI (blog).

Claude 3.5 Haiku (20241022) — anthropic/claude-3-5-haiku-20241022

Claude 3.5 Haiku is a Claude 3 family model which matches the performance of Claude 3 Opus at a similar speed to the previous generation of Haiku (blog).

Claude 3.5 Sonnet (20240620) — anthropic/claude-3-5-sonnet-20240620

Claude 3.5 Sonnet is a Claude 3 family model which outperforms Claude 3 Opus while operating faster and at a lower cost. (blog)

Claude 3.5 Sonnet (20241022) — anthropic/claude-3-5-sonnet-20241022

Claude 3.5 Sonnet is a Claude 3 family model which outperforms Claude 3 Opus while operating faster and at a lower cost (blog). This is an upgraded snapshot released on 2024-10-22 (blog).

BigScience

BLOOM (176B) — bigscience/bloom

BLOOM (176B parameters) is an autoregressive model trained on 46 natural languages and 13 programming languages (paper).

T0pp (11B) — bigscience/t0pp

T0pp (11B parameters) is an encoder-decoder model trained on a large set of different tasks specified in natural language prompts (paper).

BioMistral

BioMistral (7B) — biomistral/biomistral-7b

BioMistral 7B is an open-source LLM tailored for the biomedical domain, utilizing Mistral as its foundation model and further pre-trained on PubMed Central.

Cohere

Command — cohere/command

Command is Cohere’s flagship text generation model. It is trained to follow user commands and to be instantly useful in practical business applications. docs and changelog

Command Light — cohere/command-light

Command is Cohere’s flagship text generation model. It is trained to follow user commands and to be instantly useful in practical business applications. docs and changelog

Command R — cohere/command-r

Command R is a multilingual 35B parameter model with a context length of 128K that has been trained with conversational tool use capabilities.

Command R Plus — cohere/command-r-plus

Command R+ is a multilingual 104B parameter model with a context length of 128K that has been trained with conversational tool use capabilities.

Databricks

Dolly V2 (3B) — databricks/dolly-v2-3b

Dolly V2 (3B) is an instruction-following large language model trained on the Databricks machine learning platform. It is based on pythia-12b.

Dolly V2 (7B) — databricks/dolly-v2-7b

Dolly V2 (7B) is an instruction-following large language model trained on the Databricks machine learning platform. It is based on pythia-12b.

Dolly V2 (12B) — databricks/dolly-v2-12b

Dolly V2 (12B) is an instruction-following large language model trained on the Databricks machine learning platform. It is based on pythia-12b.

DBRX Instruct — databricks/dbrx-instruct

DBRX is a large language model with a fine-grained mixture-of-experts (MoE) architecture that uses 16 experts and chooses 4. It has 132B total parameters, of which 36B parameters are active on any input. (blog post)

DeepSeek

DeepSeek LLM Chat (67B) — deepseek-ai/deepseek-llm-67b-chat

DeepSeek LLM Chat is a open-source language model trained on 2 trillion tokens in both English and Chinese, and fine-tuned supervised fine-tuning (SFT) and Direct Preference Optimization (DPO). (paper)

EleutherAI

GPT-J (6B) — eleutherai/gpt-j-6b

GPT-J (6B parameters) autoregressive language model trained on The Pile (details).

GPT-NeoX (20B) — eleutherai/gpt-neox-20b

GPT-NeoX (20B parameters) autoregressive language model trained on The Pile (paper).

Pythia (1B) — eleutherai/pythia-1b-v0

Pythia (1B parameters). The Pythia project combines interpretability analysis and scaling laws to understand how knowledge develops and evolves during training in autoregressive transformers.

Pythia (2.8B) — eleutherai/pythia-2.8b-v0

Pythia (2.8B parameters). The Pythia project combines interpretability analysis and scaling laws to understand how knowledge develops and evolves during training in autoregressive transformers.

Pythia (6.9B) — eleutherai/pythia-6.9b

Pythia (6.9B parameters). The Pythia project combines interpretability analysis and scaling laws to understand how knowledge develops and evolves during training in autoregressive transformers.

Pythia (12B) — eleutherai/pythia-12b-v0

Pythia (12B parameters). The Pythia project combines interpretability analysis and scaling laws to understand how knowledge develops and evolves during training in autoregressive transformers.

EPFL LLM

Meditron (7B) — epfl-llm/meditron-7b

Meditron-7B is a 7 billion parameter model adapted to the medical domain from Llama-2-7B through continued pretraining on a comprehensively curated medical corpus.

Google

T5 (11B) — google/t5-11b

T5 (11B parameters) is an encoder-decoder model trained on a multi-task mixture, where each task is converted into a text-to-text format (paper).

UL2 (20B) — google/ul2

UL2 (20B parameters) is an encoder-decoder model trained on the C4 corpus. It's similar to T5 but trained with a different objective and slightly different scaling knobs (paper).

Flan-T5 (11B) — google/flan-t5-xxl

Flan-T5 (11B parameters) is T5 fine-tuned on 1.8K tasks (paper).

Gemini Pro — google/gemini-pro

Gemini Pro is a multimodal model able to reason across text, images, video, audio and code. (paper)

Gemini 1.0 Pro (001) — google/gemini-1.0-pro-001

Gemini 1.0 Pro is a multimodal model able to reason across text, images, video, audio and code. (paper)

Gemini 1.0 Pro (002) — google/gemini-1.0-pro-002

Gemini 1.0 Pro is a multimodal model able to reason across text, images, video, audio and code. (paper)

Gemini 1.5 Pro (001) — google/gemini-1.5-pro-001

Gemini 1.5 Pro is a multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from long contexts. This model is accessed through Vertex AI and has all safety thresholds set to BLOCK_NONE. (paper)

Gemini 1.5 Flash (001) — google/gemini-1.5-flash-001

Gemini 1.5 Flash is a multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from long contexts. This model is accessed through Vertex AI and has all safety thresholds set to BLOCK_NONE. (paper)

Gemini 1.5 Pro (0409 preview) — google/gemini-1.5-pro-preview-0409

Gemini 1.5 Pro is a multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from long contexts. This model is accessed through Vertex AI and has all safety thresholds set to BLOCK_NONE. (paper)

Gemini 1.5 Pro (0514 preview) — google/gemini-1.5-pro-preview-0514

Gemini 1.5 Pro is a multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from long contexts. This model is accessed through Vertex AI and has all safety thresholds set to BLOCK_NONE. (paper)

Gemini 1.5 Flash (0514 preview) — google/gemini-1.5-flash-preview-0514

Gemini 1.5 Flash is a smaller Gemini model. It has a 1 million token context window and allows interleaving text, images, audio and video as inputs. This model is accessed through Vertex AI and has all safety thresholds set to BLOCK_NONE. (blog)

Gemini 1.5 Pro (001, default safety) — google/gemini-1.5-pro-001-safety-default

Gemini 1.5 Pro is a multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from long contexts. This model is accessed through Vertex AI and uses default safety settings. (paper)

Gemini 1.5 Pro (001, BLOCK_NONE safety) — google/gemini-1.5-pro-001-safety-block-none

Gemini 1.5 Pro is a multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from long contexts. This model is accessed through Vertex AI and has all safety thresholds set to BLOCK_NONE. (paper)

Gemini 1.5 Flash (001, default safety) — google/gemini-1.5-flash-001-safety-default

Gemini 1.5 Flash is a multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from long contexts. This model is accessed through Vertex AI and uses default safety settings. (paper)

Gemini 1.5 Flash (001, BLOCK_NONE safety) — google/gemini-1.5-flash-001-safety-block-none

Gemini 1.5 Flash is a multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from long contexts. This model is accessed through Vertex AI and has all safety thresholds set to BLOCK_NONE. (paper)

Gemini 1.5 Pro (002) — google/gemini-1.5-pro-002

Gemini 1.5 Pro is a multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from long contexts. This model is accessed through Vertex AI and has all safety thresholds set to BLOCK_NONE. (paper)

Gemini 1.5 Flash (002) — google/gemini-1.5-flash-002

Gemini 1.5 Flash is a multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from long contexts. This model is accessed through Vertex AI and has all safety thresholds set to BLOCK_NONE. (paper)

Gemma (2B) — google/gemma-2b

Gemma is a family of lightweight, open models built from the research and technology that Google used to create the Gemini models. (model card, blog post)

Gemma Instruct (2B) — google/gemma-2b-it

Gemma is a family of lightweight, open models built from the research and technology that Google used to create the Gemini models. (model card, blog post)

Gemma (7B) — google/gemma-7b

Gemma is a family of lightweight, open models built from the research and technology that Google used to create the Gemini models. (model card, blog post)

Gemma Instruct (7B) — google/gemma-7b-it

Gemma is a family of lightweight, open models built from the research and technology that Google used to create the Gemini models. (model card, blog post)

Gemma 2 (9B) — google/gemma-2-9b

Gemma is a family of lightweight, open models built from the research and technology that Google used to create the Gemini models. (model card, blog post)

Gemma 2 Instruct (9B) — google/gemma-2-9b-it

Gemma is a family of lightweight, open models built from the research and technology that Google used to create the Gemini models. (model card, blog post)

Gemma 2 (27B) — google/gemma-2-27b

Gemma is a family of lightweight, open models built from the research and technology that Google used to create the Gemini models. (model card, blog post)

Gemma 2 Instruct (27B) — google/gemma-2-27b-it

Gemma is a family of lightweight, open models built from the research and technology that Google used to create the Gemini models. (model card, blog post)

PaLM-2 (Bison) — google/text-bison@001

The best value PaLM model. PaLM 2 (Pathways Language Model) is a Transformer-based model trained using a mixture of objectives that was evaluated on English and multilingual language, and reasoning tasks. (report)

PaLM-2 (Bison) — google/text-bison@002

The best value PaLM model. PaLM 2 (Pathways Language Model) is a Transformer-based model trained using a mixture of objectives that was evaluated on English and multilingual language, and reasoning tasks. (report)

PaLM-2 (Bison) — google/text-bison-32k

The best value PaLM model with a 32K context. PaLM 2 (Pathways Language Model) is a Transformer-based model trained using a mixture of objectives that was evaluated on English and multilingual language, and reasoning tasks. (report)

PaLM-2 (Unicorn) — google/text-unicorn@001

The largest model in PaLM family. PaLM 2 (Pathways Language Model) is a Transformer-based model trained using a mixture of objectives that was evaluated on English and multilingual language, and reasoning tasks. (report)

MedLM (Medium) — google/medlm-medium

MedLM is a family of foundation models fine-tuned for the healthcare industry based on Google Research's medically-tuned large language model, Med-PaLM 2. (documentation)

MedLM (Large) — google/medlm-large

MedLM is a family of foundation models fine-tuned for the healthcare industry based on Google Research's medically-tuned large language model, Med-PaLM 2. (documentation)

Lightning AI

Lit-GPT — lightningai/lit-gpt

Lit-GPT is an optimized collection of open-source LLMs for finetuning and inference. It supports – Falcon, Llama 2, Vicuna, LongChat, and other top-performing open-source large language models.

LMSYS

Vicuna v1.3 (7B) — lmsys/vicuna-7b-v1.3

Vicuna v1.3 (7B) is an open-source chatbot trained by fine-tuning LLaMA on user-shared conversations collected from ShareGPT.

Vicuna v1.3 (13B) — lmsys/vicuna-13b-v1.3

Vicuna v1.3 (13B) is an open-source chatbot trained by fine-tuning LLaMA on user-shared conversations collected from ShareGPT.

Microsoft

Phi-2 — microsoft/phi-2

Phi-2 is a Transformer with 2.7 billion parameters. It was trained using the same data sources as Phi-1.5, augmented with a new data source that consists of various NLP synthetic texts and filtered websites (for safety and educational value)

Phi-3 (7B) — microsoft/phi-3-small-8k-instruct

Phi-3-Small-8K-Instruct is a lightweight model trained with synthetic data and filtered publicly available website data with a focus on high-quality and reasoning dense properties. (paper, blog)

Phi-3 (14B) — microsoft/phi-3-medium-4k-instruct

Phi-3-Medium-4K-Instruct is a lightweight model trained with synthetic data and filtered publicly available website data with a focus on high-quality and reasoning dense properties. (paper, blog)

01.AI

Yi (6B) — 01-ai/yi-6b

The Yi models are large language models trained from scratch by developers at 01.AI.

Yi (34B) — 01-ai/yi-34b

The Yi models are large language models trained from scratch by developers at 01.AI.

Yi Chat (6B) — 01-ai/yi-6b-chat

The Yi models are large language models trained from scratch by developers at 01.AI.

Yi Chat (34B) — 01-ai/yi-34b-chat

The Yi models are large language models trained from scratch by developers at 01.AI.

Yi Large — 01-ai/yi-large

The Yi models are large language models trained from scratch by developers at 01.AI. (tweet)

Yi Large (Preview) — 01-ai/yi-large-preview

The Yi models are large language models trained from scratch by developers at 01.AI. (tweet)

Allen Institute for AI

OLMo (7B) — allenai/olmo-7b

OLMo is a series of Open Language Models trained on the Dolma dataset.

OLMo (7B Twin 2T) — allenai/olmo-7b-twin-2t

OLMo is a series of Open Language Models trained on the Dolma dataset.

OLMo (7B Instruct) — allenai/olmo-7b-instruct

OLMo is a series of Open Language Models trained on the Dolma dataset. The instruct versions was trained on the Tulu SFT mixture and a cleaned version of the UltraFeedback dataset.

OLMo 1.7 (7B) — allenai/olmo-1.7-7b

OLMo is a series of Open Language Models trained on the Dolma dataset. The instruct versions was trained on the Tulu SFT mixture and a cleaned version of the UltraFeedback dataset.

Mistral AI

Mistral v0.1 (7B) — mistralai/mistral-7b-v0.1

Mistral 7B is a 7.3B parameter transformer model that uses Grouped-Query Attention (GQA) and Sliding-Window Attention (SWA). (blog post)

Mistral Instruct v0.1 (7B) — mistralai/mistral-7b-instruct-v0.1

Mistral v0.1 Instruct 7B is a 7.3B parameter transformer model that uses Grouped-Query Attention (GQA) and Sliding-Window Attention (SWA). The instruct version was fined-tuned using publicly available conversation datasets. (blog post)

Mistral Instruct v0.2 (7B) — mistralai/mistral-7b-instruct-v0.2

Mistral v0.2 Instruct 7B is a 7.3B parameter transformer model that uses Grouped-Query Attention (GQA). Compared to v0.1, v0.2 has a 32k context window and no Sliding-Window Attention (SWA). (blog post)

Mistral Instruct v0.3 (7B) — mistralai/mistral-7b-instruct-v0.3

Mistral v0.3 Instruct 7B is a 7.3B parameter transformer model that uses Grouped-Query Attention (GQA). Compared to v0.1, v0.2 has a 32k context window and no Sliding-Window Attention (SWA). (blog post)

Mixtral (8x7B 32K seqlen) — mistralai/mixtral-8x7b-32kseqlen

Mixtral is a mixture-of-experts model that has 46.7B total parameters but only uses 12.9B parameters per token. (blog post, tweet).

Mixtral Instruct (8x7B) — mistralai/mixtral-8x7b-instruct-v0.1

Mixtral Instruct (8x7B) is a version of Mixtral (8x7B) that was optimized through supervised fine-tuning and direct preference optimisation (DPO) for careful instruction following. (blog post).

Mixtral (8x22B) — mistralai/mixtral-8x22b

Mistral AI's mixture-of-experts model that uses 39B active parameters out of 141B (blog post).

Mixtral Instruct (8x22B) — mistralai/mixtral-8x22b-instruct-v0.1

Mistral AI's mixture-of-experts model that uses 39B active parameters out of 141B (blog post).

Ministral 3B (2402) — mistralai/ministral-3b-2410

Ministral 3B (2402) is a model for on-device computing and at-the-edge use cases (blog).

Ministral 8B (2402) — mistralai/ministral-8b-2410

Ministral 8B (2402) is a model for on-device computing and at-the-edge use cases a special interleaved sliding-window attention pattern for faster and memory-efficient inference (blog).

Mistral Small (2402) — mistralai/mistral-small-2402

Mistral Small is a multilingual model with a 32K tokens context window and function-calling capabilities. (blog)

Mistral Small (2409) — mistralai/mistral-small-2409

Mistral Small is a multilingual model with a 32K tokens context window and function-calling capabilities. (blog)

Mistral Medium (2312) — mistralai/mistral-medium-2312

Mistral is a transformer model that uses Grouped-Query Attention (GQA) and Sliding-Window Attention (SWA).

Mistral Large (2402) — mistralai/mistral-large-2402

Mistral Large is a multilingual model with a 32K tokens context window and function-calling capabilities. (blog)

Mistral Large 2 (2407) — mistralai/mistral-large-2407

Mistral Large 2 is a 123 billion parameter model that has a 128k context window and supports dozens of languages and 80+ coding languages. (blog)

Mistral Large (2411) — mistralai/mistral-large-2411

Mistral Large (2411) is a 123B parameter model that has a 128k context window. (blog)

Mistral NeMo (2402) — mistralai/open-mistral-nemo-2407

Mistral NeMo is a multilingual 12B model with a large context window of 128K tokens. (blog)

Mistral Pixtral (2409) — mistralai/pixtral-12b-2409

Mistral Pixtral 12B is the first multimodal Mistral model for image understanding. (blog)

Mistral Pixtral Large (2411) — mistralai/pixtral-large-2411

Mistral Pixtral Large is a 124B open-weights multimodal model built on top of Mistral Large 2 (2407). (blog)

MosaicML

MPT (7B) — mosaicml/mpt-7b

MPT (7B) is a Transformer trained from scratch on 1T tokens of text and code.

MPT-Instruct (7B) — mosaicml/mpt-instruct-7b

MPT-Instruct (7B) is a model for short-form instruction following. It is built by finetuning MPT (30B), a Transformer trained from scratch on 1T tokens of text and code.

MPT (30B) — mosaicml/mpt-30b

MPT (30B) is a Transformer trained from scratch on 1T tokens of text and code.

MPT-Instruct (30B) — mosaicml/mpt-instruct-30b

MPT-Instruct (30B) is a model for short-form instruction following. It is built by finetuning MPT (30B), a Transformer trained from scratch on 1T tokens of text and code.

nectec

Pathumma-llm-text-1.0.0 (7B) — nectec/Pathumma-llm-text-1.0.0

Pathumma-llm-text-1.0.0 (7B) is a instruction model from OpenThaiLLM-Prebuilt-7B (blog)

OpenThaiLLM-Prebuilt-7B (7B) — nectec/OpenThaiLLM-Prebuilt-7B

OpenThaiLLM-Prebuilt-7B (7B) is a pretrained Thai large language model with 7 billion parameters based on Qwen2.5-7B.

Neurips

Neurips Local — neurips/local

Neurips Local

NVIDIA

Megatron GPT2 — nvidia/megatron-gpt2

GPT-2 implemented in Megatron-LM (paper).

Nemotron-4 Instruct (340B) — nvidia/nemotron-4-340b-instruct

Nemotron-4 Instruct (340B) is an open weights model sized to fit on a single DGX H100 with 8 GPUs when deployed in FP8 precision. 98% of the data used for model alignment was synthetically generated (paper).

Llama 3.1 Nemotron Instruct (70B) — nvidia/llama-3.1-nemotron-70b-instruct

Llama-3.1-Nemotron-70B-Instruct is a large language model customized by NVIDIA to improve the helpfulness of LLM generated responses to user queries. It was trained using RLHF (specifically, REINFORCE), Llama-3.1-Nemotron-70B-Reward and HelpSteer2-Preference prompts on a Llama-3.1-70B-Instruct model. (paper)

OpenAI

GPT-2 (1.5B) — openai/gpt2

GPT-2 (1.5B parameters) is a transformer model trained on a large corpus of English text in a self-supervised fashion (paper).

davinci-002 — openai/davinci-002

Replacement for the GPT-3 curie and davinci base models.

babbage-002 — openai/babbage-002

Replacement for the GPT-3 ada and babbage base models.

GPT-3.5 Turbo Instruct — openai/gpt-3.5-turbo-instruct

Similar capabilities as GPT-3 era models. Compatible with legacy Completions endpoint and not Chat Completions.

GPT-3.5 Turbo (0301) — openai/gpt-3.5-turbo-0301

Sibling model of text-davinci-003 that is optimized for chat but works well for traditional completions tasks as well. Snapshot from 2023-03-01.

GPT-3.5 Turbo (0613) — openai/gpt-3.5-turbo-0613

Sibling model of text-davinci-003 that is optimized for chat but works well for traditional completions tasks as well. Snapshot from 2023-06-13.

GPT-3.5 Turbo (1106) — openai/gpt-3.5-turbo-1106

Sibling model of text-davinci-003 that is optimized for chat but works well for traditional completions tasks as well. Snapshot from 2023-11-06.

GPT-3.5 Turbo (0125) — openai/gpt-3.5-turbo-0125

Sibling model of text-davinci-003 that is optimized for chat but works well for traditional completions tasks as well. Snapshot from 2024-01-25.

gpt-3.5-turbo-16k-0613 — openai/gpt-3.5-turbo-16k-0613

Sibling model of text-davinci-003 that is optimized for chat but works well for traditional completions tasks as well. Snapshot from 2023-06-13 with a longer context length of 16,384 tokens.

GPT-4 Turbo (1106 preview) — openai/gpt-4-1106-preview

GPT-4 Turbo (preview) is a large multimodal model that is optimized for chat but works well for traditional completions tasks. The model is cheaper and faster than the original GPT-4 model. Preview snapshot from 2023-11-06.

GPT-4 (0314) — openai/gpt-4-0314

GPT-4 is a large multimodal model (currently only accepting text inputs and emitting text outputs) that is optimized for chat but works well for traditional completions tasks. Snapshot of gpt-4 from 2023-03-14.

gpt-4-32k-0314 — openai/gpt-4-32k-0314

GPT-4 is a large multimodal model (currently only accepting text inputs and emitting text outputs) that is optimized for chat but works well for traditional completions tasks. Snapshot of gpt-4 with a longer context length of 32,768 tokens from March 14th 2023.

GPT-4 (0613) — openai/gpt-4-0613

GPT-4 is a large multimodal model (currently only accepting text inputs and emitting text outputs) that is optimized for chat but works well for traditional completions tasks. Snapshot of gpt-4 from 2023-06-13.

gpt-4-32k-0613 — openai/gpt-4-32k-0613

GPT-4 is a large multimodal model (currently only accepting text inputs and emitting text outputs) that is optimized for chat but works well for traditional completions tasks. Snapshot of gpt-4 with a longer context length of 32,768 tokens from 2023-06-13.

GPT-4 Turbo (0125 preview) — openai/gpt-4-0125-preview

GPT-4 Turbo (preview) is a large multimodal model that is optimized for chat but works well for traditional completions tasks. The model is cheaper and faster than the original GPT-4 model. Preview snapshot from 2023-01-25. This snapshot is intended to reduce cases of “laziness” where the model doesn’t complete a task.

GPT-4 Turbo (2024-04-09) — openai/gpt-4-turbo-2024-04-09

GPT-4 Turbo (2024-04-09) is a large multimodal model that is optimized for chat but works well for traditional completions tasks. The model is cheaper and faster than the original GPT-4 model. Snapshot from 2024-04-09.

GPT-4o (2024-05-13) — openai/gpt-4o-2024-05-13

GPT-4o (2024-05-13) is a large multimodal model that accepts as input any combination of text, audio, and image and generates any combination of text, audio, and image outputs. (blog)

GPT-4o (2024-08-06) — openai/gpt-4o-2024-08-06

GPT-4o (2024-08-06) is a large multimodal model that accepts as input any combination of text, audio, and image and generates any combination of text, audio, and image outputs. (blog)

GPT-4o mini (2024-07-18) — openai/gpt-4o-mini-2024-07-18

GPT-4o mini (2024-07-18) is a multimodal model with a context window of 128K tokens and improved handling of non-English text. (blog)

o1-preview (2024-09-12) — openai/o1-preview-2024-09-12

o1-preview is a language model trained with reinforcement learning to perform complex reasoning that can produce a long internal chain of thought before responding to the user. (model card, blog post)

o1-mini (2024-09-12) — openai/o1-mini-2024-09-12

o1-mini is a cost-effective reasoning model for applications that require reasoning without broad world knowledge. (model card, blog post)

OpenThaiGPT

OpenThaiGPT v1.0.0 (7B) — openthaigpt/openthaigpt-1.0.0-7b-chat

OpenThaiGPT v1.0.0 (7B) is a Thai language chat model based on Llama 2 that has been specifically fine-tuned for Thai instructions and enhanced by incorporating over 10,000 of the most commonly used Thai words into the dictionary. (blog post)

OpenThaiGPT v1.0.0 (13B) — openthaigpt/openthaigpt-1.0.0-13b-chat

OpenThaiGPT v1.0.0 (13B) is a Thai language chat model based on Llama 2 that has been specifically fine-tuned for Thai instructions and enhanced by incorporating over 10,000 of the most commonly used Thai words into the dictionary. (blog post)

OpenThaiGPT v1.0.0 (70B) — openthaigpt/openthaigpt-1.0.0-70b-chat

OpenThaiGPT v1.0.0 (70B) is a Thai language chat model based on Llama 2 that has been specifically fine-tuned for Thai instructions and enhanced by incorporating over 10,000 of the most commonly used Thai words into the dictionary. (blog post)

Qwen

Qwen — qwen/qwen-7b

7B-parameter version of the large language model series, Qwen (abbr. Tongyi Qianwen), proposed by Aibaba Cloud. Qwen is a family of transformer models with SwiGLU activation, RoPE, and multi-head attention. (blog)

Qwen1.5 (7B) — qwen/qwen1.5-7b

7B-parameter version of the large language model series, Qwen 1.5 (abbr. Tongyi Qianwen), proposed by Aibaba Cloud. Qwen is a family of transformer models with SwiGLU activation, RoPE, and multi-head attention. (blog)

Qwen1.5 (14B) — qwen/qwen1.5-14b

14B-parameter version of the large language model series, Qwen 1.5 (abbr. Tongyi Qianwen), proposed by Aibaba Cloud. Qwen is a family of transformer models with SwiGLU activation, RoPE, and multi-head attention. (blog)

Qwen1.5 (32B) — qwen/qwen1.5-32b

32B-parameter version of the large language model series, Qwen 1.5 (abbr. Tongyi Qianwen), proposed by Aibaba Cloud. Qwen is a family of transformer models with SwiGLU activation, RoPE, and multi-head attention. The 32B version also includes grouped query attention (GQA). (blog)

Qwen1.5 (72B) — qwen/qwen1.5-72b

72B-parameter version of the large language model series, Qwen 1.5 (abbr. Tongyi Qianwen), proposed by Aibaba Cloud. Qwen is a family of transformer models with SwiGLU activation, RoPE, and multi-head attention. (blog)

Qwen1.5 Chat (7B) — qwen/qwen1.5-7b-chat

7B-parameter version of the large language model series, Qwen 1.5 (abbr. Tongyi Qianwen), proposed by Aibaba Cloud. Qwen is a family of transformer models with SwiGLU activation, RoPE, and multi-head attention. (blog)

Qwen1.5 Chat (14B) — qwen/qwen1.5-14b-chat

14B-parameter chat version of the large language model series, Qwen 1.5 (abbr. Tongyi Qianwen), proposed by Aibaba Cloud. Qwen is a family of transformer models with SwiGLU activation, RoPE, and multi-head attention. (blog)

Qwen1.5 Chat (32B) — qwen/qwen1.5-32b-chat

32B-parameter version of the large language model series, Qwen 1.5 (abbr. Tongyi Qianwen), proposed by Aibaba Cloud. Qwen is a family of transformer models with SwiGLU activation, RoPE, and multi-head attention. The 32B version also includes grouped query attention (GQA). (blog)

Qwen1.5 Chat (72B) — qwen/qwen1.5-72b-chat

72B-parameter chat version of the large language model series, Qwen 1.5 (abbr. Tongyi Qianwen), proposed by Aibaba Cloud. Qwen is a family of transformer models with SwiGLU activation, RoPE, and multi-head attention. (blog)

Qwen1.5 Chat (110B) — qwen/qwen1.5-110b-chat

110B-parameter chat version of the large language model series, Qwen 1.5 (abbr. Tongyi Qianwen), proposed by Aibaba Cloud. Qwen is a family of transformer models with SwiGLU activation, RoPE, and multi-head attention. The 110B version also includes grouped query attention (GQA). (blog)

Qwen2 Instruct (72B) — qwen/qwen2-72b-instruct

72B-parameter chat version of the large language model series, Qwen2. Qwen2 uses Group Query Attention (GQA) and has extended context length support up to 128K tokens. (blog)

Qwen2.5 Instruct Turbo (7B) — qwen/qwen2.5-7b-instruct-turbo

Qwen2.5 Instruct Turbo (7B) was trained on 18 trillion tokens and supports 29 languages, and shows improvements over Qwen2 in knowledge, coding, mathematics, instruction following, generating long texts, and processing structure data. (blog) Turbo is Together's cost-efficient implementation, providing fast FP8 performance while maintaining quality, closely matching FP16 reference models. (blog)

Qwen2.5 Instruct Turbo (72B) — qwen/qwen2.5-72b-instruct-turbo

Qwen2.5 Instruct Turbo (72B) was trained on 18 trillion tokens and supports 29 languages, and shows improvements over Qwen2 in knowledge, coding, mathematics, instruction following, generating long texts, and processing structure data. (blog) Turbo is Together's cost-efficient implementation, providing fast FP8 performance while maintaining quality, closely matching FP16 reference models. (blog)

SAIL

Sailor (7B) — sail/sailor-7b

Sailor is a suite of Open Language Models tailored for South-East Asia, focusing on languages such as Indonesian, Thai, Vietnamese, Malay, and Lao. These models were continually pre-trained from Qwen1.5. (paper)

Sailor Chat (7B) — sail/sailor-7b-chat

Sailor is a suite of Open Language Models tailored for South-East Asia, focusing on languages such as Indonesian, Thai, Vietnamese, Malay, and Lao. These models were continually pre-trained from Qwen1.5. (paper)

Sailor (14B) — sail/sailor-14b

Sailor is a suite of Open Language Models tailored for South-East Asia, focusing on languages such as Indonesian, Thai, Vietnamese, Malay, and Lao. These models were continually pre-trained from Qwen1.5. (paper)

Sailor Chat (14B) — sail/sailor-14b-chat

Sailor is a suite of Open Language Models tailored for South-East Asia, focusing on languages such as Indonesian, Thai, Vietnamese, Malay, and Lao. These models were continually pre-trained from Qwen1.5. (paper)

SambaLingo

SambaLingo-Thai-Base — sambanova/sambalingo-thai-base

SambaLingo-Thai-Base is a pretrained bi-lingual Thai and English model that adapts Llama 2 (7B) to Thai by training on 38 billion tokens from the Thai split of the Cultura-X dataset. (paper)

SambaLingo-Thai-Chat — sambanova/sambalingo-thai-chat

SambaLingo-Thai-Chat is a chat model trained using direct preference optimization on SambaLingo-Thai-Base. SambaLingo-Thai-Base adapts Llama 2 (7B) to Thai by training on 38 billion tokens from the Thai split of the Cultura-X dataset. (paper)

SambaLingo-Thai-Base-70B — sambanova/sambalingo-thai-base-70b

SambaLingo-Thai-Base-70B is a pretrained bi-lingual Thai and English model that adapts Llama 2 (70B) to Thai by training on 26 billion tokens from the Thai split of the Cultura-X dataset. (paper)

SambaLingo-Thai-Chat-70B — sambanova/sambalingo-thai-chat-70b

SambaLingo-Thai-Chat-70B is a chat model trained using direct preference optimization on SambaLingo-Thai-Base-70B. SambaLingo-Thai-Base-70B adapts Llama 2 (7B) to Thai by training on 26 billion tokens from the Thai split of the Cultura-X dataset. (paper)

SCB10X

Typhoon (7B) — scb10x/typhoon-7b

Typhoon (7B) is pretrained Thai large language model with 7 billion parameters based on Mistral 7B. (paper)

Typhoon v1.5 (8B) — scb10x/typhoon-v1.5-8b

Typhoon v1.5 (8B) is a pretrained Thai large language model with 8 billion parameters based on Llama 3 8B. (blog)

Typhoon v1.5 Instruct (8B) — scb10x/typhoon-v1.5-8b-instruct

Typhoon v1.5 Instruct (8B) is a pretrained Thai large language model with 8 billion parameters based on Llama 3 8B. (blog)

Typhoon v1.5 (72B) — scb10x/typhoon-v1.5-72b

Typhoon v1.5 (72B) is a pretrained Thai large language model with 72 billion parameters based on Qwen1.5-72B. (blog)

Typhoon v1.5 Instruct (72B) — scb10x/typhoon-v1.5-72b-instruct

Typhoon v1.5 Instruct (72B) is a pretrained Thai large language model with 72 billion parameters based on Qwen1.5-72B. (blog)

Typhoon 1.5X instruct (8B) — scb10x/llama-3-typhoon-v1.5x-8b-instruct

Llama-3-Typhoon-1.5X-8B-instruct is a 8 billion parameter instruct model designed for the Thai language based on Llama 3 Instruct. It utilizes the task-arithmetic model editing technique. (blog)

Typhoon 1.5X instruct (70B) — scb10x/llama-3-typhoon-v1.5x-70b-instruct

Llama-3-Typhoon-1.5X-70B-instruct is a 70 billion parameter instruct model designed for the Thai language based on Llama 3 Instruct. It utilizes the task-arithmetic model editing technique. (blog)

Alibaba DAMO Academy

SeaLLM v2 (7B) — damo/seallm-7b-v2

SeaLLM v2 is a multilingual LLM for Southeast Asian (SEA) languages trained from Mistral (7B). (website)

SeaLLM v2.5 (7B) — damo/seallm-7b-v2.5

SeaLLM is a multilingual LLM for Southeast Asian (SEA) languages trained from Gemma (7B). (website)

Snowflake

Arctic Instruct — snowflake/snowflake-arctic-instruct

Arctic combines a 10B dense transformer model with a residual 128x3.66B MoE MLP resulting in 480B total and 17B active parameters chosen using a top-2 gating.

Stability AI

StableLM-Base-Alpha (3B) — stabilityai/stablelm-base-alpha-3b

StableLM-Base-Alpha is a suite of 3B and 7B parameter decoder-only language models pre-trained on a diverse collection of English datasets with a sequence length of 4096 to push beyond the context window limitations of existing open-source language models.

StableLM-Base-Alpha (7B) — stabilityai/stablelm-base-alpha-7b

StableLM-Base-Alpha is a suite of 3B and 7B parameter decoder-only language models pre-trained on a diverse collection of English datasets with a sequence length of 4096 to push beyond the context window limitations of existing open-source language models.

Stanford

Alpaca (7B) — stanford/alpaca-7b

Alpaca 7B is a model fine-tuned from the LLaMA 7B model on 52K instruction-following demonstrations

TII UAE

Falcon (7B) — tiiuae/falcon-7b

Falcon-7B is a 7B parameters causal decoder-only model built by TII and trained on 1,500B tokens of RefinedWeb enhanced with curated corpora.

Falcon-Instruct (7B) — tiiuae/falcon-7b-instruct

Falcon-7B-Instruct is a 7B parameters causal decoder-only model built by TII based on Falcon-7B and finetuned on a mixture of chat/instruct datasets.

Falcon (40B) — tiiuae/falcon-40b

Falcon-40B is a 40B parameters causal decoder-only model built by TII and trained on 1,500B tokens of RefinedWeb enhanced with curated corpora.

Falcon-Instruct (40B) — tiiuae/falcon-40b-instruct

Falcon-40B-Instruct is a 40B parameters causal decoder-only model built by TII based on Falcon-7B and finetuned on a mixture of chat/instruct datasets.

Together

GPT-JT (6B) — together/gpt-jt-6b-v1

GPT-JT (6B parameters) is a fork of GPT-J (blog post).

GPT-NeoXT-Chat-Base (20B) — together/gpt-neoxt-chat-base-20b

GPT-NeoXT-Chat-Base (20B) is fine-tuned from GPT-NeoX, serving as a base model for developing open-source chatbots.

RedPajama-INCITE-Base-v1 (3B) — together/redpajama-incite-base-3b-v1

RedPajama-INCITE-Base-v1 (3B parameters) is a 3 billion base model that aims to replicate the LLaMA recipe as closely as possible.

RedPajama-INCITE-Instruct-v1 (3B) — together/redpajama-incite-instruct-3b-v1

RedPajama-INCITE-Instruct-v1 (3B parameters) is a model fine-tuned for few-shot applications on the data of GPT-JT. It is built from RedPajama-INCITE-Base-v1 (3B), a 3 billion base model that aims to replicate the LLaMA recipe as closely as possible.

RedPajama-INCITE-Base (7B) — together/redpajama-incite-base-7b

RedPajama-INCITE-Base (7B parameters) is a 7 billion base model that aims to replicate the LLaMA recipe as closely as possible.

RedPajama-INCITE-Instruct (7B) — together/redpajama-incite-instruct-7b

RedPajama-INCITE-Instruct (7B parameters) is a model fine-tuned for few-shot applications on the data of GPT-JT. It is built from RedPajama-INCITE-Base (7B), a 7 billion base model that aims to replicate the LLaMA recipe as closely as possible.

Upstage

Solar Pro Preview (22B) — upstage/solar-pro-preview-instruct

Solar Pro Preview (22B) is open-weights model for single GPU inference that is a preview of the upcoming Solar Pro model (blog).

Solar Pro — upstage/solar-pro-241126

Solar Pro is a LLM designed for instruction-following and processing structured formats like HTML and Markdown. It supports English, Korean, and Japanese and has domain expertise in Finance, Healthcare, and Legal. (blog).

Writer

Palmyra Base (5B) — writer/palmyra-base

Palmyra Base (5B)

Palmyra Large (20B) — writer/palmyra-large

Palmyra Large (20B)

Silk Road (35B) — writer/silk-road

Silk Road (35B)

Palmyra X (43B) — writer/palmyra-x

Palmyra-X (43B parameters) is trained to adhere to instructions using human feedback and utilizes a technique called multiquery attention. Furthermore, a new feature called 'self-instruct' has been introduced, which includes the implementation of an early stopping criteria specifically designed for minimal instruction tuning (paper).

Palmyra X V2 (33B) — writer/palmyra-x-v2

Palmyra-X V2 (33B parameters) is a Transformer-based model, which is trained on extremely large-scale pre-training data. The pre-training data more than 2 trillion tokens types are diverse and cover a wide range of areas, used FlashAttention-2.

Palmyra X V3 (72B) — writer/palmyra-x-v3

Palmyra-X V3 (72B parameters) is a Transformer-based model, which is trained on extremely large-scale pre-training data. It is trained via unsupervised learning and DPO and use multiquery attention.

Palmyra X-32K (33B) — writer/palmyra-x-32k

Palmyra-X-32K (33B parameters) is a Transformer-based model, which is trained on large-scale pre-training data. The pre-training data types are diverse and cover a wide range of areas. These data types are used in conjunction and the alignment mechanism to extend context window.

Palmyra-X-004 — writer/palmyra-x-004

Palmyra-X-004 language model with a large context window of up to 128,000 tokens that excels in processing and understanding complex tasks.

Palmyra-Med 32K (70B) — writer/palmyra-med-32k

Palmyra-Med 32K (70B) is a model finetuned from Palmyra-X-003 intended for medical applications.

Palmyra-Med (70B) — writer/palmyra-med

Palmyra-Med (70B) is a model finetuned from Palmyra-X-003 intended for medical applications.

Palmyra-Fin 32K (70B) — writer/palmyra-fin-32k

Palmyra-Fin 32K (70B) is a model finetuned from Palmyra-X-003 intended for financial applications.

xAI

Grok Beta — xai/grok-beta

Grok Beta is a model from xAI.

Yandex

YaLM (100B) — yandex/yalm

YaLM (100B parameters) is an autoregressive language model trained on English and Russian text (GitHub).

IBM

Granite 3.0 base (2B) — ibm-granite/granite-3.0-2b-base

Granite-3.0-2B-Base is a decoder-only language model to support a variety of text-to-text generation tasks.

Granite 3.0 Instruct (2B) — ibm-granite/granite-3.0-2b-instruct

Granite-3.0-2B-Instruct is a 2B parameter model finetuned from Granite-3.0-2B-Base using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets.

Granite 3.0 instruct (8B) — ibm-granite/granite-3.0-8b-instruct

Granite-3.0-8B-Instruct is a 8B parameter model finetuned from Granite-3.0-8B-Base using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets.

Granite 3.0 base (8B) — ibm-granite/granite-3.0-8b-base

Granite-3.0-8B-Base is a decoder-only language model to support a variety of text-to-text generation tasks.

Granite 3.0 A800M instruct (3B) — ibm-granite/granite-3.0-3b-a800m-instruct

Granite-3.0-3B-A800M-Instruct is a 3B parameter model finetuned from Granite-3.0-3B-A800M-Base-4K using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets.

Granite 3.0 A800M base (3B) — ibm-granite/granite-3.0-3b-a800m-base

Granite-3.0-3B-A800M-Base is a decoder-only language model to support a variety of text-to-text generation tasks.

Granite 3.0 A400M instruct (1B) — ibm-granite/granite-3.0-1b-a400m-instruct

Granite-3.0-1B-A400M-Instruct is an 1B parameter model finetuned from Granite-3.0-1B-A400M-Base using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets.

Granite 3.0 A400M base (1B) — ibm-granite/granite-3.0-1b-a400m-base

Granite-3.0-1B-A400M-Base is a decoder-only language model to support a variety of text-to-text generation tasks. It is trained from scratch following a two-stage training strategy.

BigCode

SantaCoder (1.1B) — bigcode/santacoder

SantaCoder (1.1B parameters) model trained on the Python, Java, and JavaScript subset of The Stack (v1.1) (model card).

StarCoder (15.5B) — bigcode/starcoder

The StarCoder (15.5B parameter) model trained on 80+ programming languages from The Stack (v1.2) (model card).

Google

Codey PaLM-2 (Bison) — google/code-bison@001

A model fine-tuned to generate code based on a natural language description of the desired code. PaLM 2 (Pathways Language Model) is a Transformer-based model trained using a mixture of objectives that was evaluated on English and multilingual language, and reasoning tasks. (report)

Codey PaLM-2 (Bison) — google/code-bison@002

A model fine-tuned to generate code based on a natural language description of the desired code. PaLM 2 (Pathways Language Model) is a Transformer-based model trained using a mixture of objectives that was evaluated on English and multilingual language, and reasoning tasks. (report)

Codey PaLM-2 (Bison) — google/code-bison-32k

Codey with a 32K context. PaLM 2 (Pathways Language Model) is a Transformer-based model trained using a mixture of objectives that was evaluated on English and multilingual language, and reasoning tasks. (report)

Vision-Language Models

Aleph Alpha

Luminous Base (13B) — AlephAlpha/luminous-base

Luminous Base (13B parameters) (docs)

Luminous Extended (30B) — AlephAlpha/luminous-extended

Luminous Extended (30B parameters) (docs)

Anthropic

Claude 3 Haiku (20240307) — anthropic/claude-3-haiku-20240307

Claude 3 is a a family of models that possess vision and multilingual capabilities. They were trained with various methods such as unsupervised learning and Constitutional AI (blog).

Claude 3 Sonnet (20240229) — anthropic/claude-3-sonnet-20240229

Claude 3 is a a family of models that possess vision and multilingual capabilities. They were trained with various methods such as unsupervised learning and Constitutional AI (blog).

Claude 3 Opus (20240229) — anthropic/claude-3-opus-20240229

Claude 3 is a a family of models that possess vision and multilingual capabilities. They were trained with various methods such as unsupervised learning and Constitutional AI (blog).

Claude 3.5 Sonnet (20240620) — anthropic/claude-3-5-sonnet-20240620

Claude 3.5 Sonnet is a Claude 3 family model which outperforms Claude 3 Opus while operating faster and at a lower cost. (blog)

Claude 3.5 Sonnet (20241022) — anthropic/claude-3-5-sonnet-20241022

Claude 3.5 Sonnet is a Claude 3 family model which outperforms Claude 3 Opus while operating faster and at a lower cost (blog). This is an upgraded snapshot released on 2024-10-22 (blog).

Google

Gemini Pro Vision — google/gemini-pro-vision

Gemini Pro Vision is a multimodal model able to reason across text, images, video, audio and code. (paper)

Gemini 1.0 Pro Vision — google/gemini-1.0-pro-vision-001

Gemini 1.0 Pro Vision is a multimodal model able to reason across text, images, video, audio and code. (paper)

Gemini 1.5 Pro (001) — google/gemini-1.5-pro-001

Gemini 1.5 Pro is a multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from long contexts. This model is accessed through Vertex AI and has all safety thresholds set to BLOCK_NONE. (paper)

Gemini 1.5 Flash (001) — google/gemini-1.5-flash-001

Gemini 1.5 Flash is a multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from long contexts. This model is accessed through Vertex AI and has all safety thresholds set to BLOCK_NONE. (paper)

Gemini 1.5 Pro (0409 preview) — google/gemini-1.5-pro-preview-0409

Gemini 1.5 Pro is a multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from long contexts. This model is accessed through Vertex AI and has all safety thresholds set to BLOCK_NONE. (paper)

Gemini 1.5 Pro (0514 preview) — google/gemini-1.5-pro-preview-0514

Gemini 1.5 Pro is a multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from long contexts. This model is accessed through Vertex AI and has all safety thresholds set to BLOCK_NONE. (paper)

Gemini 1.5 Flash (0514 preview) — google/gemini-1.5-flash-preview-0514

Gemini 1.5 Flash is a smaller Gemini model. It has a 1 million token context window and allows interleaving text, images, audio and video as inputs. This model is accessed through Vertex AI and has all safety thresholds set to BLOCK_NONE. (blog)

Gemini 1.5 Pro (001, default safety) — google/gemini-1.5-pro-001-safety-default

Gemini 1.5 Pro is a multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from long contexts. This model is accessed through Vertex AI and uses default safety settings. (paper)

Gemini 1.5 Pro (001, BLOCK_NONE safety) — google/gemini-1.5-pro-001-safety-block-none

Gemini 1.5 Pro is a multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from long contexts. This model is accessed through Vertex AI and has all safety thresholds set to BLOCK_NONE. (paper)

Gemini 1.5 Flash (001, default safety) — google/gemini-1.5-flash-001-safety-default

Gemini 1.5 Flash is a multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from long contexts. This model is accessed through Vertex AI and uses default safety settings. (paper)

Gemini 1.5 Flash (001, BLOCK_NONE safety) — google/gemini-1.5-flash-001-safety-block-none

Gemini 1.5 Flash is a multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from long contexts. This model is accessed through Vertex AI and has all safety thresholds set to BLOCK_NONE. (paper)

Gemini 1.5 Pro (002) — google/gemini-1.5-pro-002

Gemini 1.5 Pro is a multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from long contexts. This model is accessed through Vertex AI and has all safety thresholds set to BLOCK_NONE. (paper)

Gemini 1.5 Flash (002) — google/gemini-1.5-flash-002

Gemini 1.5 Flash is a multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from long contexts. This model is accessed through Vertex AI and has all safety thresholds set to BLOCK_NONE. (paper)

PaliGemma (3B) Mix 224 — google/paligemma-3b-mix-224

PaliGemma is a versatile and lightweight vision-language model (VLM) inspired by PaLI-3 and based on open components such as the SigLIP vision model and the Gemma language model. Pre-trained with 224x224 input images and 128 token input/output text sequences. Finetuned on a mixture of downstream academic datasets. (blog)

PaliGemma (3B) Mix 448 — google/paligemma-3b-mix-448

PaliGemma is a versatile and lightweight vision-language model (VLM) inspired by PaLI-3 and based on open components such as the SigLIP vision model and the Gemma language model. Pre-trained with 448x448 input images and 512 token input/output text sequences. Finetuned on a mixture of downstream academic datasets. (blog)

HuggingFace

IDEFICS 2 (8B) — HuggingFaceM4/idefics2-8b

IDEFICS 2 (8B parameters) is an open multimodal model that accepts arbitrary sequences of image and text inputs and produces text outputs. (blog).

IDEFICS (9B) — HuggingFaceM4/idefics-9b

IDEFICS (9B parameters) is an open-source model based on DeepMind's Flamingo (blog).

IDEFICS-instruct (9B) — HuggingFaceM4/idefics-9b-instruct

IDEFICS-instruct (9B parameters) is the instruction-tuned version of IDEFICS 9B (blog).

IDEFICS (80B) — HuggingFaceM4/idefics-80b

IDEFICS (80B parameters) is an open-source model based on DeepMind's Flamingo (blog).

IDEFICS-instruct (80B) — HuggingFaceM4/idefics-80b-instruct

IDEFICS-instruct (80B parameters) is the instruction-tuned version of IDEFICS 80B (blog).

Meta

Llama 3.2 Vision Instruct Turbo (11B) — meta/llama-3.2-11b-vision-instruct-turbo

The Llama 3.2 Vision collection of multimodal large language models (LLMs) is a collection of pretrained and instruction-tuned image reasoning generative models in 11B and 90B sizes. (blog) Turbo is Together's implementation, providing a near negligible difference in quality from the reference implementation with faster performance and lower cost, currently using FP8 quantization. (blog)

Llama 3.2 Vision Instruct Turbo (90B) — meta/llama-3.2-90b-vision-instruct-turbo

The Llama 3.2 Vision collection of multimodal large language models (LLMs) is a collection of pretrained and instruction-tuned image reasoning generative models in 11B and 90B sizes. (blog) Turbo is Together's implementation, providing a near negligible difference in quality from the reference implementation with faster performance and lower cost, currently using FP8 quantization. (blog)

Microsoft

LLaVA 1.5 (7B) — microsoft/llava-1.5-7b-hf

LLaVa is an open-source chatbot trained by fine-tuning LlamA/Vicuna on GPT-generated multimodal instruction-following data. (paper)

LLaVA 1.5 (13B) — microsoft/llava-1.5-13b-hf

LLaVa is an open-source chatbot trained by fine-tuning LlamA/Vicuna on GPT-generated multimodal instruction-following data. (paper)

LLaVA 1.6 (7B) — uw-madison/llava-v1.6-vicuna-7b-hf

LLaVa is an open-source chatbot trained by fine-tuning LlamA/Vicuna on GPT-generated multimodal instruction-following data. (paper)

LLaVA 1.6 (13B) — uw-madison/llava-v1.6-vicuna-13b-hf

LLaVa is an open-source chatbot trained by fine-tuning LlamA/Vicuna on GPT-generated multimodal instruction-following data. (paper)

LLaVA 1.6 + Mistral (7B) — uw-madison/llava-v1.6-mistral-7b-hf

LLaVa is an open-source chatbot trained by fine-tuning LlamA/Vicuna on GPT-generated multimodal instruction-following data. (paper)

LLaVA + Nous-Hermes-2-Yi-34B (34B) — uw-madison/llava-v1.6-34b-hf

LLaVa is an open-source chatbot trained by fine-tuning LlamA/Vicuna on GPT-generated multimodal instruction-following data. (paper)

OpenFlamingo

OpenFlamingo (9B) — openflamingo/OpenFlamingo-9B-vitl-mpt7b

OpenFlamingo is an open source implementation of DeepMind's Flamingo models. This 9B-parameter model uses a CLIP ViT-L/14 vision encoder and MPT-7B language model (paper).

KAIST AI

LLaVA + Vicuna-v1.5 (13B) — kaistai/prometheus-vision-13b-v1.0-hf

LLaVa is an open-source chatbot trained by fine-tuning LlamA/Vicuna on GPT-generated multimodal instruction-following data. (paper)

Mistral AI

BakLLaVA v1 (7B) — mistralai/bakLlava-v1-hf

BakLLaVA v1 is a Mistral 7B base augmented with the LLaVA 1.5 architecture. (blog)

Mistral Pixtral (2409) — mistralai/pixtral-12b-2409

Mistral Pixtral 12B is the first multimodal Mistral model for image understanding. (blog)

Mistral Pixtral Large (2411) — mistralai/pixtral-large-2411

Mistral Pixtral Large is a 124B open-weights multimodal model built on top of Mistral Large 2 (2407). (blog)

OpenAI

GPT-4 Turbo (2024-04-09) — openai/gpt-4-turbo-2024-04-09

GPT-4 Turbo (2024-04-09) is a large multimodal model that is optimized for chat but works well for traditional completions tasks. The model is cheaper and faster than the original GPT-4 model. Snapshot from 2024-04-09.

GPT-4o (2024-05-13) — openai/gpt-4o-2024-05-13

GPT-4o (2024-05-13) is a large multimodal model that accepts as input any combination of text, audio, and image and generates any combination of text, audio, and image outputs. (blog)

GPT-4o (2024-08-06) — openai/gpt-4o-2024-08-06

GPT-4o (2024-08-06) is a large multimodal model that accepts as input any combination of text, audio, and image and generates any combination of text, audio, and image outputs. (blog)

GPT-4o mini (2024-07-18) — openai/gpt-4o-mini-2024-07-18

GPT-4o mini (2024-07-18) is a multimodal model with a context window of 128K tokens and improved handling of non-English text. (blog)

GPT-4V (1106 preview) — openai/gpt-4-vision-preview

GPT-4V is a large multimodal model that accepts both text and images and is optimized for chat (model card).

GPT-4V (1106 preview) — openai/gpt-4-1106-vision-preview

GPT-4V is a large multimodal model that accepts both text and images and is optimized for chat (model card).

Alibaba Cloud

Qwen-VL — qwen/qwen-vl

Visual multimodal version of the Qwen large language model series (paper).

Qwen-VL Chat — qwen/qwen-vl-chat

Chat version of Qwen-VL (paper).

Writer

Palmyra Vision 003 — writer/palmyra-vision-003

Palmyra Vision 003 (internal only)

Reka AI

Reka-Core — reka/reka-core

Reka-Core

Reka-Core-20240415 — reka/reka-core-20240415

Reka-Core-20240415

Reka-Core-20240501 — reka/reka-core-20240501

Reka-Core-20240501

Reka-Flash (21B) — reka/reka-flash

Reka-Flash (21B)

Reka-Flash-20240226 (21B) — reka/reka-flash-20240226

Reka-Flash-20240226 (21B)

Reka-Edge (7B) — reka/reka-edge

Reka-Edge (7B)

Reka-Edge-20240208 (7B) — reka/reka-edge-20240208

Reka-Edge-20240208 (7B)

Text-to-image Models

Adobe

GigaGAN (1B) — adobe/giga-gan

GigaGAN is a GAN model that produces high-quality images extremely quickly. The model was trained on text and image pairs from LAION2B-en and COYO-700M. (paper).

Aleph Alpha

MultiFusion (13B) — AlephAlpha/m-vader

MultiFusion is a multimodal, multilingual diffusion model that extend the capabilities of Stable Diffusion v1.4 by integrating different pre-trained modules, which transfers capabilities to the downstream model (paper)

Craiyon

DALL-E mini (0.4B) — craiyon/dalle-mini

DALL-E mini is an open-source text-to-image model that attempt to reproduce OpenAI's DALL-E 1 (code).

DALL-E mega (2.6B) — craiyon/dalle-mega

DALL-E mega is an open-source text-to-image model that attempt to reproduce OpenAI's DALL-E 1 (code).

DeepFloyd

DeepFloyd IF Medium (0.4B) — DeepFloyd/IF-I-M-v1.0

DeepFloyd-IF is a pixel-based text-to-image triple-cascaded diffusion model with state-of-the-art photorealism and language understanding (paper coming soon).

DeepFloyd IF Large (0.9B) — DeepFloyd/IF-I-L-v1.0

DeepFloyd-IF is a pixel-based text-to-image triple-cascaded diffusion model with state-of-the-art photorealism and language understanding (paper coming soon).

DeepFloyd IF X-Large (4.3B) — DeepFloyd/IF-I-XL-v1.0

DeepFloyd-IF is a pixel-based text-to-image triple-cascaded diffusion model with state-of-the-art photorealism and language understanding (paper coming soon).

dreamlike.art

Dreamlike Diffusion v1.0 (1B) — huggingface/dreamlike-diffusion-v1-0

Dreamlike Diffusion v1.0 is Stable Diffusion v1.5 fine tuned on high quality art (HuggingFace model card)

Dreamlike Photoreal v2.0 (1B) — huggingface/dreamlike-photoreal-v2-0

Dreamlike Photoreal v2.0 is a photorealistic model based on Stable Diffusion v1.5 (HuggingFace model card)

PromptHero

Openjourney (1B) — huggingface/openjourney-v1-0

Openjourney is an open source Stable Diffusion fine tuned model on Midjourney images (HuggingFace model card)

Openjourney v2 (1B) — huggingface/openjourney-v2-0

Openjourney v2 is an open source Stable Diffusion fine tuned model on Midjourney images. Openjourney v2 is now referred to as Openjourney v4 in Hugging Face (HuggingFace model card).

Microsoft

Promptist + Stable Diffusion v1.4 (1B) — huggingface/promptist-stable-diffusion-v1-4

Trained with human preferences, Promptist optimizes user input into model-preferred prompts for Stable Diffusion v1.4 (paper)

nitrosocke

Redshift Diffusion (1B) — huggingface/redshift-diffusion

Redshift Diffusion is an open source Stable Diffusion model fine tuned on high resolution 3D artworks (HuggingFace model card)

TU Darmstadt

Safe Stable Diffusion weak (1B) — huggingface/stable-diffusion-safe-weak

Safe Stable Diffusion is an extension to the Stable Diffusion that drastically reduces inappropriate content (paper).

Safe Stable Diffusion medium (1B) — huggingface/stable-diffusion-safe-medium

Safe Stable Diffusion is an extension to the Stable Diffusion that drastically reduces inappropriate content (paper)

Safe Stable Diffusion strong (1B) — huggingface/stable-diffusion-safe-strong

Safe Stable Diffusion is an extension to the Stable Diffusion that drastically reduces inappropriate content (paper)

Safe Stable Diffusion max (1B) — huggingface/stable-diffusion-safe-max

Safe Stable Diffusion is an extension to the Stable Diffusion that drastically reduces inappropriate content (paper)

Ludwig Maximilian University of Munich CompVis

Stable Diffusion v1.4 (1B) — huggingface/stable-diffusion-v1-4

Stable Diffusion v1.4 is a latent text-to-image diffusion model capable of generating photorealistic images given any text input (paper)

Runway

Stable Diffusion v1.5 (1B) — huggingface/stable-diffusion-v1-5

The Stable-Diffusion-v1-5 checkpoint was initialized with the weights of the Stable-Diffusion-v1-2 checkpoint and subsequently fine-tuned on 595k steps at resolution 512x512 on laion-aesthetics v2 5+ and 10% dropping of the text-conditioning to improve classifier-free guidance sampling (paper)

Stability AI

Stable Diffusion v2 base (1B) — huggingface/stable-diffusion-v2-base

The model is trained from scratch 550k steps at resolution 256x256 on a subset of LAION-5B filtered for explicit pornographic material, using the LAION-NSFW classifier with punsafe=0.1 and an aesthetic score greater than 4.5. Then it is further trained for 850k steps at resolution 512x512 on the same dataset on images with resolution greater than 512x512 (paper)

Stable Diffusion v2.1 base (1B) — huggingface/stable-diffusion-v2-1-base

This stable-diffusion-2-1-base model fine-tunes stable-diffusion-2-base with 220k extra steps taken, with punsafe=0.98 on the same dataset (paper)

Stable Diffusion XL — stabilityai/stable-diffusion-xl-base-1.0

Stable Diffusion XL (SDXL) consists of an ensemble of experts pipeline for latent diffusion. (HuggingFace model card)

22 Hours

Vintedois (22h) Diffusion model v0.1 (1B) — huggingface/vintedois-diffusion-v0-1

Vintedois (22h) Diffusion model v0.1 is Stable Diffusion v1.5 that was finetuned on a large amount of high quality images with simple prompts to generate beautiful images without a lot of prompt engineering (HuggingFace model card)

Segmind

Segmind Stable Diffusion (0.74B) — segmind/Segmind-Vega

The Segmind-Vega Model is a distilled version of the Stable Diffusion XL (SDXL), offering a remarkable 70% reduction in size and an impressive 100% speedup while retaining high-quality text-to-image generation capabilities. Trained on diverse datasets, including Grit and Midjourney scrape data, it excels at creating a wide range of visual content based on textual prompts. (HuggingFace model card)

Segmind Stable Diffusion (1B) — segmind/SSD-1B

The Segmind Stable Diffusion Model (SSD-1B) is a distilled 50% smaller version of the Stable Diffusion XL (SDXL), offering a 60% speedup while maintaining high-quality text-to-image generation capabilities. It has been trained on diverse datasets, including Grit and Midjourney scrape data, to enhance its ability to create a wide range of visual content based on textual prompts. (HuggingFace model card)

Kakao

minDALL-E (1.3B) — kakaobrain/mindall-e

minDALL-E, named after minGPT, is an autoregressive text-to-image generation model trained on 14 million image-text pairs (code)

Lexica

Lexica Search with Stable Diffusion v1.5 (1B) — lexica/search-stable-diffusion-1.5

Retrieves Stable Diffusion v1.5 images Lexica users generated (docs).

OpenAI

DALL-E 2 (3.5B) — openai/dall-e-2

DALL-E 2 is a encoder-decoder-based latent diffusion model trained on large-scale paired text-image datasets. The model is available via the OpenAI API (paper).

DALL-E 3 — openai/dall-e-3

DALL-E 3 is a text-to-image generation model built natively on ChatGPT, used to prompt engineer automatically. The default style, vivid, causes the model to lean towards generating hyper-real and dramatic images. The model is available via the OpenAI API (paper).

DALL-E 3 (natural style) — openai/dall-e-3-natural

DALL-E 3 is a text-to-image generation model built natively on ChatGPT, used to prompt engineer automatically. The natural style causes the model to produce more natural, less hyper-real looking images. The model is available via the OpenAI API (paper).

DALL-E 3 HD — openai/dall-e-3-hd

DALL-E 3 is a text-to-image generation model built natively on ChatGPT, used to prompt engineer automatically. The HD version creates images with finer details and greater consistency across the image, but generation is slower. The default style, vivid, causes the model to lean towards generating hyper-real and dramatic images. The model is available via the OpenAI API (paper).

DALL-E 3 HD (natural style) — openai/dall-e-3-hd-natural

DALL-E 3 is a text-to-image generation model built natively on ChatGPT, used to prompt engineer automatically. The HD version creates images with finer details and greater consistency across the image, but generation is slower. The natural style causes the model to produce more natural, less hyper-real looking images. The model is available via the OpenAI API (paper).

Tsinghua

CogView2 (6B) — thudm/cogview2

CogView2 is a hierarchical transformer (6B-9B-9B parameters) for text-to-image generation that supports both English and Chinese input text (paper)

Audio-Language Models

Google

Gemini 1.5 Pro (001) — google/gemini-1.5-pro-001

Gemini 1.5 Pro is a multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from long contexts. This model is accessed through Vertex AI and has all safety thresholds set to BLOCK_NONE. (paper)

Gemini 1.5 Flash (001) — google/gemini-1.5-flash-001

Gemini 1.5 Flash is a multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from long contexts. This model is accessed through Vertex AI and has all safety thresholds set to BLOCK_NONE. (paper)

Gemini 1.5 Pro (002) — google/gemini-1.5-pro-002

Gemini 1.5 Pro is a multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from long contexts. This model is accessed through Vertex AI and has all safety thresholds set to BLOCK_NONE. (paper)

Gemini 1.5 Flash (002) — google/gemini-1.5-flash-002

Gemini 1.5 Flash is a multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from long contexts. This model is accessed through Vertex AI and has all safety thresholds set to BLOCK_NONE. (paper)

OpenAI

GPT-4o Audio (Preview 2024-10-01) — openai/gpt-4o-audio-preview-2024-10-01

GPT-4o Audio (Preview 2024-10-01) is a preview model that allows using use audio inputs to prompt the model (documentation).

Alibaba Cloud

Qwen-Audio Chat — qwen/qwen-audio-chat

Auditory multimodal version of the Qwen large language model series (paper).

Qwen2-Audio Instruct — qwen/qwen2-audio-instruct

The second version of auditory multimodal version of the Qwen large language model series (paper).

Stanford

Diva Llama 3 (8B) — stanford/diva-llama

Diva Llama 3 is an end-to-end Voice Assistant Model which can handle speech and text as inputs. It was trained using distillation loss. (paper)

ICTNLP

LLaMA-Omni (8B) — ictnlp/llama-3.1-8b-omni

The audio-visual multimodal version of the LLaMA 3.1 model (paper).