Models
Text Models
AI21 Labs
J1-Jumbo v1 (178B) — ai21/j1-jumbo
Jurassic-1 Jumbo (178B parameters) (docs, tech report).
J1-Large v1 (7.5B) — ai21/j1-large
Jurassic-1 Large (7.5B parameters) (docs, tech report).
J1-Grande v1 (17B) — ai21/j1-grande
Jurassic-1 Grande (17B parameters) with a "few tweaks" to the training process (docs, tech report).
J1-Grande v2 beta (17B) — ai21/j1-grande-v2-beta
Jurassic-1 Grande v2 beta (17B parameters)
Jurassic-2 Large (7.5B) — ai21/j2-large
Jurassic-2 Large (7.5B parameters) (docs)
Jurassic-2 Grande (17B) — ai21/j2-grande
Jurassic-2 Grande (17B parameters) (docs)
Jurassic-2 Jumbo (178B) — ai21/j2-jumbo
Jurassic-2 Jumbo (178B parameters) (docs)
Jamba Instruct — ai21/jamba-instruct
Jamba Instruct is an instruction tuned version of Jamba, which uses a hybrid Transformer-Mamba mixture-of-experts (MoE) architecture that interleaves blocks of Transformer and Mamba layers. (blog)
Jamba 1.5 Mini — ai21/jamba-1.5-mini
Jamba 1.5 Mini is a long-context, hybrid SSM-Transformer instruction following foundation model that is optimized for function calling, structured output, and grounded generation. (blog)
Jamba 1.5 Large — ai21/jamba-1.5-large
Jamba 1.5 Large is a long-context, hybrid SSM-Transformer instruction following foundation model that is optimized for function calling, structured output, and grounded generation. (blog)
AI Singapore
SEA-LION (7B) — aisingapore/sea-lion-7b
SEA-LION is a collection of language models which has been pretrained and instruct-tuned on languages from the Southeast Asia region. It utilizes the MPT architecture and a custom SEABPETokenizer for tokenization.
SEA-LION Instruct (7B) — aisingapore/sea-lion-7b-instruct
SEA-LION is a collection of language models which has been pretrained and instruct-tuned on languages from the Southeast Asia region. It utilizes the MPT architecture and a custom SEABPETokenizer for tokenization.
Aleph Alpha
Luminous Base (13B) — AlephAlpha/luminous-base
Luminous Base (13B parameters) (docs)
Luminous Extended (30B) — AlephAlpha/luminous-extended
Luminous Extended (30B parameters) (docs)
Luminous Supreme (70B) — AlephAlpha/luminous-supreme
Luminous Supreme (70B parameters) (docs)
Amazon
Amazon Titan Text Lite — amazon/titan-text-lite-v1
Amazon Titan Text Lite is a lightweight, efficient model perfect for fine-tuning English-language tasks like summarization and copywriting. It caters to customers seeking a smaller, cost-effective, and highly customizable model. It supports various formats, including text generation, code generation, rich text formatting, and orchestration (agents). Key model attributes encompass fine-tuning, text generation, code generation, and rich text formatting.
Amazon Titan Large — amazon/titan-tg1-large
Amazon Titan Large is efficient model perfect for fine-tuning English-language tasks like summarization, create article, marketing campaign.
Amazon Titan Text Express — amazon/titan-text-express-v1
Amazon Titan Text Express, with a context length of up to 8,000 tokens, excels in advanced language tasks like open-ended text generation and conversational chat. It's also optimized for Retrieval Augmented Generation (RAG). Initially designed for English, the model offers preview multilingual support for over 100 additional languages.
Anthropic
Claude v1.3 — anthropic/claude-v1.3
A 52B parameter language model, trained using reinforcement learning from human feedback paper.
Claude Instant V1 — anthropic/claude-instant-v1
A lightweight version of Claude, a model trained using reinforcement learning from human feedback (docs).
Claude Instant 1.2 — anthropic/claude-instant-1.2
A lightweight version of Claude, a model trained using reinforcement learning from human feedback (docs).
Claude 2.0 — anthropic/claude-2.0
Claude 2.0 is a general purpose large language model developed by Anthropic. It uses a transformer architecture and is trained via unsupervised learning, RLHF, and Constitutional AI (including both a supervised and Reinforcement Learning (RL) phase). (model card)
Claude 2.1 — anthropic/claude-2.1
Claude 2.1 is a general purpose large language model developed by Anthropic. It uses a transformer architecture and is trained via unsupervised learning, RLHF, and Constitutional AI (including both a supervised and Reinforcement Learning (RL) phase). (model card)
Claude 3 Haiku (20240307) — anthropic/claude-3-haiku-20240307
Claude 3 is a a family of models that possess vision and multilingual capabilities. They were trained with various methods such as unsupervised learning and Constitutional AI (blog).
Claude 3 Sonnet (20240229) — anthropic/claude-3-sonnet-20240229
Claude 3 is a a family of models that possess vision and multilingual capabilities. They were trained with various methods such as unsupervised learning and Constitutional AI (blog).
Claude 3 Opus (20240229) — anthropic/claude-3-opus-20240229
Claude 3 is a a family of models that possess vision and multilingual capabilities. They were trained with various methods such as unsupervised learning and Constitutional AI (blog).
Claude 3.5 Sonnet (20240620) — anthropic/claude-3-5-sonnet-20240620
Claude 3.5 Sonnet is a Claude 3 family model which outperforms Claude 3 Opus while operating faster and at a lower cost. (blog)
Anthropic-LM v4-s3 (52B) — anthropic/stanford-online-all-v4-s3
A 52B parameter language model, trained using reinforcement learning from human feedback paper.
BigScience
BLOOM (176B) — bigscience/bloom
BLOOM (176B parameters) is an autoregressive model trained on 46 natural languages and 13 programming languages (paper).
T0pp (11B) — bigscience/t0pp
T0pp (11B parameters) is an encoder-decoder model trained on a large set of different tasks specified in natural language prompts (paper).
BioMistral
BioMistral (7B) — biomistral/biomistral-7b
BioMistral 7B is an open-source LLM tailored for the biomedical domain, utilizing Mistral as its foundation model and further pre-trained on PubMed Central.
Cohere
Cohere xlarge v20220609 (52.4B) — cohere/xlarge-20220609
Cohere xlarge v20220609 (52.4B parameters)
Cohere large v20220720 (13.1B) — cohere/large-20220720
Cohere large v20220720 (13.1B parameters), which is deprecated by Cohere as of December 2, 2022.
Cohere medium v20220720 (6.1B) — cohere/medium-20220720
Cohere medium v20220720 (6.1B parameters)
Cohere small v20220720 (410M) — cohere/small-20220720
Cohere small v20220720 (410M parameters), which is deprecated by Cohere as of December 2, 2022.
Cohere xlarge v20221108 (52.4B) — cohere/xlarge-20221108
Cohere xlarge v20221108 (52.4B parameters)
Cohere medium v20221108 (6.1B) — cohere/medium-20221108
Cohere medium v20221108 (6.1B parameters)
Command beta (6.1B) — cohere/command-medium-beta
Command beta (6.1B parameters) is fine-tuned from the medium model to respond well with instruction-like prompts (details).
Command beta (52.4B) — cohere/command-xlarge-beta
Command beta (52.4B parameters) is fine-tuned from the XL model to respond well with instruction-like prompts (details).
Command — cohere/command
Command is Cohere’s flagship text generation model. It is trained to follow user commands and to be instantly useful in practical business applications. docs and changelog
Command Light — cohere/command-light
Command is Cohere’s flagship text generation model. It is trained to follow user commands and to be instantly useful in practical business applications. docs and changelog
Command R — cohere/command-r
Command R is a multilingual 35B parameter model with a context length of 128K that has been trained with conversational tool use capabilities.
Command R Plus — cohere/command-r-plus
Command R+ is a multilingual 104B parameter model with a context length of 128K that has been trained with conversational tool use capabilities.
Databricks
Dolly V2 (3B) — databricks/dolly-v2-3b
Dolly V2 (3B) is an instruction-following large language model trained on the Databricks machine learning platform. It is based on pythia-12b.
Dolly V2 (7B) — databricks/dolly-v2-7b
Dolly V2 (7B) is an instruction-following large language model trained on the Databricks machine learning platform. It is based on pythia-12b.
Dolly V2 (12B) — databricks/dolly-v2-12b
Dolly V2 (12B) is an instruction-following large language model trained on the Databricks machine learning platform. It is based on pythia-12b.
DBRX Instruct — databricks/dbrx-instruct
DBRX is a large language model with a fine-grained mixture-of-experts (MoE) architecture that uses 16 experts and chooses 4. It has 132B total parameters, of which 36B parameters are active on any input. (blog post)
DeepSeek
DeepSeek LLM Chat (67B) — deepseek-ai/deepseek-llm-67b-chat
DeepSeek LLM Chat is a open-source language model trained on 2 trillion tokens in both English and Chinese, and fine-tuned supervised fine-tuning (SFT) and Direct Preference Optimization (DPO). (paper)
EleutherAI
GPT-J (6B) — eleutherai/gpt-j-6b
GPT-J (6B parameters) autoregressive language model trained on The Pile (details).
GPT-NeoX (20B) — eleutherai/gpt-neox-20b
GPT-NeoX (20B parameters) autoregressive language model trained on The Pile (paper).
Pythia (1B) — eleutherai/pythia-1b-v0
Pythia (1B parameters). The Pythia project combines interpretability analysis and scaling laws to understand how knowledge develops and evolves during training in autoregressive transformers.
Pythia (2.8B) — eleutherai/pythia-2.8b-v0
Pythia (2.8B parameters). The Pythia project combines interpretability analysis and scaling laws to understand how knowledge develops and evolves during training in autoregressive transformers.
Pythia (6.9B) — eleutherai/pythia-6.9b
Pythia (6.9B parameters). The Pythia project combines interpretability analysis and scaling laws to understand how knowledge develops and evolves during training in autoregressive transformers.
Pythia (12B) — eleutherai/pythia-12b-v0
Pythia (12B parameters). The Pythia project combines interpretability analysis and scaling laws to understand how knowledge develops and evolves during training in autoregressive transformers.
EPFL LLM
Meditron (7B) — epfl-llm/meditron-7b
Meditron-7B is a 7 billion parameter model adapted to the medical domain from Llama-2-7B through continued pretraining on a comprehensively curated medical corpus.
T5 (11B) — google/t5-11b
T5 (11B parameters) is an encoder-decoder model trained on a multi-task mixture, where each task is converted into a text-to-text format (paper).
UL2 (20B) — google/ul2
UL2 (20B parameters) is an encoder-decoder model trained on the C4 corpus. It's similar to T5 but trained with a different objective and slightly different scaling knobs (paper).
Flan-T5 (11B) — google/flan-t5-xxl
Flan-T5 (11B parameters) is T5 fine-tuned on 1.8K tasks (paper).
Gemini Pro — google/gemini-pro
Gemini Pro is a multimodal model able to reason across text, images, video, audio and code. (paper)
Gemini 1.0 Pro (001) — google/gemini-1.0-pro-001
Gemini 1.0 Pro is a multimodal model able to reason across text, images, video, audio and code. (paper)
Gemini 1.0 Pro (002) — google/gemini-1.0-pro-002
Gemini 1.0 Pro is a multimodal model able to reason across text, images, video, audio and code. (paper)
Gemini 1.5 Pro (001) — google/gemini-1.5-pro-001
Gemini 1.5 Pro is a multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from long contexts. This model is accessed through Vertex AI and has all safety thresholds set to BLOCK_NONE. (paper)
Gemini 1.5 Flash (001) — google/gemini-1.5-flash-001
Gemini 1.5 Flash is a multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from long contexts. This model is accessed through Vertex AI and has all safety thresholds set to BLOCK_NONE. (paper)
Gemini 1.5 Pro (0409 preview) — google/gemini-1.5-pro-preview-0409
Gemini 1.5 Pro is a multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from long contexts. This model is accessed through Vertex AI and has all safety thresholds set to BLOCK_NONE. (paper)
Gemini 1.5 Pro (0514 preview) — google/gemini-1.5-pro-preview-0514
Gemini 1.5 Pro is a multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from long contexts. This model is accessed through Vertex AI and has all safety thresholds set to BLOCK_NONE. (paper)
Gemini 1.5 Flash (0514 preview) — google/gemini-1.5-flash-preview-0514
Gemini 1.5 Flash is a smaller Gemini model. It has a 1 million token context window and allows interleaving text, images, audio and video as inputs. This model is accessed through Vertex AI and has all safety thresholds set to BLOCK_NONE. (blog)
Gemini 1.5 Pro (001, default safety) — google/gemini-1.5-pro-001-safety-default
Gemini 1.5 Pro is a multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from long contexts. This model is accessed through Vertex AI and uses default safety settings. (paper)
Gemini 1.5 Pro (001, BLOCK_NONE safety) — google/gemini-1.5-pro-001-safety-block-none
Gemini 1.5 Pro is a multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from long contexts. This model is accessed through Vertex AI and has all safety thresholds set to BLOCK_NONE. (paper)
Gemini 1.5 Flash (001, default safety) — google/gemini-1.5-flash-001-safety-default
Gemini 1.5 Flash is a multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from long contexts. This model is accessed through Vertex AI and uses default safety settings. (paper)
Gemini 1.5 Flash (001, BLOCK_NONE safety) — google/gemini-1.5-flash-001-safety-block-none
Gemini 1.5 Flash is a multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from long contexts. This model is accessed through Vertex AI and has all safety thresholds set to BLOCK_NONE. (paper)
Gemma (2B) — google/gemma-2b
Gemma is a family of lightweight, open models built from the research and technology that Google used to create the Gemini models. (model card, blog post)
Gemma Instruct (2B) — google/gemma-2b-it
Gemma is a family of lightweight, open models built from the research and technology that Google used to create the Gemini models. (model card, blog post)
Gemma (7B) — google/gemma-7b
Gemma is a family of lightweight, open models built from the research and technology that Google used to create the Gemini models. (model card, blog post)
Gemma Instruct (7B) — google/gemma-7b-it
Gemma is a family of lightweight, open models built from the research and technology that Google used to create the Gemini models. (model card, blog post)
Gemma 2 (9B) — google/gemma-2-9b
Gemma is a family of lightweight, open models built from the research and technology that Google used to create the Gemini models. (model card, blog post)
Gemma 2 Instruct (9B) — google/gemma-2-9b-it
Gemma is a family of lightweight, open models built from the research and technology that Google used to create the Gemini models. (model card, blog post)
Gemma 2 (27B) — google/gemma-2-27b
Gemma is a family of lightweight, open models built from the research and technology that Google used to create the Gemini models. (model card, blog post)
Gemma 2 Instruct (27B) — google/gemma-2-27b-it
Gemma is a family of lightweight, open models built from the research and technology that Google used to create the Gemini models. (model card, blog post)
PaLM-2 (Bison) — google/text-bison@001
The best value PaLM model. PaLM 2 (Pathways Language Model) is a Transformer-based model trained using a mixture of objectives that was evaluated on English and multilingual language, and reasoning tasks. (report)
PaLM-2 (Bison) — google/text-bison@002
The best value PaLM model. PaLM 2 (Pathways Language Model) is a Transformer-based model trained using a mixture of objectives that was evaluated on English and multilingual language, and reasoning tasks. (report)
PaLM-2 (Bison) — google/text-bison-32k
The best value PaLM model with a 32K context. PaLM 2 (Pathways Language Model) is a Transformer-based model trained using a mixture of objectives that was evaluated on English and multilingual language, and reasoning tasks. (report)
PaLM-2 (Unicorn) — google/text-unicorn@001
The largest model in PaLM family. PaLM 2 (Pathways Language Model) is a Transformer-based model trained using a mixture of objectives that was evaluated on English and multilingual language, and reasoning tasks. (report)
MedLM (Medium) — google/medlm-medium
MedLM is a family of foundation models fine-tuned for the healthcare industry based on Google Research's medically-tuned large language model, Med-PaLM 2. (documentation)
MedLM (Large) — google/medlm-large
MedLM is a family of foundation models fine-tuned for the healthcare industry based on Google Research's medically-tuned large language model, Med-PaLM 2. (documentation)
Lightning AI
Lit-GPT — lightningai/lit-gpt
Lit-GPT is an optimized collection of open-source LLMs for finetuning and inference. It supports – Falcon, Llama 2, Vicuna, LongChat, and other top-performing open-source large language models.
LMSYS
Vicuna v1.3 (7B) — lmsys/vicuna-7b-v1.3
Vicuna v1.3 (7B) is an open-source chatbot trained by fine-tuning LLaMA on user-shared conversations collected from ShareGPT.
Vicuna v1.3 (13B) — lmsys/vicuna-13b-v1.3
Vicuna v1.3 (13B) is an open-source chatbot trained by fine-tuning LLaMA on user-shared conversations collected from ShareGPT.
Meta
OPT (175B) — meta/opt-175b
Open Pre-trained Transformers (175B parameters) is a suite of decoder-only pre-trained transformers that are fully and responsibly shared with interested researchers (paper).
OPT (66B) — meta/opt-66b
Open Pre-trained Transformers (66B parameters) is a suite of decoder-only pre-trained transformers that are fully and responsibly shared with interested researchers (paper).
OPT (6.7B) — meta/opt-6.7b
Open Pre-trained Transformers (6.7B parameters) is a suite of decoder-only pre-trained transformers that are fully and responsibly shared with interested researchers (paper).
OPT (1.3B) — meta/opt-1.3b
Open Pre-trained Transformers (1.3B parameters) is a suite of decoder-only pre-trained transformers that are fully and responsibly shared with interested researchers (paper).
LLaMA (7B) — meta/llama-7b
LLaMA is a collection of foundation language models ranging from 7B to 65B parameters.
LLaMA (13B) — meta/llama-13b
LLaMA is a collection of foundation language models ranging from 7B to 65B parameters.
LLaMA (30B) — meta/llama-30b
LLaMA is a collection of foundation language models ranging from 7B to 65B parameters.
LLaMA (65B) — meta/llama-65b
LLaMA is a collection of foundation language models ranging from 7B to 65B parameters.
Llama 2 (7B) — meta/llama-2-7b
Llama 2 pretrained models are trained on 2 trillion tokens, and have double the context length than Llama 1.
Llama 2 (13B) — meta/llama-2-13b
Llama 2 pretrained models are trained on 2 trillion tokens, and have double the context length than Llama 1.
Llama 2 (70B) — meta/llama-2-70b
Llama 2 pretrained models are trained on 2 trillion tokens, and have double the context length than Llama 1.
Llama 3 (8B) — meta/llama-3-8b
Llama 3 is a family of language models that have been trained on more than 15 trillion tokens, and use Grouped-Query Attention (GQA) for improved inference scalability. (paper
Llama 3 (70B) — meta/llama-3-70b
Llama 3 is a family of language models that have been trained on more than 15 trillion tokens, and use Grouped-Query Attention (GQA) for improved inference scalability. (paper
Llama 3.1 Instruct Turbo (8B) — meta/llama-3.1-8b-instruct-turbo
Llama 3.1 (8B) is part of the Llama 3 family of dense Transformer models that that natively support multilinguality, coding, reasoning, and tool usage. (paper, blog) Turbo is Together's implementation, providing a near negligible difference in quality from the reference implementation with faster performance and lower cost, currently using FP8 quantization. (blog)
Llama 3.1 Instruct Turbo (70B) — meta/llama-3.1-70b-instruct-turbo
Llama 3.1 (70B) is part of the Llama 3 family of dense Transformer models that that natively support multilinguality, coding, reasoning, and tool usage. (paper, blog) Turbo is Together's implementation, providing a near negligible difference in quality from the reference implementation with faster performance and lower cost, currently using FP8 quantization. (blog)
Llama 3.1 Instruct Turbo (405B) — meta/llama-3.1-405b-instruct-turbo
Llama 3.1 (405B) is part of the Llama 3 family of dense Transformer models that that natively support multilinguality, coding, reasoning, and tool usage. (paper, blog) Turbo is Together's implementation, providing a near negligible difference in quality from the reference implementation with faster performance and lower cost, currently using FP8 quantization. (blog)
Llama 3 Instruct (8B) — meta/llama-3-8b-chat
Llama 3 is a family of language models that have been trained on more than 15 trillion tokens, and use Grouped-Query Attention (GQA) for improved inference scalability. It used SFT, rejection sampling, PPO and DPO for post-training. (paper
Llama 3 Instruct (70B) — meta/llama-3-70b-chat
Llama 3 is a family of language models that have been trained on more than 15 trillion tokens, and use Grouped-Query Attention (GQA) for improved inference scalability. It used SFT, rejection sampling, PPO and DPO for post-training. (paper
Llama Guard (7B) — meta/llama-guard-7b
Llama-Guard is a 7B parameter Llama 2-based input-output safeguard model. It can be used for classifying content in both LLM inputs (prompt classification) and in LLM responses (response classification). It acts as an LLM it generates text in its output that indicates whether a given prompt or response is safe/unsafe, and if unsafe based on a policy, it also lists the violating subcategories.
Llama Guard 2 (8B) — meta/llama-guard-2-8b
Llama Guard 2 is an 8B parameter Llama 3-based LLM safeguard model. Similar to Llama Guard, it can be used for classifying content in both LLM inputs (prompt classification) and in LLM responses (response classification). It acts as an LLM – it generates text in its output that indicates whether a given prompt or response is safe or unsafe, and if unsafe, it also lists the content categories violated.
Llama Guard 3 (8B) — meta/llama-guard-3-8b
Llama Guard 3 is an 8B parameter Llama 3.1-based LLM safeguard model. Similar to Llama Guard, it can be used for classifying content in both LLM inputs (prompt classification) and in LLM responses (response classification). It acts as an LLM – it generates text in its output that indicates whether a given prompt or response is safe or unsafe, and if unsafe, it also lists the content categories violated.
Microsoft/NVIDIA
TNLG v2 (530B) — microsoft/TNLGv2_530B
TNLG v2 (530B parameters) autoregressive language model trained on a filtered subset of the Pile and CommonCrawl (paper).
TNLG v2 (6.7B) — microsoft/TNLGv2_7B
TNLG v2 (6.7B parameters) autoregressive language model trained on a filtered subset of the Pile and CommonCrawl (paper).
Microsoft
Phi-2 — microsoft/phi-2
Phi-2 is a Transformer with 2.7 billion parameters. It was trained using the same data sources as Phi-1.5, augmented with a new data source that consists of various NLP synthetic texts and filtered websites (for safety and educational value)
Phi-3 (7B) — microsoft/phi-3-small-8k-instruct
Phi-3-Small-8K-Instruct is a lightweight model trained with synthetic data and filtered publicly available website data with a focus on high-quality and reasoning dense properties. (paper, blog)
Phi-3 (14B) — microsoft/phi-3-medium-4k-instruct
Phi-3-Medium-4K-Instruct is a lightweight model trained with synthetic data and filtered publicly available website data with a focus on high-quality and reasoning dense properties. (paper, blog)
01.AI
Yi (6B) — 01-ai/yi-6b
The Yi models are large language models trained from scratch by developers at 01.AI.
Yi (34B) — 01-ai/yi-34b
The Yi models are large language models trained from scratch by developers at 01.AI.
Yi Chat (6B) — 01-ai/yi-6b-chat
The Yi models are large language models trained from scratch by developers at 01.AI.
Yi Chat (34B) — 01-ai/yi-34b-chat
The Yi models are large language models trained from scratch by developers at 01.AI.
Yi Large — 01-ai/yi-large
The Yi models are large language models trained from scratch by developers at 01.AI. (tweet)
Yi Large (Preview) — 01-ai/yi-large-preview
The Yi models are large language models trained from scratch by developers at 01.AI. (tweet)
Allen Institute for AI
OLMo (7B) — allenai/olmo-7b
OLMo is a series of Open Language Models trained on the Dolma dataset.
OLMo (7B Twin 2T) — allenai/olmo-7b-twin-2t
OLMo is a series of Open Language Models trained on the Dolma dataset.
OLMo (7B Instruct) — allenai/olmo-7b-instruct
OLMo is a series of Open Language Models trained on the Dolma dataset. The instruct versions was trained on the Tulu SFT mixture and a cleaned version of the UltraFeedback dataset.
OLMo 1.7 (7B) — allenai/olmo-1.7-7b
OLMo is a series of Open Language Models trained on the Dolma dataset. The instruct versions was trained on the Tulu SFT mixture and a cleaned version of the UltraFeedback dataset.
Mistral AI
Mistral v0.1 (7B) — mistralai/mistral-7b-v0.1
Mistral 7B is a 7.3B parameter transformer model that uses Grouped-Query Attention (GQA) and Sliding-Window Attention (SWA). (blog post)
Mistral Instruct v0.1 (7B) — mistralai/mistral-7b-instruct-v0.1
Mistral v0.1 Instruct 7B is a 7.3B parameter transformer model that uses Grouped-Query Attention (GQA) and Sliding-Window Attention (SWA). The instruct version was fined-tuned using publicly available conversation datasets. (blog post)
Mistral Instruct v0.2 (7B) — mistralai/mistral-7b-instruct-v0.2
Mistral v0.2 Instruct 7B is a 7.3B parameter transformer model that uses Grouped-Query Attention (GQA). Compared to v0.1, v0.2 has a 32k context window and no Sliding-Window Attention (SWA). (blog post)
Mistral Instruct v0.3 (7B) — mistralai/mistral-7b-instruct-v0.3
Mistral v0.3 Instruct 7B is a 7.3B parameter transformer model that uses Grouped-Query Attention (GQA). Compared to v0.1, v0.2 has a 32k context window and no Sliding-Window Attention (SWA). (blog post)
Mixtral (8x7B 32K seqlen) — mistralai/mixtral-8x7b-32kseqlen
Mixtral is a mixture-of-experts model that has 46.7B total parameters but only uses 12.9B parameters per token. (blog post, tweet).
Mixtral Instruct (8x7B) — mistralai/mixtral-8x7b-instruct-v0.1
Mixtral Instruct (8x7B) is a version of Mixtral (8x7B) that was optimized through supervised fine-tuning and direct preference optimisation (DPO) for careful instruction following. (blog post).
Mixtral (8x22B) — mistralai/mixtral-8x22b
Mistral AI's mixture-of-experts model that uses 39B active parameters out of 141B (blog post).
Mixtral Instruct (8x22B) — mistralai/mixtral-8x22b-instruct-v0.1
Mistral AI's mixture-of-experts model that uses 39B active parameters out of 141B (blog post).
Mistral Small (2402) — mistralai/mistral-small-2402
Mistral Small is a multilingual model with a 32K tokens context window and function-calling capabilities. (blog)
Mistral Medium (2312) — mistralai/mistral-medium-2312
Mistral is a transformer model that uses Grouped-Query Attention (GQA) and Sliding-Window Attention (SWA).
Mistral Large (2402) — mistralai/mistral-large-2402
Mistral Large is a multilingual model with a 32K tokens context window and function-calling capabilities. (blog)
Mistral Large 2 (2407) — mistralai/mistral-large-2407
Mistral Large 2 is a 123 billion parameter model that has a 128k context window and supports dozens of languages and 80+ coding languages. (blog)
Mistral NeMo (2402) — mistralai/open-mistral-nemo-2407
Mistral NeMo is a multilingual 12B model with a large context window of 128K tokens. (blog)
MosaicML
MPT (7B) — mosaicml/mpt-7b
MPT (7B) is a Transformer trained from scratch on 1T tokens of text and code.
MPT-Instruct (7B) — mosaicml/mpt-instruct-7b
MPT-Instruct (7B) is a model for short-form instruction following. It is built by finetuning MPT (30B), a Transformer trained from scratch on 1T tokens of text and code.
MPT (30B) — mosaicml/mpt-30b
MPT (30B) is a Transformer trained from scratch on 1T tokens of text and code.
MPT-Instruct (30B) — mosaicml/mpt-instruct-30b
MPT-Instruct (30B) is a model for short-form instruction following. It is built by finetuning MPT (30B), a Transformer trained from scratch on 1T tokens of text and code.
Neurips
Neurips Local — neurips/local
Neurips Local
NVIDIA
Megatron GPT2 — nvidia/megatron-gpt2
GPT-2 implemented in Megatron-LM (paper).
Nemotron-4 Instruct (340B) — nvidia/nemotron-4-340b-instruct
Nemotron-4 Instruct (340B) is an open weights model sized to fit on a single DGX H100 with 8 GPUs when deployed in FP8 precision. 98% of the data used for model alignment was synthetically generated (paper).
OpenAI
GPT-2 (1.5B) — openai/gpt2
GPT-2 (1.5B parameters) is a transformer model trained on a large corpus of English text in a self-supervised fashion (paper).
davinci-002 — openai/davinci-002
Replacement for the GPT-3 curie and davinci base models.
babbage-002 — openai/babbage-002
Replacement for the GPT-3 ada and babbage base models.
davinci (175B) — openai/davinci
Original GPT-3 (175B parameters) autoregressive language model (paper, docs).
curie (6.7B) — openai/curie
Original GPT-3 (6.7B parameters) autoregressive language model (paper, docs).
babbage (1.3B) — openai/babbage
Original GPT-3 (1.3B parameters) autoregressive language model (paper, docs).
ada (350M) — openai/ada
Original GPT-3 (350M parameters) autoregressive language model (paper, docs).
GPT-3.5 (text-davinci-003) — openai/text-davinci-003
text-davinci-003 model that involves reinforcement learning (PPO) with reward models. Derived from text-davinci-002 (docs).
GPT-3.5 (text-davinci-002) — openai/text-davinci-002
text-davinci-002 model that involves supervised fine-tuning on human-written demonstrations. Derived from code-davinci-002 (docs).
GPT-3.5 (text-davinci-001) — openai/text-davinci-001
text-davinci-001 model that involves supervised fine-tuning on human-written demonstrations (docs).
text-curie-001 — openai/text-curie-001
text-curie-001 model that involves supervised fine-tuning on human-written demonstrations (docs).
text-babbage-001 — openai/text-babbage-001
text-babbage-001 model that involves supervised fine-tuning on human-written demonstrations (docs).
text-ada-001 — openai/text-ada-001
text-ada-001 model that involves supervised fine-tuning on human-written demonstrations (docs).
GPT-3.5 Turbo Instruct — openai/gpt-3.5-turbo-instruct
Similar capabilities as GPT-3 era models. Compatible with legacy Completions endpoint and not Chat Completions.
GPT-3.5 Turbo (0301) — openai/gpt-3.5-turbo-0301
Sibling model of text-davinci-003 that is optimized for chat but works well for traditional completions tasks as well. Snapshot from 2023-03-01.
GPT-3.5 Turbo (0613) — openai/gpt-3.5-turbo-0613
Sibling model of text-davinci-003 that is optimized for chat but works well for traditional completions tasks as well. Snapshot from 2023-06-13.
GPT-3.5 Turbo (1106) — openai/gpt-3.5-turbo-1106
Sibling model of text-davinci-003 that is optimized for chat but works well for traditional completions tasks as well. Snapshot from 2023-11-06.
GPT-3.5 Turbo (0125) — openai/gpt-3.5-turbo-0125
Sibling model of text-davinci-003 that is optimized for chat but works well for traditional completions tasks as well. Snapshot from 2024-01-25.
gpt-3.5-turbo-16k-0613 — openai/gpt-3.5-turbo-16k-0613
Sibling model of text-davinci-003 that is optimized for chat but works well for traditional completions tasks as well. Snapshot from 2023-06-13 with a longer context length of 16,384 tokens.
GPT-4 Turbo (1106 preview) — openai/gpt-4-1106-preview
GPT-4 Turbo (preview) is a large multimodal model that is optimized for chat but works well for traditional completions tasks. The model is cheaper and faster than the original GPT-4 model. Preview snapshot from 2023-11-06.
GPT-4 (0314) — openai/gpt-4-0314
GPT-4 is a large multimodal model (currently only accepting text inputs and emitting text outputs) that is optimized for chat but works well for traditional completions tasks. Snapshot of gpt-4 from 2023-03-14.
gpt-4-32k-0314 — openai/gpt-4-32k-0314
GPT-4 is a large multimodal model (currently only accepting text inputs and emitting text outputs) that is optimized for chat but works well for traditional completions tasks. Snapshot of gpt-4 with a longer context length of 32,768 tokens from March 14th 2023.
GPT-4 (0613) — openai/gpt-4-0613
GPT-4 is a large multimodal model (currently only accepting text inputs and emitting text outputs) that is optimized for chat but works well for traditional completions tasks. Snapshot of gpt-4 from 2023-06-13.
gpt-4-32k-0613 — openai/gpt-4-32k-0613
GPT-4 is a large multimodal model (currently only accepting text inputs and emitting text outputs) that is optimized for chat but works well for traditional completions tasks. Snapshot of gpt-4 with a longer context length of 32,768 tokens from 2023-06-13.
GPT-4 Turbo (0125 preview) — openai/gpt-4-0125-preview
GPT-4 Turbo (preview) is a large multimodal model that is optimized for chat but works well for traditional completions tasks. The model is cheaper and faster than the original GPT-4 model. Preview snapshot from 2023-01-25. This snapshot is intended to reduce cases of “laziness” where the model doesn’t complete a task.
GPT-4 Turbo (2024-04-09) — openai/gpt-4-turbo-2024-04-09
GPT-4 Turbo (2024-04-09) is a large multimodal model that is optimized for chat but works well for traditional completions tasks. The model is cheaper and faster than the original GPT-4 model. Snapshot from 2024-04-09.
GPT-4o (2024-05-13) — openai/gpt-4o-2024-05-13
GPT-4o (2024-05-13) is a large multimodal model that accepts as input any combination of text, audio, and image and generates any combination of text, audio, and image outputs. (blog)
GPT-4o (2024-08-06) — openai/gpt-4o-2024-08-06
GPT-4o (2024-08-06) is a large multimodal model that accepts as input any combination of text, audio, and image and generates any combination of text, audio, and image outputs. (blog)
GPT-4o mini (2024-07-18) — openai/gpt-4o-mini-2024-07-18
GPT-4o mini (2024-07-18) is a multimodal model with a context window of 128K tokens and improved handling of non-English text. (blog)
OpenThaiGPT
OpenThaiGPT v1.0.0 (7B) — openthaigpt/openthaigpt-1.0.0-7b-chat
OpenThaiGPT v1.0.0 (7B) is a Thai language chat model based on Llama 2 that has been specifically fine-tuned for Thai instructions and enhanced by incorporating over 10,000 of the most commonly used Thai words into the dictionary. (blog post)
OpenThaiGPT v1.0.0 (13B) — openthaigpt/openthaigpt-1.0.0-13b-chat
OpenThaiGPT v1.0.0 (13B) is a Thai language chat model based on Llama 2 that has been specifically fine-tuned for Thai instructions and enhanced by incorporating over 10,000 of the most commonly used Thai words into the dictionary. (blog post)
OpenThaiGPT v1.0.0 (70B) — openthaigpt/openthaigpt-1.0.0-70b-chat
OpenThaiGPT v1.0.0 (70B) is a Thai language chat model based on Llama 2 that has been specifically fine-tuned for Thai instructions and enhanced by incorporating over 10,000 of the most commonly used Thai words into the dictionary. (blog post)
Qwen
Qwen — qwen/qwen-7b
7B-parameter version of the large language model series, Qwen (abbr. Tongyi Qianwen), proposed by Aibaba Cloud. Qwen is a family of transformer models with SwiGLU activation, RoPE, and multi-head attention. (blog)
Qwen1.5 (7B) — qwen/qwen1.5-7b
7B-parameter version of the large language model series, Qwen 1.5 (abbr. Tongyi Qianwen), proposed by Aibaba Cloud. Qwen is a family of transformer models with SwiGLU activation, RoPE, and multi-head attention. (blog)
Qwen1.5 (14B) — qwen/qwen1.5-14b
14B-parameter version of the large language model series, Qwen 1.5 (abbr. Tongyi Qianwen), proposed by Aibaba Cloud. Qwen is a family of transformer models with SwiGLU activation, RoPE, and multi-head attention. (blog)
Qwen1.5 (32B) — qwen/qwen1.5-32b
32B-parameter version of the large language model series, Qwen 1.5 (abbr. Tongyi Qianwen), proposed by Aibaba Cloud. Qwen is a family of transformer models with SwiGLU activation, RoPE, and multi-head attention. The 32B version also includes grouped query attention (GQA). (blog)
Qwen1.5 (72B) — qwen/qwen1.5-72b
72B-parameter version of the large language model series, Qwen 1.5 (abbr. Tongyi Qianwen), proposed by Aibaba Cloud. Qwen is a family of transformer models with SwiGLU activation, RoPE, and multi-head attention. (blog)
Qwen1.5 Chat (7B) — qwen/qwen1.5-7b-chat
7B-parameter version of the large language model series, Qwen 1.5 (abbr. Tongyi Qianwen), proposed by Aibaba Cloud. Qwen is a family of transformer models with SwiGLU activation, RoPE, and multi-head attention. (blog)
Qwen1.5 Chat (14B) — qwen/qwen1.5-14b-chat
14B-parameter chat version of the large language model series, Qwen 1.5 (abbr. Tongyi Qianwen), proposed by Aibaba Cloud. Qwen is a family of transformer models with SwiGLU activation, RoPE, and multi-head attention. (blog)
Qwen1.5 Chat (32B) — qwen/qwen1.5-32b-chat
32B-parameter version of the large language model series, Qwen 1.5 (abbr. Tongyi Qianwen), proposed by Aibaba Cloud. Qwen is a family of transformer models with SwiGLU activation, RoPE, and multi-head attention. The 32B version also includes grouped query attention (GQA). (blog)
Qwen1.5 Chat (72B) — qwen/qwen1.5-72b-chat
72B-parameter chat version of the large language model series, Qwen 1.5 (abbr. Tongyi Qianwen), proposed by Aibaba Cloud. Qwen is a family of transformer models with SwiGLU activation, RoPE, and multi-head attention. (blog)
Qwen1.5 Chat (110B) — qwen/qwen1.5-110b-chat
110B-parameter chat version of the large language model series, Qwen 1.5 (abbr. Tongyi Qianwen), proposed by Aibaba Cloud. Qwen is a family of transformer models with SwiGLU activation, RoPE, and multi-head attention. The 110B version also includes grouped query attention (GQA). (blog)
Qwen2 Instruct (72B) — qwen/qwen2-72b-instruct
72B-parameter chat version of the large language model series, Qwen2. Qwen2 uses Group Query Attention (GQA) and has extended context length support up to 128K tokens. (blog)
SAIL
Sailor (7B) — sail/sailor-7b
Sailor is a suite of Open Language Models tailored for South-East Asia, focusing on languages such as Indonesian, Thai, Vietnamese, Malay, and Lao. These models were continually pre-trained from Qwen1.5. (paper)
Sailor Chat (7B) — sail/sailor-7b-chat
Sailor is a suite of Open Language Models tailored for South-East Asia, focusing on languages such as Indonesian, Thai, Vietnamese, Malay, and Lao. These models were continually pre-trained from Qwen1.5. (paper)
Sailor (14B) — sail/sailor-14b
Sailor is a suite of Open Language Models tailored for South-East Asia, focusing on languages such as Indonesian, Thai, Vietnamese, Malay, and Lao. These models were continually pre-trained from Qwen1.5. (paper)
Sailor Chat (14B) — sail/sailor-14b-chat
Sailor is a suite of Open Language Models tailored for South-East Asia, focusing on languages such as Indonesian, Thai, Vietnamese, Malay, and Lao. These models were continually pre-trained from Qwen1.5. (paper)
SambaLingo
SambaLingo-Thai-Base — sambanova/sambalingo-thai-base
SambaLingo-Thai-Base is a pretrained bi-lingual Thai and English model that adapts Llama 2 (7B) to Thai by training on 38 billion tokens from the Thai split of the Cultura-X dataset. (paper)
SambaLingo-Thai-Chat — sambanova/sambalingo-thai-chat
SambaLingo-Thai-Chat is a chat model trained using direct preference optimization on SambaLingo-Thai-Base. SambaLingo-Thai-Base adapts Llama 2 (7B) to Thai by training on 38 billion tokens from the Thai split of the Cultura-X dataset. (paper)
SambaLingo-Thai-Base-70B — sambanova/sambalingo-thai-base-70b
SambaLingo-Thai-Base-70B is a pretrained bi-lingual Thai and English model that adapts Llama 2 (70B) to Thai by training on 26 billion tokens from the Thai split of the Cultura-X dataset. (paper)
SambaLingo-Thai-Chat-70B — sambanova/sambalingo-thai-chat-70b
SambaLingo-Thai-Chat-70B is a chat model trained using direct preference optimization on SambaLingo-Thai-Base-70B. SambaLingo-Thai-Base-70B adapts Llama 2 (7B) to Thai by training on 26 billion tokens from the Thai split of the Cultura-X dataset. (paper)
SCB10X
Typhoon (7B) — scb10x/typhoon-7b
Typhoon (7B) is pretrained Thai large language model with 7 billion parameters based on Mistral 7B. (paper)
Typhoon v1.5 (8B) — scb10x/typhoon-v1.5-8b
Typhoon v1.5 (8B) is a pretrained Thai large language model with 8 billion parameters based on Llama 3 8B. (blog)
Typhoon v1.5 Instruct (8B) — scb10x/typhoon-v1.5-8b-instruct
Typhoon v1.5 Instruct (8B) is a pretrained Thai large language model with 8 billion parameters based on Llama 3 8B. (blog)
Typhoon v1.5 (72B) — scb10x/typhoon-v1.5-72b
Typhoon v1.5 (72B) is a pretrained Thai large language model with 72 billion parameters based on Qwen1.5-72B. (blog)
Typhoon v1.5 Instruct (72B) — scb10x/typhoon-v1.5-72b-instruct
Typhoon v1.5 Instruct (72B) is a pretrained Thai large language model with 72 billion parameters based on Qwen1.5-72B. (blog)
Typhoon 1.5X instruct (8B) — scb10x/llama-3-typhoon-v1.5x-8b-instruct
Llama-3-Typhoon-1.5X-8B-instruct is a 8 billion parameter instruct model designed for the Thai language based on Llama 3 Instruct. It utilizes the task-arithmetic model editing technique. (blog)
Typhoon 1.5X instruct (70B) — scb10x/llama-3-typhoon-v1.5x-70b-instruct
Llama-3-Typhoon-1.5X-70B-instruct is a 70 billion parameter instruct model designed for the Thai language based on Llama 3 Instruct. It utilizes the task-arithmetic model editing technique. (blog)
Alibaba DAMO Academy
SeaLLM v2 (7B) — damo/seallm-7b-v2
SeaLLM v2 is a multilingual LLM for Southeast Asian (SEA) languages trained from Mistral (7B). (website)
SeaLLM v2.5 (7B) — damo/seallm-7b-v2.5
SeaLLM is a multilingual LLM for Southeast Asian (SEA) languages trained from Gemma (7B). (website)
Snowflake
Arctic Instruct — snowflake/snowflake-arctic-instruct
Arctic combines a 10B dense transformer model with a residual 128x3.66B MoE MLP resulting in 480B total and 17B active parameters chosen using a top-2 gating.
Stability AI
StableLM-Base-Alpha (3B) — stabilityai/stablelm-base-alpha-3b
StableLM-Base-Alpha is a suite of 3B and 7B parameter decoder-only language models pre-trained on a diverse collection of English datasets with a sequence length of 4096 to push beyond the context window limitations of existing open-source language models.
StableLM-Base-Alpha (7B) — stabilityai/stablelm-base-alpha-7b
StableLM-Base-Alpha is a suite of 3B and 7B parameter decoder-only language models pre-trained on a diverse collection of English datasets with a sequence length of 4096 to push beyond the context window limitations of existing open-source language models.
Stanford
Alpaca (7B) — stanford/alpaca-7b
Alpaca 7B is a model fine-tuned from the LLaMA 7B model on 52K instruction-following demonstrations
TII UAE
Falcon (7B) — tiiuae/falcon-7b
Falcon-7B is a 7B parameters causal decoder-only model built by TII and trained on 1,500B tokens of RefinedWeb enhanced with curated corpora.
Falcon-Instruct (7B) — tiiuae/falcon-7b-instruct
Falcon-7B-Instruct is a 7B parameters causal decoder-only model built by TII based on Falcon-7B and finetuned on a mixture of chat/instruct datasets.
Falcon (40B) — tiiuae/falcon-40b
Falcon-40B is a 40B parameters causal decoder-only model built by TII and trained on 1,500B tokens of RefinedWeb enhanced with curated corpora.
Falcon-Instruct (40B) — tiiuae/falcon-40b-instruct
Falcon-40B-Instruct is a 40B parameters causal decoder-only model built by TII based on Falcon-7B and finetuned on a mixture of chat/instruct datasets.
Together
GPT-JT (6B) — together/gpt-jt-6b-v1
GPT-JT (6B parameters) is a fork of GPT-J (blog post).
GPT-NeoXT-Chat-Base (20B) — together/gpt-neoxt-chat-base-20b
GPT-NeoXT-Chat-Base (20B) is fine-tuned from GPT-NeoX, serving as a base model for developing open-source chatbots.
RedPajama-INCITE-Base-v1 (3B) — together/redpajama-incite-base-3b-v1
RedPajama-INCITE-Base-v1 (3B parameters) is a 3 billion base model that aims to replicate the LLaMA recipe as closely as possible.
RedPajama-INCITE-Instruct-v1 (3B) — together/redpajama-incite-instruct-3b-v1
RedPajama-INCITE-Instruct-v1 (3B parameters) is a model fine-tuned for few-shot applications on the data of GPT-JT. It is built from RedPajama-INCITE-Base-v1 (3B), a 3 billion base model that aims to replicate the LLaMA recipe as closely as possible.
RedPajama-INCITE-Base (7B) — together/redpajama-incite-base-7b
RedPajama-INCITE-Base (7B parameters) is a 7 billion base model that aims to replicate the LLaMA recipe as closely as possible.
RedPajama-INCITE-Instruct (7B) — together/redpajama-incite-instruct-7b
RedPajama-INCITE-Instruct (7B parameters) is a model fine-tuned for few-shot applications on the data of GPT-JT. It is built from RedPajama-INCITE-Base (7B), a 7 billion base model that aims to replicate the LLaMA recipe as closely as possible.
Tsinghua
GLM (130B) — tsinghua/glm
GLM (130B parameters) is an open bilingual (English & Chinese) bidirectional dense model that was trained using General Language Model (GLM) procedure (paper).
Writer
Palmyra Base (5B) — writer/palmyra-base
Palmyra Base (5B)
Palmyra Large (20B) — writer/palmyra-large
Palmyra Large (20B)
InstructPalmyra (30B) — writer/palmyra-instruct-30
InstructPalmyra (30B parameters) is trained using reinforcement learning techniques based on feedback from humans.
Palmyra E (30B) — writer/palmyra-e
Palmyra E (30B)
Silk Road (35B) — writer/silk-road
Silk Road (35B)
Palmyra X (43B) — writer/palmyra-x
Palmyra-X (43B parameters) is trained to adhere to instructions using human feedback and utilizes a technique called multiquery attention. Furthermore, a new feature called 'self-instruct' has been introduced, which includes the implementation of an early stopping criteria specifically designed for minimal instruction tuning (paper).
Palmyra X V2 (33B) — writer/palmyra-x-v2
Palmyra-X V2 (33B parameters) is a Transformer-based model, which is trained on extremely large-scale pre-training data. The pre-training data more than 2 trillion tokens types are diverse and cover a wide range of areas, used FlashAttention-2.
Palmyra X V3 (72B) — writer/palmyra-x-v3
Palmyra-X V3 (72B parameters) is a Transformer-based model, which is trained on extremely large-scale pre-training data. It is trained via unsupervised learning and DPO and use multiquery attention.
Palmyra X-32K (33B) — writer/palmyra-x-32k
Palmyra-X-32K (33B parameters) is a Transformer-based model, which is trained on large-scale pre-training data. The pre-training data types are diverse and cover a wide range of areas. These data types are used in conjunction and the alignment mechanism to extend context window.
Yandex
YaLM (100B) — yandex/yalm
YaLM (100B parameters) is an autoregressive language model trained on English and Russian text (GitHub).
BigCode
SantaCoder (1.1B) — bigcode/santacoder
SantaCoder (1.1B parameters) model trained on the Python, Java, and JavaScript subset of The Stack (v1.1) (model card).
StarCoder (15.5B) — bigcode/starcoder
The StarCoder (15.5B parameter) model trained on 80+ programming languages from The Stack (v1.2) (model card).
Codey PaLM-2 (Bison) — google/code-bison@001
A model fine-tuned to generate code based on a natural language description of the desired code. PaLM 2 (Pathways Language Model) is a Transformer-based model trained using a mixture of objectives that was evaluated on English and multilingual language, and reasoning tasks. (report)
Codey PaLM-2 (Bison) — google/code-bison@002
A model fine-tuned to generate code based on a natural language description of the desired code. PaLM 2 (Pathways Language Model) is a Transformer-based model trained using a mixture of objectives that was evaluated on English and multilingual language, and reasoning tasks. (report)
Codey PaLM-2 (Bison) — google/code-bison-32k
Codey with a 32K context. PaLM 2 (Pathways Language Model) is a Transformer-based model trained using a mixture of objectives that was evaluated on English and multilingual language, and reasoning tasks. (report)
OpenAI
code-davinci-002 — openai/code-davinci-002
Codex-style model that is designed for pure code-completion tasks (docs).
code-davinci-001 — openai/code-davinci-001
code-davinci-001 model
code-cushman-001 (12B) — openai/code-cushman-001
Codex-style model that is a stronger, multilingual version of the Codex (12B) model in the Codex paper.
HEIM (text-to-image evaluation)
For a list of text-to-image models, please visit the models page of the HEIM results website.