Models

Text Models

AI21 Labs

J1-Jumbo v1 (178B) — `ai21/j1-jumbo`

Jurassic-1 Jumbo (178B parameters) (docs, tech report).

J1-Large v1 (7.5B) — `ai21/j1-large`

Jurassic-1 Large (7.5B parameters) (docs, tech report).

J1-Grande v1 (17B) — `ai21/j1-grande`

Jurassic-1 Grande (17B parameters) with a "few tweaks" to the training process (docs, tech report).

J1-Grande v2 beta (17B) — `ai21/j1-grande-v2-beta`

Jurassic-1 Grande v2 beta (17B parameters)

Jurassic-2 Large (7.5B) — `ai21/j2-large`

Jurassic-2 Large (7.5B parameters) (docs)

Jurassic-2 Grande (17B) — `ai21/j2-grande`

Jurassic-2 Grande (17B parameters) (docs)

Jurassic-2 Jumbo (178B) — `ai21/j2-jumbo`

Jurassic-2 Jumbo (178B parameters) (docs)

Jamba Instruct — `ai21/jamba-instruct`

Jamba Instruct is an instruction tuned version of Jamba, which uses a hybrid Transformer-Mamba mixture-of-experts (MoE) architecture that interleaves blocks of Transformer and Mamba layers. (blog)

Jamba 1.5 Mini — `ai21/jamba-1.5-mini`

Jamba 1.5 Mini is a long-context, hybrid SSM-Transformer instruction following foundation model that is optimized for function calling, structured output, and grounded generation. (blog)

Jamba 1.5 Large — `ai21/jamba-1.5-large`

Jamba 1.5 Large is a long-context, hybrid SSM-Transformer instruction following foundation model that is optimized for function calling, structured output, and grounded generation. (blog)

AI Singapore

SEA-LION (7B) — `aisingapore/sea-lion-7b`

SEA-LION is a collection of language models which has been pretrained and instruct-tuned on languages from the Southeast Asia region. It utilizes the MPT architecture and a custom SEABPETokenizer for tokenization.

SEA-LION Instruct (7B) — `aisingapore/sea-lion-7b-instruct`

SEA-LION is a collection of language models which has been pretrained and instruct-tuned on languages from the Southeast Asia region. It utilizes the MPT architecture and a custom SEABPETokenizer for tokenization.

Aleph Alpha

Luminous Base (13B) — `AlephAlpha/luminous-base`

Luminous Base (13B parameters) (docs)

Luminous Extended (30B) — `AlephAlpha/luminous-extended`

Luminous Extended (30B parameters) (docs)

Luminous Supreme (70B) — `AlephAlpha/luminous-supreme`

Luminous Supreme (70B parameters) (docs)

Amazon

Amazon Titan Text Lite — `amazon/titan-text-lite-v1`

Amazon Titan Text Lite is a lightweight, efficient model perfect for fine-tuning English-language tasks like summarization and copywriting. It caters to customers seeking a smaller, cost-effective, and highly customizable model. It supports various formats, including text generation, code generation, rich text formatting, and orchestration (agents). Key model attributes encompass fine-tuning, text generation, code generation, and rich text formatting.

Amazon Titan Large — `amazon/titan-tg1-large`

Amazon Titan Large is efficient model perfect for fine-tuning English-language tasks like summarization, create article, marketing campaign.

Amazon Titan Text Express — `amazon/titan-text-express-v1`

Amazon Titan Text Express, with a context length of up to 8,000 tokens, excels in advanced language tasks like open-ended text generation and conversational chat. It's also optimized for Retrieval Augmented Generation (RAG). Initially designed for English, the model offers preview multilingual support for over 100 additional languages.

Anthropic

Claude v1.3 — `anthropic/claude-v1.3`

A 52B parameter language model, trained using reinforcement learning from human feedback paper.

Claude Instant V1 — `anthropic/claude-instant-v1`

A lightweight version of Claude, a model trained using reinforcement learning from human feedback (docs).

Claude Instant 1.2 — `anthropic/claude-instant-1.2`

A lightweight version of Claude, a model trained using reinforcement learning from human feedback (docs).

Claude 2.0 — `anthropic/claude-2.0`

Claude 2.0 is a general purpose large language model developed by Anthropic. It uses a transformer architecture and is trained via unsupervised learning, RLHF, and Constitutional AI (including both a supervised and Reinforcement Learning (RL) phase). (model card)

Claude 2.1 — `anthropic/claude-2.1`

Claude 2.1 is a general purpose large language model developed by Anthropic. It uses a transformer architecture and is trained via unsupervised learning, RLHF, and Constitutional AI (including both a supervised and Reinforcement Learning (RL) phase). (model card)

Claude 3 Haiku (20240307) — `anthropic/claude-3-haiku-20240307`

Claude 3 is a a family of models that possess vision and multilingual capabilities. They were trained with various methods such as unsupervised learning and Constitutional AI (blog).

Claude 3 Sonnet (20240229) — `anthropic/claude-3-sonnet-20240229`

Claude 3 is a a family of models that possess vision and multilingual capabilities. They were trained with various methods such as unsupervised learning and Constitutional AI (blog).

Claude 3 Opus (20240229) — `anthropic/claude-3-opus-20240229`

Claude 3 is a a family of models that possess vision and multilingual capabilities. They were trained with various methods such as unsupervised learning and Constitutional AI (blog).

Claude 3.5 Sonnet (20240620) — `anthropic/claude-3-5-sonnet-20240620`

Claude 3.5 Sonnet is a Claude 3 family model which outperforms Claude 3 Opus while operating faster and at a lower cost. (blog)

Anthropic-LM v4-s3 (52B) — `anthropic/stanford-online-all-v4-s3`

A 52B parameter language model, trained using reinforcement learning from human feedback paper.

BigScience

BLOOM (176B) — `bigscience/bloom`

BLOOM (176B parameters) is an autoregressive model trained on 46 natural languages and 13 programming languages (paper).

T0pp (11B) — `bigscience/t0pp`

T0pp (11B parameters) is an encoder-decoder model trained on a large set of different tasks specified in natural language prompts (paper).

BioMistral

BioMistral (7B) — `biomistral/biomistral-7b`

BioMistral 7B is an open-source LLM tailored for the biomedical domain, utilizing Mistral as its foundation model and further pre-trained on PubMed Central.

Cohere

Cohere xlarge v20220609 (52.4B) — `cohere/xlarge-20220609`

Cohere xlarge v20220609 (52.4B parameters)

Cohere large v20220720 (13.1B) — `cohere/large-20220720`

Cohere large v20220720 (13.1B parameters), which is deprecated by Cohere as of December 2, 2022.

Cohere medium v20220720 (6.1B) — `cohere/medium-20220720`

Cohere medium v20220720 (6.1B parameters)

Cohere small v20220720 (410M) — `cohere/small-20220720`

Cohere small v20220720 (410M parameters), which is deprecated by Cohere as of December 2, 2022.

Cohere xlarge v20221108 (52.4B) — `cohere/xlarge-20221108`

Cohere xlarge v20221108 (52.4B parameters)

Cohere medium v20221108 (6.1B) — `cohere/medium-20221108`

Cohere medium v20221108 (6.1B parameters)

Command beta (6.1B) — `cohere/command-medium-beta`

Command beta (6.1B parameters) is fine-tuned from the medium model to respond well with instruction-like prompts (details).

Command beta (52.4B) — `cohere/command-xlarge-beta`

Command beta (52.4B parameters) is fine-tuned from the XL model to respond well with instruction-like prompts (details).

Command — `cohere/command`

Command is Cohere’s flagship text generation model. It is trained to follow user commands and to be instantly useful in practical business applications. docs and changelog

Command Light — `cohere/command-light`

Command is Cohere’s flagship text generation model. It is trained to follow user commands and to be instantly useful in practical business applications. docs and changelog

Command R — `cohere/command-r`

Command R is a multilingual 35B parameter model with a context length of 128K that has been trained with conversational tool use capabilities.

Command R Plus — `cohere/command-r-plus`

Command R+ is a multilingual 104B parameter model with a context length of 128K that has been trained with conversational tool use capabilities.

Databricks

Dolly V2 (3B) — `databricks/dolly-v2-3b`

Dolly V2 (3B) is an instruction-following large language model trained on the Databricks machine learning platform. It is based on pythia-12b.

Dolly V2 (7B) — `databricks/dolly-v2-7b`

Dolly V2 (7B) is an instruction-following large language model trained on the Databricks machine learning platform. It is based on pythia-12b.

Dolly V2 (12B) — `databricks/dolly-v2-12b`

Dolly V2 (12B) is an instruction-following large language model trained on the Databricks machine learning platform. It is based on pythia-12b.

DBRX Instruct — `databricks/dbrx-instruct`

DBRX is a large language model with a fine-grained mixture-of-experts (MoE) architecture that uses 16 experts and chooses 4. It has 132B total parameters, of which 36B parameters are active on any input. (blog post)

DeepSeek

DeepSeek LLM Chat (67B) — `deepseek-ai/deepseek-llm-67b-chat`

DeepSeek LLM Chat is a open-source language model trained on 2 trillion tokens in both English and Chinese, and fine-tuned supervised fine-tuning (SFT) and Direct Preference Optimization (DPO). (paper)

EleutherAI

GPT-J (6B) — `eleutherai/gpt-j-6b`

GPT-J (6B parameters) autoregressive language model trained on The Pile (details).

GPT-NeoX (20B) — `eleutherai/gpt-neox-20b`

GPT-NeoX (20B parameters) autoregressive language model trained on The Pile (paper).

Pythia (1B) — `eleutherai/pythia-1b-v0`

Pythia (1B parameters). The Pythia project combines interpretability analysis and scaling laws to understand how knowledge develops and evolves during training in autoregressive transformers.

Pythia (2.8B) — `eleutherai/pythia-2.8b-v0`

Pythia (2.8B parameters). The Pythia project combines interpretability analysis and scaling laws to understand how knowledge develops and evolves during training in autoregressive transformers.

Pythia (6.9B) — `eleutherai/pythia-6.9b`

Pythia (6.9B parameters). The Pythia project combines interpretability analysis and scaling laws to understand how knowledge develops and evolves during training in autoregressive transformers.

Pythia (12B) — `eleutherai/pythia-12b-v0`

Pythia (12B parameters). The Pythia project combines interpretability analysis and scaling laws to understand how knowledge develops and evolves during training in autoregressive transformers.

EPFL LLM

Meditron (7B) — `epfl-llm/meditron-7b`

Meditron-7B is a 7 billion parameter model adapted to the medical domain from Llama-2-7B through continued pretraining on a comprehensively curated medical corpus.

Google

T5 (11B) — `google/t5-11b`

T5 (11B parameters) is an encoder-decoder model trained on a multi-task mixture, where each task is converted into a text-to-text format (paper).

UL2 (20B) — `google/ul2`

UL2 (20B parameters) is an encoder-decoder model trained on the C4 corpus. It's similar to T5 but trained with a different objective and slightly different scaling knobs (paper).

Flan-T5 (11B) — `google/flan-t5-xxl`

Flan-T5 (11B parameters) is T5 fine-tuned on 1.8K tasks (paper).

Gemini Pro — `google/gemini-pro`

Gemini Pro is a multimodal model able to reason across text, images, video, audio and code. (paper)

Gemini 1.0 Pro (001) — `google/gemini-1.0-pro-001`

Gemini 1.0 Pro is a multimodal model able to reason across text, images, video, audio and code. (paper)

Gemini 1.0 Pro (002) — `google/gemini-1.0-pro-002`

Gemini 1.0 Pro is a multimodal model able to reason across text, images, video, audio and code. (paper)

Gemini 1.5 Pro (001) — `google/gemini-1.5-pro-001`

Gemini 1.5 Pro is a multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from long contexts. This model is accessed through Vertex AI and has all safety thresholds set to BLOCK_NONE. (paper)

Gemini 1.5 Flash (001) — `google/gemini-1.5-flash-001`

Gemini 1.5 Flash is a multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from long contexts. This model is accessed through Vertex AI and has all safety thresholds set to BLOCK_NONE. (paper)

Gemini 1.5 Pro (0409 preview) — `google/gemini-1.5-pro-preview-0409`

Gemini 1.5 Pro is a multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from long contexts. This model is accessed through Vertex AI and has all safety thresholds set to BLOCK_NONE. (paper)

Gemini 1.5 Pro (0514 preview) — `google/gemini-1.5-pro-preview-0514`

Gemini 1.5 Pro is a multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from long contexts. This model is accessed through Vertex AI and has all safety thresholds set to BLOCK_NONE. (paper)

Gemini 1.5 Flash (0514 preview) — `google/gemini-1.5-flash-preview-0514`

Gemini 1.5 Flash is a smaller Gemini model. It has a 1 million token context window and allows interleaving text, images, audio and video as inputs. This model is accessed through Vertex AI and has all safety thresholds set to BLOCK_NONE. (blog)

Gemini 1.5 Pro (001, default safety) — `google/gemini-1.5-pro-001-safety-default`

Gemini 1.5 Pro is a multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from long contexts. This model is accessed through Vertex AI and uses default safety settings. (paper)

Gemini 1.5 Pro (001, BLOCK_NONE safety) — `google/gemini-1.5-pro-001-safety-block-none`

Gemini 1.5 Pro is a multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from long contexts. This model is accessed through Vertex AI and has all safety thresholds set to BLOCK_NONE. (paper)

Gemini 1.5 Flash (001, default safety) — `google/gemini-1.5-flash-001-safety-default`

Gemini 1.5 Flash is a multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from long contexts. This model is accessed through Vertex AI and uses default safety settings. (paper)

Gemini 1.5 Flash (001, BLOCK_NONE safety) — `google/gemini-1.5-flash-001-safety-block-none`

Gemini 1.5 Flash is a multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from long contexts. This model is accessed through Vertex AI and has all safety thresholds set to BLOCK_NONE. (paper)