Model request API

HELM represents model calls with the shared Request and RequestResult dataclasses in helm.common.request. These classes are the common boundary between scenarios, clients, local execution, and cached raw results.

Use this API when you need to make a model request from Python code or inspect the exact request and response fields used by HELM runs.

Request and response formats

`Request` `dataclass`

A Request specifies how to query a language model (given a prompt, complete it). It is the unified representation for communicating with various APIs (e.g., GPT-3, Jurassic).

`model_deployment: str = ''` `class-attribute` `instance-attribute`

Which model deployment to query -> Determines the Client. Refers to a deployment in the model deployment registry.

`model: str = ''` `class-attribute` `instance-attribute`

Which model to use -> Determines the Engine. Refers to a model metadata in the model registry.

`embedding: bool = False` `class-attribute` `instance-attribute`

Whether to query embedding instead of text response

`prompt: str = ''` `class-attribute` `instance-attribute`

What prompt do condition the language model on

`temperature: float = 1.0` `class-attribute` `instance-attribute`

Temperature parameter that governs diversity

`num_completions: int = 1` `class-attribute` `instance-attribute`

Generate this many completions (by sampling from the model)

`top_k_per_token: int = 1` `class-attribute` `instance-attribute`

Take this many highest probability candidates per token in the completion

`max_tokens: int = 100` `class-attribute` `instance-attribute`

Maximum number of tokens to generate (per completion)

`stop_sequences: List[str] = field(default_factory=list)` `class-attribute` `instance-attribute`

Stop generating once we hit one of these strings.

`echo_prompt: bool = False` `class-attribute` `instance-attribute`

Should prompt be included as a prefix of each completion? (e.g., for evaluating perplexity of the prompt)

`top_p: float = 1` `class-attribute` `instance-attribute`

Same from tokens that occupy this probability mass (nucleus sampling)

`presence_penalty: float = 0` `class-attribute` `instance-attribute`

Penalize repetition (OpenAI & Writer only)

`frequency_penalty: float = 0` `class-attribute` `instance-attribute`

Penalize repetition (OpenAI & Writer only)

`random: Optional[str] = None` `class-attribute` `instance-attribute`

Used to control randomness. Expect different responses for the same request but with different values for random.

`messages: Optional[List[Dict[str, str]]] = None` `class-attribute` `instance-attribute`

Used for chat models. (OpenAI only for now). if messages is specified for a chat model, the prompt is ignored. Otherwise, the client should convert the prompt into a message.

`multimodal_prompt: Optional[MultimediaObject] = None` `class-attribute` `instance-attribute`

Multimodal prompt with media objects interleaved (e.g., text, video, image, text, ...)

`image_generation_parameters: Optional[ImageGenerationParameters] = None` `class-attribute` `instance-attribute`

Parameters for image generation.

`response_format: Optional[ResponseFormat] = None` `class-attribute` `instance-attribute`

EXPERIMENTAL: Response format. Currently only supported by OpenAI and Together.

`model_host: str` `property`

Returns the model host (referring to the deployment). Not to be confused with the model creator organization (referring to the model).

'openai/davinci' => 'openai'

'together/bloom' => 'together'

`model_engine: str` `property`

Returns the model engine (referring to the model). This is often the same as self.model_deploymentl.split("/")[1], but not always. For example, one model could be served on several servers (each with a different model_deployment) In that case we would have for example: 'aws/bloom-1', 'aws/bloom-2', 'aws/bloom-3' => 'bloom' This is why we need to keep track of the model engine with the model metadata. Example: 'openai/davinci' => 'davinci'

`validate()`

`RequestResult` `dataclass`

What comes back due to a Request.

`success: bool` `instance-attribute`

Whether the request was successful

`embedding: List[float]` `instance-attribute`

Fixed dimensional embedding corresponding to the entire prompt

`completions: List[GeneratedOutput]` `instance-attribute`

List of completion

`cached: bool` `instance-attribute`

Whether the request was actually cached

`request_time: Optional[float] = None` `class-attribute` `instance-attribute`

How long the request took in seconds

`request_datetime: Optional[int] = None` `class-attribute` `instance-attribute`

When was the request sent? We keep track of when the request was made because the underlying model or inference procedure backing the API might change over time. The integer represents the current time in seconds since the Epoch (January 1, 1970).

`error: Optional[str] = None` `class-attribute` `instance-attribute`

If success is false, what was the error?

`error_flags: Optional[ErrorFlags] = None` `class-attribute` `instance-attribute`

Describes how to treat errors in the request.

`batch_size: Optional[int] = None` `class-attribute` `instance-attribute`

Batch size (TogetherClient only)

`batch_request_time: Optional[float] = None` `class-attribute` `instance-attribute`

How long it took to process the batch? (TogetherClient only)

`render_lines() -> List[str]`

`GeneratedOutput` `dataclass`

A GeneratedOutput is a single generated output that may contain text or multimodal content.

`text: str` `instance-attribute`

`logprob: float` `instance-attribute`

`tokens: List[Token]` `instance-attribute`

`finish_reason: Optional[Dict[str, Any]] = None` `class-attribute` `instance-attribute`

`multimodal_content: Optional[MultimediaObject] = None` `class-attribute` `instance-attribute`

`thinking: Optional[Thinking] = None` `class-attribute` `instance-attribute`

`add(other: GeneratedOutput) -> GeneratedOutput`

`render_lines() -> List[str]`

`Token` `dataclass`

A Token represents one token position in a Sequence, which has the chosen text as well as the top probabilities under the model.

`text: str` `instance-attribute`

`logprob: float` `instance-attribute`

`render_lines() -> List[str]`

Making a local request through `AutoClient`

Use AutoClient when you want HELM to select the concrete client from the model_deployment field and use local credentials directly. This is the recommended path for making model requests from Python code. AutoClient requires a credentials mapping, a file storage path, and a cache backend configuration. The example below uses BlackHoleCacheBackendConfig, which does not persist cache entries.

from helm.clients.auto_client import AutoClient
from helm.common.cache_backend_config import BlackHoleCacheBackendConfig
from helm.common.request import Request

client = AutoClient(
    credentials={"openaiApiKey": "YOUR_OPENAI_API_KEY"},
    file_storage_path="prod_env/cache",
    cache_backend_config=BlackHoleCacheBackendConfig(),
)

request = Request(
    model_deployment="openai/gpt-4o-mini",
    model="openai/gpt-4o-mini",
    prompt="Explain HELM in one sentence.",
    max_tokens=64,
    temperature=0.0,
)

result = client.make_request(request)

if result.success:
    print(result.completions[0].text)
else:
    print(result.error)

See helm.clients.auto_client.AutoClient for the complete local-client interface.

Using a persistent cache

Use SqliteCacheBackendConfig when you want HELM to persist request results locally:

from helm.common.cache_backend_config import SqliteCacheBackendConfig

cache_backend_config = SqliteCacheBackendConfig(path="prod_env/cache")

Pass this value as cache_backend_config when constructing AutoClient.

Call request.validate() before dispatch if you construct requests dynamically and want to fail early on incompatible prompt fields.

Model request API

Request and response formats

Request dataclass

model_deployment: str = '' class-attribute instance-attribute

model: str = '' class-attribute instance-attribute

embedding: bool = False class-attribute instance-attribute

prompt: str = '' class-attribute instance-attribute

temperature: float = 1.0 class-attribute instance-attribute

num_completions: int = 1 class-attribute instance-attribute

top_k_per_token: int = 1 class-attribute instance-attribute

max_tokens: int = 100 class-attribute instance-attribute

stop_sequences: List[str] = field(default_factory=list) class-attribute instance-attribute

echo_prompt: bool = False class-attribute instance-attribute

top_p: float = 1 class-attribute instance-attribute

presence_penalty: float = 0 class-attribute instance-attribute

frequency_penalty: float = 0 class-attribute instance-attribute

random: Optional[str] = None class-attribute instance-attribute

messages: Optional[List[Dict[str, str]]] = None class-attribute instance-attribute

multimodal_prompt: Optional[MultimediaObject] = None class-attribute instance-attribute

image_generation_parameters: Optional[ImageGenerationParameters] = None class-attribute instance-attribute

response_format: Optional[ResponseFormat] = None class-attribute instance-attribute

model_host: str property

model_engine: str property

validate()

RequestResult dataclass

success: bool instance-attribute

embedding: List[float] instance-attribute

completions: List[GeneratedOutput] instance-attribute

cached: bool instance-attribute

request_time: Optional[float] = None class-attribute instance-attribute

request_datetime: Optional[int] = None class-attribute instance-attribute

error: Optional[str] = None class-attribute instance-attribute

error_flags: Optional[ErrorFlags] = None class-attribute instance-attribute

batch_size: Optional[int] = None class-attribute instance-attribute

batch_request_time: Optional[float] = None class-attribute instance-attribute

render_lines() -> List[str]

GeneratedOutput dataclass

text: str instance-attribute

logprob: float instance-attribute

tokens: List[Token] instance-attribute

finish_reason: Optional[Dict[str, Any]] = None class-attribute instance-attribute

multimodal_content: Optional[MultimediaObject] = None class-attribute instance-attribute

thinking: Optional[Thinking] = None class-attribute instance-attribute

__add__(other: GeneratedOutput) -> GeneratedOutput

render_lines() -> List[str]

Token dataclass

text: str instance-attribute

logprob: float instance-attribute

render_lines() -> List[str]

Making a local request through AutoClient

Using a persistent cache

`Request` `dataclass`

`model_deployment: str = ''` `class-attribute` `instance-attribute`

`model: str = ''` `class-attribute` `instance-attribute`

`embedding: bool = False` `class-attribute` `instance-attribute`

`prompt: str = ''` `class-attribute` `instance-attribute`

`temperature: float = 1.0` `class-attribute` `instance-attribute`

`num_completions: int = 1` `class-attribute` `instance-attribute`

`top_k_per_token: int = 1` `class-attribute` `instance-attribute`

`max_tokens: int = 100` `class-attribute` `instance-attribute`

`stop_sequences: List[str] = field(default_factory=list)` `class-attribute` `instance-attribute`

`echo_prompt: bool = False` `class-attribute` `instance-attribute`

`top_p: float = 1` `class-attribute` `instance-attribute`

`presence_penalty: float = 0` `class-attribute` `instance-attribute`

`frequency_penalty: float = 0` `class-attribute` `instance-attribute`

`random: Optional[str] = None` `class-attribute` `instance-attribute`

`messages: Optional[List[Dict[str, str]]] = None` `class-attribute` `instance-attribute`

`multimodal_prompt: Optional[MultimediaObject] = None` `class-attribute` `instance-attribute`

`image_generation_parameters: Optional[ImageGenerationParameters] = None` `class-attribute` `instance-attribute`

`response_format: Optional[ResponseFormat] = None` `class-attribute` `instance-attribute`

`model_host: str` `property`

`model_engine: str` `property`

`validate()`

`RequestResult` `dataclass`

`success: bool` `instance-attribute`

`embedding: List[float]` `instance-attribute`

`completions: List[GeneratedOutput]` `instance-attribute`

`cached: bool` `instance-attribute`

`request_time: Optional[float] = None` `class-attribute` `instance-attribute`

`request_datetime: Optional[int] = None` `class-attribute` `instance-attribute`

`error: Optional[str] = None` `class-attribute` `instance-attribute`

`error_flags: Optional[ErrorFlags] = None` `class-attribute` `instance-attribute`

`batch_size: Optional[int] = None` `class-attribute` `instance-attribute`

`batch_request_time: Optional[float] = None` `class-attribute` `instance-attribute`

`render_lines() -> List[str]`

`GeneratedOutput` `dataclass`

`text: str` `instance-attribute`

`logprob: float` `instance-attribute`

`tokens: List[Token]` `instance-attribute`

`finish_reason: Optional[Dict[str, Any]] = None` `class-attribute` `instance-attribute`

`multimodal_content: Optional[MultimediaObject] = None` `class-attribute` `instance-attribute`

`thinking: Optional[Thinking] = None` `class-attribute` `instance-attribute`

`add(other: GeneratedOutput) -> GeneratedOutput`

`render_lines() -> List[str]`

`Token` `dataclass`

`text: str` `instance-attribute`

`logprob: float` `instance-attribute`

`render_lines() -> List[str]`

Making a local request through `AutoClient`