Models

Tutorial

Please see the model section in the installation guide for an overview of the different models and how to configure them.

This page documents the configuration objects used to specify the behavior of a language model (LM).

In most cases, you will want to use the GenericAPIModelConfig object.

API LMs

sweagent.agent.models.GenericAPIModelConfig `pydantic-model`

Bases: BaseModel

This configuration object specifies a LM like GPT4 or similar. The model will be served with the help of the litellm library.

Config:

extra: forbid

Fields:

name (str)
per_instance_cost_limit (float)
total_cost_limit (float)
per_instance_call_limit (int)
temperature (float)
top_p (float | None)
api_base (str | None)
api_version (str | None)
api_key (SecretStr | None)
stop (list[str])
completion_kwargs (dict[str, Any])
convert_system_to_user (bool)
retry (RetryConfig)
delay (float)
fallbacks (list[dict[str, Any]])
choose_api_key_by_thread (bool)
max_input_tokens (int | None)
max_output_tokens (int | None)
litellm_model_registry (str | None)
custom_tokenizer (dict[str, Any] | None)

name `pydantic-field`

name: str

Name of the model.

per_instance_cost_limit `pydantic-field`

per_instance_cost_limit: float = 3.0

Cost limit for every instance (task).

total_cost_limit `pydantic-field`

total_cost_limit: float = 0.0

Total cost limit.

per_instance_call_limit `pydantic-field`

per_instance_call_limit: int = 0

Per instance call limit.

temperature `pydantic-field`

temperature: float = 0.0

Sampling temperature

top_p `pydantic-field`

top_p: float | None = 1.0

Sampling top-p

api_base `pydantic-field`

api_base: str | None = None

api_version `pydantic-field`

api_version: str | None = None

api_key `pydantic-field`

api_key: SecretStr | None = None

API key to the model. We recommend using environment variables to set this instead or putting your environment variables in a .env file. You can concatenate more than one key by separating them with :::, e.g., key1:::key2. If field starts with $, it will be interpreted as an environment variable.

stop `pydantic-field`

stop: list[str] = []

Custom stop sequences

completion_kwargs `pydantic-field`

completion_kwargs: dict[str, Any] = {}

Additional kwargs to pass to litellm.completion

convert_system_to_user `pydantic-field`

convert_system_to_user: bool = False

Whether to convert system messages to user messages. This is useful for models that do not support system messages like o1.

retry `pydantic-field`

retry: RetryConfig

Retry configuration: How often to retry after a failure (e.g., from a rate limit) etc.

delay `pydantic-field`

delay: float = 0.0

Minimum delay before querying (this can help to avoid overusing the API if sharing it with other people).

fallbacks `pydantic-field`

fallbacks: list[dict[str, Any]] = []

List of fallbacks to try if the main model fails See https://docs.litellm.ai/docs/completion/reliable_completions#fallbacks-sdk for more information.

choose_api_key_by_thread `pydantic-field`

choose_api_key_by_thread: bool = True

Whether to choose the API key based on the thread name (if multiple are configured). This ensures that with run-batch, we use the same API key within a single-thread so that prompt caching still works.

max_input_tokens `pydantic-field`

max_input_tokens: int | None = None

If set, this will override the max input tokens for the model that we usually look up from litellm.model_cost. Use this for local models or if you want to set a custom max input token limit. If this value is exceeded, a ContextWindowExceededError will be raised. Set this to 0 to disable this check.

max_output_tokens `pydantic-field`

max_output_tokens: int | None = None

If set, this will override the max output tokens for the model that we usually look up from litellm.model_cost. Use this for local models or if you want to set a custom max output token limit. If this value is exceeded, a ContextWindowExceededError will be raised. Set this to 0 to disable this check.

litellm_model_registry `pydantic-field`

litellm_model_registry: str | None = None

If set, this will override the default model registry for litellm. Use this for local models or models not (yet) in the default litellm model registry for tracking costs.

custom_tokenizer `pydantic-field`

custom_tokenizer: dict[str, Any] | None = None

Override the default tokenizer for the model. Use the arguments of litellm.create_pretrained_tokenizer. Basic example: {"identifier": "hf-internal-testing/llama-tokenizer"}

id `property`

id: str

get_api_keys

get_api_keys() -> list[str]

Returns a list of API keys that were explicitly set in this config. Does not return API keys that were set via environment variables/.env

Source code in sweagent/agent/models.py

def get_api_keys(self) -> list[str]:
    """Returns a list of API keys that were explicitly set in this config.
    Does not return API keys that were set via environment variables/.env
    """
    if self.api_key is None:
        return []
    api_key = self.api_key.get_secret_value()
    if not api_key:
        return []
    if api_key.startswith("$"):
        env_var_name = api_key[1:]
        api_key = os.getenv(env_var_name, "")
        if not api_key:
            get_logger("swea-config", emoji="🔧").warning(f"Environment variable {env_var_name} not set")
            return []
    return api_key.split(":::")

choose_api_key

choose_api_key() -> str | None

Chooses an API key based on the API keys explicitly set in this config. If no API keys are set, returns None (which means that the API key will be taken from the environment variables/.env file).

Source code in sweagent/agent/models.py

def choose_api_key(self) -> str | None:
    """Chooses an API key based on the API keys explicitly set in this config.
    If no API keys are set, returns None (which means that the API key will be
    taken from the environment variables/.env file).
    """
    api_keys = self.get_api_keys()
    if not api_keys:
        return None
    if not self.choose_api_key_by_thread:
        return random.choice(api_keys)
    thread_name = threading.current_thread().name
    if thread_name not in _THREADS_THAT_USED_API_KEYS:
        _THREADS_THAT_USED_API_KEYS.append(thread_name)
    thread_idx = _THREADS_THAT_USED_API_KEYS.index(thread_name)
    key_idx = thread_idx % len(api_keys)
    get_logger("config", emoji="🔧").debug(
        f"Choosing API key {key_idx} for thread {thread_name} (idx {thread_idx})"
    )
    return api_keys[key_idx]

sweagent.agent.models.RetryConfig `pydantic-model`

Bases: BaseModel

This configuration object specifies how many times to retry a failed LM API call.

Fields:

retries (int)
min_wait (float)
max_wait (float)

retries `pydantic-field`

retries: int = 20

Number of retries

min_wait `pydantic-field`

min_wait: float = 10

Minimum wait time between retries (random exponential wait)

max_wait `pydantic-field`

max_wait: float = 120

Maximum wait time between retries (random exponential wait)

Manual models for testing

The following two models allow you to test your environment by prompting you for actions. This can also be very useful to create your first demonstrations.

sweagent.agent.models.HumanModel

HumanModel(config: HumanModelConfig, tools: ToolConfig)

Bases: AbstractModel

Model that allows for human-in-the-loop

Source code in sweagent/agent/models.py

def __init__(self, config: HumanModelConfig, tools: ToolConfig):
    """Model that allows for human-in-the-loop"""
    self.logger = get_logger("swea-lm", emoji="🤖")
    self.config: HumanModelConfig = config
    self.stats = InstanceStats()

    # Determine which commands require multi-line input
    self.multi_line_command_endings = {
        command.name: command.end_name for command in tools.commands if command.end_name is not None
    }
    self._readline_histfile = REPO_ROOT / ".swe-agent-human-history"
    self._load_readline_history()

logger `instance-attribute`

logger = get_logger('swea-lm', emoji='🤖')

config `instance-attribute`

config: HumanModelConfig = config

stats `instance-attribute`

stats = InstanceStats()

multi_line_command_endings `instance-attribute`

multi_line_command_endings = {(name): (end_name)for command in (commands) if end_name is not None}

query

query(history: History, action_prompt: str = '> ', n: int | None = None, **kwargs) -> dict | list[dict]

Wrapper to separate action prompt from formatting

Source code in sweagent/agent/models.py

def query(self, history: History, action_prompt: str = "> ", n: int | None = None, **kwargs) -> dict | list[dict]:
    """Wrapper to separate action prompt from formatting"""
    out = []
    n_samples = n or 1
    for _ in range(n_samples):
        try:
            out.append(self._query(history, action_prompt))
        except KeyboardInterrupt:
            print("^C (exit with ^D)")
            out.append(self.query(history, action_prompt))
        except EOFError:
            if self.config.catch_eof:
                print("\nGoodbye!")
                out.append({"message": "exit"})
            else:
                # Re-raise EOFError when catch_eof is disabled
                raise
    if n is None:
        return out[0]
    return out

sweagent.agent.models.HumanThoughtModel

HumanThoughtModel(config: HumanModelConfig, tools: ToolConfig)

Bases: HumanModel

Source code in sweagent/agent/models.py

def __init__(self, config: HumanModelConfig, tools: ToolConfig):
    """Model that allows for human-in-the-loop"""
    self.logger = get_logger("swea-lm", emoji="🤖")
    self.config: HumanModelConfig = config
    self.stats = InstanceStats()

    # Determine which commands require multi-line input
    self.multi_line_command_endings = {
        command.name: command.end_name for command in tools.commands if command.end_name is not None
    }
    self._readline_histfile = REPO_ROOT / ".swe-agent-human-history"
    self._load_readline_history()

query

query(history: History, **kwargs) -> dict

Logic for handling user input (both thought + action) to pass to SWEEnv

Source code in sweagent/agent/models.py

def query(self, history: History, **kwargs) -> dict:
    """Logic for handling user input (both thought + action) to pass to SWEEnv"""
    thought_all = ""
    thought = input("Thought (end w/ END_THOUGHT): ")
    while True:
        if "END_THOUGHT" in thought:
            thought = thought.split("END_THOUGHT")[0]
            thought_all += thought
            break
        thought_all += thought
        thought = input("... ")

    action = super()._query(history, action_prompt="Action: ")["message"]

    return {"message": f"{thought_all}\n```\n{action}\n```"}

Replay model for testing and demonstrations

sweagent.agent.models.ReplayModel

ReplayModel(config: ReplayModelConfig, tools: ToolConfig)

Bases: AbstractModel

Model used for replaying a trajectory (i.e., taking all the actions for the .traj file and re-issuing them.

Source code in sweagent/agent/models.py

def __init__(self, config: ReplayModelConfig, tools: ToolConfig):
    """Model used for replaying a trajectory (i.e., taking all the actions for the `.traj` file
    and re-issuing them.
    """
    self.config = config
    self.stats = InstanceStats()

    if not self.config.replay_path.exists():
        msg = f"Replay file {self.config.replay_path} not found"
        raise FileNotFoundError(msg)

    self._replays = [
        list(json.loads(x).values())[0] for x in Path(self.config.replay_path).read_text().splitlines(keepends=True)
    ]
    self._replay_idx = 0
    self._action_idx = 0
    self.use_function_calling = tools.use_function_calling
    self.submit_command = tools.submit_command
    self.logger = get_logger("swea-lm", emoji="🤖")

config `instance-attribute`

config = config

stats `instance-attribute`

stats = InstanceStats()

use_function_calling `instance-attribute`

use_function_calling = use_function_calling

submit_command `instance-attribute`

submit_command = submit_command

logger `instance-attribute`

logger = get_logger('swea-lm', emoji='🤖')

query

query(history: History) -> dict

Logic for tracking which replay action to pass to SWEEnv

Source code in sweagent/agent/models.py

def query(self, history: History) -> dict:
    """Logic for tracking which replay action to pass to SWEEnv"""
    self.stats.api_calls += 1
    actions = self._replays[self._replay_idx]
    try:
        action = actions[self._action_idx]
    except IndexError:
        # log error
        self.logger.error("Reached end of replay trajectory without submitting. Submitting now.")
        if self.use_function_calling:
            action = {
                "message": f"Calling `{self.submit_command}` to submit.",
                "tool_calls": [
                    {
                        "type": "function",
                        "id": "call_submit",
                        "function": {
                            "name": self.submit_command,
                            "arguments": "{}",
                        },
                    }
                ],
            }
        else:
            action = f"```\n{self.submit_command}\n```"

    self._action_idx += 1

    # Assuming `submit` is always last action of replay trajectory
    if isinstance(action, str) and action == "submit":
        self._next_replay()
        return {"message": action}

    # Handle both dict and string actions
    if isinstance(action, dict):
        return action
    return {"message": action}

Our projects

Models

API LMs

sweagent.agent.models.GenericAPIModelConfig pydantic-model

name pydantic-field

per_instance_cost_limit pydantic-field

total_cost_limit pydantic-field

per_instance_call_limit pydantic-field

temperature pydantic-field

top_p pydantic-field

api_base pydantic-field

api_version pydantic-field

api_key pydantic-field

stop pydantic-field

completion_kwargs pydantic-field

convert_system_to_user pydantic-field

retry pydantic-field

delay pydantic-field

fallbacks pydantic-field

choose_api_key_by_thread pydantic-field

max_input_tokens pydantic-field

max_output_tokens pydantic-field

litellm_model_registry pydantic-field

custom_tokenizer pydantic-field

id property

get_api_keys

choose_api_key

sweagent.agent.models.RetryConfig pydantic-model

retries pydantic-field

min_wait pydantic-field

max_wait pydantic-field

Manual models for testing

sweagent.agent.models.HumanModel

logger instance-attribute

config instance-attribute

stats instance-attribute

multi_line_command_endings instance-attribute

query

sweagent.agent.models.HumanThoughtModel

query

Replay model for testing and demonstrations

sweagent.agent.models.ReplayModel

config instance-attribute

stats instance-attribute

use_function_calling instance-attribute

submit_command instance-attribute

logger instance-attribute

query

sweagent.agent.models.GenericAPIModelConfig `pydantic-model`

name `pydantic-field`

per_instance_cost_limit `pydantic-field`

total_cost_limit `pydantic-field`

per_instance_call_limit `pydantic-field`

temperature `pydantic-field`

top_p `pydantic-field`

api_base `pydantic-field`

api_version `pydantic-field`

api_key `pydantic-field`

stop `pydantic-field`

completion_kwargs `pydantic-field`

convert_system_to_user `pydantic-field`

retry `pydantic-field`

delay `pydantic-field`

fallbacks `pydantic-field`

choose_api_key_by_thread `pydantic-field`

max_input_tokens `pydantic-field`

max_output_tokens `pydantic-field`

litellm_model_registry `pydantic-field`

custom_tokenizer `pydantic-field`

id `property`

sweagent.agent.models.RetryConfig `pydantic-model`

retries `pydantic-field`

min_wait `pydantic-field`

max_wait `pydantic-field`

logger `instance-attribute`

config `instance-attribute`

stats `instance-attribute`

multi_line_command_endings `instance-attribute`

config `instance-attribute`

stats `instance-attribute`

use_function_calling `instance-attribute`

submit_command `instance-attribute`

logger `instance-attribute`