Models

Required reading

See the model section in the installation guide for a primer before reading the rest of this page.

Related pages

See the model config reference for the full list of model options
To control how the agent extracts the actions from the model response, see the action parsers reference

Notes for specific models

Local models

See the model section in the installation guide. Remember to unset spending limits and configure the action parser if you cannot support function calling.

For "cost" tracking with local models, you can optionally provide a custom litellm_model_registry file in your configuration. This allows you to define custom pricing information for your local models instead of disabling cost limits entirely. See the local models section for detailed instructions.

Anthropic Claude

Prompt caching makes SWE-agent several times more affordable. While this is done automatically for models like gpt-4o, care has to be taken for Anthropic Claude, as you need to manually set the cache break points.

For this, include the following history processor:

agent:
  history_processors:
  - type: cache_control
    last_n_messages: 2

Other history processors

Other history processors might interfere with the prompt caching if you are not careful. However, if your history processor is only modifying the last observation, you can combine as done here.

Anthropic Claude gives you 4 cache break points per key. You need two of them for a single agent run (because the break points are both used to retrieve and set the cache). Therefore, you can only run two parallel instances of SWE-agent with run-batch per key. To support more parallel running instances, supply multiple keys as described below.

We recommend that you check how often you hit the cache. A very simple way is to go to your trajectory directory and grep like so:

grep -o "cached_tokens=[0-9]*" django__django-11299.debug.log

Note that the maximum number of output tokens of Claude 3.7/4 can be extended with extra headers. See this issue in litellm and and swe-agent PR #1035 for omore information. Since #1036 you can also manually set the maximum output tokens and override the information from litellm.

To use extended thinking, you can set the following in your config:

agent:
  name: 'claude-sonnet-4-20250514'
  model:
    temperature: 1.
    completion_kwargs:
      reasoning_effort: 'high'

o1

Make sure to set

agent:
    model:
        top_p: null
        temperature: 1.

as other values aren't supported by o1.

Using multiple keys

We support rotating through multiple keys for run-batch. For this, concatenate all keys with ::: and set them via the --agent.model.api_key flag. Every thread (i.e., every parallel running agent that is working on one task instance) will stick to one key during the entire run, i.e., this does not break prompt caching.

Custom cost tracking

If you want to track costs for models not in the default litellm registry, you can provide a custom model registry file. This is particularly useful for:

New models not yet supported by litellm's default registry
Overriding default / old cost values in litellm
Local models that you want to track "costs" for, to compare to other results

This file will override entries in the litellm community model cost file.

Create a JSON file with your model's cost information following the litellm model registry format:

my_model_registry.json

{
  "ollama/llama2": {
    "max_tokens": 8192,
    "input_cost_per_token": 0.00002,
    "output_cost_per_token": 0.00006,
    "litellm_provider": "ollama",
    "mode": "chat"
  },
  "my-custom-provider/my-new-model": {
    "max_tokens": 8192,
    "max_input_tokens": 8192,
    "max_output_tokens": 8192,
    "input_cost_per_token": 0.000001,
    "output_cost_per_token": 0.000002,
    "litellm_provider": "openai",
    "mode": "chat"
  }
}

Then specify this registry in your config:

config/your_config.yaml

agent:
  model:
    litellm_model_registry: "my_model_registry.json"  # Path to your custom registry
    ...

If you need to modify the tokenizer that is used when calculating costs, you can set the custom_tokenizer setting in the model config.

Models for testing

We also provide models for testing SWE-agent without spending any credits

HumanModel and HumanThoughtModel will prompt for input from the user that stands in for the output of the LM. This can be used to create new demonstrations.
ReplayModel takes a trajectory as input and "replays it"
InstantEmptySubmitTestModel will create an empty reproduce.py and then submit

bug_report Something broken/unclear?

Open an issue on GitHub!

help Open-ended discussions

Join our Slack!

Our projects