Coding Agents
Build agents that can write, test, debug, and execute code autonomously. Covers the tool-calling loop pattern, code execution sandboxing, and model selection for coding tasks.
Use this page with an AI assistant
Opens a new chat with this docs URL and the correct AICredits base URLs.
Overview
A coding agent combines an LLM with tools — file I/O, code execution, shell commands, web search — in an iterative loop. The model plans, takes action (tool call), observes the result, and continues until the task is complete.
AICredits gives you access to the best coding models (Claude Sonnet, GPT-4o, Gemini) through a single API, making it easy to compare and swap models without changing your agent code.
Agent Architecture
- Plan — Model receives the task and plans the steps needed to complete it.
- Act — Model calls a tool (write file, run code, search, etc.).
- Observe — Tool output is fed back to the model as a tool result.
- Repeat — Model continues the loop until
finish_reasonis"stop"(task complete).
Basic Coding Agent
A minimal agent that can read files, write files, and run Python code:
import json
import subprocess
import tempfile
from pathlib import Path
from openai import OpenAI
client = OpenAI(
base_url="https://api.aicredits.in/v1",
api_key="sk-your-key-here",
)
tools = [
{
"type": "function",
"function": {
"name": "read_file",
"description": "Read the contents of a file",
"parameters": {
"type": "object",
"properties": {
"path": {"type": "string", "description": "File path to read"},
},
"required": ["path"],
},
},
},
{
"type": "function",
"function": {
"name": "write_file",
"description": "Write content to a file",
"parameters": {
"type": "object",
"properties": {
"path": {"type": "string"},
"content": {"type": "string"},
},
"required": ["path", "content"],
},
},
},
{
"type": "function",
"function": {
"name": "run_python",
"description": "Execute Python code and return stdout/stderr",
"parameters": {
"type": "object",
"properties": {
"code": {"type": "string", "description": "Python code to execute"},
},
"required": ["code"],
},
},
},
]Tool Calling Loop
def execute_tool(name: str, args: dict) -> str:
if name == "read_file":
try:
return Path(args["path"]).read_text()
except Exception as e:
return f"Error: {e}"
elif name == "write_file":
try:
path = Path(args["path"])
path.parent.mkdir(parents=True, exist_ok=True)
path.write_text(args["content"])
return f"Written {len(args['content'])} bytes to {args['path']}"
except Exception as e:
return f"Error: {e}"
elif name == "run_python":
with tempfile.NamedTemporaryFile(suffix=".py", mode="w", delete=False) as f:
f.write(args["code"])
tmp = f.name
result = subprocess.run(
["python", tmp],
capture_output=True, text=True, timeout=30
)
output = result.stdout + result.stderr
return output[:2000]
return f"Unknown tool: {name}"
def run_agent(task: str, max_iterations: int = 10) -> str:
messages = [
{"role": "system", "content": "You are a coding assistant. Use tools to complete tasks."},
{"role": "user", "content": task},
]
for i in range(max_iterations):
response = client.chat.completions.create(
model="anthropic/claude-sonnet-4.5",
messages=messages,
tools=tools,
tool_choice="auto",
)
message = response.choices[0].message
messages.append(message)
if response.choices[0].finish_reason == "stop":
return message.content
if message.tool_calls:
for tool_call in message.tool_calls:
args = json.loads(tool_call.function.arguments)
result = execute_tool(tool_call.function.name, args)
print(f"Tool: {tool_call.function.name} → {result[:100]}...")
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": result,
})
return "Max iterations reached."
result = run_agent(
"Create a Python function that calculates Fibonacci numbers iteratively, "
"write it to fibonacci.py, and run it to verify the first 10 numbers."
)
print(result)Code Execution Tool
Never run LLM-generated code directly in your production environment. Use Docker containers, VMs, or a dedicated sandbox service. Add resource limits, network isolation, and timeout enforcement for production.
import docker
import tempfile
def run_python_sandboxed(code: str, timeout: int = 10) -> str:
client_docker = docker.from_env()
with tempfile.NamedTemporaryFile(suffix=".py", mode="w", delete=False) as f:
f.write(code)
tmp_path = f.name
try:
output = client_docker.containers.run(
"python:3.12-slim",
f"python /code/script.py",
volumes={tmp_path: {"bind": "/code/script.py", "mode": "ro"}},
remove=True,
network_disabled=True,
mem_limit="128m",
cpu_period=100000,
cpu_quota=50000,
timeout=timeout,
)
return output.decode("utf-8")[:2000]
except docker.errors.ContainerError as e:
return f"Runtime error: {e.stderr.decode()}"
except Exception as e:
return f"Execution error: {e}"Choosing a Model
| Model | Coding Strength | Notes |
|---|---|---|
openai/gpt-5.4 | Excellent | Latest GPT — strong reasoning, fast tool calls |
anthropic/claude-sonnet-4.5 | Excellent | Best overall coding agent, great tool use |
openai/gpt-4o | Excellent | Strong across all languages, reliable tool calls |
openai/o3-mini | Very High | Best for hard algorithmic problems, slow |
openai/gpt-4o-mini | Good | Fast and cheap for simple tasks |
google/gemini-2.0-flash | Good | Very fast, good for quick iterations |
deepseek/deepseek-chat | Good | Cost-effective, strong on Python/Go |
Safety Considerations
- Sandbox all code execution — Never run LLM-generated code outside an isolated environment.
- Limit filesystem access — Restrict which directories the agent can read/write.
- Set iteration limits — Cap the agent loop to prevent runaway costs (max 10–20 iterations).
- Log all tool calls — Keep an audit trail of every action the agent takes.
- Rate limit per user — Prevent users from triggering expensive long-running agents repeatedly.
- Review tool definitions — Only give the agent tools it actually needs — principle of least privilege.