Coding Agents

Build agents that can write, test, debug, and execute code autonomously. Covers the tool-calling loop pattern, code execution sandboxing, and model selection for coding tasks.

Overview

A coding agent combines an LLM with tools — file I/O, code execution, shell commands, web search — in an iterative loop. The model plans, takes action (tool call), observes the result, and continues until the task is complete.

AICredits gives you access to the best coding models (Claude Sonnet, GPT-4o, Gemini) through a single API, making it easy to compare and swap models without changing your agent code.

Agent Architecture

Plan — Model receives the task and plans the steps needed to complete it.
Act — Model calls a tool (write file, run code, search, etc.).
Observe — Tool output is fed back to the model as a tool result.
Repeat — Model continues the loop until finish_reason is "stop" (task complete).

Basic Coding Agent

A minimal agent that can read files, write files, and run Python code:

Define agent tools

import json
import subprocess
import tempfile
from pathlib import Path
from openai import OpenAI

client = OpenAI(
    base_url="https://api.aicredits.in/v1",
    api_key="sk-your-key-here",
)

tools = [
    {
        "type": "function",
        "function": {
            "name": "read_file",
            "description": "Read the contents of a file",
            "parameters": {
                "type": "object",
                "properties": {
                    "path": {"type": "string", "description": "File path to read"},
                },
                "required": ["path"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "write_file",
            "description": "Write content to a file",
            "parameters": {
                "type": "object",
                "properties": {
                    "path": {"type": "string"},
                    "content": {"type": "string"},
                },
                "required": ["path", "content"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "run_python",
            "description": "Execute Python code and return stdout/stderr",
            "parameters": {
                "type": "object",
                "properties": {
                    "code": {"type": "string", "description": "Python code to execute"},
                },
                "required": ["code"],
            },
        },
    },
]

Tool Calling Loop

Agent loop

def execute_tool(name: str, args: dict) -> str:
    if name == "read_file":
        try:
            return Path(args["path"]).read_text()
        except Exception as e:
            return f"Error: {e}"

    elif name == "write_file":
        try:
            path = Path(args["path"])
            path.parent.mkdir(parents=True, exist_ok=True)
            path.write_text(args["content"])
            return f"Written {len(args['content'])} bytes to {args['path']}"
        except Exception as e:
            return f"Error: {e}"

    elif name == "run_python":
        with tempfile.NamedTemporaryFile(suffix=".py", mode="w", delete=False) as f:
            f.write(args["code"])
            tmp = f.name
        result = subprocess.run(
            ["python", tmp],
            capture_output=True, text=True, timeout=30
        )
        output = result.stdout + result.stderr
        return output[:2000]

    return f"Unknown tool: {name}"


def run_agent(task: str, max_iterations: int = 10) -> str:
    messages = [
        {"role": "system", "content": "You are a coding assistant. Use tools to complete tasks."},
        {"role": "user", "content": task},
    ]

    for i in range(max_iterations):
        response = client.chat.completions.create(
            model="anthropic/claude-sonnet-4.5",
            messages=messages,
            tools=tools,
            tool_choice="auto",
        )

        message = response.choices[0].message
        messages.append(message)

        if response.choices[0].finish_reason == "stop":
            return message.content

        if message.tool_calls:
            for tool_call in message.tool_calls:
                args = json.loads(tool_call.function.arguments)
                result = execute_tool(tool_call.function.name, args)
                print(f"Tool: {tool_call.function.name} → {result[:100]}...")

                messages.append({
                    "role": "tool",
                    "tool_call_id": tool_call.id,
                    "content": result,
                })

    return "Max iterations reached."


result = run_agent(
    "Create a Python function that calculates Fibonacci numbers iteratively, "
    "write it to fibonacci.py, and run it to verify the first 10 numbers."
)
print(result)

Code Execution Tool

Never run LLM-generated code directly in your production environment. Use Docker containers, VMs, or a dedicated sandbox service. Add resource limits, network isolation, and timeout enforcement for production.

Sandboxed code execution (Docker)

import docker
import tempfile

def run_python_sandboxed(code: str, timeout: int = 10) -> str:
    client_docker = docker.from_env()

    with tempfile.NamedTemporaryFile(suffix=".py", mode="w", delete=False) as f:
        f.write(code)
        tmp_path = f.name

    try:
        output = client_docker.containers.run(
            "python:3.12-slim",
            f"python /code/script.py",
            volumes={tmp_path: {"bind": "/code/script.py", "mode": "ro"}},
            remove=True,
            network_disabled=True,
            mem_limit="128m",
            cpu_period=100000,
            cpu_quota=50000,
            timeout=timeout,
        )
        return output.decode("utf-8")[:2000]
    except docker.errors.ContainerError as e:
        return f"Runtime error: {e.stderr.decode()}"
    except Exception as e:
        return f"Execution error: {e}"

Choosing a Model

Model	Coding Strength	Notes
`openai/gpt-5.4`	Excellent	Latest GPT — strong reasoning, fast tool calls
`anthropic/claude-sonnet-4.5`	Excellent	Best overall coding agent, great tool use
`openai/gpt-4o`	Excellent	Strong across all languages, reliable tool calls
`openai/o3-mini`	Very High	Best for hard algorithmic problems, slow
`openai/gpt-4o-mini`	Good	Fast and cheap for simple tasks
`google/gemini-2.0-flash`	Good	Very fast, good for quick iterations
`deepseek/deepseek-chat`	Good	Cost-effective, strong on Python/Go

Safety Considerations

Sandbox all code execution — Never run LLM-generated code outside an isolated environment.
Limit filesystem access — Restrict which directories the agent can read/write.
Set iteration limits — Cap the agent loop to prevent runaway costs (max 10–20 iterations).
Log all tool calls — Keep an audit trail of every action the agent takes.
Rate limit per user — Prevent users from triggering expensive long-running agents repeatedly.
Review tool definitions — Only give the agent tools it actually needs — principle of least privilege.