AICredits logo
Guides

Coding Agents

Build agents that can write, test, debug, and execute code autonomously. Covers the tool-calling loop pattern, code execution sandboxing, and model selection for coding tasks.

Use this page with an AI assistant

Opens a new chat with this docs URL and the correct AICredits base URLs.

Overview

A coding agent combines an LLM with tools — file I/O, code execution, shell commands, web search — in an iterative loop. The model plans, takes action (tool call), observes the result, and continues until the task is complete.

AICredits gives you access to the best coding models (Claude Sonnet, GPT-4o, Gemini) through a single API, making it easy to compare and swap models without changing your agent code.

Agent Architecture

  1. Plan — Model receives the task and plans the steps needed to complete it.
  2. Act — Model calls a tool (write file, run code, search, etc.).
  3. Observe — Tool output is fed back to the model as a tool result.
  4. Repeat — Model continues the loop until finish_reason is "stop" (task complete).

Basic Coding Agent

A minimal agent that can read files, write files, and run Python code:

Define agent tools
import json
import subprocess
import tempfile
from pathlib import Path
from openai import OpenAI

client = OpenAI(
    base_url="https://api.aicredits.in/v1",
    api_key="sk-your-key-here",
)

tools = [
    {
        "type": "function",
        "function": {
            "name": "read_file",
            "description": "Read the contents of a file",
            "parameters": {
                "type": "object",
                "properties": {
                    "path": {"type": "string", "description": "File path to read"},
                },
                "required": ["path"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "write_file",
            "description": "Write content to a file",
            "parameters": {
                "type": "object",
                "properties": {
                    "path": {"type": "string"},
                    "content": {"type": "string"},
                },
                "required": ["path", "content"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "run_python",
            "description": "Execute Python code and return stdout/stderr",
            "parameters": {
                "type": "object",
                "properties": {
                    "code": {"type": "string", "description": "Python code to execute"},
                },
                "required": ["code"],
            },
        },
    },
]

Tool Calling Loop

Agent loop
def execute_tool(name: str, args: dict) -> str:
    if name == "read_file":
        try:
            return Path(args["path"]).read_text()
        except Exception as e:
            return f"Error: {e}"

    elif name == "write_file":
        try:
            path = Path(args["path"])
            path.parent.mkdir(parents=True, exist_ok=True)
            path.write_text(args["content"])
            return f"Written {len(args['content'])} bytes to {args['path']}"
        except Exception as e:
            return f"Error: {e}"

    elif name == "run_python":
        with tempfile.NamedTemporaryFile(suffix=".py", mode="w", delete=False) as f:
            f.write(args["code"])
            tmp = f.name
        result = subprocess.run(
            ["python", tmp],
            capture_output=True, text=True, timeout=30
        )
        output = result.stdout + result.stderr
        return output[:2000]

    return f"Unknown tool: {name}"


def run_agent(task: str, max_iterations: int = 10) -> str:
    messages = [
        {"role": "system", "content": "You are a coding assistant. Use tools to complete tasks."},
        {"role": "user", "content": task},
    ]

    for i in range(max_iterations):
        response = client.chat.completions.create(
            model="anthropic/claude-sonnet-4.5",
            messages=messages,
            tools=tools,
            tool_choice="auto",
        )

        message = response.choices[0].message
        messages.append(message)

        if response.choices[0].finish_reason == "stop":
            return message.content

        if message.tool_calls:
            for tool_call in message.tool_calls:
                args = json.loads(tool_call.function.arguments)
                result = execute_tool(tool_call.function.name, args)
                print(f"Tool: {tool_call.function.name}{result[:100]}...")

                messages.append({
                    "role": "tool",
                    "tool_call_id": tool_call.id,
                    "content": result,
                })

    return "Max iterations reached."


result = run_agent(
    "Create a Python function that calculates Fibonacci numbers iteratively, "
    "write it to fibonacci.py, and run it to verify the first 10 numbers."
)
print(result)

Code Execution Tool

Never run LLM-generated code directly in your production environment. Use Docker containers, VMs, or a dedicated sandbox service. Add resource limits, network isolation, and timeout enforcement for production.

Sandboxed code execution (Docker)
import docker
import tempfile

def run_python_sandboxed(code: str, timeout: int = 10) -> str:
    client_docker = docker.from_env()

    with tempfile.NamedTemporaryFile(suffix=".py", mode="w", delete=False) as f:
        f.write(code)
        tmp_path = f.name

    try:
        output = client_docker.containers.run(
            "python:3.12-slim",
            f"python /code/script.py",
            volumes={tmp_path: {"bind": "/code/script.py", "mode": "ro"}},
            remove=True,
            network_disabled=True,
            mem_limit="128m",
            cpu_period=100000,
            cpu_quota=50000,
            timeout=timeout,
        )
        return output.decode("utf-8")[:2000]
    except docker.errors.ContainerError as e:
        return f"Runtime error: {e.stderr.decode()}"
    except Exception as e:
        return f"Execution error: {e}"

Choosing a Model

ModelCoding StrengthNotes
openai/gpt-5.4ExcellentLatest GPT — strong reasoning, fast tool calls
anthropic/claude-sonnet-4.5ExcellentBest overall coding agent, great tool use
openai/gpt-4oExcellentStrong across all languages, reliable tool calls
openai/o3-miniVery HighBest for hard algorithmic problems, slow
openai/gpt-4o-miniGoodFast and cheap for simple tasks
google/gemini-2.0-flashGoodVery fast, good for quick iterations
deepseek/deepseek-chatGoodCost-effective, strong on Python/Go

Safety Considerations

  • Sandbox all code execution — Never run LLM-generated code outside an isolated environment.
  • Limit filesystem access — Restrict which directories the agent can read/write.
  • Set iteration limits — Cap the agent loop to prevent runaway costs (max 10–20 iterations).
  • Log all tool calls — Keep an audit trail of every action the agent takes.
  • Rate limit per user — Prevent users from triggering expensive long-running agents repeatedly.
  • Review tool definitions — Only give the agent tools it actually needs — principle of least privilege.

On this page