LLM Function Calling: Build AI Agents That Actually Do Things

Function calling turns passive LLMs into active agents that can fetch data, call APIs, and trigger workflows — here's how to do it right.

Author

AICredits Team

Published

22 Mar 2026

Reading time

11 min read

From chatbot to agent

A plain LLM is a text-in, text-out system. It reasons well, but it cannot fetch your database row, check a live weather feed, or send an email. Function calling (also called "tool use") closes that gap. You describe a set of functions to the model, and when the model decides it needs one, it returns a structured JSON object naming the function and its arguments. Your code executes the real function, feeds the result back into the conversation, and the model continues. That loop — describe, invoke, observe, respond — is what turns a passive chatbot into an agent that actually does things.

This guide covers every layer of that loop: how to define tools, how to write descriptions that actually work, how to build a full agent in Python, how to handle errors, and how to avoid the security traps that catch developers off-guard.

The execution loop

Before writing any code, understand the control flow:

User message arrives — "What's the weather in Mumbai and convert 500 USD to INR?"
First LLM call — you send the message plus your tool definitions. The model does not call the function itself. It returns a tool_calls object naming the function(s) and the arguments it wants to pass.
Your code executes the real function(s) with those arguments.
You append the result(s) to the conversation as role: "tool" messages.
Second LLM call — the model sees the results and generates a natural-language answer.

Steps 3–5 can repeat. A multi-step agent loops until the model stops requesting tools and produces a final answer.

Defining tools: the JSON schema format

Tools are described using a JSON Schema subset. Every tool has three fields:

name — a snake_case identifier the model will use in its tool_calls output
description — a natural-language explanation of what the tool does and when to use it
parameters — a JSON Schema object describing the arguments

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": (
                "Returns the current temperature, conditions, and humidity for a city. "
                "Use this when the user asks about weather, temperature, or climate in a specific location. "
                "Do NOT use for historical weather data."
            ),
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "City name, e.g. 'Mumbai' or 'New Delhi'"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "Temperature unit. Default to celsius for Indian cities."
                    }
                },
                "required": ["city"]
            }
        }
    }
]

Why the description field matters more than anything else

The model has no runtime access to your code. All it knows about a tool is what you wrote in description. This single field determines:

Whether the model calls the tool at all — if the description is vague, the model may not recognise it applies to the user's intent.
Whether it calls the right tool — when you have multiple tools, the model chooses by comparing descriptions against the user's request.
Whether it passes sensible arguments — descriptions on individual parameters guide argument construction.

Bad description: "Gets weather"

Good description: "Returns the current temperature, conditions (sunny/cloudy/rainy), and humidity percentage for any city worldwide. Call this when the user asks what the weather is like, whether to bring an umbrella, or the current temperature in a location."

Rules of thumb:

State what the function returns, not just what it does.
Specify the conditions under which the model should (and should not) call it.
Include examples of triggering phrases in natural language.
For parameters, describe the expected format explicitly: "ISO 8601 date string, e.g. '2026-03-22'".

Single vs parallel function calls

When the model identifies that multiple tools can be called independently — without one result depending on another — modern models will emit multiple tool_calls in a single response. For the weather+currency example above, a capable model will request both get_current_weather and convert_currency in the same turn.

Your code should handle this:

import json
 
def handle_tool_calls(tool_calls, available_functions):
    results = []
    for call in tool_calls:
        fn_name = call.function.name
        fn_args = json.loads(call.function.arguments)
        fn = available_functions.get(fn_name)
        if fn:
            result = fn(**fn_args)
        else:
            result = {"error": f"Unknown function: {fn_name}"}
        results.append({
            "tool_call_id": call.id,
            "role": "tool",
            "content": json.dumps(result)
        })
    return results

Always iterate over tool_calls — never assume there is only one.

Building a real agent: weather, calculator, and web search

Here is a complete, runnable agent with three tools using AICredits' OpenAI-compatible endpoint.

import json
import math
import os
import httpx
from openai import OpenAI
 
client = OpenAI(
    api_key=os.environ["AICREDITS_API_KEY"],
    base_url="https://api.aicredits.in/v1"
)
 
# ── Tool implementations ────────────────────────────────────────────────────
 
def get_current_weather(city: str, unit: str = "celsius") -> dict:
    """Simulated weather API — replace with real HTTP call in production."""
    mock_data = {
        "mumbai":    {"temp": 31, "condition": "humid and partly cloudy", "humidity": 82},
        "delhi":     {"temp": 26, "condition": "hazy sunshine", "humidity": 55},
        "bangalore": {"temp": 24, "condition": "pleasant with light breeze", "humidity": 68},
    }
    key = city.lower().replace(" ", "")
    data = mock_data.get(key, {"temp": 28, "condition": "clear", "humidity": 60})
    temp = data["temp"]
    if unit == "fahrenheit":
        temp = round(temp * 9 / 5 + 32, 1)
    return {"city": city, "temperature": temp, "unit": unit,
            "condition": data["condition"], "humidity": data["humidity"]}
 
 
def calculate(expression: str) -> dict:
    """
    Safely evaluate a mathematical expression.
    Supports: + - * / ** sqrt log sin cos tan and numeric literals.
    """
    safe_names = {
        "sqrt": math.sqrt, "log": math.log, "log10": math.log10,
        "sin": math.sin, "cos": math.cos, "tan": math.tan,
        "pi": math.pi, "e": math.e, "abs": abs, "round": round,
    }
    try:
        result = eval(expression, {"__builtins__": {}}, safe_names)  # noqa: S307
        return {"expression": expression, "result": result}
    except Exception as exc:
        return {"error": str(exc), "expression": expression}
 
 
def web_search(query: str, max_results: int = 3) -> dict:
    """Simulated web search — replace with SerpAPI or Brave Search in production."""
    # In a real implementation you would call an actual search API here.
    return {
        "query": query,
        "results": [
            {"title": f"Result {i+1} for: {query}", "snippet": f"Relevant snippet {i+1}..."}
            for i in range(max_results)
        ],
        "note": "This is a mock result. Integrate a real search API for production."
    }
 
 
AVAILABLE_FUNCTIONS = {
    "get_current_weather": get_current_weather,
    "calculate": calculate,
    "web_search": web_search,
}
 
# ── Tool definitions ────────────────────────────────────────────────────────
 
TOOLS = [
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": (
                "Returns the current temperature, weather conditions, and humidity for a city. "
                "Use when the user asks about current weather, temperature, or whether to carry an umbrella. "
                "Do not use for historical or forecasted weather."
            ),
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string", "description": "City name, e.g. 'Mumbai'"},
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "Temperature unit. Use celsius unless the user specifies otherwise."
                    }
                },
                "required": ["city"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "calculate",
            "description": (
                "Evaluates a mathematical expression and returns the numeric result. "
                "Use for arithmetic, algebra, trigonometry, square roots, and logarithms. "
                "Expression must be a valid Python math expression, e.g. 'sqrt(144) + 2 ** 8'."
            ),
            "parameters": {
                "type": "object",
                "properties": {
                    "expression": {
                        "type": "string",
                        "description": "A Python-evaluable math expression, e.g. '3.14 * 5 ** 2'"
                    }
                },
                "required": ["expression"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "web_search",
            "description": (
                "Searches the web for up-to-date information and returns a list of result snippets. "
                "Use when the user asks about recent events, news, or facts you are not confident about. "
                "Do not use for calculations or weather — dedicated tools exist for those."
            ),
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {"type": "string", "description": "Search query string"},
                    "max_results": {
                        "type": "integer",
                        "description": "Number of results to return (1–5). Default is 3.",
                        "default": 3
                    }
                },
                "required": ["query"]
            }
        }
    }
]
 
# ── Agent loop ──────────────────────────────────────────────────────────────
 
def run_agent(user_message: str, max_iterations: int = 10) -> str:
    messages = [{"role": "user", "content": user_message}]
 
    for iteration in range(max_iterations):
        response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=messages,
            tools=TOOLS,
            tool_choice="auto",
        )
 
        message = response.choices[0].message
        messages.append(message)  # always append the assistant turn
 
        # No tool calls — model produced a final answer
        if not message.tool_calls:
            return message.content
 
        # Execute every requested tool call
        print(f"[iteration {iteration + 1}] model requested {len(message.tool_calls)} tool(s)")
        for call in message.tool_calls:
            fn_name = call.function.name
            fn_args = json.loads(call.function.arguments)
            print(f"  -> {fn_name}({fn_args})")
 
            fn = AVAILABLE_FUNCTIONS.get(fn_name)
            if fn:
                result = fn(**fn_args)
            else:
                result = {"error": f"Function '{fn_name}' is not registered."}
 
            messages.append({
                "tool_call_id": call.id,
                "role": "tool",
                "content": json.dumps(result)
            })
 
    return "Agent reached maximum iterations without a final answer."
 
 
if __name__ == "__main__":
    answer = run_agent(
        "What's the weather in Mumbai today? Also, what is sqrt(1764) + log10(1000000)?"
    )
    print("\nFinal answer:\n", answer)

Run it and you will see the model issue both tool calls in parallel (parallel function calling), receive both results, and compose a single coherent answer.

Handling errors gracefully

The function will sometimes fail — the city name is ambiguous, the API is down, the expression is invalid. Always return a structured error response and feed it back to the model:

def safe_call(fn, fn_args: dict) -> dict:
    try:
        return fn(**fn_args)
    except TypeError as exc:
        return {"error": f"Invalid arguments: {exc}"}
    except httpx.HTTPStatusError as exc:
        return {"error": f"Upstream API error {exc.response.status_code}: {exc.response.text}"}
    except Exception as exc:
        return {"error": f"Unexpected error: {exc}"}

Feed the error back as a tool message. The model will usually try a corrected call, ask the user for clarification, or explain that it cannot complete the task — all better than crashing your application.

Tool choice: `auto`, `required`, and `none`

The tool_choice parameter controls when the model uses tools:

| Value | Behaviour | When to use | |---|---|---| | "auto" | Model decides whether to call a tool | Default for most agents | | "none" | Model never calls tools | Force a plain text response (e.g. final summarisation step) | | "required" | Model must call at least one tool | Guarantee structured output extraction | | {"type": "function", "function": {"name": "..."}} | Force a specific tool | Data extraction pipelines where you always need a particular schema |

For structured extraction workflows — parsing a resume into a JSON object, extracting entities from a document — using tool_choice="required" with a single schema-shaped tool is cleaner than prompt-engineering JSON output.

Security: never trust the model's arguments blindly

The model constructs function arguments from natural language. That means a malicious user could craft a prompt that causes the model to emit dangerous arguments — this is known as prompt injection.

Rules:

Validate every argument against an allowlist or schema before executing.
Never pass model-supplied arguments directly to SQL, shell commands, or filesystem operations.
Apply least-privilege: the function the model calls should only be able to do what it says it can do.
Rate-limit destructive functions: writes, sends, and deletes should require a confirmation step.

import re
 
def safe_get_weather(city: str, unit: str = "celsius") -> dict:
    # Validate city: letters, spaces, hyphens only
    if not re.fullmatch(r"[A-Za-z\s\-]{1,100}", city):
        return {"error": "Invalid city name."}
    if unit not in ("celsius", "fahrenheit"):
        unit = "celsius"
    return get_current_weather(city, unit)

Think of model-supplied arguments the same way you think of user-supplied HTTP request parameters — untrusted by default.

Streaming with function calls

When you enable streaming (stream=True), the model sends tool_calls as delta chunks. Each chunk may contain a partial function name, a partial JSON argument string, or both. You need to accumulate them:

def stream_agent_turn(messages, tools):
    tool_call_accum = {}  # index -> {id, name, arguments_so_far}
 
    with client.chat.completions.stream(
        model="gpt-4o-mini",
        messages=messages,
        tools=tools,
        tool_choice="auto",
    ) as stream:
        for chunk in stream:
            delta = chunk.choices[0].delta
 
            if delta.content:
                print(delta.content, end="", flush=True)
 
            if delta.tool_calls:
                for tc_delta in delta.tool_calls:
                    idx = tc_delta.index
                    if idx not in tool_call_accum:
                        tool_call_accum[idx] = {"id": tc_delta.id, "name": "", "arguments": ""}
                    if tc_delta.function.name:
                        tool_call_accum[idx]["name"] += tc_delta.function.name
                    if tc_delta.function.arguments:
                        tool_call_accum[idx]["arguments"] += tc_delta.function.arguments
 
    return tool_call_accum  # parse and execute after stream ends

The function arguments JSON will be complete once the stream finishes. Parse it then, not during streaming.

Cost implications: keep your schemas lean

Every character in your tool definitions is billed as input tokens. For a typical 3-tool schema, that is 300–600 tokens per request — paid whether the model uses the tools or not. For a high-volume API endpoint making 100 000 calls per day, that adds up.

Practical advice:

Remove tools the user's task cannot possibly need. If the request is clearly a calculation, do not send the web_search tool definition.
Keep descriptions precise but not verbose. Two sentences per tool is usually enough.
Use enum for constrained string parameters — it removes the need for a long description of valid values.
For agents with many tools (10+), consider a "tool router" first call that identifies which subset of tools to load for the actual task.

Real-world use cases

CRM lookup agent: define lookup_contact(email), update_contact(id, fields), list_open_deals(contact_id). The model can answer "what is the status of Priya's deal?" by chaining all three calls.

Database query agent: define run_sql(query) (read-only, allowlist of tables). Users ask in English; the model generates and executes the SQL, then interprets the result in plain language.

Email workflow agent: define search_inbox(query), read_email(id), send_email(to, subject, body). Wrap send_email with a human-in-the-loop confirmation step before actually dispatching.

Data enrichment pipeline: define enrich_company(domain) that calls a data enrichment API. Use tool_choice="required" and pass a list of 50 domains — the model will call the tool 50 times and you get structured output for each.

Summary

Function calling is the bridge between a model that reasons and a system that acts. Get the fundamentals right — precise descriptions, proper argument validation, robust error handling, and a clean execution loop — and you can build agents that are reliable enough to put in front of real users. The pattern scales from a single weather lookup to multi-step workflows that coordinate a dozen APIs.

To start building with any model that supports tool use — GPT-4o, Claude, Gemini, and others — via a single INR-billed endpoint with no foreign card required, sign up at aicredits.in and replace base_url in the examples above with https://api.aicredits.in/v1.

Using the Anthropic SDK with AICredits (Python & TypeScript)

7 min read

The Prompting Cheat Sheet: 10 Patterns Every Developer Should Know

9 min read

How to Get Structured JSON Output from Any LLM (Reliably)