
LLM Function Calling: Build AI Agents That Actually Do Things
Function calling turns passive LLMs into active agents that can fetch data, call APIs, and trigger workflows — here's how to do it right.
Author
AICredits Team
Published
22 Mar 2026
Reading time
11 min read
From chatbot to agent
A plain LLM is a text-in, text-out system. It reasons well, but it cannot fetch your database row, check a live weather feed, or send an email. Function calling (also called "tool use") closes that gap. You describe a set of functions to the model, and when the model decides it needs one, it returns a structured JSON object naming the function and its arguments. Your code executes the real function, feeds the result back into the conversation, and the model continues. That loop — describe, invoke, observe, respond — is what turns a passive chatbot into an agent that actually does things.
This guide covers every layer of that loop: how to define tools, how to write descriptions that actually work, how to build a full agent in Python, how to handle errors, and how to avoid the security traps that catch developers off-guard.
The execution loop
Before writing any code, understand the control flow:
- User message arrives — "What's the weather in Mumbai and convert 500 USD to INR?"
- First LLM call — you send the message plus your tool definitions. The model does not call the function itself. It returns a
tool_callsobject naming the function(s) and the arguments it wants to pass. - Your code executes the real function(s) with those arguments.
- You append the result(s) to the conversation as
role: "tool"messages. - Second LLM call — the model sees the results and generates a natural-language answer.
Steps 3–5 can repeat. A multi-step agent loops until the model stops requesting tools and produces a final answer.
Defining tools: the JSON schema format
Tools are described using a JSON Schema subset. Every tool has three fields:
name— a snake_case identifier the model will use in itstool_callsoutputdescription— a natural-language explanation of what the tool does and when to use itparameters— a JSON Schemaobjectdescribing the arguments
tools = [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": (
"Returns the current temperature, conditions, and humidity for a city. "
"Use this when the user asks about weather, temperature, or climate in a specific location. "
"Do NOT use for historical weather data."
),
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "City name, e.g. 'Mumbai' or 'New Delhi'"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature unit. Default to celsius for Indian cities."
}
},
"required": ["city"]
}
}
}
]Why the description field matters more than anything else
The model has no runtime access to your code. All it knows about a tool is what you wrote in description. This single field determines:
- Whether the model calls the tool at all — if the description is vague, the model may not recognise it applies to the user's intent.
- Whether it calls the right tool — when you have multiple tools, the model chooses by comparing descriptions against the user's request.
- Whether it passes sensible arguments — descriptions on individual parameters guide argument construction.
Bad description: "Gets weather"
Good description: "Returns the current temperature, conditions (sunny/cloudy/rainy), and humidity percentage for any city worldwide. Call this when the user asks what the weather is like, whether to bring an umbrella, or the current temperature in a location."
Rules of thumb:
- State what the function returns, not just what it does.
- Specify the conditions under which the model should (and should not) call it.
- Include examples of triggering phrases in natural language.
- For parameters, describe the expected format explicitly:
"ISO 8601 date string, e.g. '2026-03-22'".
Single vs parallel function calls
When the model identifies that multiple tools can be called independently — without one result depending on another — modern models will emit multiple tool_calls in a single response. For the weather+currency example above, a capable model will request both get_current_weather and convert_currency in the same turn.
Your code should handle this:
import json
def handle_tool_calls(tool_calls, available_functions):
results = []
for call in tool_calls:
fn_name = call.function.name
fn_args = json.loads(call.function.arguments)
fn = available_functions.get(fn_name)
if fn:
result = fn(**fn_args)
else:
result = {"error": f"Unknown function: {fn_name}"}
results.append({
"tool_call_id": call.id,
"role": "tool",
"content": json.dumps(result)
})
return resultsAlways iterate over tool_calls — never assume there is only one.
Building a real agent: weather, calculator, and web search
Here is a complete, runnable agent with three tools using AICredits' OpenAI-compatible endpoint.
import json
import math
import os
import httpx
from openai import OpenAI
client = OpenAI(
api_key=os.environ["AICREDITS_API_KEY"],
base_url="https://api.aicredits.in/v1"
)
# ── Tool implementations ────────────────────────────────────────────────────
def get_current_weather(city: str, unit: str = "celsius") -> dict:
"""Simulated weather API — replace with real HTTP call in production."""
mock_data = {
"mumbai": {"temp": 31, "condition": "humid and partly cloudy", "humidity": 82},
"delhi": {"temp": 26, "condition": "hazy sunshine", "humidity": 55},
"bangalore": {"temp": 24, "condition": "pleasant with light breeze", "humidity": 68},
}
key = city.lower().replace(" ", "")
data = mock_data.get(key, {"temp": 28, "condition": "clear", "humidity": 60})
temp = data["temp"]
if unit == "fahrenheit":
temp = round(temp * 9 / 5 + 32, 1)
return {"city": city, "temperature": temp, "unit": unit,
"condition": data["condition"], "humidity": data["humidity"]}
def calculate(expression: str) -> dict:
"""
Safely evaluate a mathematical expression.
Supports: + - * / ** sqrt log sin cos tan and numeric literals.
"""
safe_names = {
"sqrt": math.sqrt, "log": math.log, "log10": math.log10,
"sin": math.sin, "cos": math.cos, "tan": math.tan,
"pi": math.pi, "e": math.e, "abs": abs, "round": round,
}
try:
result = eval(expression, {"__builtins__": {}}, safe_names) # noqa: S307
return {"expression": expression, "result": result}
except Exception as exc:
return {"error": str(exc), "expression": expression}
def web_search(query: str, max_results: int = 3) -> dict:
"""Simulated web search — replace with SerpAPI or Brave Search in production."""
# In a real implementation you would call an actual search API here.
return {
"query": query,
"results": [
{"title": f"Result {i+1} for: {query}", "snippet": f"Relevant snippet {i+1}..."}
for i in range(max_results)
],
"note": "This is a mock result. Integrate a real search API for production."
}
AVAILABLE_FUNCTIONS = {
"get_current_weather": get_current_weather,
"calculate": calculate,
"web_search": web_search,
}
# ── Tool definitions ────────────────────────────────────────────────────────
TOOLS = [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": (
"Returns the current temperature, weather conditions, and humidity for a city. "
"Use when the user asks about current weather, temperature, or whether to carry an umbrella. "
"Do not use for historical or forecasted weather."
),
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name, e.g. 'Mumbai'"},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature unit. Use celsius unless the user specifies otherwise."
}
},
"required": ["city"]
}
}
},
{
"type": "function",
"function": {
"name": "calculate",
"description": (
"Evaluates a mathematical expression and returns the numeric result. "
"Use for arithmetic, algebra, trigonometry, square roots, and logarithms. "
"Expression must be a valid Python math expression, e.g. 'sqrt(144) + 2 ** 8'."
),
"parameters": {
"type": "object",
"properties": {
"expression": {
"type": "string",
"description": "A Python-evaluable math expression, e.g. '3.14 * 5 ** 2'"
}
},
"required": ["expression"]
}
}
},
{
"type": "function",
"function": {
"name": "web_search",
"description": (
"Searches the web for up-to-date information and returns a list of result snippets. "
"Use when the user asks about recent events, news, or facts you are not confident about. "
"Do not use for calculations or weather — dedicated tools exist for those."
),
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "Search query string"},
"max_results": {
"type": "integer",
"description": "Number of results to return (1–5). Default is 3.",
"default": 3
}
},
"required": ["query"]
}
}
}
]
# ── Agent loop ──────────────────────────────────────────────────────────────
def run_agent(user_message: str, max_iterations: int = 10) -> str:
messages = [{"role": "user", "content": user_message}]
for iteration in range(max_iterations):
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
tools=TOOLS,
tool_choice="auto",
)
message = response.choices[0].message
messages.append(message) # always append the assistant turn
# No tool calls — model produced a final answer
if not message.tool_calls:
return message.content
# Execute every requested tool call
print(f"[iteration {iteration + 1}] model requested {len(message.tool_calls)} tool(s)")
for call in message.tool_calls:
fn_name = call.function.name
fn_args = json.loads(call.function.arguments)
print(f" -> {fn_name}({fn_args})")
fn = AVAILABLE_FUNCTIONS.get(fn_name)
if fn:
result = fn(**fn_args)
else:
result = {"error": f"Function '{fn_name}' is not registered."}
messages.append({
"tool_call_id": call.id,
"role": "tool",
"content": json.dumps(result)
})
return "Agent reached maximum iterations without a final answer."
if __name__ == "__main__":
answer = run_agent(
"What's the weather in Mumbai today? Also, what is sqrt(1764) + log10(1000000)?"
)
print("\nFinal answer:\n", answer)Run it and you will see the model issue both tool calls in parallel (parallel function calling), receive both results, and compose a single coherent answer.
Handling errors gracefully
The function will sometimes fail — the city name is ambiguous, the API is down, the expression is invalid. Always return a structured error response and feed it back to the model:
def safe_call(fn, fn_args: dict) -> dict:
try:
return fn(**fn_args)
except TypeError as exc:
return {"error": f"Invalid arguments: {exc}"}
except httpx.HTTPStatusError as exc:
return {"error": f"Upstream API error {exc.response.status_code}: {exc.response.text}"}
except Exception as exc:
return {"error": f"Unexpected error: {exc}"}Feed the error back as a tool message. The model will usually try a corrected call, ask the user for clarification, or explain that it cannot complete the task — all better than crashing your application.
Tool choice: auto, required, and none
The tool_choice parameter controls when the model uses tools:
| Value | Behaviour | When to use |
|---|---|---|
| "auto" | Model decides whether to call a tool | Default for most agents |
| "none" | Model never calls tools | Force a plain text response (e.g. final summarisation step) |
| "required" | Model must call at least one tool | Guarantee structured output extraction |
| {"type": "function", "function": {"name": "..."}} | Force a specific tool | Data extraction pipelines where you always need a particular schema |
For structured extraction workflows — parsing a resume into a JSON object, extracting entities from a document — using tool_choice="required" with a single schema-shaped tool is cleaner than prompt-engineering JSON output.
Security: never trust the model's arguments blindly
The model constructs function arguments from natural language. That means a malicious user could craft a prompt that causes the model to emit dangerous arguments — this is known as prompt injection.
Rules:
- Validate every argument against an allowlist or schema before executing.
- Never pass model-supplied arguments directly to SQL, shell commands, or filesystem operations.
- Apply least-privilege: the function the model calls should only be able to do what it says it can do.
- Rate-limit destructive functions: writes, sends, and deletes should require a confirmation step.
import re
def safe_get_weather(city: str, unit: str = "celsius") -> dict:
# Validate city: letters, spaces, hyphens only
if not re.fullmatch(r"[A-Za-z\s\-]{1,100}", city):
return {"error": "Invalid city name."}
if unit not in ("celsius", "fahrenheit"):
unit = "celsius"
return get_current_weather(city, unit)Think of model-supplied arguments the same way you think of user-supplied HTTP request parameters — untrusted by default.
Streaming with function calls
When you enable streaming (stream=True), the model sends tool_calls as delta chunks. Each chunk may contain a partial function name, a partial JSON argument string, or both. You need to accumulate them:
def stream_agent_turn(messages, tools):
tool_call_accum = {} # index -> {id, name, arguments_so_far}
with client.chat.completions.stream(
model="gpt-4o-mini",
messages=messages,
tools=tools,
tool_choice="auto",
) as stream:
for chunk in stream:
delta = chunk.choices[0].delta
if delta.content:
print(delta.content, end="", flush=True)
if delta.tool_calls:
for tc_delta in delta.tool_calls:
idx = tc_delta.index
if idx not in tool_call_accum:
tool_call_accum[idx] = {"id": tc_delta.id, "name": "", "arguments": ""}
if tc_delta.function.name:
tool_call_accum[idx]["name"] += tc_delta.function.name
if tc_delta.function.arguments:
tool_call_accum[idx]["arguments"] += tc_delta.function.arguments
return tool_call_accum # parse and execute after stream endsThe function arguments JSON will be complete once the stream finishes. Parse it then, not during streaming.
Cost implications: keep your schemas lean
Every character in your tool definitions is billed as input tokens. For a typical 3-tool schema, that is 300–600 tokens per request — paid whether the model uses the tools or not. For a high-volume API endpoint making 100 000 calls per day, that adds up.
Practical advice:
- Remove tools the user's task cannot possibly need. If the request is clearly a calculation, do not send the
web_searchtool definition. - Keep descriptions precise but not verbose. Two sentences per tool is usually enough.
- Use
enumfor constrained string parameters — it removes the need for a long description of valid values. - For agents with many tools (10+), consider a "tool router" first call that identifies which subset of tools to load for the actual task.
Real-world use cases
CRM lookup agent: define lookup_contact(email), update_contact(id, fields), list_open_deals(contact_id). The model can answer "what is the status of Priya's deal?" by chaining all three calls.
Database query agent: define run_sql(query) (read-only, allowlist of tables). Users ask in English; the model generates and executes the SQL, then interprets the result in plain language.
Email workflow agent: define search_inbox(query), read_email(id), send_email(to, subject, body). Wrap send_email with a human-in-the-loop confirmation step before actually dispatching.
Data enrichment pipeline: define enrich_company(domain) that calls a data enrichment API. Use tool_choice="required" and pass a list of 50 domains — the model will call the tool 50 times and you get structured output for each.
Summary
Function calling is the bridge between a model that reasons and a system that acts. Get the fundamentals right — precise descriptions, proper argument validation, robust error handling, and a clean execution loop — and you can build agents that are reliable enough to put in front of real users. The pattern scales from a single weather lookup to multi-step workflows that coordinate a dozen APIs.
To start building with any model that supports tool use — GPT-4o, Claude, Gemini, and others — via a single INR-billed endpoint with no foreign card required, sign up at aicredits.in and replace base_url in the examples above with https://api.aicredits.in/v1.
Related Articles
Continue in Docs
Need implementation commands and endpoint details? Go to quickstart or API reference.