The Prompting Cheat Sheet: 10 Patterns Every Developer Should Know

A practical reference for the prompting techniques that actually matter in production — system prompts, chain-of-thought, output schemas, few-shot examples, and more.

Author

AICredits Team

Published

10 Apr 2026

Reading time

9 min read

Why prompting still matters in 2026

Modern LLMs are capable out of the box, but the difference between a useful output and a frustrating one almost always comes down to how the prompt is structured. A well-written prompt consistently outperforms a vague one — even on the same model — and can let you use a cheaper model instead of a frontier one, saving significant cost at scale.

This cheat sheet covers the ten patterns you will reach for most often when building real applications.

Pattern 1: The role-context-task system prompt

Structure your system prompt in three parts: role (who the model is), context (what it knows about the situation), and task (what it should do).

from openai import OpenAI
 
client = OpenAI(
    base_url="https://api.aicredits.in/v1",
    api_key="sk-your-aicredits-key",
)
 
system_prompt = """
Role: You are a senior customer support agent for a SaaS product.
Context: The user has a paid plan and has been a customer for more than six months.
Task: Answer their question concisely. Always offer one follow-up action at the end.
"""
 
response = client.chat.completions.create(
    model="openai/gpt-4o-mini",
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": "Why was I charged twice this month?"},
    ],
)
print(response.choices[0].message.content)

This pattern dramatically reduces off-topic responses. The role anchors the model's persona, the context sets constraints, and the task focuses output.

Pattern 2: Explicit output format specification

Tell the model exactly how to format its output. If you need JSON, say so explicitly and state what not to include.

response = client.chat.completions.create(
    model="openai/gpt-4o-mini",
    messages=[{
        "role": "user",
        "content": (
            "Extract the name, email, and issue from this message. "
            "Return valid JSON only. No markdown, no explanation. "
            'Format: {"name": string, "email": string, "issue": string}\n\n'
            "Message: Hi I'm Priya ([email protected]) and my invoice is wrong."
        ),
    }],
)

LLMs default to verbose prose. Specific format instructions — including what NOT to include — give far more consistent outputs.

Pattern 3: Chain-of-thought for reasoning tasks

For tasks requiring reasoning, add "Think step by step" and hide intermediate reasoning in XML tags:

import re
 
system = """Think through the problem step by step inside <thinking> tags.
Then give your final answer inside <answer> tags."""
 
response = client.chat.completions.create(
    model="anthropic/claude-3-5-sonnet-20241022",
    messages=[
        {"role": "system", "content": system},
        {"role": "user", "content": "A user bought 3 items at ₹120 each with a 15% discount. Total?"},
    ],
)
 
raw = response.choices[0].message.content
answer = re.search(r"<answer>(.*?)</answer>", raw, re.DOTALL)
print(answer.group(1).strip() if answer else raw)

Pattern 4: Few-shot examples

Include 2–3 input/output examples in your system prompt to demonstrate exact behaviour. Keep examples short and representative of the most common case.

messages = [
    {"role": "system", "content": "Classify support tickets as: billing, technical, or general. Return only the label."},
    {"role": "user",      "content": "My invoice shows the wrong amount"},
    {"role": "assistant", "content": "billing"},
    {"role": "user",      "content": "The API keeps returning 429 errors"},
    {"role": "assistant", "content": "technical"},
    {"role": "user",      "content": "How do I add a team member?"},  # new input
]

Pattern 5: Constrained fill-in template

Instead of asking the model to generate freely, give it a template to complete:

response = client.chat.completions.create(
    model="openai/gpt-4o-mini",
    messages=[{
        "role": "user",
        "content": (
            'Complete this JSON. Fill values only, do not change keys:\n'
            '{"sentiment": ___, "confidence": ___, "reason": ___}\n\n'
            'Text: "Delivery was fast but packaging was damaged."'
        ),
    }],
)

Pattern 6: Negative constraints

LLMs respond well to "do not" instructions for style and format control:

Do not use bullet points.
Do not start with a greeting.
Do not repeat the question back to me.
Do not include a conclusion paragraph.

Combine with positive instructions for best results.

Pattern 7: Context injection with delimiters

Wrap external content in XML-style tags so the model distinguishes context from instructions:

system = "Answer the question using only the information inside <doc> tags."
user = f"<doc>{document_text}</doc>\n\nQuestion: {user_question}"

This prevents prompt injection and keeps the model from treating document content as instructions.

Pattern 8: Confidence and uncertainty signalling

Ask the model to signal uncertainty explicitly and make it structured:

response = client.chat.completions.create(
    model="openai/gpt-4o-mini",
    response_format={"type": "json_object"},
    messages=[{
        "role": "system",
        "content": 'Return JSON: {"answer": string|null, "confidence": "high"|"medium"|"low", "reason": string}. Set answer to null if unsure.',
    }, {
        "role": "user",
        "content": "What was the exact date of the AICredits Series A?",
    }],
)

Pattern 9: Two-step draft + self-critique

For high-stakes outputs, ask the model to critique and rewrite its own response:

draft_response = client.chat.completions.create(
    model="anthropic/claude-3-5-sonnet-20241022",
    messages=[{"role": "user", "content": "Write a refund policy for a SaaS product."}],
)
draft = draft_response.choices[0].message.content
 
final_response = client.chat.completions.create(
    model="anthropic/claude-3-5-sonnet-20241022",
    messages=[
        {"role": "user",      "content": "Write a refund policy for a SaaS product."},
        {"role": "assistant", "content": draft},
        {"role": "user",      "content": "Review your response. Fix any gaps or unclear terms. Rewrite it."},
    ],
)
print(final_response.choices[0].message.content)

Pattern 10: Token budget management

Explicitly constrain response length to reduce output token costs by 30–60%:

# Add to system prompt or user message:
# "Reply in 50 words or fewer."
# "Write no more than three sentences."
# "Give a one-line answer."
 
response = client.chat.completions.create(
    model="openai/gpt-4o-mini",
    max_tokens=100,  # hard cap as a safety net
    messages=[
        {"role": "system", "content": "Reply in 2 sentences maximum."},
        {"role": "user",   "content": "What is an LLM gateway?"},
    ],
)

Short, direct answers cost less and users prefer them in conversational interfaces.

Using the Anthropic SDK with AICredits (Python & TypeScript)

7 min read

How to Get Structured JSON Output from Any LLM (Reliably)