
The Prompting Cheat Sheet: 10 Patterns Every Developer Should Know
A practical reference for the prompting techniques that actually matter in production — system prompts, chain-of-thought, output schemas, few-shot examples, and more.
Author
AICredits Team
Published
10 Apr 2026
Reading time
9 min read
Why prompting still matters in 2026
Modern LLMs are capable out of the box, but the difference between a useful output and a frustrating one almost always comes down to how the prompt is structured. A well-written prompt consistently outperforms a vague one — even on the same model — and can let you use a cheaper model instead of a frontier one, saving significant cost at scale.
This cheat sheet covers the ten patterns you will reach for most often when building real applications.
Pattern 1: The role-context-task system prompt
Structure your system prompt in three parts: role (who the model is), context (what it knows about the situation), and task (what it should do).
from openai import OpenAI
client = OpenAI(
base_url="https://api.aicredits.in/v1",
api_key="sk-your-aicredits-key",
)
system_prompt = """
Role: You are a senior customer support agent for a SaaS product.
Context: The user has a paid plan and has been a customer for more than six months.
Task: Answer their question concisely. Always offer one follow-up action at the end.
"""
response = client.chat.completions.create(
model="openai/gpt-4o-mini",
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": "Why was I charged twice this month?"},
],
)
print(response.choices[0].message.content)This pattern dramatically reduces off-topic responses. The role anchors the model's persona, the context sets constraints, and the task focuses output.
Pattern 2: Explicit output format specification
Tell the model exactly how to format its output. If you need JSON, say so explicitly and state what not to include.
response = client.chat.completions.create(
model="openai/gpt-4o-mini",
messages=[{
"role": "user",
"content": (
"Extract the name, email, and issue from this message. "
"Return valid JSON only. No markdown, no explanation. "
'Format: {"name": string, "email": string, "issue": string}\n\n'
"Message: Hi I'm Priya ([email protected]) and my invoice is wrong."
),
}],
)LLMs default to verbose prose. Specific format instructions — including what NOT to include — give far more consistent outputs.
Pattern 3: Chain-of-thought for reasoning tasks
For tasks requiring reasoning, add "Think step by step" and hide intermediate reasoning in XML tags:
import re
system = """Think through the problem step by step inside <thinking> tags.
Then give your final answer inside <answer> tags."""
response = client.chat.completions.create(
model="anthropic/claude-3-5-sonnet-20241022",
messages=[
{"role": "system", "content": system},
{"role": "user", "content": "A user bought 3 items at ₹120 each with a 15% discount. Total?"},
],
)
raw = response.choices[0].message.content
answer = re.search(r"<answer>(.*?)</answer>", raw, re.DOTALL)
print(answer.group(1).strip() if answer else raw)Pattern 4: Few-shot examples
Include 2–3 input/output examples in your system prompt to demonstrate exact behaviour. Keep examples short and representative of the most common case.
messages = [
{"role": "system", "content": "Classify support tickets as: billing, technical, or general. Return only the label."},
{"role": "user", "content": "My invoice shows the wrong amount"},
{"role": "assistant", "content": "billing"},
{"role": "user", "content": "The API keeps returning 429 errors"},
{"role": "assistant", "content": "technical"},
{"role": "user", "content": "How do I add a team member?"}, # new input
]Pattern 5: Constrained fill-in template
Instead of asking the model to generate freely, give it a template to complete:
response = client.chat.completions.create(
model="openai/gpt-4o-mini",
messages=[{
"role": "user",
"content": (
'Complete this JSON. Fill values only, do not change keys:\n'
'{"sentiment": ___, "confidence": ___, "reason": ___}\n\n'
'Text: "Delivery was fast but packaging was damaged."'
),
}],
)Pattern 6: Negative constraints
LLMs respond well to "do not" instructions for style and format control:
Do not use bullet points.
Do not start with a greeting.
Do not repeat the question back to me.
Do not include a conclusion paragraph.
Combine with positive instructions for best results.
Pattern 7: Context injection with delimiters
Wrap external content in XML-style tags so the model distinguishes context from instructions:
system = "Answer the question using only the information inside <doc> tags."
user = f"<doc>{document_text}</doc>\n\nQuestion: {user_question}"This prevents prompt injection and keeps the model from treating document content as instructions.
Pattern 8: Confidence and uncertainty signalling
Ask the model to signal uncertainty explicitly and make it structured:
response = client.chat.completions.create(
model="openai/gpt-4o-mini",
response_format={"type": "json_object"},
messages=[{
"role": "system",
"content": 'Return JSON: {"answer": string|null, "confidence": "high"|"medium"|"low", "reason": string}. Set answer to null if unsure.',
}, {
"role": "user",
"content": "What was the exact date of the AICredits Series A?",
}],
)Pattern 9: Two-step draft + self-critique
For high-stakes outputs, ask the model to critique and rewrite its own response:
draft_response = client.chat.completions.create(
model="anthropic/claude-3-5-sonnet-20241022",
messages=[{"role": "user", "content": "Write a refund policy for a SaaS product."}],
)
draft = draft_response.choices[0].message.content
final_response = client.chat.completions.create(
model="anthropic/claude-3-5-sonnet-20241022",
messages=[
{"role": "user", "content": "Write a refund policy for a SaaS product."},
{"role": "assistant", "content": draft},
{"role": "user", "content": "Review your response. Fix any gaps or unclear terms. Rewrite it."},
],
)
print(final_response.choices[0].message.content)Pattern 10: Token budget management
Explicitly constrain response length to reduce output token costs by 30–60%:
# Add to system prompt or user message:
# "Reply in 50 words or fewer."
# "Write no more than three sentences."
# "Give a one-line answer."
response = client.chat.completions.create(
model="openai/gpt-4o-mini",
max_tokens=100, # hard cap as a safety net
messages=[
{"role": "system", "content": "Reply in 2 sentences maximum."},
{"role": "user", "content": "What is an LLM gateway?"},
],
)Short, direct answers cost less and users prefer them in conversational interfaces.
Related Articles
Continue in Docs
Need implementation commands and endpoint details? Go to quickstart or API reference.