In-context learning is the ability of a large language model to pick up a new task from examples in the prompt, without any weight updates. Learn how it works and how to use it.
More about In-Context Learning
In-context learning (ICL) is the ability of a large language model to learn a new task purely from examples provided inside the prompt, without any gradient updates or retraining. You place a few worked examples in front of a new input, and the model generalises from the pattern to produce the right output.
ICL was one of the defining surprises of the GPT-3 era. Earlier NLP systems had to be trained or fine-tuned for each new task. With a sufficiently large pretrained model, you could skip training entirely for many tasks by just showing the model what you wanted. That capability is the foundation of modern prompt-based workflows.
How In-Context Learning Works
A standard ICL prompt has three parts:
- A short description of the task.
- A handful of input-output example pairs.
- The actual input the model should handle.
The model processes the whole prompt in a single forward pass. It does not update its weights. It simply uses the examples to condition its next-token predictions on patterns that match the task. In that sense, "learning" is a slight misnomer: the model is not changing, it is pattern-matching on the fly.
Mechanistically, ICL appears to work because large models have absorbed so many task demonstrations during pretraining that seeing a new demonstration activates an implicit prior about what the task is. Active research is still teasing apart exactly how and why this emerges in models beyond a certain size.
In-Context Learning vs. Fine-Tuning
Both let you specialise a model, but at very different cost and flexibility levels:
- In-context learning: examples live in the prompt. Zero training time. Model weights unchanged. You can switch tasks between requests.
- Fine-tuning: examples update the model weights. Training time required. Model is now specialised. Switching tasks means using a different model.
ICL is faster to iterate on and better for rapidly changing requirements. Fine-tuning is better when you want lower per-request cost, shorter prompts, or very consistent behaviour at scale.
In-Context Learning vs. Few-Shot and Zero-Shot
The terms overlap:
- Zero-shot learning: one instruction, no examples.
- Few-shot learning: one instruction, a handful of examples.
- In-context learning: the underlying mechanism by which few-shot works in LLMs.
In practice, "few-shot" describes what you do and "in-context learning" describes why it works. Most writing treats them as synonyms.
Why In-Context Learning Matters for Chatbots
ICL is what lets teams build a capable chatbot in hours instead of weeks:
- Rapid iteration: tweak the prompt, see the change immediately. No training cycles.
- Per-customer or per-intent customisation: different examples for different use cases, all running on the same underlying model.
- Format control: show two examples of the JSON shape you want and the model almost always matches it.
- Tone control: demonstrate the voice you want rather than describing it.
Combined with retrieval augmented generation, ICL is the standard pattern for production chatbots. The retrieval step pulls domain-specific context into the prompt; ICL lets the model apply task-specific reasoning patterns you have demonstrated.
SiteSpeak leans heavily on ICL internally. The system prompt includes short examples of the tone, format, and escalation behaviour the chatbot should follow, and the model applies those patterns to every new customer question without any training on a per-customer basis.
When In-Context Learning Is Enough, and When It Is Not
ICL works well for:
- Tasks with a clear structural pattern (classification, extraction, format conversion).
- Tone and format consistency.
- Low-to-medium volume workloads where prompt token cost is acceptable.
- Use cases that change often and would otherwise require constant retraining.
It struggles with:
- Very domain-specific tasks where pretraining did not cover the vocabulary.
- Extremely high volume, where the recurring cost of long example-filled prompts adds up.
- Tasks where outputs must be perfectly consistent across millions of calls.
- Cases where demonstrating the task is harder than describing it.
If ICL plateaus and you cannot improve it with better examples or better retrieval, fine-tuning is the usual next step.
Limitations and Pitfalls
ICL has sharp edges:
- Order sensitivity: reordering examples can change outputs.
- Label imbalance: if three of your four examples are the same class, the model skews toward that class.
- Example contamination: examples that are too close to the real input can cause the model to copy rather than generalise.
- Context window pressure: long example sets eat into the space available for chat history and retrieved documents.
Building good ICL prompts is its own skill, closer to teaching than to programming.