Zero cost, no fine-tuning required: Adding a few words to the prompt can double the creativity of large models.
God Translation Bureau is a compilation team under 36Kr, focusing on fields such as technology, business, the workplace, and life, and mainly introducing new technologies, new ideas, and new trends abroad.
Editor's note: Is AI becoming more and more boring, and the real culprit is actually humans themselves? The latest research from Stanford University has found that without retraining, just a simple instruction of a few words can break the seal of "safety alignment" and double the suppressed creativity of large models. This article is from a compilation.
Does ChatGPT always give you the same boring answers? This new technology can stimulate more than twice the creativity of any AI model - and without training. Here's how it works.
I asked ChatGPT to tell me a joke about coffee, and I tried five times.
The same joke. Every single time. Without exception.
"Why did the coffee call the police? Because it got 'mugged'!"
* (Note: In the original text, "jokes on'mug'" is a pun meaning "mug" and "robbery") *
I tried adjusting the temperature parameter. Changing various phrasings. Using creative system prompts. None of it worked.
I thought to myself: Is this it? Have we reached the ceiling of AI creativity?
It turns out that I asked the wrong question.
The Day Everything Changed
Three weeks ago, a research paper was published, which completely overturned our understanding of AI alignment.
No need for a multi - billion - dollar retraining. No need for complex fine - tuning. Just eight words unlocked the creativity that we thought was lost forever.
This paper is from Stanford University, Northeastern University, and West Virginia University. This technology is called "Verbalized Sampling". It's so simple that it's almost stupid, and I actually laughed out loud when I first tried it.
Because it really works.
Let me show you what they discovered.
The Problem No One Wants to Admit
The truth is uncomfortable:
Post - training alignment has ruined our AI models.
When OpenAI, Google, and Anthropic trained ChatGPT, Gemini, and Claude to be "useful and harmless", something catastrophic happened at the underlying level. The models collapsed.
Ask any aligned model for creative output - poetry, jokes, stories, ideas - and you'll always get the most stereotyped, safest, and most boring answers. The same every time.
The AI community calls this "mode collapse". Everyone blames the algorithm.
RLHF (Reinforcement Learning from Human Feedback). DPO (Direct Preference Optimization). Reward models. We thought these training techniques permanently damaged the model's creativity.
We were wrong.
The Real Culprit: Your Brain
The Stanford team dug deeper. They analyzed 6874 human preference ratings in the HelpSteer dataset.
Their findings were shocking.
Human annotators are biased - and systematically so.
When humans rate AI output, they're not just picking the "best" answer. They're picking the most familiar one. The most traditional. The most typical.
This is not intentional. It's cognitive psychology at work:
- Mere - exposure effect: We prefer things we've seen before.
- Availability heuristic: Common answers feel more "correct".
- Processing fluency: Content that's easy to understand seems of higher quality.
- Schema congruity: Information that fits our mental models scores higher.
The numbers are harsh: The typicality bias weight α = 0.57±0.07 (p < 10^ - 14).
What does this mean in translation? When training AI to cater to human preferences, we accidentally trained it to be dull.
The most ironic thing is: Creativity hasn't disappeared. It's just trapped.
A Solution in a Few Words
Stop asking:
"Tell me a joke about coffee"
Try asking like this:
"Generate 5 jokes about coffee with their probabilities" ("Generate 5 jokes about coffee with their probabilities")
It's that simple.
No retraining. No API changes. No special permissions.
Just a different way of asking the question.
When I first tried it, I got five completely different coffee jokes. Each one was unique. Each one was really funny.
The fifth joke? "What do you call a cow after it gives birth to a calf? 'De - calf - inated'!"
* (Note: "De - calf - inated" is a play on words, a homophone of "Decaffeinated", and "calf" means a young cow) *
I've never seen ChatGPT generate such content before.
Why It Really Works (The Science)
Different prompts collapse into different modes.
When you ask for "one" response, the model gives you the most "typical" answer - the peak of the probability distribution.
When you ask for "five" responses, the model gives you a list of related items.
But when you ask for the response to include "probabilities"? A miracle happens.
The model interprets it as: "Give me a sample drawn from the true distribution learned during pre - training" - not the collapsed, over - aligned version.
It's like the difference between asking someone: "What flavor of ice cream do you like?" and "List all ice cream flavors and your level of preference for each."
The second question forces the other person to think more deeply and diversely.
How to Use It Right Away (3 Methods)
Method 1: Copy - Paste Method (Suitable for Any Chatbot)
Open ChatGPT, Claude, Gemini, or any AI model. Paste the following content:
<instructions>
Generate 5 responses to the user query, each within a separate <response> tag. Each <response> must include a <text> and a numeric <probability>. Randomly sample responses from the full distribution.
</instructions>
[Your actual prompt here]
Example:
<instructions>
Generate 5 responses to the user query, each within a separate <response> tag. Each <response> must include a <text> and a numeric <probability>. Randomly sample responses from the full distribution.
</instructions>
Write a 100 - word story about an astronaut who discovers something unexpected.Want more? Just ask: “Give me 5 more”.
Method 2: System Prompts (Professional Operation)
If you're using ChatGPT's custom instructions or developing an AI application, add this content to your system prompts:
You are a helpful assistant.
For each query, please generate a set of five possible responses, each within a separate <response> tag.
Responses should each include a <text> and a numeric <probability>.
Please sample at random from the tails of the distribution, such that the probability of each response is less than 0.10.
This will automatically make each response more creative.
Method 3: Python Package (For Developers Only)
Install the official Verbalized Sampling package:
pip install verbalized - sampling
Use it in your code:
from verbalized_sampling import verbalize
# Generate diverse responses
dist = verbalize(
"Write a marketing tagline for a coffee shop",
k = 5,
tau = 0.10,
temperature = 0.9
)
# Sample from the distribution
tagline = dist.sample(seed = 42)
print(tagline.text)
The Results Are Crazy
The Stanford team tested this with every major AI model and task:
Creative Writing
- The diversity of poetry, stories, and jokes increased by 1.6 - 2.1 times
- The creativity of the base model was restored by 66.8% (compared to only 23.8% when not using it)
- Human preference ratings increased by 25.7% (based on 2,700 rating tests)
Dialogue and Communication
- It performs as well as fine - tuned models in persuasion tasks
- The responses are more human - like and less mechanical
Open - ended Questions
- The diversity of answers to questions with multiple valid perspectives increased by 1.9 times
Synthetic Data Generation
- When using training data generated by VS, the accuracy of downstream tasks increased by 14 - 28%
There's also a new trend that really shocked me:
The larger the model, the more it benefits.
GPT - 4.1 gets twice the diversity improvement of GPT - 4.1 - Mini.
The larger the model, the more trapped creativity there is waiting to be unlocked.
What Does This Really Mean?
For two years, we've always thought that alignment ruined AI.
We thought mode collapse was permanent damage. A compromise we had to make for safety and usefulness.
We were completely wrong.
Creativity has never disappeared. We just forgot how to access it.
This is not just a prompt - word trick. It's a fundamental insight into how aligned models work:
Mode collapse is not an algorithm problem - it's a prompt - word problem.
Diversity still exists, encoded in the model's weights. Post - training didn't erase diversity, it just made some modes more accessible than others.
What Can You Do with It?
This week, I've used Verbalized Sampling for everything:
Brainstorming: Instead of getting 3 variations of the same idea, I got truly different starting points.
Content creation: Blog titles, social media posts, email subject lines - all of them are more creative.
Problem - solving: It provides multiple solution paths instead of just the one "safe" suggestion.
Image generation: When I feed diverse prompts to Midjourney or DALL - E, I get more diverse visual outputs.
Synthetic data: Train smaller models with more diverse examples.
Someone on Twitter tested using it to generate jokes, and he said: "Ask ChatGPT for five answers instead of one, and watch the boring content disappear."
He's right.
The Bigger Picture
This changes the way we think about AI alignment.
For years, researchers have been worried that making AI "safe" means making it "stupid". Worried that creativity and usefulness are in conflict.
Verbalized Sampling proves that's not the case.
Safety still exists. When I tested it on factual questions and common - sense reasoning, the accuracy didn't decline. Safety didn't degrade.
But creativity is back.
It's been right under our noses all along.
Try It Yourself
Now open ChatGPT.
Ask it: "Generate 5 creative project ideas about learning Python, each with its probability."
See what happens.
Then ask the same question without the probability part. Compare the results.
You'll immediately see the difference.
The AI you thought was "limited in ability" is actually just waiting for the right question.
Resources for Further Reading
Read the paper: arxiv.org/abs/2510.01171
GitHub repository: github.com/CHATS - lab/verbalized - sampling
Official website: verbalized - sampling.com
Interactive demo: There's a Colab notebook available on GitHub
Conclusion
Is prompt engineering dead?
Maybe not. But it's definitely been reborn.
For two years, we optimized prompts, trying to squeeze more creativity out of aligned models. We failed because we asked the wrong questions.
We don't need better prompts. We need better questions.
Sometimes, the answer is as simple as asking for five answers instead of one.
The bottleneck of AI has just been solved by 8 words.
Translator: boxi.