Understanding Randomness in LLMs: Why ChatGPT Often Picks 73, 42, or 79

Schreibe einen Kommentar / Algorithm

Inhaltsverzeichnis

The Important Detail Is the Clean Session
Humans Prefer “Interesting” Randomness
Why Different Languages Drift Toward Different Numbers
The Explanation Afterward Is the Real Story
Why This Tiny Experiment Matters More Than It Looks
Opinionated Insight

Over the last few days, a small AI experiment has been circulating across LinkedIn and Reddit: open a completely fresh ChatGPT session, ask for a random number between 1 and 100, and observe what happens. Surprisingly often, the answer is 73. Sometimes it is 42. In English sessions, 79 also appears more often than true randomness would suggest.

At first glance, this looks like harmless internet trivia. Another small ritual in the growing collection of „look what AI did“ screenshots. But technically, it is a useful demonstration of how Large Language Models handle randomness, context, probability, and explanation.

The important part is not that an LLM sometimes chooses 73. The important part is why this behavior feels random to humans while not being mathematically random at all.

A Large Language Model is not secretly rolling a perfect digital die unless it has been explicitly connected to a real random number generator. In normal text generation, it predicts the next plausible token based on training data, conversational context, sampling parameters, and learned human patterns. When users ask for a random number, the model often produces something that looks psychologically random rather than something drawn from a uniform distribution.

That difference sounds small. It is not.

It explains why a clean session behaves differently from a context-rich conversation, why certain languages drift toward different numbers, and why the model can produce a confident explanation after the fact that sounds more intentional than the selection process probably was.

The Important Detail Is the Clean Session

This experiment becomes easiest to observe when the conversation starts without prior context because the model has very little to anchor itself to besides its broad training distribution. In that situation, the system falls back much more strongly toward patterns that appeared statistically “convincing” across enormous amounts of human-written text. That is why the same handful of numbers suddenly appears surprisingly often across completely unrelated users.

Once a conversation accumulates history, however, the underlying mechanics do not change at all. The model is still doing probabilistic next-token prediction. What changes is the probability landscape itself. Context continuously reshapes which outputs appear most plausible for the current interaction.

For example, a conversation with a mathematician, a gamer, a finance professional, or somebody discussing science fiction may all drift toward different numbers because different associations become statistically more relevant. A user heavily discussing Douglas Adams, probability theory, engineering culture, or gaming mechanics may increase the likelihood of numbers such as 42, 73, or other culturally loaded values appearing. The numbers adapt to context, but the mechanism behind the selection remains identical.

That distinction matters because many people intuitively assume the clean-session experiment somehow reveals a “default favorite number” hidden inside the model. In reality, it reveals something more fundamental: the model is not performing randomness in the mathematical sense. It is continuously predicting what humans are statistically likely to perceive as plausible, fitting, or convincing within the current conversational state.

The clean-session setup merely strips away most contextual noise, making the underlying statistical tendencies much easier to notice.

This is also why the behavior is not perfectly deterministic. The model still samples probabilistically across candidate tokens, so variation exists. But the outputs remain stable enough that thousands of independent users repeatedly encounter the same numbers. That consistency would be highly unusual for true randomness, but it makes perfect sense for a probability-driven language system trained on human-generated patterns.

And this leads directly to the core misunderstanding behind the entire experiment: humans are actually not particularly good at recognizing or generating randomness themselves. We strongly associate randomness with irregularity, memorability, and the absence of obvious structure. Large Language Models learn exactly those human biases from the data they are trained on. So when people ask for a “random” number, the model often generates something that looks psychologically random to humans rather than something mathematically uniform.

Humans Prefer “Interesting” Randomness

If you ask people to invent random numbers manually, they rarely choose 50, because it feels somehow lazy and not creative enough.

They also tend to avoid numbers close to the boundaries such as 1, 2, 99, or 100 because humans subconsciously think randomness should avoid obvious patterns. Round numbers feel „too intentional“, while edge values somehow feel „unlikely“, despite all numbers having identical probabilities in a true random distribution.

Instead, humans drift toward numbers that feel irregular enough to appear authentic. Numbers like 73 or 79 work extremely well psychologically because they sound arbitrary while still carrying enough structure to feel memorable.

And that is precisely the kind of pattern an LLM absorbs during training.

People often describe Large Language Models as reasoning systems, but under the hood they are still prediction engines trained on enormous amounts of human-produced text. Every generated token is essentially the result of continuously answering one statistical question:

„Given all previous context, what is the most plausible next token?“

That mechanism works exceptionally well for language because language itself is deeply pattern-driven. Randomness, however, is almost the opposite of language. True randomness is wonderfully boring. Human attempts at randomness are full of recognizable biases, and the model learns those biases extremely well.

So when somebody asks:

„Give me a random number between 1 and 100.“

the model is often not approximating mathematical randomness at all. Instead, it is approximating what humans culturally associate with the appearance of randomness.

That is why the outputs feel strangely human.

Why Different Languages Drift Toward Different Numbers

The language differences are probably the most fascinating part of the entire experiment.

German sessions frequently drift toward 73. English sessions very often produce either 42 or 79. That is not because the model suddenly switches mathematical logic depending on the language. The explanation is much more grounded and much less mystical.

Training distributions differ massively across languages because internet culture differs massively across languages.

In English-speaking technical culture, 42 is almost impossible to avoid because of The Hitchhiker’s Guide to the Galaxy. The phrase „the answer to life, the universe, and everything“ became deeply embedded into developer humor, engineering discussions, gaming culture, and online communities over decades.

That means the number 42 carries unusually high semantic weight in English-language datasets. The moment a conversation feels slightly playful, philosophical, or nerd-adjacent, the statistical attractiveness of 42 increases dramatically.

The reason for 73 behaves slightly differently. Its popularity is partially reinforced by The Big Bang Theory and the famous Sheldon Cooper monologue about why 73 is supposedly the „best number“. Completely unnecessary mathematical trivia suddenly became internet folklore:

73 is the 21st prime number
37 mirrored becomes 73
21 mirrored becomes 12
37 is the 12th prime number

Objectively useless information. Which is exactly why the internet remembered it forever.

Over time, 73 accumulated a strange identity in technical and nerd-heavy discussions. It feels mathematically quirky without looking artificially selected. From a probabilistic language perspective, it became an extremely effective candidate for “convincing fake randomness.”

And finally 79 is even more interesting because it lacks the same mainstream cultural anchor. It behaves more like an optimized compromise number. High enough to feel intentional, far enough from boundaries to avoid suspicion, and uncommon enough to appear genuinely arbitrary. Ironically, it looks exactly like the kind of number humans invent when trying very hard to look random.

Which means it fits the training distribution perfectly.

The Explanation Afterward Is the Real Story

The truly important observation is not the number selection itself. The explanation afterward tells us far more about how modern AI systems behave.

When users ask:

„Why did you choose this number?“

the model immediately generates a coherent narrative around the result. It explains human bias, memorability, cultural references, or perceived randomness characteristics. The explanation often sounds surprisingly intelligent because LLMs are exceptionally good at generating internally consistent language.

But consistency is not the same thing as transparency.

The model did not first conduct a rigorous mathematical analysis and then independently derive the number through logical reasoning. In most cases the process works roughly the other way around. A statistically plausible output gets generated first, and afterward the system constructs a narrative that humans would perceive as convincing and coherent.

That sounds subtle, but it is one of the most important concepts people need to understand about generative AI.

Humans are naturally tempted to interpret fluent explanations as evidence of deep understanding or intentional thought processes. In reality, Large Language Models are often performing highly sophisticated forms of probabilistic pattern reconstruction. The generated explanation answers a different question entirely:

„What explanation would humans likely accept as reasonable for this answer?“

Most of the time the model is extremely good at solving exactly that problem.

Why This Tiny Experiment Matters More Than It Looks

At first glance this entire discussion feels like harmless AI trivia. Funny LinkedIn experiment. Slightly nerdy observation. Interesting conversation starter for a coffee break at a conference.

But underneath it sits a surprisingly clean demonstration of how modern AI systems actually behave in production environments.

Large Language Models are not databases, not calculators, not reasoning engines in the classical sense, and definitely not objective truth machines. They are systems shaped by training distributions, cultural patterns, reinforcement tuning, conversational context, and statistical probabilities across billions of human-generated examples.

That does not make them useless. Quite the opposite. These systems are extraordinarily powerful exactly because human communication itself is highly structured and pattern-based.

But it does mean we should stop projecting magical properties into outputs that are often better explained through probability distributions than through intentional reasoning.

Ironically, a tiny „pick a random number“ experiment explains this more effectively than many highly academic AI discussions do.

Because once you notice what the model is actually optimizing for, you stop seeing randomness.

You start seeing compressed human behavior patterns reflected back at you through probability.

Opinionated Insight

The number 73 is not the problem. The problem starts one second later, when the model explains the choice with the confidence of someone who apparently had a strategy all along.

That is where people get fooled. Not because the explanation is bad, but because it is usually good enough to pass the first smell test. The model may say that 73 feels less obvious, more irregular, or more random-looking. That may even be plausible. But plausible is not the same as causal.

This is where many AI discussions go off the rails. People ask a model why it did something, receive a polished explanation, and treat it as if the system had opened the black box and handed over the audit log. It did not. In many cases, the model is generating a convincing explanation for an answer that already exists.

That is not transparency. That is courtroom performance, and sometimes the performance is excellent.

The same pattern does not stop at toy prompts about random numbers. It shows up exactly where companies are now trying to put generative AI into production workflows. Why was this customer flagged? Why was this invoice classified as suspicious? Why did this document land in that category? Why did the agent trigger this workflow?

If the architecture has proper observability, traceability, retrieval grounding, deterministic decision points, and external state, those questions can be answered properly. But if the system is just an LLM wrapped in a nice interface, the answer may still sound excellent while being operationally useless.

That is the part many AI demos conveniently skip. A demo loves fluency, but production needs accountability. A demo can survive a beautiful explanation, but production needs evidence, lineage, state, timestamps, inputs, thresholds, and reproducibility.

This is why I get nervous when people casually call LLMs reasoning engines without adding a few heavy asterisks. Yes, they can reason in useful ways when the task is framed properly. But they are also exceptionally good at producing language that resembles reasoning. Those two capabilities overlap just enough to be powerful and just enough to be dangerous.

So no, this experiment is not just a party trick. It is a small warning label attached to a much bigger machine. Fluency is not transparency, coherence is not causality, and confidence is not evidence. If your production AI system cannot show the difference, then the random number is the least random thing in the room.

Schreibe einen Kommentar Antwort abbrechen