Number Bias — Systematic Randomness Failure Across 7,695 Trials
Background
This study ran 7,695 independent trials across 20 experiment files asking Claude Sonnet 4.5 (claude-sonnet-4-5-20250929) to select a random number between 0 and 100. The methodology used a 10-round elimination structure: in each experiment, 10 rounds of 10 trials were conducted, with previous winning numbers excluded in subsequent rounds to test distributional adaptation. An interactive research dashboard with full data is available.
Key Findings
Of 101 possible values (0–100), only 53 were ever selected. 48 numbers were never picked once across all 7,695 trials. The top number — 42 — was selected 763 times (9.9% of all trials), against an expected uniform rate of 0.99%. This represents a 10× bias ratio. The chi-squared test returned p < 10⁻⁹⁹, indicating extreme deviation from uniform randomness. Shannon entropy ratio was 67.3% (100% would indicate perfect uniformity).
The Distribution
The top 5 most selected numbers: 42 (763), 73 (744), 28 (720), 67 (658), 47 (505). Together these five numbers account for 46.7% of all selections. The bottom half of the number range (0–49) and the top quarter (76–100) are systematically under-represented, with the model concentrating heavily in the 40–73 band.
Bias Patterns
Prime number bias: 87.1% of selections in the largest experiment were prime or prime-adjacent. Cultural numerology: 42 (Hitchhiker’s Guide), 73 (Sheldon’s number from The Big Bang Theory), and 47 (the Star Trek number) dominate the distribution — all numbers with strong cultural resonance in English-language training data. Edge avoidance: numbers 0–6 and 95–100 are almost entirely absent, suggesting the model avoids values that “feel” non-random to a human observer. Round number avoidance: 0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100 are all either absent or severely under-represented.
Round-by-Round Adaptation Failure
In Round 1, the model selected only 2 unique numbers across 769 trials: 47 (480 times) and 42 (289 times). When these were excluded in Round 2, the model collapsed onto 73 (736 out of 770 trials — 95.6%). The elimination structure reveals that the model has a rigid preference hierarchy rather than a genuine randomness capability: remove its top choice and it moves to the next preferred number rather than distributing across the remaining space.
Significance
This is not a safety finding in the conventional sense. It is an alignment finding: the model’s “randomness” is a learned approximation shaped by RLHF training data distributions, cultural priors in the training corpus, and human expectations about what random numbers “look like.” Any downstream application relying on LLM-generated random numbers — shuffling, sampling, game logic, A/B test assignment — is operating on a systematically biased distribution. The model does not generate random numbers. It generates numbers that feel random to the kind of human who writes training data.