What Every Experimenter Must Know About Randomization

(spawn-queue.acm.org)

68 points | by underscoreF 12 hours ago ago

39 comments

$p1necone 5 hours ago

I'm no statistician, but the part about halfway through that says not to use PRNGs for random assignment into bins seems wrong to me?
Sure I can understand why for a research trial you might want just want to be totally safe and use a source of true randomness, but for all practical purposes a decent PRNG used for sorting balls into buckets is totally indistinguishable from true randomness is it not?
I was half expecting this to have been written a few decades ago when really bad PRNGs were in common usage, but the article seems to be timestamped 2025.
[-]
- $nullc 4 hours ago
  
  Plenty of previously well regarded PRNGs are distinguishable in quite surprising ways.
  Perhaps you could say otherwise for a CSPRNG.
  [-]
  - $gizmo686 4 hours ago
    
    For almost all practical purposes, a decent prng is just as good as a csprng. Cryptographic security only becomes relevant in an anverserial situation. Otherwise, you would need whatever weekmess exists in the prng to be somehow correlated with what you are trying to measure.
    [-]
    - $contubernio 30 minutes ago
      
      If practical purposes include simulating physical processes then the problems with orngs become quite important.
$gizmo686 4 hours ago

All models are wrong, some are useful.
Yes, there are theoretical issues with assuming PRNGs are truly random. However, there are also theoretical issues with assuming that Newton's law of universal gravitation is true.
I am confident is saying that more experiments have gone wrong due to not considering relativity, than have gone wrong due to the proper usage of a statistically sound (even if not cryptographically so) PRNG.
I also suspect that both classes of errors are dwarfed by improper usage or non sound PRNGs.
[-]
- $contubernio 28 minutes ago
  
  Those are qualitatively different issues. Newtonian gravity is very precise in a certain regime according to centuries of experiments. The issues with specific orngs are in many cases not even carefully analyzed.
$kqr 2 hours ago

The observation that it is impossible to ennumerate 300 bits of assignments with a 256-bit random generator is a very clever observation. Just like almost all assignments are balanced (300 vs. 295 bits), I wonder if there's a similar equipartition property argument that 256 bits will generate the vast majority of the outcomes, but my Shannon-fu is too weak to tell.
Maybe that's the problem? With 256 bits we will only get the "typical" assignments and not the edge cases which are the ones that are important for randomisation tests?
$BigTTYGothGF 10 hours ago

"If N = 300, even a 256-bit seed arbitrarily precludes all but an unknown, haphazardly selected, non-random, and infinitesimally small fraction of permissible assignments. This introduces enormous bias into the assignment process and makes total nonsense of the p-value computed by a randomization test."
The first sentence is obviously true, but I'm going to need to see some evidence for "enormous bias" and "total nonsense". Let's leave aside lousy/little/badly-seeded PRNGs. Are there any non-cryptographic examples in which a well-designed PRNG with 256 bits of well-seeded random state produces results different enough from a TRNG to be visible to a user?
[-]
- $lilyball 8 hours ago
  
  The argument against PRNGs this paper makes isn't that the PRNG produces results that can be distinguished from TRNG, but that the 256-bit seed deterministically chooses a single shuffling. If you need 300 bits to truly shuffle the assignment but you only have 256 bits, then that's a lot of potential assignments that can never actually happen. With this argument it doesn't matter what the PRNG is, the fact that it's deterministic is all that matters. And this invalidates the p-value because the p-value assumes that all possible assignments are equiprobable, when in fact a lot of possible assignments have a probability of zero.
  I imagine you could change the p-value test to randomly sample assignments generated via the exact same process that was used to generate the assignment used by the experiment, and as you run more and more iterations of this the calculated p-value should converge to the correct value, but then the question becomes is the p-value calculated this way the same as the p-value you'd get if you actually went ahead and used equiprobable assignment to begin with?
  Ultimately, this all comes down to the fact that it's not hard to use true randomness for the whole thing, and true randomness produces statistically valid results, if you use true randomness for assignment then you can't screw up the p-value test, and so there's no reason at all to even consider how to safely use a PRNG here, all that does is open the door to messing up.
  [-]
  - $gzread 8 hours ago
    
    If you have 300 bits of shuffling entropy, you have a lot of potential assignments that can never happen because you won't test them before the universe runs out. No matter how you pick them.
    Of course a PRNG generates the same sequence every time with the same seed, but that's true of every RNG, even a TRNG where the "seed" is your current space and time coordinates. To get more results from the distribution you have to use more seeds. You can't just run an RNG once, get some value, and then declare the RNG is biased towards the value you got. That's not a useful definition of bias.
    [-]
    - $jmalicki 7 hours ago
      
      The number of possible assignments has to be effectively close to an integer multiple of the number of shuffles.
      It doesn't matter how many universes it would take to generate all of them, there are some assignments that are less likely.
  - $freehorse 6 hours ago
    
    And why does it matter in the context of randomly assigning participants in an experiment into groups? It is not plausible that any theoretical "gaps" in the pseudorandomness are related to the effect you are trying to measure, and unlikely that there is a "pattern" created in how the participants get assigned. You just do one assignment. You do not need to pick a true random configuration, just one random enough.
    I assume that as long as p-values are concerned, the issue raised could very well be measured with simulations and permutations. I really doubt though that the distribution of p-values from pseudorandom assignments with gaps would not converge very fast to the "real" distribution you would get from all permuations due to some version of a law of large numbers. A lot of resampling/permutation techniques work by permuting a negligible fraction of all possible permutations, and the distribution of the statistics extracted converges pretty fast. As long as the way the gaps are formed are independent of the effects measured, it sounds implausible that the p-values one gets are problematic because of them.
  - $hansvm an hour ago
    
    P-values assume something weaker than "all assignments are equiprobable." If the subset of possible assignments is nice in the right ways (which any good PRNG will provide) then the resulting value will be approximately the same.
  - $kqr an hour ago
    
    > then you can't screw up the p-value test
    Bakker and Wicherts (2011) would like to disagree! Apparently 15 % screw up the calculation of the p-value.
- $labcomputer 7 hours ago
  
  So here's how I would think about it intuitively:
  We can create a balanced partitioning of the 300 turkeys with a 300 bit random number having an equal number of 1's and 0's.
  Now suppose I randomly pick 300 bit number, still with equal 0's and 1's, but this time the first 20 bits are always 0's and the last 20 bits are always 1's. In this scenario, only the middle 260 bits (turkeys) are randomly assigned, and the remaining 40 are deterministic.
  We can quibble over what constitutes an "enormous" bias, but the scenario above feels like an inadequate experiment design to me.
  As it happens, log2(260 choose 130) ~= 256.
  > Are there any non-cryptographic examples in which a well-designed PRNG with 256 bits of well-seeded random state produces results different enough from a TRNG to be visible to a user?
  One example that comes to mind is shuffling a deck of playing cards. You need approximately 225 bits of entropy to ensure that every possible 52 card ordering can be represented. Suppose you wanted to simulate a game of blackjack with more than one deck or some other card game with more than 58 cards. 256 bits is not enough there.
  [-]
  - $fc417fc802 6 hours ago
    
    It's an interesting observation and that's a nice example you provided but does it actually matter? Just because certain sequences can't occur doesn't necessarily mean the bias has any practical impact. It's bias in the theoretical sense but not, I would argue, in the practical sense that is actually relevant. At least it seems to me at first glance, but I would be interested to learn more if anyone thinks otherwise.
    For example. Suppose I have 2^128 unique playing cards. I randomly select 2^64 of them and place them in a deck. Someone proceeds to draw 2^8 cards from that deck, replacing and reshuffling between each draw. Does it really matter that those draws weren't technically independent with respect to the larger set? In a sense they are independent so long as you view what happened as a single instance of a procedure that has multiple phases as opposed to multiple independent instances. And in practice with a state space so much larger than the sample set the theoretical aspect simply doesn't matter one way or the other.
    We can take this even farther. Don't replace and reshuffle after each card is drawn. Since we are only drawing 2^8 of 2^64 total cards this lack of independence won't actually matter in practice. You would need to replicate the experiment a truly absurd number of times in order to notice the issue.
- $DevelopingElk 5 hours ago
  
  By the definition of a cryptographically secure PRNG, no. They, with overwhelming probability, produce results indistinguishable from truly random numbers no matter what procedure you use to tell them apart.
- $jmalicki 7 hours ago
  
  MCMC can be difficult for this reason. There are concepts like "k-dimensional equidistribution" etc. etc... where in some ways the requirements of a PRNG are far, far, higher than a cryptographically sound PRNG, but also in many ways less so, because you don't care about adversarial issues, and would prefer speed.
  If you can't generate all possible assignments, you care about second and third order properties etc. of the sequence.
  [-]
  - $coppsilgold 4 hours ago
    
    Does there exist a single MCMC example that performs poorer when fed by a CSPRNG (with any fixed seed, including all zeroes; no state reuse within the simulation) as opposed to any other RNG source?
  - $mananaysiempre 7 hours ago
    
    > There are concepts like "k-dimensional equidistribution" etc. etc... where in some ways the requirements of a PRNG are far, far, higher than a cryptographically sound PRNG
    Huh? If you can chew through however many gigabytes of the supposed CSPRNG’s output, do some statistics, and with a non-negligible probability tell if the bytes in fact came from the CSPRNG in question or an actual iid random source, then you’ve got a distinguisher and the CSPRNG is broken.
    [-]
    - $jmalicki 5 hours ago
      
      It all comes down to actual specific statistical tests, and how hard they are to break in specific applications.
      No CSPRNG is absolutely perfect, no CSPRNG has ever absolutely passed every statistical test thrown at it.
      In MCMC, it stresses very different statistical tests than the typical CSPRNG tests.
      Every PRNG is absolutely broken if you want to be absolute about it. MCMC and crypto applications push on different aspects where statistical issues will cause application level failures.
      See e.g. this paper https://www.cs.hmc.edu/tr/hmc-cs-2014-0905.pdf
      (it's not the end all be all, but it's a good survey of why this stuff matters and why it's different)
      [-]
      - $mananaysiempre 4 hours ago
        
        > no CSPRNG has ever absolutely passed every statistical test thrown at it
        As far as I know (admittedly not a high standard), there is no published statistical test that you could run on, for example, a single AES-256-CTR bitstream set up with a random key and IV, running on a single computer, that would be able to tell you with a meaningful likelihood ratio that you were looking at a pseudorandom rather than truly random input before the computer in question broke down. (I’m assuming related-key attacks are out of scope if we’re talking about an RNG for simulation purposes.)
        [-]
        
        $jmalicki 3 hours ago
        
        I am aware of tests vs. ChaCha20 here https://www.pcg-random.org/index.html, I am not aware of tests vs. AES-256-CTR.
        However at some point, 100x faster performance w/o an exploitable attack vector is also relevant! (though sometimes people find ways).
        CSPRNGs are mostly worried about very specific attack vectors, and sure, they're like to be completely unpredictable. But other applications care more about other attack vectors like lack of k-dimensional equiprobability, and that hurts them far more.
        The idea that CSPRNGs are the end all and be all of rngs holds CS back.
        [-]
        
        $tripletao 2 hours ago
        
        Are they claiming that ChaCha20 deviates measurably from equally distributed in k dimensions in tests, or just that it hasn't been proven to be equally distributed? I can't find any reference for the former, and I'd find that surprising. The latter is not surprising or meaningful, since the same structure that makes cryptanalysis difficult also makes that hard to prove or disprove.
        For emphasis, an empirically measurable deviation from k-equidistribution would be a cryptographic weakness (since it means that knowing some members of the k-tuple helps you guess the others). So that would be a strong claim requiring specific support.
        [-]
        
        $mananaysiempre an hour ago
        
        By O’Neill’s definition (§2.5.3 in the report) it’s definitely not equidistributed in higher dimensions (does not eventually go through every possible k-tuple for large but still reasonable k) simply because its state is too small for that. Yet this seems completely irrelevant, because you’d need utterly impossible amounts of compute to actually reject the hypothesis that a black-box bitstream generator (that is actually ChaCha20) has this property. (Or I assume you would, as such a test would be an immediate high-profile cryptography paper.)
        Contrary to GP’s statement, I can’t find any claims of an actual test anywhere in the PCG materials, just “k-dimensional equdistribution: no” which I’m guessing means what I’ve just said. This is, at worst, correct but a bit terse and very slightly misleading on O’Neill’s part; how GP could derive any practical consequences from it, however, I haven’t been able to understand.
        
        $coppsilgold 4 hours ago
        
        Cryptographic operations when done correctly result in full chaos within the discrete domain (the so-called avalanche effect). Any bias of any kind gives rise to a distinguisher and the primitive is regarded as broken.
        One way to imagine what symmetric cryptography does is a cellular automaton that is completely shuffled every iteration. In the case of Keccak/SHA3, that is almost exactly what happens too.
        [-]
        
        $jmalicki 4 hours ago
        
        There has never been a perfect CSPRNG.
        [-]
        
        $coppsilgold 4 hours ago
        
        > There has never been a perfect CSPRNG.
        What is a perfect CSPRNG?
        
        $jmalicki 4 hours ago
        
        "before the computer in question broke down."
        A good MCMC simulation might test that! E.g. say, training a large diffusion model. It takes way more computing power than the average time for a single computer to fail.
        Also, the standards of those tests vs. does it bias the statistical model fitted with MCMC are different.
$freehorse 7 hours ago

It reminded me an experiment where each subject was presented with a pseudorandomised sequence of trials, only that, unknown to the researchers, every time the experiment was running the same (default) seed was used, which resulted in all subjects being presented to the same "random" sequence of trials.
$vehementi 2 hours ago

I read this and came away a bit sheepish not really grasping the significance of extreme focus on PRNG and entropy for basic things. Glad to see the rest of the comments agreeing. "What every experimenter must know"...
$Tomte 11 hours ago

Starts interesting, then veers into the usual "true random number" bullshit. Use radioactive decay as source of your random numbers!
[-]
- $amelius 10 hours ago
  
  How do we know it's truly random?
  [-]
  - $kqr an hour ago
    
    We don't. We know with great certainty it's either random or needs non-local variables, which would let information travel faster than c, among other things. (This is a consistent result of the Bell test.)
    Most prefer to believe in randomness.
  - $ChadNauseam 9 hours ago
    
    The only known explanation of what's going on in quantum mechanics is a multiversal one^[1]. Using radioactive decay of an atom as an example: there are an uncountably infinite number of universes that are initially "fungible" (identical in every way), and over time the universes gradually differentiate themselves with the atom going from a non-decayed to decayed state, at different times in each universe. But you will be in all of those universes. So if you thought the atom would decay in, let's say 5 seconds, there would be some universes where you were right and some where you were wrong. That makes it impossible to ever make reliable specific predictions about when the atom will decay. So, in practice that just looks like perfect randomness.
    ^[1]: There are other interpretations, of course. And those other interpretations are equally explanatory. But they do not claim to be explanations of what is actually happening to unobserved quantum particles. There is also Bohmian mechanics, but I don't know how many people take it seriously.
- $zeroxfe 10 hours ago
  
  > usual "true random number" bullshit
  What's bullshit about it? This is how TRNGs in security enclaves work. They collect entropy from the environment, and use that to continuously reseed a PRNG, which generates bits.
  If you're talking "true" in the philosophical sense, that doesn't exist -- the whole concept of randomness relies on an oracle.
  [-]
  - $wavemode 9 hours ago
    
    What PRNGs lack compared to TRNGs is security (i.e. preventing someone from being able to use past values to predict future values). It's not that they somehow produce statistically invalid results (e.g. they generate 3s more often than 2s or something). Unless they're very poorly constructed.
    [-]
    - $refsys 9 hours ago
      
      Maybe people have bad memories from linear congruential generators, these could go really bad (https://en.wikipedia.org/wiki/Marsaglia%27s_theorem)
  - $wtallis 10 hours ago
    
    I don't think hardware random number generators are bullshit, but it's easy to overstate their importance. Outside of cryptography, there aren't a whole lot of cases that truly require that much care in how random numbers are generated. For the kind of examples the article opens with (web page A/B testing, clinical trials, etc.) you'll never have sample sizes large enough to justify worrying about the difference between a half-decent PRNG and a "true" random number generator.