Problem One: An Off-By-One Error
Astute programmers will have noticed that the algorithm in question contains an off-by-one error. The algorithm is supposed to traverse the initial deck while swapping each card with any other card. Unlike most Pascal functions, the function Random(n) actually returns a number between 0 and n-1 instead of a number between 1 and n. The algorithm uses the following snippet of code to choose which card to swap with the current card: <random_number := random(51)+1;>. The formula sets random_number to a value between 1 and 51. In short, the algorithm in question never chooses to swap the current card with the last card. When ctr finally reaches the last card, 52, that card is swapped with any other card except itself. That means this shuffling algorithm never allows the 52nd card to end up in the 52nd place. This is an obvious, but easily correctable, violation of fairness.
Problem Two: Bad Distribution Of ShufflesA closer examination of the shuffling algorithm reveals that, regardless of the off-by-one problem, it doesn't return an even distribution of decks. The basic algorithm at the heart of the shuffle is shown in Figure 2.
A closer examination of the algorithm reveals that, regardless of the off-by-one error, it doesn't return an even distribution of shuffles. That is, some shuffles are more likely to be produced than others are. This uneven distribution can be leveraged into an advantage if a tipped-off player is willing to sit at the table long enough.
To illustrate this problem using a small example, we'll shuffle a deck consisting of only three cards (i.e, n=3) using the algorithm described above.
Figure 2: How not to shuffle cards
for (i is 1 to n) Swap i with random position between 1 and n
Figure 2 contains the algorithm we used to shuffle our deck of three cards, and also depicts the tree of all possible decks using this shuffling algorithm. If our random number source is a good one, then each leaf on the tree in Figure 2 has an equal probability of being produced.
Given even this small example, you can see that the algorithm does not produce shuffles with equal probability. It will produce the decks 231, 213, and 132 more often than the decks 312, 321, 123. If you were betting on the first card and you knew about these probabilities, you would know that card 2 is more likely to appear than any other card. The uneven probabilities become increasingly exaggerated as the number of cards in the deck increase. When a full deck of 52 cards is shuffled using the algorithm listed above (n=52), the unequal distribution of decks skews the probabilities of certain hands and changes the betting odds. Experienced poker players (who play the odds as a normal course of business) can take advantage of the skewed probabilities.
Figure 3: How to shuffle cards
for (i is 1 to 3) Swap i with random position between i and 3
Figure 3 provides a much better shuffling algorithm. The crucial difference between the two algorithms is that number of possible swap positions decreases as you progress through the deck. Once again, we show a tree illustrating this algorithm on our sample deck of three cards. The change between this new algorithm and the one used by ASF is that each card i is swapped with a card from the range [i, n], not [1, n]. This reduces the number of leaves from the 3^3 = 27 given by the bad algorithm listed above to 3! = 6. The change is important because the n! number of unique leaves means that the new shuffling algorithm generates each possible deck only once. Notice that each possible shuffle is produced once and only once so that each deck has an equal probability of occurring. Now that's fair!
The first set of software flaws we discussed merely changes the probabilities that certain cards will come up. The associated skews can be used by a clever gambler to gain an edge, but the flaws really don't constitute a complete break in the system. By contrast, the third flaw, which we explain in this section, is a doozy that allows online poker to be completely compromised. A short tutorial on pseudo-random number generators sets the stage for the rest of our story.
Suppose we want to generate a random number between 1 and 52, where every number has an equal probability of appearing. Ideally, we would generate a value on the range from 0 to 1 where every value will occur with equal probability, regardless of the previous value, then multiply that value by 52. Note that there are an infinite number of values between 0 and 1. Also note that computers do not offer infinite precision!
In order to program a computer to do something like the algorithm presented above, a pseudo-random number generator typically produces an integer on the range from 0 to N and returns that number divided by N. The resulting number is always between 0 and 1. Subsequent calls to the generator take the integer result from the first run and pass it through a function to produce a new integer between 0 and N, then return the new integer divided by N. This means the number of unique values returned by any pseudo-random number generator is limited by number of integers between 0 and N. In most common random number generators, N is 2^32 (approximately 4 billion) which is the largest value that will fit into a 32-bit number. Put another way, there are at most 4 billion possible values produced by this sort of number generator. To tip our hand a bit, this 4 billion number is not all that large.
A number known as the seed is provided to a pseudo-random generator as an initial integer to pass through the function. The seed is used to get the ball rolling. Notice that there is nothing unpredictable about the output of a pseudo-random generator. Each value returned by a pseudo-random number generator is completely determined by the previous value it returned (and ultimately, the seed that started it all). If we know the integer used to compute any one value then we know every subsequent value returned from the generator.
The pseudo-random number generator distributed with Borland compilers makes a good example and is reproduced in Figure 4. If we know that the current value of RandSeed is 12345, then the next integer produced will be 1655067934 and the value returned will be 20. The same thing happens every time (which should not be surprising to anyone since computers are completely deterministic).