========================================================================= ANALYSIS OF WESTERN CANADA RETAILER LOTTERY WINS [For the CBC National News, January 2009.] by Jeffrey S. Rosenthal (Dr. Rosenthal is a professor in the Department of Statistics at the University of Toronto, and the author of "Struck by Lightning: The Curious World of Probabilities". His web site is probability.ca.) *** INTRODUCTION AND SUMMARY: We consider major Western Canada major lottery wins, to see whether or not lottery retailers have won more prizes than could reasonably have arisen by pure chance alone. We consider three different groups of prizes: (I) $10,000+ prizes during the period Nov 1 2003 - Oct 31 2006; (II) $10,000+ prizes during the period June 1 2007 - Sept 30 2008; (III) $1000+ prizes during the period June 1 2007 - Sept 30 2008. Using the fairest available assumptions, we conclude: - The 67 retailer wins in group (I) are significantly too many to have arisen by pure chance alone, and this conclusion is quite strong and quite robust to changes in assumptions. - The 30 retailer wins in group (II) are again somewhat too high to have arisen by pure chance alone; in this case the statistical evidence is clear although not as overwhelming or robust as for group (I). - The 265 retailer wins of group (III) are again too high to have arisen by pure chance alone. The statistical evidence for this is again clear, in fact slightly stronger than for group (II), but again not as overwhelming or robust as for group (I). - The conclusions for individual provinces and territories indicate that for group (I), the number of retailer wins for AB and SK and MB are all too high; for group (II), MB is too high; and for group (III), MB and NN are too high. In fact, for group (II), once MB is eliminated, the number of retailer wins in the other four regions combined is not excessive. These conclusions provide convincing statistical evidence that the retailers are winning more major lottery wins than can be reasonably explained by pure chance alone. *** KNOWN DATA (see Source list at the end): Total regional adult population = 4,314,000. [S1, pp. 64, 74] Total regional lottery-selling outlets = 4058. [S2] Regional number of total major wins (subtracting off those of winners residing outside the WCLC region), and retailer major wins: (I) $10,000+, Nov 1 2003 - Oct 31 2006: 1610-24=1586, 67 [S1, p. 74] (II) $10,000+, June 1 2007 - Sept 30 2008: 794, 30 = sum of: $10,000+, April 1 2008 - Sept 30 2008: 325, 11 [S2] plus: $10,000+, June 1 2007 - March 31 2008: 469, 19 [S2] (III) $1000+, June 1 2007 - Sept 30 2008: 9388, 265 = sum of: $1000+, April 1 2008 - Sept 30 2008: 3940-48=3892, 96 [S2] plus: $1000+, June 1 2007 - March 31 2008: 5527-31=5496, 169 [S2] *** AVERAGE NUMBER OF RETAILERS PER LOTTERY-SELLING OUTLET: This figure is somewhat less certain. However, the WCLC in [S2] determined the figure 9391 / 780 = 12.04, based on a detailed Saskatchewan pilot project. This figure appears to be carefully determined. It is somewhat larger than certain other estimates, which makes it more fair to the retailers. So, we use the figure 12.04 in all our calculations. (Note: here and throughout, "retailers" means ALL lottery-selling retail store owners and employees.) *** GAMING INTENSITY RATIO: This is the ratio of the average lottery spending by retailers compared to that of the general adult population (including non-participants). This figure is also somewhat uncertain. A fifth estate survey (September 2006) estimated this ratio to be 1.5, and a detailed Corporate Research Associates study (commissioned by the ALC in November 2006) found a very similar figure of 1.52. Meanwhile, a Research Dimensions document commissioned by the OLG in October 2006 reported a somewhat larger ratio, 1.9, and that figure was then cited and used in [S1]. In our calculations, we consider both the 1.52 figure (which appears to be the most carefully determined), and also the 1.9 figure (since it is more fair to the retailers, and originated with the OLG, and was used in [S1]). *** STATISTICAL METHODOLOGY USED: Our method proceeds by first computing: Expected number of retailer wins = (total number of wins) * (total number of retail outlets) * (average number of employees per outlet) * (gaming intensity ratio) / (total adult population) Then, the factor by which the actual number of retailer wins exceeds the expected number is given by: factor = actual / expected. More importantly, the probability of observing the actual number or more of retailer wins, given the expected number, is found from the right-hand tail of a Poisson distribution, computed using the "R" command: ppois(actual-1, expected, lower.tail=FALSE) If this probability is very small (i.e. less than 0.05, or even better less than 0.01), then that provides convincing statistical evidence against the hypothesis that the actual number of retailer wins occurred by pure chance alone. We also consider the "robustness" of our result, i.e. how much larger the expected number could be, while still leaving the above probability less than 0.01. The more robust the result, the more certain that the conclusion cannot be "explained away" by incorrect assumptions or other influences. *** RESULTS (I) -- $10,000+ WINNERS, Nov 1 2003 - Oct 31 2006: -- Using the gaming intensity ratio of 1.52: Expected number = 1586 * 4058 * 12.04 * 1.52 / 4314000 = 27.30 Factor by which actual exceeds expected: 67 / 27.30 = 2.45 Probability of observing 67 or more: ppois(66, 27.30, lower.tail=FALSE) = 1.06 x 10^(-10) or about one chance in ten billion. -- Using the gaming intensity ratio of 1.9: Expected number = 1586 * 4058 * 12.04 * 1.9 / 4314000 = 34.13 Factor by which actual exceeds expected: 67 / 34.13 = 1.96 Probability of observing 67 or more: ppois(66, 34.13, lower.tail=FALSE) = 4.29 x 10^(-7) which less than one chance in 2.3 million. Robustness: even with the higher gaming intensity ratio (1.9), and even if the expected number of such retailer wins were actually 44% higher than we computed, i.e. if it changed from 34.13 to 34.13 * 1.44 = 49.15, the probability of observing 67 or more would still be less than 1%. Conclusion: the 67 retailer wins of $10,000+ during this period are significantly too many to have arisen by pure chance alone. *** RESULTS (II) -- $10,000+ WINNERS, June 1 2007 - Sept 30 2008: -- Using the gaming intensity ratio of 1.52: Expected number = 794 * 4058 * 12.04 * 1.52 / 4314000 = 13.67 Factor by which actual exceeds expected: 30 / 13.67 = 2.19 Probability of observing 30 or more: ppois(29, 13.67, lower.tail=FALSE) = 9.1 x 10^(-5) which is about one chance in 11,000. -- Using the gaming intensity ratio of 1.9: Expected number = 794 * 4058 * 12.04 * 1.9 / 4314000 = 17.09 Factor by which actual exceeds expected: 30 / 17.09 = 1.76 Probability of observing 30 or more: ppois(29, 17.09, lower.tail=FALSE) = 0.00294 which is about one chance in 340. Robustness: even with the higher gaming intensity ratio (1.9), and even if the expected number of such retailer wins were actually 9% higher than we computed, i.e. if it changed from 17.09 to 17.09 * 1.09 = 18.62, the probability of observing 30 or more would still be less than 1%. Conclusion: the 30 retailer wins are again too high to have arisen by pure chance alone, and the statistical evidence is clear although not as overwhelming or robust as for the longer time period of group (I). *** RESULTS (III) -- $1000+ WINNERS, June 1 2007 - Sept 30 2008: -- Using the gaming intensity ratio of 1.52: Expected number = 9388 * 4058 * 12.04 * 1.52 / 4314000 = 161.6 Factor by which actual exceeds expected: 265 / 161.6 = 1.64 Probability of observing 265 or more: ppois(264, 161.6, lower.tail=FALSE) = 5.95 x 10^(-14) or less than one chance in 16 trillion. -- Using the gaming intensity ratio of 1.9: Expected number = 9388 * 4058 * 12.04 * 1.9 / 4314000 = 202.0 Factor by which actual exceeds expected: 265 / 202.0 = 1.31 Probability of observing 265 or more: ppois(264, 202.0, lower.tail=FALSE) = 1.294 x 10^(-5) or less than one chance in 77,000. Robustness: even with the higher gaming intensity ratio (1.9), and even if the expected number of such retailer wins were actually 13% higher than we computed, i.e. if it changed from 202.0 to 202.0 * 1.13 = 228.28, the probability of observing 265 or more would still be less than 1%. Conclusion: the 265 retailer wins are again too high to have arisen by pure chance alone. The statistical evidence for this is again clear, in fact slightly stronger than for the concurrent $10,000+ prizes, but again not as overwhelming or robust as for the longer 2003-2006 time period. *** PROVINCIAL/TERRITORIAL BREAKDOWN: For completeness, we also consider each of the provinces and territories separately. In each case, we list the following values in order: "name": the two-letter name of the province or territory, except "NN" means "Northwest Territories and Nunavut (combined)"; "pop": the total regional adult population (from [S1]); "outlets": the number of regional retail outlets (from [S2]); "total": the total number of lottery wins in the specified region for the specified group (from [S1], [S2]); "actual": the actual number of those wins by retailers (from [S1], [S2]); "expected": the expected number of retailer wins, computed as above; "factor": the factor by which actual exceeds expected, computed as above; "probability": the probability of observing "actual" or more wins by retailers by pure chance alone, computed as above. PROV/TER RESULTS (I): $10,000+, Nov 1 2003 - Oct 31 2006: [data: S1, p. 74] name; pop; outlets; total; actual; expected; factor; probability; -- Using the gaming intensity ratio of 1.52: AB; 2595000; 2392; 915; 33; 15.43530; 2.137957; 6.823999e-05; SK; 749000; 780; 275; 15; 5.241017; 2.86204; 0.0003677701; MB; 898000; 837; 358; 16; 6.106639; 2.620099; 0.0006120801; NN; 48000; 20; 22; 0; 0.1677573; 0; 1; YT; 24000; 29; 16; 3; 0.3538155; 8.478996; 0.0056751; -- Using the gaming intensity ratio of 1.9: AB; 2595000; 2392; 915; 33; 19.29412; 1.710366; 0.002803098; SK; 749000; 780; 275; 15; 6.551271; 2.289632; 0.003174625; MB; 898000; 837; 358; 16; 7.633298; 2.096079; 0.005413604; NN; 48000; 20; 22; 0; 0.2096967; 0; 1; YT; 24000; 29; 16; 3; 0.4422693; 6.783197; 0.01038687; Conclusion: the number of retailer wins for AB and SK and MB are all too high, and those of YT are borderline, while NN is fine. PROV/TER RESULTS (II): $10,000+, June 1 2007 - Sept 30 2008: [data: S2] name; pop; outlets; total; actual; expected; factor; probability; -- Using the gaming intensity ratio of 1.52: AB; 2595000; 2392; 478; 14; 8.063466; 1.736226; 0.03609787; SK; 749000; 780; 118; 2; 2.248873; 0.8893345; 0.6571851; MB; 898000; 837; 173; 13; 2.950973; 4.405326; 1.363198e-05; NN; 48000; 20; 17; 1; 0.1296307; 7.714224; 0.1215802; YT; 24000; 29; 8; 0; 0.1769077; 0; 1; -- Using the gaming intensity ratio of 1.9: AB; 2595000; 2392; 478; 14; 10.07933; 1.388981; 0.1413882; SK; 749000; 780; 118; 2; 2.811091; 0.7114676; 0.7708035; MB; 898000; 837; 173; 13; 3.688717; 3.524261; 0.0001266384; NN; 48000; 20; 17; 1; 0.1620383; 6.171379; 0.1495914; YT; 24000; 29; 8; 0; 0.2211347; 0; 1; Conclusion: the number of retailer wins for MB are too high, while all the others are fine when considered individually. For this group, since MB is the most problematic, we also consider the numbers when we eliminate MB and study the other four regions combined: pop = 2595000 + 749000 + 48000 + 24000 = 3416000 outlets = 2392 + 780 + 20 + 29 = 3221 total = 478 + 118 + 17 + 8 = 621 actual = 14 + 2 + 1 + 0 = 17 expected = 621 * 3221 * 12.04 * 1.9 / 3416000 = 13.40 factor = 17 / 13.40 = 1.27 probability = ppois(16, 13.40, lower.tail=FALSE) = 0.1945538 or about 19.5%, not particularly small at all. Conclusion: for group (II), once MB is eliminated, the number of retailer wins for the other regions combined is *not* particularly excessive (at least when using a gaming intensity ratio of 1.9). So, for this group, the problem is largely confined to MB alone. PROV/TER RESULTS (III): $1000+, June 1 2007 - Sept 30 2008: [data: S2] name; pop; outlets; total; actual; expected; factor; probability; -- Using the gaming intensity ratio of 1.52: AB; 2595000; 2392; 5365; 133; 90.50313; 1.469562; 1.718556e-05; SK; 749000; 780; 1578; 46; 30.07391; 1.529565; 0.004131552; MB; 898000; 837; 2252; 74; 38.41383; 1.926390; 2.267545e-07; NN; 48000; 20; 122; 11; 0.9302907; 11.82426; 4.835532e-09; YT; 24000; 29; 71; 1; 1.570056; 0.6369199; 0.7919665; -- Using the gaming intensity ratio of 1.9: AB; 2595000; 2392; 5365; 133; 113.1289; 1.17565; 0.03689428; SK; 749000; 780; 1578; 46; 37.59238; 1.223652; 0.1012544; MB; 898000; 837; 2252; 74; 48.01728; 1.541112; 0.0003011532; NN; 48000; 20; 122; 11; 1.162863; 9.45941; 4.555422e-08; YT; 24000; 29; 71; 1; 1.96257; 0.5095359; 0.8595031; Conclusion: the number of retailer wins for MB and NN appear to be too high, and AB is borderline, while SK and YT are fine. We again eliminate MB and study the other four regions combined: pop = 2595000 + 749000 + 48000 + 24000 = 3416000 outlets = 2392 + 780 + 20 + 29 = 3221 total = 5365 + 1578 + 122 + 71 = 7136 actual = 133 + 46 + 11 + 1 = 191 expected = 7136 * 3221 * 12.04 * 1.9 / 3416000 = 153.92 factor = 191 / 153.92 = 1.24 probability = ppois(190, 153.92, lower.tail=FALSE) = 0.00215 or about one chance in 465, still very small. Or eliminate MB and NN, and study the other three regions combined: pop = 2595000 + 749000 + 24000 = 3368000 outlets = 2392 + 780 + 29 = 3201 total = 5365 + 1578 + 71 = 7014 actual = 133 + 46 + 1 = 180 expected = 7014 * 3201 * 12.04 * 1.9 / 3368000 = 152.50 factor = 180 / 152.50 = 1.18 probability = ppois(179, 152.50, lower.tail=FALSE) = 0.0162 or about 1.6%, small but not overwhelmingly small. Conclusion: for group (III), even if MB is eliminated, the number of retailer wins for the other four regions combined is still excessive. However, if MB and NN are both eliminated, then the other three regions combined are only borderline excessive. So, the problem for this group is "mostly but not entirely" confined to MB and NN. *** ADDITIONAL COMMENTS: - There is huge variation in the number of retailers per outlet depending on the TYPE of outlet. For example, an OLG document in 2006 assumed 4 retailers per Independent Convenience store, but 40 retailers per Supermarket. (By contrast, [S2] assumes for simplicity an equal 12 employees per outlet of all different types, which is surely inaccurate.) If WCLC itemised the retailer major lottery wins by type of retail outlet, then our analysis could be made much more precise. - The number of retail OWNERS is probably known precisely, or at least can be estimated much more accurately than the total number of retailers. So, if WCLC itemised the retailer lottery wins by owners versus employees, then our analysis could again be made much more precise. - It is conceivable that retailers have greater tendency to select those lottery games with higher probability of major prizes, and that this could partially "explain" their winning more major prizes than expected. However, there is virtually no evidence to support this theory. In any case, such effects would probably be rather small, and are probably more than balanced out by our using generous assumptions (e.g. the figures of 12.04 and 1.9) and considering the robustness. - It seems quite plausible that retailers actually won even more major prizes than reported, since lists of retailers were not kept, so retailer winners were required to self-report and some may have gone undetected. - Although the study [S1] is generally well-conducted and very useful, it apparently mis-interpreted the Research Dimensions' gaming intensity ratio of 1.9 as representing the ratio of lottery spending by retailers compared to the SUBSET of the adult population that plays lotteries (which they take to be 75% of the total adult population), rather than to the ENTIRE adult population (which is what Research Dimensions actually reported). So, [S1] effectively used a gaming intensity ratio of 1.9 / 0.75 = 2.53, which was too large. They also used (for their "high end" comparison) a previous WCLC estimate of 15 retail employees per outlet, but we now know (from the later Saskatchewan study) that 12.04 is more accurate. - The statistical results for individual provinces are somewhat less reliable, since some assumptions (e.g. number of retailers per outlet) may differ for certain regions (especially the territories), and also the statistical significance is somewhat diminished when we go "hunting" for conclusions in many different ways. Still, the results provide an indication of where problems may lie. Also, while the smaller population in certain regions makes the "factor" (actual / expected) more variable, this is taken into account appropriately by the "probability" calculations, so a very small probability is still significant. *** SOURCES CITED: [S1] Ernst & Young report for WCLC, October 2007. [S2] WCLC document of 2008-12-08. =========================================================================