How Much to Lose With Bad Lottery Numbers

Reading Time: 3 minutes

The Idea

I think most people have played lotteries such as 6aus49 or Eurojackpot (if you’re in Germany or anywhere in Europe) at least once in their lives. Some even play on a regular basis even though gambling will lose them money in the long run, at least, statistically speaking. Despite the fact, lotteries still tend to entice people into accepting small losses for the opportunity of large gains.

In this article, I want to present some results of often-cited strategic recommendations for playing lotteries that are, at the same time, rarely quantified. There are at least two types. First, there are rules that promise to help select numbers which are more likely to win. For example, check out this article by tz.de where three such rules are explained. However, as far as I know, this doesn’t actually help and is therefore not my focus.

The second type, which is also mentioned in the above article, would be to avoid popular numbers. Basically, the goal is not to alter the probability of winning but instead to improve the expected profit, if a win occurs. Typically, it is suggested to stay away from numbers associated with dates or special patterns. For example, dates consist mainly of smaller numbers from 1 to 12 for the months and 1 to 31 for the days. In the following, I check to what extent this rule can impact the expected profit.

The Implementation

The data used here is from the regional Lotto provider Sachsenlotto. The site offers a data set covering lottery numbers from 1955 to 2017, which includes overall stakes and profit per prize category. On 4th May 2013, the official rules changed (there are no longer additional numbers, there were new prize categories: 2 correct numbers + correct super number, etc.). For that reason, I focus on the recent period from 2014 to 2017 where a total of 418 lotteries were played. The following table presents some summary statistics of the prizes per category.


Category Mean SD Missing Obs.
6 correct with super number 9,020,644 7,166,456 2
6 correct 1,075,450 2,470,764 9
5 correct with super number 11,627 5,039 0
5 correct 3,719 1,320 0
4 correct with super number 200.3 56.78 0
4 correct 44.09 10.77 0
3 correct with super number 21.33 4.48 0
3 correct 10.56 1.7 0

To be clear, the main point here is to analyze the impact of the drawn lottery numbers on the expected profit within the 8 prize categories above. The lottery 6aus49 actually consists of 9 prize categories. However, I have omitted the category “2 correct with super number” because the prize is constantly 5 €. First, the following variable is defined:

    \[CNT^{i,j} = \sum_{k=0}^{6} \mathbf{I}(n_k \geq i) \mathbf{I}(n_k \leq j),\]

where n_1,\dots, n_6 represents the drawn lottery numbers and hence CNT^{i,j} simply counts the number of numbers between i and j. Note that \mathbf{I} denotes the indicator function. Two special cases of this, namely CNT^{1,12} and CNT^{1,31} will be used to explain the price P_c per category c using OLS as follows:

    \[ P_{c,t} = \alpha \, CNT^{i,j}_{t} + \beta \, STAKE_t + \gamma \, D_{SA,t} + \delta + \varepsilon_t \]

STAKE is the overall stake in the corresponding draw and D_{SA} is a dummy indicating whether it is Saturdays.

The Results

To simplify things, I am only presenting the results of “5 correct numbers and super number”, as shown in the following two tables.

Numbers between 1 and 12:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 16233 1501 10.81 3.613e-24 * * *
CNT^{1,12} -1551 229.4 -6.763 4.623e-11 * * *
STAKE -9.449e-05 5.871e-05 -1.609 0.1083
D_{SA} 2328 1589 1.465 0.1436
Numbers between 1 and 31:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 19033 1803 10.55 3.198e-23 * * *
CNT^{1,31} -1043 213.6 -4.883 1.498e-06 * * *
STAKE -9.105e-05 6.018e-05 -1.513 0.131
D_{SA} 2046 1628 1.257 0.2095

Firstly, the results indicate clearly that there is a highly significant effect of CNT^{i,j} on expected profit; the fact that people frequently use popular numbers is supported by the data. Next, and further supporting the intuition behind the idea, the effect is stronger for CNT^{1,12} than for CNT^{1,31}. This is intuitive because smaller numbers are more likely to represent a date and hence are more often chosen. Thirdly, the effect seems relatively large as well. Finally, the average prize in this category is 11626.62 €. According to the regression model, this average is decreased by 1551.44 €, if an additional number below 12 is contained in the winning set, which translate to a 13.34 % difference. Note that the variable STAKE (not surprisingly) has no impact. And if this wasn’t the case, then deciding which day to play on (Wednesday or Saturday) would make a difference because people play more on Saturdays. For a clear picture, see the following plot where a clear Wed/Sat pattern is observable.

Print Friendly, PDF & Email