Abstract
The pattern of first significant digit
from collected numbers follows the Benford’s law. The first significant digit
of runs made by several famous batsmen from various countries in one-day
cricket is analyzed. Based upon this, the characteristics of the batsmen’s game
plan are analyzed. For all the Batsmen, duck-outs
are not considered.
1.
Introduction
The first significant digit of collected numbers doesn’t
follow uniform distribution as we expected. According to Astronomer and
Mathematician Siman Newcomb “the first pages of tables of logarithms wear out
much faster than the last ones”.The law that follows for first significant
digit is
P
[first significant digit = d] = log10(1+(1/d)) , d=1,2,…,9.
Dr. Frank
Benford (1938), a physicist, working for General Electric in 1930’s, worked
independently on the significant digits of naturally collected numbers. He
collected several data sets, such as, the area of rivers, American league
baseball statistics, numbers appearing in Reader’s Digest, death rates and
atomic weights of elements and invented that Benford’s law fits well to these
data sets.
Interestingly,
M. Nigrini (1996) applied Benford’s law to the tax returns data and found the
fraudulent data. Ley (1996) extended Benford’s law applications to stack market
indexes. Theodore P.Hill (1995) provided a statistical derivation of the
Benford’s law.
We
collected one-day cricket runs made by famous one-day batsmen in world cricket
and applied Benford’s law to the first significant digit excluding duck-outs.
The statistical significance of uniform distribution to the significant digit
in the units place is also tested.
Section2
explains the Benford’s law in detail. Section 3 consists of description of one-day
cricket. Section 4 gives the applications of significant digits on runs made by
several famous one-day batsmen and section 5 makes the conclusions.
The data
sets are available from http://www.howstat.com.au/.
2. Benford’s law
The first significant digit is
distributed in the set {1,2,…,9} as P[first significant digit = d] = log10((d+1)/d)
, d= 1,2,…,9 . i.e., in any collected numbers, the digit 1 occurs 30.1 % [ log102
= 0.301 ] times, but the digit 9 occurs 4.6% [log2(10/9) = 0.046]
times.
The first
significant digit of 40 European countries that is in square kilometers (P.M.
Lee, 1989) follows Benford’s law. The following table gives the details.
Digit
|
1 2 3 4 5 6 7 8 9
|
True data
|
25 17.5 15 15 7.5 5 5 2.5 7.5
|
Benford’s law
|
30.103 17.609 12.494 9.691 7.978 6.695 5.799 5.115 4.576
|
Table 2.1
The
quantities that are measured may vary. Instead of square kilometers, square
miles may be considered. Because of the scale invariant property, Benford’s law
is still applicable to the changed data. If the considered data set is
converted from one base (suppose base is 10) to another base (base 100),
Benford’s law is applicable i.e., Benford’s law satisfies base invariant
properties. About the detailed discussion of scale and base invariant
properties, reader can have a look in to the publication of Theodore P. Hill
(1995)
3. One-day Cricket
Cricket was
born in England. Because of Great Britain’s rule in many countries in 18thand
19th centuries, cricket is also spread in to colonial states. People
are used to play cricket for 5 days, which is called Test cricket. To increase
the enthusiasm in the viewers, a limited sort of game playing for a day is
introduced. It is called one-day cricket. Cricket is very much famous in
Australia, Newzeland, England, Indian sub continent, South Africa and West
Indian Islands.
To learn more about this game, the URL http://encyclopedia.thefreedictionary.com/One-day%20cricketmay
be helpful.
4. Application
Runs made by several batsmen from
various countries are collected. The test statistics of Chi-square goodness of
fit test that follows Benford’s law and its p-value for first significant digit
are given in table 4.1.
Player
|
Chi-square test for Benford’s law on first significant
digit
|
|
Test statistic
P-value
|
||
Michael G Bevan
|
14.90506693
|
0.061017873
|
Andrew Flower
|
12.41756102
|
0.133523039
|
Jacques H Kallis
|
7.746278365
|
0.458638789
|
Desmond L Haynes
|
6.475067741
|
0.594174459
|
Rahul S Dravid
|
6.446699507
|
0.597325635
|
Stephen R Waugh
|
6.223625986
|
0.622198182
|
Adam C Gilchrist
|
5.901606244
|
0.658252569
|
Ricky T Ponting
|
5.858942527
|
0.663028872
|
Inzamam-Ul-Haq
|
5.779427034
|
0.671923685
|
Saurav C Ganguly
|
4.257584281
|
0.833167603
|
Pinnaduwage A De Silva
|
4.209530898
|
0.837741214
|
Mohammad Azharuddin
|
3.534059895
|
0.896530246
|
Mark E Waugh
|
2.993049155
|
0.934792991
|
Brian S Lara
|
2.885163313
|
0.941356018
|
Sachin R Tendulkar
|
2.285404993
|
0.97098821
|
Allan R Border
|
1.823578373
|
0.985950882
|
Herschelle H Gibbs
|
1.477624181
|
0.993073213
|
Sanath T Jayasurya
|
1.318275438
|
0.995330207
|
Table 4.1
By observing the table 4.1, it is confirmed that all the
batsmen from the list are playing according to Benford’s law, where the test
statistic is considered for 95% confidence interval.
Table 4.2
For Michael G Bevan, values that are higher than 14.91 would be expected to occur about 6.1 % of the time, where as for Sanath T Jayasurya , values that are higher than the test statistic 1.318 would be expected to occur about 99.5% of the time. The test statistics is low and high for these two players respectively. The true data for these two players are in Table 4.2.
Digit
|
1
|
2
|
3
|
4
|
5
|
6
|
7
|
8
|
9
|
Bevan’s true data
|
23.037
|
13.089
|
19.895
|
13.613
|
11.518
|
5.759
|
7.853
|
4.712
|
0.524
|
Jayasurya's true data
|
32.5
|
15.833
|
10.833
|
10.833
|
7.5
|
7.5
|
3.75
|
5.833
|
5.417
|
Benford’s law data
|
30.103
|
17.609
|
12.494
|
9.691
|
7.978
|
6.695
|
5.799
|
5.115
|
4.576
|
The significant digit of one-day
runs of Andrew
Flower and Michael G Bevan tried to avoid Benford’s law, but are unable i.e.,
Both of these players played well in most of the games and they tried to
decrease the effect of 1’s in the first significant digit of their runs. This
reveals that these two players tried to be around their average in every match.
The P-values for these two players are 13.4% and 6.1% respectively that are
least compared to others. In the case of Sanath T. Jayasurya p-value is (highest)
99.5%. In the case of Sachin R Tendulkar, the p-value is 97%. Since, he made
more than 30 centuries, which has 1 in its significant digit may be showing
some effect.
5. References:
[1] Benford, F. (1938), “The Law of Anamolous
Numbers,”Proceedings of the American Philosophical Society, 78, 551-572.
[2] Theodorw P.Hill. (1995), ”A Statistical Derivation of
the Significant-Digit Law,” Statistical Science, 86, 4, 354-363.
[3 ] Ley, E. (1996), “On the Peculiar Distribution of the
U.S.Stock Indices Digits,” The American Statistician, 50, 311-313.
[4]
Nigrini, M. (1996), “ A Taxpayer Compliance Application of Benford’s law,”
Journal of the American Taxation Association, 18, 72-91.
No comments:
Post a Comment