Collecting entropy from device_attach() times.
Mariusz Gromada
mariusz.gromada at gmail.com
Mon Sep 24 21:57:06 UTC 2012
W dniu 2012-09-23 17:17, Pawel Jakub Dawidek pisze:
> On Sun, Sep 23, 2012 at 02:37:48AM +0200, Mariusz Gromada wrote:
>> W dniu 2012-09-22 21:53, Pawel Jakub Dawidek pisze:
>>> Mariusz, can you confirm my findings?
>>
>> Pawel,
>>
>> Your conclusions can be easily confirmed by shape analysis of the EDF.
>> Usually maximum quantile difference (called D-statistic) gives you a
>> kind of overview, function shape gives you a strong feeling, p-value
>> gives you a formal proof.
>> D-statistic values (your data):
>>
>> 6bit: 0.33%
>> 7bit: 0.29%
>> 8bit: 0.27%
>> 9bit: 0.21%
>> 10bit: 6.34%
>> 11bit: 19.07%
>> 12bit: 54.80%
>>
>> What I would say: increasing the number of bits from 6 to 9 does not
>> affect distribution "uniformity", reaching the tenth bit results in
>> sudden increase in the difference measure - the more bits, the more
>> difference is observed. Distribution shape analysis for the 10th bit
>> shows non-linear function. Lack of "randomness" in the quntile
>> difference curve - chart shows completely lack of noise (pure
>> functional relation). These are very strong indicators that starting
>> from 10th bit distribution was changed and is no longer uniform.
>>
>> To formally confirm above conclusion for i.e. 5% significance level,
>> which means that confidence level is 95%, I need some extra data
>> regarding sample sizes. Please pass to me number of collected
>> observations in each 6-12 bit experiment.
>
> Total number of observations was 162833.
>
Ok, finally I have some formal results. To be completely honest I need
to point out that, in fact, we have a discrete data (for example
integers 0, 1, ..., 63, but not continues numbers spread across 0 and
63). That is way I am going to use two sample Kolmogorov-Smirnov test.
Methodology is simple:
- Pawel’s data will be called empirical one
- Theoretical data will be generated as a sequence of unique integer
numbers from 0 to 2**n -1, where n is the number of bits. Assumption -
each number appears in theoretical data only once representing ideal
uniform distribution.
Calculations will be done in the R-cran package
Loading empirical data form files:
> e6 = read.table("E:\\pawel\\dhr2_6bit_sorted.txt")
> e7 = read.table("E:\\pawel\\dhr2_7bit_sorted.txt")
> e8 = read.table("E:\\pawel\\dhr2_8bit_sorted.txt")
> e9 = read.table("E:\\pawel\\dhr2_9bit_sorted.txt")
> e10 = read.table("E:\\pawel\\dhr2_10bit_sorted.txt")
> e11 = read.table("E:\\pawel\\dhr2_11bit_sorted.txt")
> e12 = read.table("E:\\pawel\\dhr2_12bit_sorted.txt")
Generating ideal theoretical data:
> t6 = c(0:(2**6-1))
> t7 = c(0:(2**7-1))
> t8 = c(0:(2**8-1))
> t9 = c(0:(2**9-1))
> t10 = c(0:(2**10-1))
> t11 = c(0:(2**11-1))
> t12 = c(0:(2**12-1))
Performing KS tests:
> ks.test(e6, t6)
D = 0.0032, p-value = 1
> ks.test(e7, t7)
D = 0.0029, p-value = 1
> ks.test(e8, t8)
D = 0.0027, p-value = 1
> ks.test(e9, t9)
D = 0.0022, p-value = 1
> ks.test(e10, t10)
D = 0.0634, p-value = 0.0005562
> ks.test(e11, t11)
D = 0.1907, p-value < 2.2e-16
> ks.test(e12, t12)
D = 0.5479, p-value < 2.2e-16
As you can see D-statistics are almost the same as calculated by Pawel
(considering roundings). P-values are very interesting due to very high
number of observations generated by Pawel. Between 6 bits and 9 bits
estimated p-values are equal to 1, so it means that it is impossible (at
any significance level) to reject null hypothesis stating that compared
distributions are equal. Final conclusion: it has to be random, and for
sure it is random!
Additionally starting form 10 bits we can observe dramatic decrease of
p-value (from 100% to c.a. 0,06% and much less for the 11-12 bits). So
low p-value means that it is impossible not to reject null hypothesis
stating that compared distributions are equal. Final conclusion: it
cannot be random, and for sure it is not random.
I did the same comparison for the previous real device attach data (2081
obs.). R code and the results are below:
> e16 = read.table("E:\\pawel\\device_attach_16bit.log")
> t16 = c(0:(2**16-1))
> ks.test(e16, t16)
D = 0.0178, p-value = 0.5422
Again, D-statistic an p-value are almost the same as previously
calculated "manually". P-value is very high (it is not as high as in the
6-12 bits tests, but consider much lower number of observations: 2081 vs
162833), giving almost sureness that you have captured real 16-bits
entropy!
Regards,
Mariusz
More information about the freebsd-security
mailing list