Collecting entropy from device_attach() times.

Mon Sep 24 21:57:06 UTC 2012

W dniu 2012-09-23 17:17, Pawel Jakub Dawidek pisze:
> On Sun, Sep 23, 2012 at 02:37:48AM +0200, Mariusz Gromada wrote:
>> W dniu 2012-09-22 21:53, Pawel Jakub Dawidek pisze:
>>> Mariusz, can you confirm my findings?
>>
>> Pawel,
>>
>> Your conclusions can be easily confirmed by shape analysis of the EDF.
>> Usually maximum quantile difference (called D-statistic) gives you a
>> kind of overview, function shape gives you a strong feeling, p-value
>> gives you a formal proof.
>> D-statistic values (your data):
>>
>>    6bit:   0.33%
>>    7bit:   0.29%
>>    8bit:   0.27%
>>    9bit:   0.21%
>> 10bit:   6.34%
>> 11bit:  19.07%
>> 12bit:  54.80%
>>
>> What I would say: increasing the number of bits from 6 to 9 does not
>> affect distribution "uniformity", reaching the tenth bit results in
>> sudden increase in the difference measure -  the more bits, the more
>> difference is observed. Distribution shape analysis for the 10th bit
>> shows non-linear function. Lack of "randomness" in the quntile
>> difference curve - chart  shows completely lack of noise (pure
>> functional relation).  These are very strong indicators that starting
>> from 10th bit distribution was changed and is no longer uniform.
>>
>> To formally confirm above conclusion for i.e. 5% significance level,
>> which means that confidence level is 95%, I need some extra data
>> regarding sample sizes. Please pass to me number of collected
>> observations in each 6-12 bit experiment.
>
> Total number of observations was 162833.
>

Ok, finally I have some formal results. To be completely honest I need 
to point out that, in fact, we have a discrete data (for example 
integers 0, 1, ..., 63, but not continues numbers spread across 0 and 
63). That is way  I am going to use two sample Kolmogorov-Smirnov test. 
  Methodology is simple:

- Pawel’s data will be called empirical one
- Theoretical data will be generated as a sequence of unique integer 
numbers from 0 to 2**n -1, where n is the number of bits. Assumption - 
each number appears in theoretical data only once representing ideal 
uniform distribution.

Calculations will be done in the R-cran package

Loading empirical data form files:

 > e6 = read.table("E:\\pawel\\dhr2_6bit_sorted.txt")
 > e7 = read.table("E:\\pawel\\dhr2_7bit_sorted.txt")
 > e8 = read.table("E:\\pawel\\dhr2_8bit_sorted.txt")
 > e9 = read.table("E:\\pawel\\dhr2_9bit_sorted.txt")
 > e10 = read.table("E:\\pawel\\dhr2_10bit_sorted.txt")
 > e11 = read.table("E:\\pawel\\dhr2_11bit_sorted.txt")
 > e12 = read.table("E:\\pawel\\dhr2_12bit_sorted.txt")

Generating ideal theoretical data:

 > t6 = c(0:(2**6-1))
 > t7 = c(0:(2**7-1))
 > t8 = c(0:(2**8-1))
 > t9 = c(0:(2**9-1))
 > t10 = c(0:(2**10-1))
 > t11 = c(0:(2**11-1))
 > t12 = c(0:(2**12-1))

Performing KS tests:

 > ks.test(e6, t6)
D = 0.0032, p-value = 1

 > ks.test(e7, t7)
D = 0.0029, p-value = 1

 > ks.test(e8, t8)
D = 0.0027, p-value = 1

 > ks.test(e9, t9)
D = 0.0022, p-value = 1

 > ks.test(e10, t10)
D = 0.0634, p-value = 0.0005562

 > ks.test(e11, t11)
D = 0.1907, p-value < 2.2e-16

 > ks.test(e12, t12)
D = 0.5479, p-value < 2.2e-16

As you can see D-statistics are almost the same as calculated by Pawel 
(considering roundings). P-values are very interesting due to very high 
number of observations generated by Pawel. Between 6 bits and 9 bits 
estimated p-values are equal to 1, so it means that it is impossible (at 
any significance level) to reject null hypothesis stating that compared 
distributions are equal. Final conclusion: it has to be random, and for 
sure it is random!

Additionally starting form 10 bits we can observe dramatic decrease of 
p-value (from 100% to c.a. 0,06% and much less for the 11-12 bits). So 
low p-value means that it is impossible not to reject null hypothesis 
stating that compared distributions are equal. Final conclusion: it 
cannot be random, and for sure it is not random.

I did the same comparison for the previous real device attach data (2081 
obs.). R code and the results are below:

 > e16 = read.table("E:\\pawel\\device_attach_16bit.log")
 > t16 = c(0:(2**16-1))
 > ks.test(e16, t16)
D = 0.0178, p-value = 0.5422

Again, D-statistic an p-value are almost the same as previously 
calculated "manually". P-value is very high (it is not as high as in the 
6-12 bits tests, but consider much lower number of observations: 2081 vs 
  162833), giving almost sureness that you have captured real 16-bits 
entropy!

Regards,
Mariusz