locks and kernel randomness...

Wed Feb 25 09:06:48 UTC 2015

On Tue, Feb 24, 2015 at 08:57:58PM -0800, Harrison Grundy wrote:
> <... snip ...>
> 
> The timing attack I talked to you about on IRC works like this:
> 
> A userland process creates as many threads as there are CPUs, and by
> manipulating the load they generate, gets it so they're all flagged as
> interactive and at the same priority. (alternating spin and sleep with
> a 2% duty cycle would work, for instance)
> 
> It would also be possible to coerce a userland process, like apache to
> behave this way.
> 
> These threads now have the ability to preempt all timeshare tasks on
> all CPUs for slice_size time, by waking up and spinning at the same
> time. This means they can get very precise knowledge about scheduling,
> by timing when they get to run, versus when they have to wait.
Ok, this is definitely not impossible.

> 
> By watching CPU0, one of these threads can measure balance_ticks.
> 
> This is important because balance_ticks directly exposes the last 7
> bits it gets back from random(). (The value gets applied to
> balance_interval to keep the balancer from running on exactly the same
> interval)
> 
> This means that if an attacker can trigger the use of random, or is
> willing to wait long enough for a race, they can determine the value
> of those bits that were passed along to anyone who called random() at
> the same time.
> 
> It also means that they can eventually discover the state of the RNG,
> and predict future values.
> 
> The security implications of disclosing the values this way isn't as
> severe as it might seem, simply because random() isn't really used in
> any cryptographically sensitive areas, but there are definite
> consequences, like predicting firewall port values, and NFS client
> transaction IDs.
> 
> It is, however, surprising to learn that the balance_interval sysctl
> has security implications.

So this is an argument to remove the current random() call from
the sched_balance(). There is no implications for use of e.g.
get_cyclecount() in the sched_balance(), since on x86 userspace has the
ability to read the underlying counter directly.

On other architectures, where counter backing get_cyclecount() is not
accessible to userspace, it is still feasible to use in sched_balance(),
simply because counter is ticking.

Do you agree with these statements ?

Also, as I understand from your other responses, you did tested the
patch to use get_cyclecount() on non-x86 machines ?  I try to understand
what testing was done for the get_cyclecount() for sched_balance() patch,
i.e. is it ready for commit.