newfs silently fails if random is not ready (?)
Xin Li
delphij at FreeBSD.org
Wed Sep 5 05:00:43 UTC 2018
On 9/4/18 21:39, Conrad Meyer wrote:
> With current libc, I instead see:
>
> load: 0.10 cmd: blocked_random_poc 1668 [randseed] 1.27r 0.00u 0.00s
> 0% 2328k (SIGINFO)
>
> $ procstat -kk 1668
> PID TID COMM TDNAME KSTACK
> 1668 100609 blocked_random_poc - mi_switch+0xd3
> sleepq_catch_signals+0x386 sleepq_timedwait_sig+0x12 _sleep+0x272
> read_random_uio+0xb3 sys_getrandom+0xa3 amd64_syscall+0x940
> fast_syscall_common+0x101
>
> and:
>
> $ truss ./blocked_random_poc
> ...
> getrandom(0x7fffffffd340,40,0) ERR#35 'Resource
> temporarily unavailable'
> thr_self(0x7fffffffd310) = 0 (0x0)
> thr_kill(100609,SIGKILL) = 0 (0x0)
> SIGNAL 9 (SIGKILL) code=SI_NOINFO
>
> So getrandom(2) (via READ_RANDOM_UIO) is returning a bogus EAGAIN
> after we have already slept until random was seeded. This bubbles up
> to getentropy(3) -> arc4random(3), which sees a surprising failure
> from getentropy(3) and raises KILL against the program.
>
> I believe the EWOULDBLOCK is just a boring leak of tsleep(9)'s timeout
> condition. This may be sufficient to fix the problem:
>
> --- a/sys/dev/random/randomdev.c
> +++ b/sys/dev/random/randomdev.c
> @@ -156,6 +156,10 @@ READ_RANDOM_UIO(struct uio *uio, bool nonblock)
> error = tsleep(&random_alg_context, PCATCH, "randseed", hz/10);
> if (error == ERESTART || error == EINTR)
> break;
> + /* Squash hz/10 timeout condition */
> + if (error == EWOULDBLOCK)
> + error = 0;
> + KASSERT(error == 0, ("unexpected %d", error));
> }
> if (error == 0) {
> read_rate_increment((uio->uio_resid +
> sizeof(uint32_t))/sizeof(uint32_t));
+markm, re
I think the proposed change is reasonable (note that I think the same
theory applies to the tsleep_sbt() case below as well, which should be
handled similarly).
> Best,
> Conrad
>
>
> On Tue, Sep 4, 2018 at 8:13 PM, Conrad Meyer <cem at freebsd.org> wrote:
>> Hi Lev,
>>
>> I took a first attempt at reproducing this problem on a fast
>> desktop-class system. First steps, give us a way to revert back to
>> unseeded status:
>>
>> --- a/sys/dev/random/fortuna.c
>> +++ b/sys/dev/random/fortuna.c
>> @@ -39,6 +39,7 @@ __FBSDID("$FreeBSD$");
>>
>> #ifdef _KERNEL
>> #include <sys/param.h>
>> +#include <sys/fail.h>
>> #include <sys/kernel.h>
>> #include <sys/lock.h>
>> #include <sys/malloc.h>
>> @@ -384,6 +385,17 @@ random_fortuna_pre_read(void)
>> return;
>> }
>>
>> + /*
>> + * When set, pretend we do not have enough entropy to reseed yet.
>> + */
>> + KFAIL_POINT_CODE(DEBUG_FP, random_fortuna_pre_read, {
>> + if (RETURN_VALUE != 0) {
>> + RANDOM_RESEED_UNLOCK();
>> + return;
>> + }
>> + });
>> +
>> +
>> #ifdef _KERNEL
>> fortuna_state.fs_lasttime = now;
>> #endif
>> @@ -442,5 +454,11 @@ bool
>> random_fortuna_seeded(void)
>> {
>>
>> + /* When set, act as if we are not seeded. */
>> + KFAIL_POINT_CODE(DEBUG_FP, random_fortuna_seeded, {
>> + if (RETURN_VALUE != 0)
>> + fortuna_state.fs_counter = UINT128_ZERO;
>> + });
>> +
>> return (!uint128_is_zero(fortuna_state.fs_counter));
>> }
>>
>>
>> Second step, enable the failpoints and launch repro program:
>>
>> $ sudo sysctl debug.fail_point.random_fortuna_pre_read='return(1)'
>> debug.fail_point.random_fortuna_pre_read: off -> return(1)
>> $ sudo sysctl debug.fail_point.random_fortuna_seeded='return(1)'
>> debug.fail_point.random_fortuna_seeded: off -> return(1)
>>
>> $ cat ./blocked_random_poc.c
>> #include <stdio.h>
>> #include <stdlib.h>
>> #include <unistd.h>
>>
>> int
>> main(int argc, char **argv)
>> {
>> printf("%x\n", arc4random());
>> return (0);
>> }
>>
>>
>> $ ./blocked_random_poc
>> ...
>>
>>
>> Third step, I looked at what that process was doing:
>>
>> Curiously, it is not in getrandom() at all, but instead the ARND
>> sysctl fallback. I probably need to rebuild world (libc) to test this
>> (new libc arc4random based on Chacha).
>>
>> $ procstat -kk 1196
>> PID TID COMM TDNAME KSTACK
>> 1196 100435 blocked_random_poc - read_random+0x3d
>> sysctl_kern_arnd+0x3a sysctl_root_handler_locked+0x89
>> sysctl_root.isra.8+0x167 userland_sysctl+0x126 sys___sysctl+0x7b
>> amd64_syscall+0x940 fast_syscall_common+0x101
>>
>>
>> When I unblocked the failpoints, it completed successfully:
>>
>> $ sudo sysctl debug.fail_point.random_fortuna_pre_read='off'
>> debug.fail_point.random_fortuna_pre_read: return(1) -> off
>> $ sudo sysctl debug.fail_point.random_fortuna_seeded=off
>> debug.fail_point.random_fortuna_seeded: return(1) -> off
>>
>> ...
>> 9e5eb30f
>>
>>
>> Best,
>> Conrad
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: OpenPGP digital signature
URL: <http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20180904/269a1e96/attachment.sig>
More information about the freebsd-fs
mailing list