From nobody Tue Nov 09 11:52:18 2021 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 5C43718448C3 for ; Tue, 9 Nov 2021 11:52:33 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4HpRDd11MZz4fPp; Tue, 9 Nov 2021 11:52:33 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.16.1/8.16.1) with ESMTPS id 1A9BqIGJ037346 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Tue, 9 Nov 2021 13:52:21 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua 1A9BqIGJ037346 Received: (from kostik@localhost) by tom.home (8.16.1/8.16.1/Submit) id 1A9BqIbN037345; Tue, 9 Nov 2021 13:52:18 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Tue, 9 Nov 2021 13:52:18 +0200 From: Konstantin Belousov To: Andriy Gapon Cc: FreeBSD Current Subject: Re: thread on sleepqueue does not wake up after timeout Message-ID: References: <20191022104434.GM73312@kib.kiev.ua> <3a67f9a9-31cf-5814-4a68-8bdd6063b21e@FreeBSD.org> <20191022131633.GN73312@kib.kiev.ua> <9c131a2a-cc94-4d93-1ba8-595c0151e366@FreeBSD.org> <32fe4f76-155b-8b99-5782-4daafd4219d5@FreeBSD.org> List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <32fe4f76-155b-8b99-5782-4daafd4219d5@FreeBSD.org> X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FORGED_GMAIL_RCVD,FREEMAIL_FROM, NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.5 X-Spam-Checker-Version: SpamAssassin 3.4.5 (2021-03-20) on tom.home X-Rspamd-Queue-Id: 4HpRDd11MZz4fPp X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-ThisMailContainsUnwantedMimeParts: N On Tue, Nov 09, 2021 at 11:58:30AM +0200, Andriy Gapon wrote: > > Two years have passed since my original report, some things have changed, > but we are still seeing the problem. > > To recap, the problem is that sometimes a callout for a sleepq timeout would > fire to early. The code in sleepq_timeout would ignore such a wake-up. > And, as there would not be another callout, there will never be a wake-up. > > This has been observed on VMWare only. We have seen it on different > versions of ESXi (different environments, upgrades, etc) starting with 5.0, > through 5.5 and 6.0, and now on 6.5. > > We use 12.2 now and are still seeing the problem occasionally. > > On advice from Kostik I added some diagnostic code. > The code checks for the too early condition and panics when it happens. > It also provides some additional information. > > Here is an example of such a panic. > Unread portion of the kernel message buffer: > panic: too early sleepq timeout, 190543869738008 > 190543869603412 (190543869798505) > cpuid = 1 > time = 1636195325 > > (kgdb) bt > #0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 > #1 doadump (textdump=153080096) at /usr/src/sys/kern/kern_shutdown.c:447 > #2 0xffffffff8038f1cc in db_fncall_generic (addr=, nargs=0, > args=, rv=) at > /usr/src/sys/ddb/db_command.c:622 > #3 db_fncall (dummy1=, dummy2=, > dummy3=, dummy4=) at > /usr/src/sys/ddb/db_command.c:670 > #4 0xffffffff8038eaf6 in db_command (last_cmdp=, > cmd_table=, dopager=0) at /usr/src/sys/ddb/db_command.c:494 > #5 0xffffffff80393a68 in db_script_exec (scriptname=, > warnifnotfound=) at /usr/src/sys/ddb/db_script.c:304 > #6 0xffffffff80393892 in db_script_kdbenter (eventname=) at > /usr/src/sys/ddb/db_script.c:326 > #7 0xffffffff80391ac3 in db_trap (type=, code= out>) at /usr/src/sys/ddb/db_main.c:251 > #8 0xffffffff807821e2 in kdb_trap (type=3, code=0, tf=0xfffffe01091fd410) > at /usr/src/sys/kern/subr_kdb.c:700 > #9 0xffffffff809d870e in trap (frame=0xfffffe01091fd410) at > /usr/src/sys/amd64/amd64/trap.c:583 > #10 > #11 kdb_enter (why=0xffffffff80b17af9 "panic", msg=0xffffffff80b17af9 > "panic") at /usr/src/sys/kern/subr_kdb.c:486 > #12 0xffffffff8073ddce in vpanic (fmt=, ap=) > at /usr/src/sys/kern/kern_shutdown.c:975 > #13 0xffffffff8073dc23 in panic (fmt=0xffffffff81178120 "") > at /usr/src/sys/kern/kern_shutdown.c:909 > #14 0xffffffff8078f09c in sleepq_timeout (arg=0xfffff8005b6a1000) at > /usr/src/sys/kern/subr_sleepqueue.c:1042 > #15 0xffffffff8075594a in softclock_call_cc (c=0xfffff8005b6a1470, > cc=0xffffffff811a0580 , now=190543869798505, direct=1) at > /usr/src/sys/kern/kern_timeout.c:768 > #16 0xffffffff8075567c in callout_process (now=190543869798505) at > /usr/src/sys/kern/kern_timeout.c:511 > #17 0xffffffff80a3d455 in handleevents (now=190543869798505, fake=0) at > /usr/src/sys/kern/kern_clocksource.c:213 > #18 0xffffffff80a3da46 in timercb (et=0xffffffff81251238 , > arg=) at /usr/src/sys/kern/kern_clocksource.c:357 > #19 0xffffffff80a680cb in lapic_handle_timer (frame=0xfffffe01091fd7d0) at > /usr/src/sys/x86/x86/local_apic.c:1339 > #20 > #21 0xffffffff81d90654 in buf_hash_find > ... > > (kgdb) fr 18 > #18 0xffffffff80a3da46 in timercb (et=0xffffffff81251238 , > arg=) at /usr/src/sys/kern/kern_clocksource.c:357 > 357 /usr/src/sys/kern/kern_clocksource.c: No such file or directory. > (kgdb) i loc > state = > next = 0xfffffe00015b7b80 > now = 190543869798505 > bcast = > cpu = > > (kgdb) fr 15 > #15 0xffffffff8075594a in softclock_call_cc (c=0xfffff8005b6a1470, > cc=0xffffffff811a0580 , now=190543869798505, direct=1) at > /usr/src/sys/kern/kern_timeout.c:768 > 768 /usr/src/sys/kern/kern_timeout.c: No such file or directory. > (kgdb) p *c > $1 = {c_links = {le = {le_next = 0x0, le_prev = 0xfffffe000029f398}, sle = > {sle_next = 0x0}, tqe = {tqe_next = 0x0, tqe_prev = 0xfffffe000029f398}}, > c_time = 190543869738008, c_precision = 2684354, c_arg = 0xfffff8005b6a1000, > c_func = 0xffffffff8078f000 , c_lock = 0x0, c_flags = 2, > c_iflags = 272, c_cpu = 1, c_exec_time = 190543869798505, c_lines = {u128 = > 5514281696343066435261632100963648521, u16 = {1033, 755, 1225, 628, 409, > 651, 788, 1062}}} > > Here is an explanation for the numbers reproted in the panic message (sorted > from earliest to latest): > 190543869603412 - 'now' as seen in sleepq_timeout(), returned by sbinuptime(); > 190543869738008 - td_sleeptimo, also c_time in the callout; > 190543869798505 - 'now' as captured when the LAPIC timer fired, seen in the > stack trace and also recorded as c_exec_time in the callout. > > Just in case, here is the code that printed the panic message: > panic("too early sleepq timeout, %jd > %jd (%jd)", > > > (intmax_t)td->td_sleeptimo, > > > (intmax_t)sbinuptime(), > > > (intmax_t)td->td_slpcallout.c_exec_time); > > As can be seen, sbinuptime() in sleepq_timeout() returned a value smaller > than what sbinuptime() returned in timercb(). > > It seems that the code in callout_process() was right to fire the callout > because now > td_sleeptimo there. > But sleepq_timeout() thought that it was premature as sbinuptime() < > td_sleeptimo there. > > In the above case the eventtimer is obviously LAPIC, the timecounter is HPET. > But we also saw the same issue when we changed the timecounter to ACPI-fast. > I think that we haven't tried any other choices because: > kern.timecounter.tc.ACPI-fast.quality: 900 > kern.timecounter.tc.i8254.quality: 0 > kern.timecounter.tc.HPET.quality: 950 > kern.timecounter.tc.TSC-low.quality: -100 > > I see three possibilities of why sbinuptime() could go backwards: > - a bug in FreeBSD time keeping code that gets triggered only under very > specific conditions that don't normally happen > - a bug in VMWare, e.g. in HPET and ACPI timer emulation > - something exotic, like correct C code miscompiled to incorrect machine > code or weird memory model violation or etc... > > I'd appreciate any suggestions on additional diagnostics to narrow down / > rule out the possibilities. I do not have any other idea than to try and see if sbinuptime() might go backward, by running a parallel test of it on several CPUs. Might be, also record the raw timecounter value from the hw register. > > P.S. > As a workaround I could modify sleepq_timeout() to get "current time" from > c_exec_time (added by us) instead of sbinuptime(). c_exec_time is the value > of 'now' in callout_process() when it decides that the callout should fire. > But I'd like to get to the bottom of the issue. This was done quite recently in 6df1359e5542f69179c142 by mav.