Re: Rock64 configuration fails to boot for main 22c4ab6cb015 but worked for main 06bd74e1e39c (Nov 21): e.MMC mishandled?
- Reply: Kornel_Dulęba : "Re: Rock64 configuration fails to boot for main 22c4ab6cb015 but worked for main 06bd74e1e39c (Nov 21): e.MMC mishandled?"
- In reply to: Mark Millard via freebsd-arm : "Re: Rock64 configuration fails to boot for main 22c4ab6cb015 but worked for main 06bd74e1e39c (Nov 21): e.MMC mishandled?"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Sun, 12 Dec 2021 21:38:19 UTC
On 2021-Dec-12, at 00:59, Mark Millard <marklmi@yahoo.com> wrote: > On 2021-Dec-12, at 00:29, Mark Millard via freebsd-arm <freebsd-arm@freebsd.org> wrote: > >> On 2021-Dec-11, at 16:19, Mark Millard <marklmi@yahoo.com> wrote: >> >>> [I've cut out the history: just presenting some new evidence.] >>> >>> First, a little context from getting to the db> prompt. >>> >>> db> ps >>> pid ppid pgrp uid state wmesg wchan cmd >>> 18 0 0 0 DL syncer 0xffff000000eca5a8 [syncer] >>> 17 0 0 0 DL vlruwt 0xffffa00007d2ea60 [vnlru] >>> 16 0 0 0 DL (threaded) [bufdaemon] >>> 100089 D qsleep 0xffff000000ec9478 [bufdaemon] >>> 100092 D - 0xffff000000c11100 [bufspacedaemon-0] >>> 100093 D - 0xffff000000c21680 [bufspacedaemon-1] >>> 9 0 0 0 DL psleep 0xffff000000ef0650 [vmdaemon] >>> 8 0 0 0 DL (threaded) [pagedaemon] >>> 100087 D psleep 0xffff000000ee2b38 [dom0] >>> 100094 D launds 0xffff000000ee2b44 [laundry: dom0] >>> 100095 D umarcl 0xffff0000007b38d8 [uma] >>> 7 0 0 0 DL mmcsd d 0xffffa00007b72e00 [mmcsd0boot1: mmc/sd] >>> 6 0 0 0 DL mmcsd d 0xffffa00007b71300 [mmcsd0boot0: mmc/sd] >>> 5 0 0 0 DL mmcreq 0xffff00009b5d0710 [mmcsd0: mmc/sd card] >>> 4 0 0 0 DL - 0xffff000000ccc020 [rand_harvestq] >>> 15 0 0 0 DL (threaded) [usb] >>> . . . >>> >>> and "mmcreq" is from the while loop in: >>> >>> static int >>> mmc_wait_for_req(struct mmc_softc *sc, struct mmc_request *req) >>> { >>> >>> req->done = mmc_wakeup; >>> req->done_data = sc; >>> if (__predict_false(mmc_debug > 1)) { >>> device_printf(sc->dev, "REQUEST: CMD%d arg %#x flags %#x", >>> req->cmd->opcode, req->cmd->arg, req->cmd->flags); >>> if (req->cmd->data) { >>> printf(" data %d\n", (int)req->cmd->data->len); >>> } else >>> printf("\n"); >>> } >>> MMCBR_REQUEST(device_get_parent(sc->dev), sc->dev, req); >>> MMC_LOCK(sc); >>> while ((req->flags & MMC_REQ_DONE) == 0) >>> msleep(req, &sc->sc_mtx, 0, "mmcreq", 0); >>> MMC_UNLOCK(sc); >>> if (__predict_false(mmc_debug > 2 || (mmc_debug > 0 && >>> req->cmd->error != MMC_ERR_NONE))) >>> device_printf(sc->dev, "CMD%d RESULT: %d\n", >>> req->cmd->opcode, req->cmd->error); >>> return (0); >>> } >>> >>> So it appears that the error report: >>> >>> mmcsd0: Error indicated: 4 Failed >>> >>> ends up associated with (req->flags & MMC_REQ_DONE) == 0 staying >>> true in the above code: an unbounded loop with MMC_LOCK(sc) active. >>> The "4" in the error report seems to be from: >>> >>> #define MMC_ERR_FAILED 4 >>> >>> It looks like there are some problems with handling errors, problems >>> such that it gets stuck looping (no panic, no progress). >>> >>> That seems to be separate from why the MMC_ERR_FAILED was generated >>> in the first place. So: 2 problems, not just one. Thus it may be a >>> good context for tackling the looping problem with a known example >>> failure to look at. >>> >>> >>> >>> Just for reference, I tried "boot -v" with debug.verbose_sysinit=1 in place, >>> just to capture and report the tail of the output for the boot failure. >>> >>> . . . >>> subsystem f000000 >>> release_aps(0)... Release APs...done >>> done. >>> intr_irq_shuffle(0)... Trying to mount root from ufs:/dev/gpt/Rock64root []... >>> done. >>> netisr_start(0)... done. >>> taskqgroup_bind_softirq(0)... done. >>> GEOM: new disk mmcsd0 >>> GEOM: new disk mmcsd0boot0 >>> GEOM: new disk mmcsd0boot1 >>> smp_after_idle_runnable(0)... done. >>> taskqgroup_bind_if_config_tqg(0)... done. >>> taskqgroup_bind_if_io_tqg(0)... done. >>> tmr_setup_user_access(0)... done. >>> subsystem f000001 >>> mmcsd0: Error indicated: 4 Failed >>> epoch_init_smp(0)... done. >>> subsystem f100000 >>> racctd_init(0)... done. >>> subsystem fffffff >>> start_periodic_resettodr(0)... done. >>> oktousecallout(0)... done. >>> clknode_finish(0)... Unresolved linked clock found: hdmi_phy >>> Unresolved linked clock found: usb480m_phy >>> done. >>> regulator_constraint(0)... done. >>> regulator_shutdown(0)... regulator: shutting down unused regulators >>> regulator: shutting down vcc_sd... busy >>> done. >>> uhub0: 1 port with 1 removable, self powered >>> uhub2: 2 ports with 2 removable, self powered >>> uhub3: 1 port with 1 removable, self powered >>> uhub1: 1 port with 1 removable, self powered >>> ugen4.2: <Samsung PSSD T7 Touch> at usbus4 >>> umass0 on uhub2 >>> umass0: <Samsung PSSD T7 Touch, class 0/0, rev 3.20/1.00, addr 1> on usbus4 >>> umass0: SCSI over Bulk-Only; quirks = 0x0000 >>> umass0:0:0: Attached to scbus0 >>> pass0 at umass-sim0 bus 0 scbus0 target 0 lun 0 >>> pass0: <Samsung PSSD T7 Touch 0> Fixed Direct Access SPC-4 SCSI device >>> pass0: Serial Number REPLACED >>> pass0: 400.000MB/s transfers >>> da0 at umass-sim0 bus 0 scbus0 target 0 lun 0 >>> da0: <Samsung PSSD T7 Touch 0> Fixed Direct Access SPC-4 SCSI device >>> da0: Serial Number REPLACED >>> da0: 400.000MB/s transfers >>> da0: 953869MB (1953525168 512 byte sectors) >>> da0: quirks=0x2<NO_6_BYTE> >>> da0: Delete methods: <NONE(*),ZERO> >>> random: unblocking device. >>> >>> No more output after that. >> >> As for why MMC_ERR_FAILED results, the following code diff is >> intended to suggest what I think may be incomplete about sticking >> to what the device-specific code supports vs. does not support >> (not supporting HS200 here). The code does compile in my context >> but is untested. > > It is now tested (at least to be a useful hack): no longer am I > running an older 1400042 kernel. For reference, > > # uname -apKU > FreeBSD Rock64_RPi_4_3_2v1p2 14.0-CURRENT FreeBSD 14.0-CURRENT #18 main-n251456-22c4ab6cb015-dirty: Sun Dec 12 00:34:53 PST 2021 root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA53-nodbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-NODBG-CA53 arm64 aarch64 1400043 1400043 > > And it reports during the boot (other than the "REPLACED"): > > mmcsd0: 125GB <MMCHC DJNB4R 0.7 SN REPLACED MFG 06/2016 by 21 0x0000> at mmc0 52.0MHz/8bit/1016-block > > So it no longer sets up a mode that the rk3328-specific-code does not > actually support. > > (Nothing that I've done here deals with the looping issue when there > is a MMC_ERR_FAILED or the like.) > >> The email handling may mess up some leading >> whitespace --but, again, I'm only trying to suggest a type of >> change. >> >> # git -C /usr/main-src/ diff /usr/main-src/sys/dev/mmc >> diff --git a/sys/dev/mmc/mmc.c b/sys/dev/mmc/mmc.c >> index 9c73dfd57ce0..dffd1c382684 100644 >> --- a/sys/dev/mmc/mmc.c >> +++ b/sys/dev/mmc/mmc.c >> @@ -59,6 +59,7 @@ __FBSDID("$FreeBSD$"); >> #include <sys/param.h> >> #include <sys/systm.h> >> #include <sys/kernel.h> >> +#include <sys/kobj.h> >> #include <sys/malloc.h> >> #include <sys/lock.h> >> #include <sys/module.h> >> @@ -1512,6 +1513,8 @@ mmc_timing_to_string(enum mmc_bus_timing timing) >> static bool >> mmc_host_timing(device_t dev, enum mmc_bus_timing timing) >> { >> + kobjop_desc_t kobj_desc; >> + kobj_method_t *kobj_method; >> int host_caps; >> >> host_caps = mmcbr_get_caps(dev); >> @@ -1543,14 +1546,37 @@ mmc_host_timing(device_t dev, enum mmc_bus_timing timing) >> case bus_timing_mmc_ddr52: >> return (HOST_TIMING_CAP(host_caps, MMC_CAP_MMC_DDR52)); >> case bus_timing_mmc_hs200: >> - return (HOST_TIMING_CAP(host_caps, MMC_CAP_MMC_HS200_120) || >> - HOST_TIMING_CAP(host_caps, MMC_CAP_MMC_HS200_180)); >> case bus_timing_mmc_hs400: >> - return (HOST_TIMING_CAP(host_caps, MMC_CAP_MMC_HS400_120) || >> - HOST_TIMING_CAP(host_caps, MMC_CAP_MMC_HS400_180)); >> case bus_timing_mmc_hs400es: >> - return (HOST_TIMING_CAP(host_caps, MMC_CAP_MMC_HS400 | >> - MMC_CAP_MMC_ENH_STROBE)); >> + /* >> + * Disable eMMC modes that require use of >> + * MMC_SEND_TUNING_BLOCK_HS200 to set things up if either the >> + * tune or re-tune method is the default NULL implementation. >> + */ >> + kobj_desc = &mmcbr_tune_desc; >> + kobj_method = kobj_lookup_method(((kobj_t)dev)->ops->cls, NULL, >> + kobj_desc); >> + if (kobj_method == &kobj_desc->deflt) >> + return (false); >> + kobj_desc = &mmcbr_retune_desc; >> + kobj_method = kobj_lookup_method(((kobj_t)dev)->ops->cls, NULL, >> + kobj_desc); >> + if (kobj_method == &kobj_desc->deflt) { >> + return (false); >> + } >> + >> + /* >> + * Otherwise track the host capabilities. >> + */ >> + if (timing == bus_timing_mmc_hs200) >> + return (HOST_TIMING_CAP(host_caps, MMC_CAP_MMC_HS200_120) || >> + HOST_TIMING_CAP(host_caps, MMC_CAP_MMC_HS200_180)); >> + if (timing == bus_timing_mmc_hs400) >> + return (HOST_TIMING_CAP(host_caps, MMC_CAP_MMC_HS400_120) || >> + HOST_TIMING_CAP(host_caps, MMC_CAP_MMC_HS400_180)); >> + if (timing == bus_timing_mmc_hs400es) >> + return (HOST_TIMING_CAP(host_caps, MMC_CAP_MMC_HS400 | >> + MMC_CAP_MMC_ENH_STROBE)); >> } >> >> #undef HOST_TIMING_CAP >> >> >> In other words: have mmc_host_timing avoid returning true for some >> combinations that definitely do not have sufficient software support >> present at the time. (So far as I can tell, the rk3328's get the >> NULL-implementations as things are.) >> >> I expect that this sort of thing would go back to using >> MMC_CAP_MMC_DDR52 for the rk3328's, as an example. Working, but in a >> slower mode, the same mode as FreeBSD was previously using. >> >> A possible incompleteness in the suggestion is that there is also a >> drive-strength setting involved. If that also had "kobj" interfacing >> and NULL-implementation possibilities, then in the future there would >> be more to test for possibly forcing return-false than I did above. >> >> Hopefully this sort of thing would help, possibly more than just for >> rk3328's. > > As for what was happening without my patch . . . sys/dev/mmc/mmcbr_if.m defines: static int null_retune(device_t brdev __unused, device_t reqdev __unused, bool reset __unused) { return (0); } static int null_tune(device_t brdev __unused, device_t reqdev __unused, bool hs400 __unused) { return (0); } . . . # # Called by the mmcbus with the bridge claimed to execute initial tuning. # METHOD int tune { device_t brdev; device_t reqdev; bool hs400; } DEFAULT null_tune; # # Called by the mmcbus with the bridge claimed to execute re-tuning. # METHOD int retune { device_t brdev; device_t reqdev; bool reset; } DEFAULT null_retune; . . . It is these success-reporting no-op routines that were being used to attempt the tuning: so there was no tuning done. The code that I added detects that these routines would be used and avoids allowing contexts that would involve putting them to use with HS200 mode. I'll note that there is another such null_* routine that the code (even with my patch) does not deal with avoiding the use of: . . . static int null_switch_vccq(device_t brdev __unused, device_t reqdev __unused) { return (0); } . . . # # Called by the mmcbus to switch the signaling voltage (VCCQ). # METHOD int switch_vccq { device_t brdev; device_t reqdev; } DEFAULT null_switch_vccq; . . . /usr/main-src/sys/dev/sdhci/sdhci.c has somewhat analogous code for somewhat analogous null_* routines. null_set_uhs_timing for that is from sys/dev/sdhci/sdhci_if.m (but the other two are again the above null_tune and null_retune routines, so not repeated here): . . . static void null_set_uhs_timing(device_t brdev __unused, struct sdhci_slot *slot __unused) { } . . . METHOD void set_uhs_timing { device_t brdev; struct sdhci_slot *slot; } DEFAULT null_set_uhs_timing; . . . sdhci_init_slot(device_t dev, struct sdhci_slot *slot, int num) in sdhci.c looks like (in part): . . . /* * Disable UHS-I and eMMC modes if the set_uhs_timing method is the * default NULL implementation. */ kobj_desc = &sdhci_set_uhs_timing_desc; kobj_method = kobj_lookup_method(((kobj_t)dev)->ops->cls, NULL, kobj_desc); if (kobj_method == &kobj_desc->deflt) host_caps &= ~(MMC_CAP_UHS_SDR12 | MMC_CAP_UHS_SDR25 | MMC_CAP_UHS_SDR50 | MMC_CAP_UHS_DDR50 | MMC_CAP_UHS_SDR104 | MMC_CAP_MMC_DDR52 | MMC_CAP_MMC_HS200 | MMC_CAP_MMC_HS400); #define SDHCI_CAP_MODES_TUNING(caps2) \ (((caps2) & SDHCI_TUNE_SDR50 ? MMC_CAP_UHS_SDR50 : 0) | \ MMC_CAP_UHS_DDR50 | MMC_CAP_UHS_SDR104 | MMC_CAP_MMC_HS200 | \ MMC_CAP_MMC_HS400) /* * Disable UHS-I and eMMC modes that require (re-)tuning if either * the tune or re-tune method is the default NULL implementation. */ kobj_desc = &mmcbr_tune_desc; kobj_method = kobj_lookup_method(((kobj_t)dev)->ops->cls, NULL, kobj_desc); if (kobj_method == &kobj_desc->deflt) goto no_tuning; kobj_desc = &mmcbr_retune_desc; kobj_method = kobj_lookup_method(((kobj_t)dev)->ops->cls, NULL, kobj_desc); if (kobj_method == &kobj_desc->deflt) { no_tuning: host_caps &= ~(SDHCI_CAP_MODES_TUNING(caps2)); } . . . What I've done in my patch is analogous to what the the code shown after the #define SDHCI_CAP_MODES_TUNING above does, translated to fit the mmc's pre-existing code structure. === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)