Re: Rock64 configuration fails to boot for main 22c4ab6cb015 but worked for main 06bd74e1e39c (Nov 21): e.MMC mishandled?
- Reply: Mark Millard via freebsd-arm : "Re: Rock64 configuration fails to boot for main 22c4ab6cb015 but worked for main 06bd74e1e39c (Nov 21): e.MMC mishandled?"
- In reply to: Mark Millard via freebsd-arm : "Re: Rock64 configuration fails to boot for main 22c4ab6cb015 but worked for main 06bd74e1e39c (Nov 21): e.MMC mishandled?"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Sun, 12 Dec 2021 08:29:25 UTC
On 2021-Dec-11, at 16:19, Mark Millard <marklmi@yahoo.com> wrote: > [I've cut out the history: just presenting some new evidence.] > > First, a little context from getting to the db> prompt. > > db> ps > pid ppid pgrp uid state wmesg wchan cmd > 18 0 0 0 DL syncer 0xffff000000eca5a8 [syncer] > 17 0 0 0 DL vlruwt 0xffffa00007d2ea60 [vnlru] > 16 0 0 0 DL (threaded) [bufdaemon] > 100089 D qsleep 0xffff000000ec9478 [bufdaemon] > 100092 D - 0xffff000000c11100 [bufspacedaemon-0] > 100093 D - 0xffff000000c21680 [bufspacedaemon-1] > 9 0 0 0 DL psleep 0xffff000000ef0650 [vmdaemon] > 8 0 0 0 DL (threaded) [pagedaemon] > 100087 D psleep 0xffff000000ee2b38 [dom0] > 100094 D launds 0xffff000000ee2b44 [laundry: dom0] > 100095 D umarcl 0xffff0000007b38d8 [uma] > 7 0 0 0 DL mmcsd d 0xffffa00007b72e00 [mmcsd0boot1: mmc/sd] > 6 0 0 0 DL mmcsd d 0xffffa00007b71300 [mmcsd0boot0: mmc/sd] > 5 0 0 0 DL mmcreq 0xffff00009b5d0710 [mmcsd0: mmc/sd card] > 4 0 0 0 DL - 0xffff000000ccc020 [rand_harvestq] > 15 0 0 0 DL (threaded) [usb] > . . . > > and "mmcreq" is from the while loop in: > > static int > mmc_wait_for_req(struct mmc_softc *sc, struct mmc_request *req) > { > > req->done = mmc_wakeup; > req->done_data = sc; > if (__predict_false(mmc_debug > 1)) { > device_printf(sc->dev, "REQUEST: CMD%d arg %#x flags %#x", > req->cmd->opcode, req->cmd->arg, req->cmd->flags); > if (req->cmd->data) { > printf(" data %d\n", (int)req->cmd->data->len); > } else > printf("\n"); > } > MMCBR_REQUEST(device_get_parent(sc->dev), sc->dev, req); > MMC_LOCK(sc); > while ((req->flags & MMC_REQ_DONE) == 0) > msleep(req, &sc->sc_mtx, 0, "mmcreq", 0); > MMC_UNLOCK(sc); > if (__predict_false(mmc_debug > 2 || (mmc_debug > 0 && > req->cmd->error != MMC_ERR_NONE))) > device_printf(sc->dev, "CMD%d RESULT: %d\n", > req->cmd->opcode, req->cmd->error); > return (0); > } > > So it appears that the error report: > > mmcsd0: Error indicated: 4 Failed > > ends up associated with (req->flags & MMC_REQ_DONE) == 0 staying > true in the above code: an unbounded loop with MMC_LOCK(sc) active. > The "4" in the error report seems to be from: > > #define MMC_ERR_FAILED 4 > > It looks like there are some problems with handling errors, problems > such that it gets stuck looping (no panic, no progress). > > That seems to be separate from why the MMC_ERR_FAILED was generated > in the first place. So: 2 problems, not just one. Thus it may be a > good context for tackling the looping problem with a known example > failure to look at. > > > > Just for reference, I tried "boot -v" with debug.verbose_sysinit=1 in place, > just to capture and report the tail of the output for the boot failure. > > . . . > subsystem f000000 > release_aps(0)... Release APs...done > done. > intr_irq_shuffle(0)... Trying to mount root from ufs:/dev/gpt/Rock64root []... > done. > netisr_start(0)... done. > taskqgroup_bind_softirq(0)... done. > GEOM: new disk mmcsd0 > GEOM: new disk mmcsd0boot0 > GEOM: new disk mmcsd0boot1 > smp_after_idle_runnable(0)... done. > taskqgroup_bind_if_config_tqg(0)... done. > taskqgroup_bind_if_io_tqg(0)... done. > tmr_setup_user_access(0)... done. > subsystem f000001 > mmcsd0: Error indicated: 4 Failed > epoch_init_smp(0)... done. > subsystem f100000 > racctd_init(0)... done. > subsystem fffffff > start_periodic_resettodr(0)... done. > oktousecallout(0)... done. > clknode_finish(0)... Unresolved linked clock found: hdmi_phy > Unresolved linked clock found: usb480m_phy > done. > regulator_constraint(0)... done. > regulator_shutdown(0)... regulator: shutting down unused regulators > regulator: shutting down vcc_sd... busy > done. > uhub0: 1 port with 1 removable, self powered > uhub2: 2 ports with 2 removable, self powered > uhub3: 1 port with 1 removable, self powered > uhub1: 1 port with 1 removable, self powered > ugen4.2: <Samsung PSSD T7 Touch> at usbus4 > umass0 on uhub2 > umass0: <Samsung PSSD T7 Touch, class 0/0, rev 3.20/1.00, addr 1> on usbus4 > umass0: SCSI over Bulk-Only; quirks = 0x0000 > umass0:0:0: Attached to scbus0 > pass0 at umass-sim0 bus 0 scbus0 target 0 lun 0 > pass0: <Samsung PSSD T7 Touch 0> Fixed Direct Access SPC-4 SCSI device > pass0: Serial Number REPLACED > pass0: 400.000MB/s transfers > da0 at umass-sim0 bus 0 scbus0 target 0 lun 0 > da0: <Samsung PSSD T7 Touch 0> Fixed Direct Access SPC-4 SCSI device > da0: Serial Number REPLACED > da0: 400.000MB/s transfers > da0: 953869MB (1953525168 512 byte sectors) > da0: quirks=0x2<NO_6_BYTE> > da0: Delete methods: <NONE(*),ZERO> > random: unblocking device. > > No more output after that. As for why MMC_ERR_FAILED results, the following code diff is intended to suggest what I think may be incomplete about sticking to what the device-specific code supports vs. does not support (not supporting HS200 here). The code does compile in my context but is untested. The email handling may mess up some leading whitespace --but, again, I'm only trying to suggest a type of change. # git -C /usr/main-src/ diff /usr/main-src/sys/dev/mmc diff --git a/sys/dev/mmc/mmc.c b/sys/dev/mmc/mmc.c index 9c73dfd57ce0..dffd1c382684 100644 --- a/sys/dev/mmc/mmc.c +++ b/sys/dev/mmc/mmc.c @@ -59,6 +59,7 @@ __FBSDID("$FreeBSD$"); #include <sys/param.h> #include <sys/systm.h> #include <sys/kernel.h> +#include <sys/kobj.h> #include <sys/malloc.h> #include <sys/lock.h> #include <sys/module.h> @@ -1512,6 +1513,8 @@ mmc_timing_to_string(enum mmc_bus_timing timing) static bool mmc_host_timing(device_t dev, enum mmc_bus_timing timing) { + kobjop_desc_t kobj_desc; + kobj_method_t *kobj_method; int host_caps; host_caps = mmcbr_get_caps(dev); @@ -1543,14 +1546,37 @@ mmc_host_timing(device_t dev, enum mmc_bus_timing timing) case bus_timing_mmc_ddr52: return (HOST_TIMING_CAP(host_caps, MMC_CAP_MMC_DDR52)); case bus_timing_mmc_hs200: - return (HOST_TIMING_CAP(host_caps, MMC_CAP_MMC_HS200_120) || - HOST_TIMING_CAP(host_caps, MMC_CAP_MMC_HS200_180)); case bus_timing_mmc_hs400: - return (HOST_TIMING_CAP(host_caps, MMC_CAP_MMC_HS400_120) || - HOST_TIMING_CAP(host_caps, MMC_CAP_MMC_HS400_180)); case bus_timing_mmc_hs400es: - return (HOST_TIMING_CAP(host_caps, MMC_CAP_MMC_HS400 | - MMC_CAP_MMC_ENH_STROBE)); + /* + * Disable eMMC modes that require use of + * MMC_SEND_TUNING_BLOCK_HS200 to set things up if either the + * tune or re-tune method is the default NULL implementation. + */ + kobj_desc = &mmcbr_tune_desc; + kobj_method = kobj_lookup_method(((kobj_t)dev)->ops->cls, NULL, + kobj_desc); + if (kobj_method == &kobj_desc->deflt) + return (false); + kobj_desc = &mmcbr_retune_desc; + kobj_method = kobj_lookup_method(((kobj_t)dev)->ops->cls, NULL, + kobj_desc); + if (kobj_method == &kobj_desc->deflt) { + return (false); + } + + /* + * Otherwise track the host capabilities. + */ + if (timing == bus_timing_mmc_hs200) + return (HOST_TIMING_CAP(host_caps, MMC_CAP_MMC_HS200_120) || + HOST_TIMING_CAP(host_caps, MMC_CAP_MMC_HS200_180)); + if (timing == bus_timing_mmc_hs400) + return (HOST_TIMING_CAP(host_caps, MMC_CAP_MMC_HS400_120) || + HOST_TIMING_CAP(host_caps, MMC_CAP_MMC_HS400_180)); + if (timing == bus_timing_mmc_hs400es) + return (HOST_TIMING_CAP(host_caps, MMC_CAP_MMC_HS400 | + MMC_CAP_MMC_ENH_STROBE)); } #undef HOST_TIMING_CAP In other words: have mmc_host_timing avoid returning true for some combinations that definitely do not have sufficient software support present at the time. (So far as I can tell, the rk3328's get the NULL-implementations as things are.) I expect that this sort of thing would go back to using MMC_CAP_MMC_DDR52 for the rk3328's, as an example. Working, but in a slower mode, the same mode as FreeBSD was previously using. A possible incompleteness in the suggestion is that there is also a drive-strength setting involved. If that also had "kobj" interfacing and NULL-implementation possibilities, then in the future there would be more to test for possibly forcing return-false than I did above. Hopefully this sort of thing would help, possibly more than just for rk3328's. === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)