[PATCH] Netdump for review and testing -- preliminary version
Attilio Rao
attilio at freebsd.org
Wed Oct 13 11:04:57 UTC 2010
2010/10/9 Robert Watson <rwatson at freebsd.org>:
> On Fri, 8 Oct 2010, Attilio Rao wrote:
>
>>> GENERAL FRAMEWORK ARCHITECTURE
>>>
>>> Netdump is composed, right now, by an userland "server" and a kernel
>>> "client". The former is run on the target machine (where the dump will
>>> phisically happen) and it is responsible for receiving the packets
>>> containing coredumps frame and for correctly writing them on-disk. The
>>> latter is part of the kernel installed on the source machine (where the dump
>>> is initiated) and is responsible for building correctly UDP packets
>>> containing the coredump frames, pushing through the network interface and
>>> routing them appropriately.
>
> Hi Attilio:
>
> Network dumps would be a great addition to the FreeBSD debugging suite! A
> few casual comments and questions, I need to spend some more time pondering
> the implications of the current netdump design later in the week.
>
> (1) Did you consider using tftp as the network dump protocol, rather than a
> custom protocol? It's also a simple UDP-based, ACKed file transfer
> protocol, with the advantage that it's widely supported by exiting tftp
> daemons, packet sniffers, security devices, etc. This wouldn't require
> using a stock tftpd, although that would certainly be a benefit as well.
> The post-processing of crashdumps that seems to happen in the netdump
> server could, presumably, happen "offline" as easily...?
I don't understand the "offline" question but, really, I don't know
why tftp wasn't considered in the first place.
Let me do some small search and see how much we could benefit from it.
> (2) Have you thought about adding a checksum to the dump format, since
> packets are going over the wire on UDP, etc? Even an MD5 sum wouldn't be
> too hard to arrange: all the code is in the kernel already, requires
> relatively little state storage, and is designed for streamed data.
Someone else already brought this point and I agree, we could use a
checksum here.
> (3) As the bounds checking/etc behavior in dumping grows more complex, it
> seems a shame to replicate it in architecture-specific code. Could we pull
> the mediaoffset/mediasize checking into common code? Also, a reserved
> size/offset of "0" meaning "no limit" sounds slightly worrying; it might be
> slightly more conservative to add a flags field to dumperinfo and have a
> flag indicating "size is no object" for netdump, rather than overloading the
> existing fields.
I don't agree with you here.
The real problem is that dumpsys is highly disk-specific (I've
commented about it somewhere, maybe the e-mail or maybe the commit
logs).
What we'd need is to have something like netdumpsys() which tries to
use network, but it is not short to make and the mediasize+mediaoffset
concept really rappresents an higher bound which can't be 0. I think
it is a reasonable compromise so far but it is subjected to further
improvements for sure.
> Some specific patch comments:
>
> + * XXX: This should be split into machdep and non-machdep parts
>
> What MD parts are in the file?
This is just a stale comment.
> The sysctl parts of the patch have a number of issues:
Sysctl are still not overhauled just because I'm not sure we want to
keep them. That is one of the points I raised in the main e-mail and
I'd really would like to hear opinions about if we should keep them
rather than having a setup userland process, etc.
I'm sorry about this, but please keep in mind that the file still
needs a lot of cleanup so some comments are a bit out of date and
other parts may not be still perfectly overhauled.
> +sysctl_force_crash(SYSCTL_HANDLER_ARGS)
>
> Does this belong in the netdump code? We already have some of these options
> in debug.kdb.*, but perhaps others should be added there as well.
For this specific case, I'd really axe it out rather.
> + /*
> + * get and fill a header mbuf, then chain data as an
> extended
> + * mbuf.
> + */
> + MGETHDR(m, M_DONTWAIT, MT_DATA);
>
> The idea of calling into the mbuf allocator in this context is just freaky,
> and may have some truly awful side effects. I suppose this is the cost of
> trying to combine code paths in the network device driver rather than have
> an independent path in the netdump case, but it's quite unfortunate and will
> significantly reduce the robustness of netdumps in the face of, for example,
> mbuf starvation.
I'm not sure in which other way we could fix that actually. Maybe a
pre-allocated pool of mbufs?
> + if (ntohs(ah->ar_hrd) != ARPHRD_ETHER &&
> + ntohs(ah->ar_hrd) != ARPHRD_IEEE802 &&
> + ntohs(ah->ar_hrd) != ARPHRD_ARCNET &&
> + ntohs(ah->ar_hrd) != ARPHRD_IEEE1394) {
> + NETDDEBUG("nd_handle_arp: unknown hardware address fmt "
> + "0x%2D)\n", (unsigned char *)&ah->ar_hrd, "");
> + return;
> + }
>
> Are you sure you don't want to just check for ETHER here?
I'd really like to hear Ryan's or Ed's idea here.
> + /* XXX: Probably should check if we're the recipient MAC address */
> + /* Done ethernet processing. Strip off the ethernet header */
>
> Yes, quite probably. What if you panic in promiscuous mode?
>
> + /*
> + * The first write (at offset 0) is the kernel dump header. Flag it
> + * for the server to treat specially. XXX: This doesn't strip out
> the
> + * footer KDH, although it shouldn't hurt anything.
> + */
>
> The footer allows us to confirm that the tail end of a dump matches the
> beginning; while probably not strictly necessary in this scenario, it's not
> a bad idea given the lack of a checksum.
So I assume you are in favor of it, right?
> + /* Default the nic to the first available interface */
> + if ((ifn = TAILQ_FIRST(&ifnet)) != NULL) do {
> + if ((ifn->if_flags & IFF_UP) == 0)
> + continue;
>
> More locking needed.
Please refer to the second patch I posted.
It should have proper locking and actually this code is just trimmed
(more locking in V_ifnet accessings, but not in this codepath).
>
> - void *if_pspare[7];
> + struct netdump_methods *if_ndumpfuncs; /* netdump virtual methods
> */
> + void *if_pspare[6];
> int if_ispare[4];
>
> In HEAD, at least, you should add your field and not use the spare. The
> spare should only be used on a merge to a KBI-stable branch (such as 8.x).
Thanks, for picking this.
> I need to ponder your changes to ifnet and to the drivers some more; I
> recognize the benefits of maximal code alignment, but I worry a lot about
> these code paths having side effects (not least, calls into memory
> allocators, which in turn can enter VM, etc). I wonder if, given that, we
> need to teach the mbuf allocator to be much more conservative if netdumping
> is active...
Sorry, which calls from drivers should get in the memory allocator?
Are you just referring to the mbuf headers allocation?
Changes to the drivers are mostly stright-forward -- they just try to
do polling in locked or unlocked mode, following DDB entering or not
(basically how other DDB-agnostic routines already do in FreeBSD for
the known locking problems we discussed several times in the past).
Thanks for your valuable feedback, looking forward to hear more.
Attilio
--
Peace can only be achieved by understanding - A. Einstein
More information about the freebsd-net
mailing list