[patch] DTrace disk IO

George Neville-Neil gnn at neville-neil.com
Fri Sep 28 20:16:25 UTC 2012


On Sep 17, 2012, at 04:00 , Matt Burke <mattblists at icritical.com> wrote:

> I've recently been trying to get a grip on measuring disk IO latency (per transaction), and have found it to be rather difficult given the asynchronous nature of the beast, and also I can't find a way of translating the bio start of transaction timestamps to anything I can use in DTrace when pulling them out.
> 

Interesting.  Have you looked at all at the I/O module I added in HEAD?

Best,
George

> So I knocked up this little patch against releng/9.1 to put a couple of DTrace probes in the right places to pick up crucial data like the now+then timestamps while they're present.
> 
> The predicate on the probe is needed to pick up the right firing - for reasons I've not been able to fathom because gstat et al give correct data, devstat_end_transaction is called multiple times for a given operation - from g_disk_done(), then g_io_deliver() - without anything useful in the bio struct (device name, number, etc). There also seem to be a lot of firings coming from the following path which I don't understand, again without anything useful in the bio:
>              kernel`devstat_end_transaction+0x13b
>              kernel`g_io_deliver+0x1b0
>              kernel`g_io_schedule_up+0xa6
>              kernel`g_up_procbody+0x5c
>              kernel`fork_exit+0x11f
>              kernel`0xffffffff80c1c3fe
> 
> Catching flushes is also proving problematic. It would seem that devstat_end_transaction_bio() is called, but the bio and devstat structs are virtually empty. bp->bio_dev, bp->bio_disk, ds->device_name, ds->device_number, ds_unit_number are all null/empty, so I know that one disk has flushed, and I know how long it took, but I can't find out which disk it was.
> 
> 
> Thoughts?
> 
> 
> 
> Index: sys/kern/subr_devstat.c
> ===================================================================
> --- sys/kern/subr_devstat.c	(revision 240481)
> +++ sys/kern/subr_devstat.c	(working copy)
> @@ -29,6 +29,7 @@
> #include <sys/cdefs.h>
> __FBSDID("$FreeBSD$");
> 
> +#include "opt_kdtrace.h"
> #include <sys/param.h>
> #include <sys/kernel.h>
> #include <sys/systm.h>
> @@ -41,9 +42,22 @@
> #include <sys/conf.h>
> #include <vm/vm.h>
> #include <vm/pmap.h>
> +#include <sys/sdt.h>
> 
> #include <machine/atomic.h>
> 
> +SDT_PROVIDER_DEFINE(devstat);
> +SDT_PROBE_DEFINE(devstat, subr_devstat, devstat_end_transaction, stat, stat);
> +SDT_PROBE_ARGTYPE(devstat, subr_devstat, devstat_end_transaction, stat, 0, "struct devstat *");
> +SDT_PROBE_ARGTYPE(devstat, subr_devstat, devstat_end_transaction, stat, 1, "uint32_t");
> +SDT_PROBE_ARGTYPE(devstat, subr_devstat, devstat_end_transaction, stat, 2, "struct bintime *");
> +SDT_PROBE_ARGTYPE(devstat, subr_devstat, devstat_end_transaction, stat, 3, "struct bintime *");
> +SDT_PROBE_DEFINE(devstat, subr_devstat, devstat_end_transaction_bio, stat, stat);
> +SDT_PROBE_ARGTYPE(devstat, subr_devstat, devstat_end_transaction_bio, stat, 0, "struct devstat *");
> +SDT_PROBE_ARGTYPE(devstat, subr_devstat, devstat_end_transaction_bio, stat, 1, "struct bio *");
> +
> +
> +
> static int devstat_num_devs;
> static long devstat_generation = 1;
> static int devstat_version = DEVSTAT_VERSION;
> @@ -312,6 +326,8 @@
> 
> 	ds->end_count++;
> 	atomic_add_rel_int(&ds->sequence0, 1);
> +
> +	SDT_PROBE(devstat, subr_devstat, devstat_end_transaction, stat, ds, bytes, now, then, 0);
> }
> 
> void
> @@ -332,6 +348,8 @@
> 	else 
> 		flg = DEVSTAT_NO_DATA;
> 
> +	SDT_PROBE(devstat, subr_devstat, devstat_end_transaction_bio, stat, ds, bp, 0, 0, 0);
> +
> 	devstat_end_transaction(ds, bp->bio_bcount - bp->bio_resid,
> 				DEVSTAT_TAG_SIMPLE, flg, NULL, &bp->bio_t0);
> }
> 
> 
> 
> 
> Sample dtrace script:
> =====================
> 
> BEGIN
> {
>        bio_cmds[1] =  "READ";
>        bio_cmds[2] =  "WRITE";
>        bio_cmds[4] =  "DELETE";
>        bio_cmds[8] =  "GETATTR";
>        bio_cmds[16] = "FLUSH";
> }
> 
> devstat::devstat_end_transaction_bio:
> {
>        self->bio = args[1];
> }
> 
> devstat::devstat_end_transaction:
> /self->bio && args[0]->device_number/
> {
>        diff_frac = args[2]->frac - args[3]->frac;
>        diff_ufrac = (diff_frac < 0) ? (args[3]->frac - args[2]->frac) : diff_frac;
>        diff = (1000000000*(diff_ufrac>>32))>>32;
> 
>        printf("%d\t%s%d\t%s\t%d\t0.%09d\n", timestamp,
>                args[0]->device_name, args[0]->unit_number,
>                bio_cmds[self->bio->bio_cmd],
>                args[1],
>                diff
>                );
> }
> 
> 
> -- 
> The information contained in this message is confidential and intended for the addressee only. If you have received this message in error, or there are any problems with its content, please contact the sender. 
> 
> iCritical is a trading name of Critical Software Ltd. Registered in England: 04909220.
> Registered Office: IC2, Keele Science Park, Keele, Staffordshire, ST5 5NH.
> 
> This message has been scanned for security threats by iCritical. www.icritical.com
> 
> _______________________________________________
> freebsd-fs at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org"



More information about the freebsd-fs mailing list