RFC: Changes in DTrace to allow for distributed operation

Tue Jan 3 18:01:48 UTC 2017

Hello,

> I think some concrete examples of what you'd like to be able to
> accomplish with this would help a lot. Is the goal here to be able to
> trace multiple independent systems at once and somehow aggregate their
> trace records? That's what "distributed tracing" suggests to me, but
> from reading your email it seems as though you're primarily interested
> in some form of remote tracing whereby one could execute something like
> 
> # dtrace -n 'fbt$guest::kern_ioctl:entry'

The ultimate goal is indeed to have multiple DTrace instances cooperate and
trace multiple independent systems at once. With the proposed implementation,
remote tracing would implicitly be implemented, as one could easily attach to a
remote machine that one has access to and trace it.

The reasoning behind doing this in DTrace itself, as opposed to a daemon, is
that global state could easily be kept in the scripts then. We could have all
types of variables that we do now and use them. In addition to that,
instance-local variables could be added and allow for a concise way to represent
instance-wise aggregation.

One could create scripts that allow for certain probes to be fired only if a
certain condition is satisfied that happened on another instance. This could aid
in debugging interaction between certain software, such as a web server and a
database on two different machines among other things, while giving one full
disclosure of the events taking place in both instances.

The scripts would be written in the following way:

# dtrace -n 'instance:provider:module:function:name'

for a number of reasons:

(1) It allows for tracing of multiple providers in an arbitrary number of
different OS instances

(2) With these semantics, even userland probes can be traced, for example, once
could write:

# dtrace -n 'vm-[a-zA-Z0-9]:pid$target:libc:memcpy:entry'

resulting in each of the userland probes in each of the instances(not
necessarily virtual machines) that match glob being installed and reporting back
to the host.

> on a hypervisor and get records (in real time?) from a guest. Assuming
> I'm not completely off-base, this is a cool idea, but I think your
> objective needs to be more clearly defined before it's possible to
> evaluate the merits of different designs, especially when you're
> proposing adding new concepts to the core DTrace code.

In the case of a hypervisor, this approach allows for real time reporting of
events using the hypercall ABI that I've implemented for that purpose[1]. This
would also allow for destructive actions to be implemented in the case of
virtual machines. This would prove to be more problematic for remote machines,
as we don't want to wait for host's response(which could take a very long time
over the network) in order to do something.

In such a case, the host could during initialization ask the guest DTrace to
perform a certain action once a probe fires. This of course requires one to
trust that DTrace instance to do the right thing.

Some of these changes already exist and are being implemented in my GitHub
repository[2]. I'm currently working on a PoC that allows for some of the
simpler things to operate, which would allow for benchmarks and stress testing
of the core concept.

I hope that's cleared up some of the vagueness regarding the problem I am trying
to solve.

[1] https://reviews.freebsd.org/D8100
[2] https://github.com/dstolfa/freebsd/tree/dtrace-vm

-- 
Best regards,
Domagoj Stolfa
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-dtrace/attachments/20170103/b950910d/attachment.sig>