Main Page -> QuickSheets -> DTrace QuickSheet |
DTrace QuickSheet
Version 1.2.0
Note: Due to the HTML version and formatting of this QuickSheet, it may be necessary to increase your browser window size to prevent line wrapping in the code samples.
Overview The dtrace utility is a highly customizable tool that gives powerful insight into internal operations of a running system. dtrace utilizes the D language to define what information is desired, how it is processed, and how it will be formatted. It is available on a number of OS distributions (examples here are from OS-X (10.5) and Solaris 10). The D language is a description of what events should be watched (called probes) and what information should be pulled, managed, and displayed (called clauses). It is not a procedural language in that it does not have functions or have a predictable program flow but instead is defined by probes that fire based upon external events that are specified through the D language. While test clauses (predicates) and code blocks (actions) maintain some similarities to C, the language is more related to defining conditions where probes will fire and what should be done as each probe fires. An example of this (out of order execution) is that probe blocks can be placed within the D program in virtually any order. Unlike a C program or a shell script there is no pre-defined flow of the D language. Instead the "flow" of the application is driven by external applications that trigger probes that dtrace is watching. (Note: One or more probes with the same name will fire in the order they appear in the D script. This can be used to introduce a "flow" and conditional execution of some instructions in a dtrace script.) Syntax Probe clause (block) format
Probe description The probe description is a definition of an "event" that will cause the clause to "fire". A probe description can define something like a system call (syscall), a timer (profile), or other performance related event. Names of each section of the probe description can be omitted and are matched from the left, so the following are equivalent: dtrace:::BEGIN :::BEGIN BEGIN Multiple probes can be specified by explicitly listing them on multiple lines separated by commas, leaving sections blank (so they will match all), or using a wild-card pattern match much like as seen on the shell command line with file name globbing. |
Predicate The predicate is an optional, additional test that can be placed upon the probe to limit the execution of the actions section to a preferred subset of instances. A predicate in D will look familiar to anyone who has ever written a shell test or a C expression. D is more like shell in that it allows comparisons of strings using an equality test ("==") and more like C in that it allows the use of concepts like the increment operator ("++"). Below is an example of a predicate that increments a counter and exits after 60 seconds (the firing probe in this case is a 1 second timer). profile:::tick-1sec /i++ >= 59/ { exit(0); } Actions Actions are commands that should be executed when the probe fires and the predicate (if it exists) evaluates to true. This block of "code" is encased in { } brackets and generally follows C syntax. Hello World! Hello World! in dtrace serves to introduce a few D concepts but is a bit misleading as it suggests that D is an imperative language. The point here is that this example should be considered for its syntax and layout but not its purpose. In this example a single D probe clause, that resembles a block of C code, contains some familiar functions. The key difference is that the probe clause is event driven, and is not part of a sequence of instructions that flow from start to finish. We defeat this behavior by putting all our instructions into the BEGIN clause and then calling exit() from the BEGIN clause. This has the effect of collapsing the probe-clause into a single instruction / code block. In the Hello World example we have defined a single probe that fires upon start (BEGIN), displays some text (printf()) and then exits. The BEGIN probe is a pseudo-probe in that it is not driven by externally watched processes but is provided by dtrace itself and always fires when the script is first run. #!/usr/sbin/dtrace -s /* Don't print all the garbage Dtrace prints */ #pragma D option quiet /* Upon starting... */ dtrace:::BEGIN { /* ...formatted print of familiar string... */ printf("Hell of a world!\n"); /* ...and exit(); */ exit(0); } The above D language sample can be saved to a file (HelloWorld.d), set executable, and run on a dtrace enabled system. Like most "Hello World!" examples it is not quite useful in itself but only an introduction to the language. |
Providers Providers are "modules" or "listeners" that provide specific types of probes that dtrace can be told to watch for. As seen in the Hello World! example the dtrace provider provides a "pseudo-probe" that will fire upon the start of any dtrace session. This probe is frequently used to initialize variables and provide any setup required for the D script.
|
Provider Arguments Arguments to probes can take two different formats. The args[x] format is typed based upon the individual probe while argx is always a 64-bit int. The following examples are how each is used. It should be noted that these are not arguments that you provide like a standard API but are provided to you by the probe as additional insight into the event the probe signals such as syscall / function arguments, return values, or system statistics. This is an example of a simple io probe that accesses all three structs passed in the io probe args[] parameters. (Note: The ?: clause is wrapped for space. See the "Code Snippets" section below for a variation and additional explanation of this example.) args[0] is bufinfo_t struct args[1] is devinfo_t struct args[2] is fileinfo_t struct io:::done { printf("%d bytes : %s\n", args[0]->b_bcount, (args[2]->fi_pathname == "<none>") ? args[1]->dev_pathname : args[2]->fi_pathname); } The following example includes two probes that uses the argX arguments. In the syscall::read:entry probe arg0, arg1, and arg2 are the three arguments to the read() system call. We are interested in arg1 that is the pointer to the buffer we will read into. Here we copyin() 1 byte from the application address space to the dtrace address space. (Pointers are relative to the application address space and are not valid locally unless explicitly translated with a function like copyin().) In the syscall::read:return probe we check arg0 for a return value of 1 from read() (meaning 1 byte was read, an expected value in this case). An expanded version of this script is included in the "Code Snippets" section later in this page. syscall::read:entry /execname == "passwd"/ { got = *(char *)copyin(arg1, 1); } syscall::read:return /execname == "passwd" && arg0 == 1/ { printf("%c", got); } |
Variables The D language supports two basic variable types, scalars (integers, pointers) or associative arrays (a dynamic, keyed array). Scalars can be declared as explicit types, but will assume type based upon the initial use. (Unlike C, variables do not need to be declared prior to use in D.) Global variables can be explicitly declared and typed (int, float, double, etc...) outside of probe clauses (blocks) typically at the top of the D file. Unlike C, D global variables cannot be (explicitly) initialized to a value when they are declared (they are initialized to 0 by default). Variables can be initialized to a specific value in a BEGIN block. Each probe provides a number of variables specific to the probe such as pid and execname. These variables are local to the probe clause (block) and will be set accordingly by what kind of probe and what fired it. (Reserved) Probe variables The following variables are (an incomplete list of) pre-defined variables that are probe (firing) specific and available within the probe predicate or actions (code) block.
The key difference between arg0 and args[0] is that arg0 is always the same type (64 bit int) while args[0] typically refers to a struct that will vary based upon the firing probe. The 64 bit int of arg0 may be an integer or a pointer depending upon the probe that fires. The definitions of the providers, and specific probes will define which of these to use, and what data to expect inside. DTrace variables (macros): Macros are variables that are relative to the dtrace process (script) and not the probe or other variable values.
macros can be explicitly cast to a string by preceding it with double "$" characters. For example if $1 is the value 123: $1 will be an integer 123 $$1 will be the string "123" Variable type examples: In the following (nonsensical) code snippet: pid is the pid of the process firing the read probe $pid is a macro evaluating to the pid of the dtrace process $1 is a macro containing the first argument to the dtrace script. As $1 it is treated as a number as $$1 it is referenced as a string. execname is the name of the file of the executable firing the read probe. re and rr are dtrace global variables syscall::read:entry /pid != $pid/ { /* this read() entry is NOT from dtrace */ re++; } syscall::read:return /execname == $$1/ { /* this read() return is a specific process */ rr++; } Arrays Arrays in the D language are not (necessarily) keyed on indexes like they are in C, but can be keyed on multiple different data types such as numbers, strings, or a mixture of each. Another difference from C is that arrays in D are dynamic and do not explicitly need to be allocated to a particular size. |
Aggregations Aggregations allow for collection and management of large amounts of data. Aggregating functions will coalesce this data in a manageable set. Aggregations are printed by default on exit or can be explicitly printed using the printa() function. Collect how many times write was called by process name syscall::write:entry { @NumWrtByExe[execname] = count(); } This is the total of all writes by process name and pid syscall::write:return { @TotalWrtByPIDExe[pid, execname] = sum(arg0); } This is the average write size by process name syscall::write:return { @AvgWrtByExe[execname] = avg(arg0); } Get aggregated numbers for a single process syscall::write:return /pid == $1/ { @minw = min(arg0); @maxw = max(arg0); @avgw = avg(arg0); } END { printa("Writes: min: %@d max: %@d avg: %@d\n", @minw, @maxw, @avgw); } Get a distribution of write sizes for all runs of ls syscall::write:return /execname == "ls"/ { @DistWrite = quantize(arg0); } Print a distribution of write calls over time for a PID BEGIN { beginsec = timestamp / 1000000000; i = 0; } syscall::write:entry /pid == $1/ { nowsec = timestamp / 1000000000; @TimeDistWrite = lquantize(nowsec - beginsec, 1, 60, 1); } profile:::tick-1sec /i++ >= 59/ { exit(0); } Thread and clause local variables Thread local variables are variables that are local for the firing probe. In the following example, the thread local variable insures that we are always referring to the same read() call instead of another process' entry: syscall::read:entry { self->stime = timestamp; } syscall::read:return /self->stime != 0/ { printf("%s read() %d nsecs\n", execname, timestamp - self->stime); } Clause local variables are local to a clause (block) and are denoted with the this-> prefix. Application space pointers Pointers to application data structures are local to the application being watched and not dtrace. So attempts to de-reference a pointer (access the data) will refer to the application offset and not the dtrace offset. It is necessary to copy the item into the dtrace environment so it can be utilized. In the following example the execname variable and arg0 both are (effectively) strings. execname is a string data type local to dtrace while arg0 is a pointer to an array of characters used in the open() call. The pointer arg0 is the pointer for the application and is invalid in dtrace. For this reason we use copyinstr(arg0) to convert it to a local string variable as well. syscall::open:entry { printf("%s open() by %s\n", copyinstr(arg0), execname); } Kernel variables Dtrace can reference kernel variables by using a ` char prior to var trace(`kmem_flags); Constants Constants are declared in D using the "inline" keyword. The key difference between inline variables and a C #define is that constants in D have type. The following code sample is an example of a constant declaration and use in D. inline int MAX_VALUE = 10; BEGIN { myvalue = MAX_VALUE; } |
dtrace command line List all probes dtrace -l Run quiet dtrace -q Print Version, exit dtrace -V Print all probes provided by the syscall provider. dtrace -l -P syscall Print all entry probes provided by the syscall provider (1/2 of previous) dtrace -l -n syscall:::entry Print just the syscall names from the previous list dtrace -l -n syscall:::entry | awk '{ print $3 }' Create a simple aggregation from the command line. (Note: The single @ is an un-named / default aggregation. When Ctrl-C is used to exit, this aggregation will be traced (printed).) dtrace -n 'syscall:::entry { @ = count(); }' Trace all open()s using the ID of the probe. (Note: probe ID will vary this is from OS X version. The default action (when none is explicitly specified) is to trace the syscall.) dtrace -i 17602 Count all open()s using the ID of the probe. (Note: probe ID will vary this is from OS X version. The @ aggregation will be printed on exit by default.) dtrace -i '17602 { @ = count(); }' Scripting DTrace scripts can be written like "normal" Unix scripts in that they can be started with a shebang (#!) and the path to the dtrace interpreter on the first line. This makes the script a "pure" D script. The benefit of this is simplicity but this comes at the expense of some flexibility such as parameter checking and extra functionality. One alternative to this is to create a shell wrapper that includes D code that is (potentially modified) and passed to the dtrace binary. In the following (shell script) example D code is generated on the fly. The ${PID} variable in the following example will be substituted with a literal value when the D script is written to disk and run. In this (incomplete) shell script the ${PID} variable can be generated and validated by a shell function or system tool and then inserted into the D script as a string literal rather than a parameter. This also allows the checking for such items as "--help" and "-h" on the script command line. The end result is the ability for a D script to respond to user requests and perform error checking more like what users expect from similar utilities. This allows you to utilize D as a method to create standardized tools with a more user friendly experience and output. PID=`getmypid` cat > /tmp/dtracetmp.$$ <<EOF #!/usr/sbin/dtrace -s syscall:::entry /pid == ${PID}/ { @[probefunc] = count(); } EOF dtrace -s /tmp/dtracetmp.$$ Options Options can be declared in a dtrace script using the #pragma declaration or by passing as an argument to the interpreter (using the #!/usr/sbin/dtrace method). Most scripts here, specifically those with explicit printf() statements assume the following "quiet" option: #pragma D option quiet Many examples elsewhere assume that the quiet option is not enabled and will produce significantly different results. One key example is an empty action block. With the quiet option, dtrace will print nothing from an empty block as nothing is explicitly told to print. Without the quiet option, an empty block will print the name and ID of the firing probe as well as the CPU it fired on. Scripts calling destructive functions (those capable of modifying the system or calling other binaries) need to use the "destructive" option: #pragma D option destructive Comments Comments in D follow the same rules as C. The following code example is a comment (and how not to comment). /* This is a valid D comment */ // This will cause a compiler error in D. Portability I have successfully run and tested D scripts between Solaris and OSX. This does not mean that they are 100% compatible as probes differ between the two platforms. D scripts are not compatible with ProbeVue (on AIX). |
DTrace functions
Destructive calls require the destructive option to be set. One method of doing this is using the #pragma D option destructive pragma. if-then DTrace allows for a simple if-then decision using the ? : syntax from the C language. This does not allow advanced flow control but can be used to conditionally set a value. For example: mystring = (myvalue > 0) ? "Some" : "None" ; Simple flow control can be achieved using multiple probes (of the same probe identifier) with different predicates / actions. The probes will be processed in the order they appear in the DTrace script. (An example is available in the Code Snippets section.) |
Code snippets Note: Code may wrap in the two column format. It may be necessary to resize the browser to see code formatted correctly. 60 second count down timer profile:::tick-1sec /i++ >= 60/ { exit(0); } Print out kernel variables (and exit) BEGIN { printf("rlim_fd_cur = %d\n", `rlim_fd_cur); printf("rlim_fd_max = %d\n", `rlim_fd_max); exit(0); } Timestamp beginning and end of dtrace session BEGIN { printf("dtrace started: %Y\n", walltimestamp); } END { printf("dtrace ended: %Y\n", walltimestamp); } List every exec()ed process proc:genunix::exec { printf("%s(%d) by %s(%d)\n", basename(args[0]), pid, execname, ppid); } Run a command using the system() call Note: This simply calls kill on itself. #pragma D option destructive BEGIN { system("kill %d", pid); } Print syscall stats for a PID using vmstat-like timing parameters Note: defaultargs option is used. This sets all unassigned parameters ($n macros) to 0 or "". The full text of the dtrace script is included here to show the option usage. #!/usr/sbin/dtrace -s #pragma D option quiet #pragma D option defaultargs /* Counts all the syscalls (by PID) iteratively $0 <PID> [delay] [count] */ dtrace:::BEGIN { dlymax = ($2 > 0) ? $2 : 1; cntmax = ($3 > 0) ? $3 : -1; dlycnt = 0; cntcnt = 0; } syscall:::entry /pid == $1/ { @syscalls[probefunc] = count(); } profile:::tick-1sec /dlycnt++ == dlymax/ { printa(@syscalls); clear(@syscalls); dlycnt = 0; cntcnt++; } profile:::tick-1sec /cntcnt == cntmax/ { exit(0); } |
Place uptime-like display in output (Here put in a BEGIN probe) BEGIN { load1_i = `hp_avenrun[0] / 65536; load1_f = ((`hp_avenrun[0] % 65536) * 100) / 65536; load5_i = `hp_avenrun[1] / 65536; load5_f = ((`hp_avenrun[1] % 65536) * 100) / 65536; load15_i = `hp_avenrun[2] / 65536; load15_f = ((`hp_avenrun[2] % 65536) * 100) / 65536; printf("load average: %d.%02d, %d.%02d, %d.%02d\n", load1_i, load1_f, load5_i, load5_f, load15_i, load15_f); } Parse and print bitwise flags dtrace:::BEGIN { flags = "DESPRWA"; } io:::done { flags[0] = args[0]->b_flags & B_DONE ? 'D' : '-'; flags[1] = args[0]->b_flags & B_ERROR ? 'E' : '-'; flags[2] = args[0]->b_flags & B_PAGEIO ? 'S' : '-'; flags[3] = args[0]->b_flags & B_PHYS ? 'P' : '-'; flags[4] = args[0]->b_flags & B_READ ? 'R' : '-'; flags[5] = args[0]->b_flags & B_WRITE ? 'W' : '-'; flags[6] = args[0]->b_flags & B_ASYNC ? 'A' : '-'; printf("%d bytes %s %s\n", args[0]->b_bcount, flags, (args[2]->fi_pathname == "<none>") ? args[1]->dev_pathname : args[2]->fi_pathname); } Snoop on the passwd command to capture the passwd typed #!/usr/sbin/dtrace -s #pragma D option quiet char got; BEGIN { printf("Watching for passwd processes.\n"); } syscall::read:entry /execname == "passwd"/ { got = *(char *)copyin(arg1, 1); } syscall::read:return /execname == "passwd" && arg0 == 1/ { printf("%c", got); } syscall::rexit: /execname == "passwd"/ { printf("\n"); } (Lame) Example of introducing flow control into a DTrace script. If the first argument is "print" then the second argument will be printed in the output. The point here is to demonstrate that the probes will fire in order and some can be conditional while others can always fire. (Note; The "problem" with this script is that $$1 or $$2 may not be defined and will throw an error. $$1 means that $1 is to be evaluated as a string.) An example of calling this script follows the code snippet. #!/usr/sbin/dtrace -s #pragma D option quiet #pragma D option defaultargs BEGIN { printf("I am starting"); } BEGIN /$$1 == "print"/ { printf(" %s", $$2); } BEGIN { printf("."); exit(0); } Here is an example of the script running: # ./flow.d I am starting. # ./flow.d print "to like dtrace" I am starting to like dtrace. |
This is by no means a comprehensive comparison of the D (Sun/DTrace) and Vue (IBM/ProbeVue) languages, but only a quick highlight of a few of the most obvious differences. Additional discussion on the subject can be found in the Introduction to Dynamic Tracing white paper. Providers The Vue language has only four providers (syscalls, UFT, probevue, and an interval timer). D has considerably more providers. In this respect, D appears to be oriented as a fundamental provider of any and all performance statistics for the Solaris operating system while Vue is best viewed as one tool amongst many existing trace tools in AIX. The Vue language simply does not appear to have this vision and scope of usage, although an equivalent set of providers is on the Vue development roadmap. Aggregations Vue uses a list data type that is used as a parameter for the aggregating functions. D uses an aggregate data type that can be passed to quantize() or lquantize() for some meaningful interpretation of the data. The D aggregate data type seems considerably better planned than the list type found in Vue. Furthermore I prefer to reset the aggregation every few seconds as I want to know what the latest, and not cumulative, per-second numbers are. The Vue language does not have an equivalent function to the D clear() or trunc() functions. As a result I was forced to calculate my own running average using primitive data types even though the min(), max(), avg(), count(), and sum() functions existed in Vue. It is my opinion that the Vue list data type provides no advantage over just calculating the values on your own. It seems to be a bit frustrating that you cannot reset your list in Vue. The aggregations provided on a list will always include all data since the script started but this is even more dangerous when you consider how large the list can potentially grow! Vue allows for the initialization of a list only in the BEGIN block. Once data has been append()ed to a list, it cannot be removed. After my first use of the list data type in Vue, I immediately converted my Vue script to calculate min(), max(), count(), and a running average without the use of list(). |
Floating Point Math Neither D nor Vue allow for floating point math. D (or more specifically, dtrace) seems to let you know when floating point math will not work, Vue simply does integer math and assigns it to a float. Examples exist on the ProbeVue QuickSheet for creating floating point "like" output with integers (when calculating percentages). Flow Control D does not allow for explicit flow control in an action block. Flow control can be created in D by using the predicates of multiple identical probes. Each probe, predicate, and action block then becomes a conditional statement and action block. (An example of this is found in the Samples section in this document.) I think of this much like the flow-through functionality of a C case statement. Vue allows for explicit if-then-else flow control in an action block. This means that a Vue probe can have a predicate (test) like D, but then have additional "if" conditionals within the action block. As expected, these conditionals can be nested. Function Prototypes The Vue language requires that function probes (syscall, UFT) include a function prototype if you wish to retrieve the parameters or return values. These can be inserted at the top of the Vue script or included from a command line option on the probevue command line. There is no convenient method to #include function prototypes in Vue scripts at this time. D allows for C macros (pre-processor directives) to be included and leveraged. Vue will not process C pre-processor directives, so it is required that all .h include files be scrubbed of these items to insure that they are read correctly. By convention, these Vue header files are .i files. Other annoyances I was unable to glob (*) the function name in the syscall provider in Vue (like I can in D). (This has the tendency to make D work much like truss.) This is because Vue only supports a subset of all system calls. The closest functionality Vue offers is the ability (like D) to have multiple probe points for a single action block. Additional information can be found in the ProbeVue section and the ProbeVue QuickSheet. |
Release (specific) Notes This QuickSheet was developed differently in that it was written in HTML rather than the PDF format that I typically use. This has several notable differences from the other (PDF) format. First, it is release early, release often where the PDF document was typically well worked over for formatting issues prior to publishing. The HTML version is a bit more fluid as its primary target is not print (like the PDF versions). The content is complete but is subject to change more so than the PDF versions. Second, the formatting of HTML is not quite as controlled as with the LaTeX layout so your code viewing may vary by the browser type and size that you are using. I have used multiple columns in the PDF versions with much success. Finally, the size of a two sheet PDF limits me to the most important concepts only. With HTML I have allowed this document to swell to around 10 pages (printed). This makes the document a bit less "Quick". |
QuickSheet Notes This content is free to distribute as long as credit to the author and tablespace.net is retained with the distribution. Every effort has been made to insure that the content is as accurate as possible, but this is no guarantee as to the suitability or usability for any purpose. Carefully research and consider any action you inflict upon your command line. William Favorite <wfavorite@tablespace.net> http://www.tablespace.net Additional Info Sun DTrace Wiki |