Basics. Profiling Types

To successfully work with dotTrace, you need a good understanding of profiling types. A profiling type defines what application data and to which extent dotTrace will collect during a profiling session. When configuring a profiling session, you can choose from the following profiling types: sampling, tracing, line-by-line, and timeline.

Sampling

Short overview

dotTrace periodically takes samples of call stack data.

Pros:

Accurate measurement of function execution time
Small snapshot
Very lightweight: Low memory usage; time required to run an application under profiler does not change significantly.

Cons:

Number of calls for a function is not measured
Not all call stacks and functions are captured
No info about ETW events, TPL events, native functions.

When to use:

Evaluation of overall application performance and finding most obvious performance bottlenecks, i.e., the slowest functions.

Details

Sampling is a process or technique of taking samples. A sample is a set of call stacks taken during a profiling session. That leads us to two obvious questions: (1) how long is the pause between two given samples and (2) how much time does it take to get a sample. The answers to these questions can help us estimate the accuracy of the sampling method.

dotTrace captures call stacks of all existing threads within the process, sequentially without pauses. It also takes into account threads that are locked or sleeping. The time required for capturing a call stack cannot be precisely determined because it depends on the stack depth and the number of native and managed stack frames. Therefore, the time required to take a sample necessarily varies from sample to sample and depends on the number of currently running threads.

dotTrace makes pauses between taking samples. The pause is the time gone by after dotTrace stops processing thread activities for the previous sample and before it starts processing again for the next sample. The length of each pause is a random value between 5 and 11 milliseconds. Random values help decrease the probability of having gaps in call stacks. During such pauses application continues running normally.

One consequence of this is that, since the time between samples is at least 5 milliseconds, methods that run quickly enough may not be caught and shown in a snapshot. However, this does not prevent dotTrace from getting the correct time data. Two situations are possible. If a method is fast and is called many times, it will be caught and shown in a snapshot. If a method is fast, but is called rarely, then it may be omitted in a snapshot, but its time will be included in total time of its parent. In other words, if the total time of a method is significant, it will be counted.

All in all, this profiling method provides time data that helps reveal problem call stacks, but it fails to provide numbers of function calls. Still this method is the fastest and can be a solid first step to localize performance problems.

Timeline

Short overview

dotTrace records application events and writes data about how application state changed. Based on ETW events. Oversimplified – as a result you get data similar to the sampling profiling but with all events shown on the timeline.

Pros:

Accurate measurement of function execution time. Function calls are shown on the timeline
Lightweight: time required to run an application under profiler does not change significantly
Provides ETW event data, e.g., data about memory allocation, garbage collections, I/O operations, and so on
Can provide TPL data: await and continuation blocks for async functions
Can provide data on native functions in the call tree (requires symbol files)

Cons:

Number of calls for a function is not measured
Not all call stacks and functions are captured
Snapshots might be quite large

When to use:

Evaluation of overall application performance and finding most obvious performance bottlenecks, i.e., the slowest functions
Identifying the cause of user interface freezes
Identifying excessive garbage collections and I/O operations
Determining issues in multithreaded applications like irregular work distribution, lock contention, serialized execution, and other

Details

The timeline profiling type is very close to the sampling profiling. Both collect call stack data and allow determining performance bottlenecks. The main difference is that during the timeline profiling, dotTrace doesn't collect samples by itself but gets application data from the Event Tracing for Windows (ETW).

The main benefit of the timeline profiling is that it allows you to see not only what calls were made by your application but also how these calls were distributed in time. This can be extremely helpful when analyzing behavior of multi-threaded applications where the chronological order of events does matter: for example, in determining sync delays, the cause of UI freezes, and so on.

Another benefit is that timeline profiling collects a wider range of data. In addition to call stack data, it records memory allocation, garbage collection, and I/O events.

Tracing

Short overview

CLR notifies dotTrace each time a particular function is entered and when it is left. dotTrace measures time between these two notifications.

Pros:

Accurate measurement of number of function calls
All call stacks and functions are captured, except inlined functions

Cons:

Inaccurate measurement of function execution time due to dependencies between time distortions and number of function calls
Heavyweight: More time required to run an application under profiler; snapshots may be quite large; higher memory usage
No info about ETW events, TPL events, native functions

When to use:

Evaluation of algorithm complexity, e.g., when performance issues are related to frequent function calls.

Details

Unlike sampling, tracing revolves around a function, or more precisely, around function entry and exit.

dotTrace receives notifications from CLR when a function is entered and when it is left, even if it is left because of an exception. The time between these two notifications is considered the execution time of the function.

On the one hand, you get all functions that were not inlined by the JIT compiler and were executed at that point in time in a snapshot with their detailed timing data. On the other hand, the JIT compiler generates a specific prologue and epilogue for each function, which takes some extra time for CLR to execute such pieces of code. dotTrace does not count and subtract this time from the total function time. As a result, the total time might be distorted. The degree of distortion depends on the number of function calls. The dependency is linear. The more times a function is called, the bigger the distortion becomes. And the less time a function executes, the less accurate its total time can be. For example, you have a simple function Inc() { _value++; }, but it is called millions of times. Of course, it can be optimized and will take a little time anyway. However, if it runs under dotTrace and the tracing method has been chosen, each call of this function adds some overhead which can be much more than the actual function execution time. As a result, the total time can be more than it could be after using the sampling method or without using the profiler.

CLR may cause another overhead. CLR provides different kinds of optimizations. Depending on the CLR version and the chosen profiling method, some optimizations may be turned off or done in a different way, so the results may differ.

On the whole, you always get the correct number of function calls, but the total function time may be inaccurate. Because tracing takes more time than sampling and may also slow down your application significantly, it is better to profile individual parts of an application or specific scenarios.

Line-by-line

Short overview

dotTrace measures execution time of each code line.

Pros:

Possibility to study a function in detail

Cons:

Inaccurate measurement of function execution time
Extremely heavyweight: More time required to run an application under profiler compared to tracing; large snapshots; higher memory usage compared to tracing.
No info about ETW events, TPL events, native functions.
Requires PDB files

When to use:

Advanced cases, for example, when you want to analyze performance only inside a particular function.

Details

This method is similar to tracing, but here the target of investigation is a statement, not a function. In order to profile a function line by line, dotTrace requires PDB files. If you do not have the corresponding PDB files, the method works as tracing.

dotTrace measures the time required to execute a statement and how many times it is executed. As you can likely imagine, this method is even slower than tracing because dotTrace performs time-counting work for each statement.

Line-by-line is an effective method after you have narrowed the scope of investigation and want to focus on certain functions.

11 February 2024