dotTrace 2016.3 Help

Optimize App Performance and Memory Traffic

WARNING! This tutorial was written for dotTrace 2016.2. Please note that some UI controls (e.g., filters) were significantly redesigned in dotTrace 2016.3. For example, all controls from the Analysis Subject filter were moved under the Events filter.

It is well known that memory traffic impacts heavily on application performance: The higher the traffic, the slower your app. The problem is not in how frequently your application allocates memory (that is almost free as far as performance goes), but in how your application collects memory that's no longer needed. The convenience of the garbage collection (GC) mechanism that does this, unfortunately, has its cost.

First, GC itself demands some CPU time. For example, one of its stages is the detection of unused objects, a complex operation involving building object reference graphs. Second, to perform the Gen0 and Gen1 GCs*, garbage collector must obtain exclusive access to a part of the managed heap. This, in turn, suspends all managed threads except the one that triggered the "blocking GC." As the user interface thread is also suspended, users may experience UI freezes at these moments.

That is why you should always try to optimize your application to reduce memory traffic and minimize the impact of GCs on app responsiveness.

In this tutorial, we will learn how to use timeline profiling to detect excessive GCs and their causes.

Sample Application

We use the same basic sample application as in the Get Started with Timeline Profiling tutorial. This application is used to reverse lines in text files, e.g. ABC => CBA.

The source code for the app is available at github.

/help/img/dotnet/2016.3/t2_working_app.png

With the Select Files button, a user chooses the text files to be processed. The Process Files button runs a separate BackgroundWorker thread (named FileProcessing) which reverses lines in the files. A label in the left-hand corner of the main window displays the progress. After the processing is finished, the label shows All files were successfully processed.

Imagine the following scenario: When testing the application, you find out that text files are processed not as fast as you expected. Moreover, in some cases you experience minor lags during file processing.

Let's use timeline profiling to analyze these performance drawbacks!

Step 1. Running the Profiler and Getting a Snapshot

  1. Open the MassFileProcessing.sln solution in Visual Studio.
  2. Run the profiler by choosing ReSharper | Profile | Run Startup Configuration Performance Profiling....
  3. In Profiling type, select Timeline.
    /help/img/dotnet/2016.3/t2_profiling_config.png
  4. Click Run. dotTrace will run our application and display a special controller window used to control the profiling process.
    /help/img/dotnet/2016.3/t1_profiling_controller.png
    Now, let's try to reproduce a performance issue in our app.
  5. Click Select Files and choose five text files that come with the application in the Text Files folder.
    /help/img/dotnet/2016.3/t1_app.png
  6. Click Process Files to start file processing.
  7. Once the processing is over, collect a timeline profiling snapshot by clicking Get Snapshot and Wait in the controller window. The snapshot will be opened in separate Performance Profiler tool window in Visual Studio.
  8. Close the application. This will also close the controller window.

Step 2. Starting Analysis

  1. In the Performance Profiler tool window, click Show Threads. This will open a list of application threads in a separate Threads tool window.
    /help/img/dotnet/2016.3/t2_performance_profiler_window.png
    You can see that dotTrace has detected several managed threads. These are the Main thread, the Garbage Collection thread that is used to perform background GC, and the FileProcessing thread* used to process files. In addition, there are two threads that do no work: the Finalizer thread and an auxiliary Thread Pool.
  2. As we are concerned with slow file processing, let's zoom into the period where the FileProcessing thread processes files. To do this, use Mouse Wheel on the Threads diagram.
    /help/img/dotnet/2016.3/t2_threads_zoomed.png
    This automatically adds the filter by the visible time interval (1950 ms). Note how this filter affects others: all values are recalculated for the visible time range. The filter now applied to the snapshot data is "Select all time intervals within the visible time range for all threads". Let's remember 2 seconds as the approximate time required for processing files.
  3. On the process overview diagram (in the top of the Threadswindow), look at the GC bar. It appears that many blocking GCs are performed during file processing. This indicates large memory traffic that, without any doubts, impacts application performance.

    Now, let's find out what's really behind this traffic. There are two ways* this can be done:

    • Identify the threads and methods that are running during the GCs. These must be the threads and methods that toggled these collections.
    • Identify the methods that allocate the major part of memory. The logic is simple: the main reason for toggling GCs is resizing a heap due to memory allocation. Thus, if a method allocates memory a lot, it triggers GCs a lot as well.

Looking ahead, we can tell that the second way is much easier and slightly more reliable. Nevertheless, for educational purposes, let's try both.

Step 3. Where Does Memory Traffic Come From? Analyzing Garbage Collections

  1. In the list of filters on the top of the Threads window (or the Performance Profiler window), select the Blocking GC value in the Blocking GC filter. The resulting filter will be "Select all time intervals within the visible time range for all shown threads where blocking GCs take place".
    /help/img/dotnet/2016.3/t2_blocking_gc_filter.png
  2. Now, let's identify the thread where the GCs happen.* For example, we can suppose that this was the Main thread.

    Select the Main thread on the Threads diagram. The resulting filter is now "Select all time intervals within the visible time range on the Main thread where blocking GCs take place".

    /help/img/dotnet/2016.3/t2_main_thread_filter.png
  3. Open and take a look at the Thread States filter. Now, it shows what the main thread did during all those GCs. Most of the time (97.3%) it was Waiting. This means that GC took place on some other thread (obviously on the FileProcessing thread) and the main thread had to wait until the GCs were finished.
    /help/img/dotnet/2016.3/t2_thread_state_filter.png
    Now, let's find out what method in the FileProcessing thread is to blame for all those GCs.
  4. Remove the filter by Main thread. To do this, uncheck the corresponding checkbox in the list of threads.
  5. Instead, select the FileProcessing thread in the list. The resulting filter is now "Select all time intervals within the visible time range for the FileProcessing thread where blocking GCs take place".
    /help/img/dotnet/2016.3/t2_threadpool_filter.png
  6. Look at the Thread State filter. It shows that 99.1% of the GC time the FileProccessing thread was Running. This confirms that this thread is responsible for these GCs.
    /help/img/dotnet/2016.3/t2_running_filter.png
  7. Select Running in Thread State.
  8. In the Performance Profiler tool window, click Top Methods to see the list of user methods with the highest own time.
    /help/img/dotnet/2016.3/t2_top_methods_filter.png
    Taking into account all applied filters, Top Methods now shows the top methods* that triggered the GCs. As you can see, StringReverser.Reverse is the method that probably generates most of the memory traffic in the application.

Step 3. Where Does Memory Traffic Come From? Analyzing Memory Allocations

Now let's try a simpler way of analyzing memory traffic. As we mentioned earlier, the idea is to identify methods that allocate the largest amounts of memory.

  1. Remove all filters by clicking the /help/img/dotnet/2016.3/t2_clear_all_filters.png button either in the Performance Profiler or Threads tool window.
  2. In the Events filter, select Memory Allocation*.
    /help/img/dotnet/2016.3/t2_memory_allocation_filter.png
  3. Look at the Threads diagram. The FileProcessing thread allocates enormous amounts of memory – 5882 MB. There is no doubt it's to blame for high memory traffic in our app.
  4. Look at Top Methods. Now, it sorts methods by the amount of allocated memory. The StringReverser.Reverse method is far behind with 5496 MB.
    /help/img/dotnet/2016.3/t2_top_methods_filter_allocations.png

Step 4. Improving the Code

Let's look at method's code and try to find out what is wrong with it.

  1. Right click the StringReverser.Reverse method in Top Methods and select Navigate to Code in the context menu.
    /help/img/dotnet/2016.3/t2_top_methods_filter_navigate_to_code.png
  2. Look at the code. It appears that this method is used to reverse lines from a text files. It takes a string on input and returns the reversed string.
    public string Reverse(string line) { char[] charArray = line.ToCharArray(); string stringResult = null; for (int i = charArray.Length; i > 0; i--) { stringResult += charArray[i - 1]; } return stringResult; }
    Apparently, the problem is in the way it reverses a string. The thing is that string is an immutable type – its contents cannot be changed after it is created. This means that each time we add a character to stringResult with the += operation, a new string is allocated in memory. It would be much more effective to employ a special StringBuilder class or handle the string as an array of char and use the Reversemethod. Let's try the second option.
  3. Correct the StringReverser.Reverse method as shown below.
    public string Reverse(string line) { char[] charArray = line.ToCharArray(); Array.Reverse(charArray); return new string(charArray); }

Step 5. Verifying the Fix

  1. Rebuild the solution and perform profiling one more time as described in Step 1.
  2. After the snapshot is opened, switch Events to Memory Allocation.
  3. Look at the Threads diagram. Our improvement works! Memory traffic has reduced from almost 6 GB to 166 MB for the FileProcessing thread.
    /help/img/dotnet/2016.3/t2_performance_profiler_window_fixed.png
  4. If you zoom into the time interval where FileProcessing processed files, you'll see that file processing has sped up significantly. Now, it takes only about 300 ms to process all files, compared to 2 s before the fix.
Last modified: 3 April 2017