CLion 2021.1 Help

Profiler

With CLion's CPU profiler integration, you can analyze the metrics of performance collected for your application (both kernel and user's code). The profiler is available on Linux and macOS, and the implementation is based on the Perf and DTrace tools respectfully.

Perf and DTrace use sampling at a fixed rate to interrupt the application and collect program counter and stack traces, which are then translated into profiling reports. Such reports can be long and difficult to analyze, so CLion provides visualization for the profiler's output data.

Prerequisites

  1. Install the Perf tool for your particular kernel release.

    Use uname -r to find out the exact version, and then install the corresponding linux-tools package. For example:

    $ uname -r 4.15.0-36-generic $ sudo apt-get install linux-tools-4.15.0-36-generic

  2. Adjust kernel options

    • perf_event_paranoid- controls the use of the performance events data by non-root users.

      Set the value to be less than 2 to let the profiler collect performance information without root privileges:

      sudo sh -c 'echo 1 >/proc/sys/kernel/perf_event_paranoid'

      You can find the description of possible values in the kernel documentation. Usually, 1 or 0 is enough for the profiler to run and collect data. However, if you get empty profiling results (the No profiler data message), your system setup might require -1- the least secure option, which allows using all performance events by all users.

    • kptr_restrict- sets restrictions on exposing kernel addresses.

      To have kernel symbols properly resolved, disable the protection offered by kptr_restrict by setting its value to 0:

      sudo sh -c 'echo 0 >/proc/sys/kernel/kptr_restrict'

    By default, these changes affect your current OS session only. To keep the settings across system reboots, run:

    sudo sh -c 'echo kernel.perf_event_paranoid=1 >> /etc/sysctl.d/99-perf.conf' sudo sh -c 'echo kernel.kptr_restrict=0 >> /etc/sysctl.d/99-perf.conf' sudo sh -c 'sysctl --system'

    Upon the first launch of the profiler, CLion checks whether kernel variables are already set up and suggests the necessary changes:

    adjust linux kernel variables for the profiler
  • The only required tool is DTrace, which is most likely installed by default on your macOS. Check it by calling the dtrace command in the terminal.

CLion automatically detects the Perf or DTrace executable in case its location is included in the PATH environment variable. You can also set the path manually in Settings / Preferences | Build, Execution, Deployment | Dynamic Analysis Tools | Profiler.

Run profiling

Prepare the build

  1. The profiler relies on debug information to provide meaningful output data and navigation, so Debug configurations are preferable to be used for profiling.

  2. Compiler optimizations, such as inlining, can influence profiling results. To make sure none of the frames are missing due to inlining, set the optimization level to -O0 in your CMakeLists.txt:

    set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -O0") set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -O0")

    Also, compilers can use the frame pointer register as a general-purpose register for optimization purposes, which may lead to broken stack traces. On Linux, the profiler implementation does not depend on this, but on macOS, we recommend setting the -fno-omit-frame-pointer compilation flag for gcc and both -fno-omit-frame-pointer and -mno-omit-leaf-frame-pointer for clang.

Configure sampling frequency

  • The default sampling rate value is rather high, which might require a lot of disk space for long-running programs.

    If required, you can change the profiler's sampling frequency in Settings / Preferences | Build, Execution, Deployment | Dynamic Analysis Tools | Profiler.

    profiler settings

    When choosing a sampling rate, mind other timer-driven activities that may be scheduled in your system. As an example, the default value is set to 99 Hertz instead of 100 Hertz to avoid lockstep sampling with other possible activity with a sampling frequency of 100Hz.

Run the profiler

  1. Use one of the following options:

    • Select a run configuration from the list on the toolbar and click Icons actions profile or call Run | Profile from the main menu:

      Profiler button on the toolbar

    • Alternatively, select Icons actions profileProfile from the left gutter menu of a program entry point or a function that you want to profile:

      run gutter menu with the profiler option

    On macOS, you can also attach the profiler to a running process (call Run | Attach Profiler to Process ):

    attach profiler to a process

  2. When you launch profiling, CLion notifies you if the profiler is attached successfully.

    After the application stops, and the profiling data is ready, CLion shows a balloon with a link to the CPU Profiler tool window (also accessible from the main menu View | Tool Windows | CPU Profiler ):

    profiling finished balloon

    To stop the profiler prior to stopping the application, use the Artwork studio icons profiler toolbar stop session button in the Profiler tool window.

Read the profiling report

In the CPU Profiler tool window, you can see the collected data presented in three tabs: Flame Graph, Call Tree, and Method List. The left-hand part lists the application threads and All threads merged. On Linux, CLion shows meaningful thread names if they were set in the program, and on macOS thread names are shown as id-s.

Profiler tool window overview

Navigate the report

The Profiler tool window allows you to jump between the tabs while staying focused on a specific method.

Right-click the necessary method and select another view in which you want to open it:

  • Locate the selected method in another tab (for example, Focus on method in Methods List for a Flame Graph block).

    context menu for tab elements

  • Navigate to the source code (Jump to Source ).
  • Copy frame information to clipboard; only the frame name (Copy Frame) or the sequence of frame names from the stack bottom up to the selected frame (Copy Stack up to Frame ).

Export profiling results

  1. On the left frame of the Profiler tool window (View | Tool Windows | Profiler ), click the Export button.

  2. In the dialog that opens, name the file, specify the folder in which you want to save it, and click Save.

Flame Graph

Raw profiling data collected by Perf or DTrace is a call tree summary. Flame Graphs visualize it as a collection of stack traces: the rectangles stand for frames of the call stack, ordered by width.

Each block represents a function in the stack (a stack frame). The width of each block corresponds to the method’s CPU time used (or the allocation size, in case of allocation profiling). On the Y-axis, there is a stack depth going from bottom up. The X-axis shows the stack profile sorted from the most resource-consuming functions to the least consuming ones.

When reading the flame graph, focus on the widest blocks. These blocks are the functions most presented in the profile. You can start from the bottom and move up, following the code flow from parent to child methods, or use the opposite direction to explore the top blocks that show the functions running directly on the CPU.

Show details in tooltips

  • Hover the mouse pointer over a block to display a tooltip:

    block details in the flame graph

    The tooltips show the fully qualified method name, the percentage of the parent sample time, and the percentage of total sample time.

Zoom the graph

  • Use the the Zoom in button and the Zoom out button options to zoom the graph.

  • To focus on a specific method, double-click the corresponding block on the graph.

  • To restore the original size of the graph, click 1:1.

  • If you want to locate a specific function on the graph, start typing its name. The graph highlights all blocks with the names matching your search request.

    Use Previous Occurrence and Next Occurrence for fast navigation between search results. You can also search either in the whole graph or just in a specific subtree.

    Searching the flame graph

Capture the graph

You can capture and export the graph separately from other data in the report.

  • Click Capture Image and select Copy to Clipboard or click Save to export the graph as an image in the .png format.

Call Tree

The Call Tree tab represents information about a program’s call stacks that were sampled during profiling. The top-level All threads merged option shows all threads merged together into a single tree. There's also a top-down call tree for each thread.

Call Tree

For each method, the tab shows the following information:

  • Functions' names

  • Percentage of total sample time or parent's sample time

  • The total sample count

  • Recursive calls

Collapse recursive calls

A complex application that has multiple recursive methods may be very difficult to analyze. In a regular Call Tree view, recursive calls are displayed as they are called – one after another, which in case of complex call stacks with multiple recursive calls leads to almost infinite stack scrolling.

CLion detects a recursion when the same method is called higher up in the call stack. In this case, the subtree is taken out of the call tree and then attached back to the first invocation of that method. This way you can bypass recursion and focus on methods that consume most of the resources and calls that they make.

Collapsing recursive calls allows you to see the total amount of time spent in these calls as if there was no recursion.

Demonstrating collapsed recursive calls

Folded recursive calls are marked with the the Recusrion icon icon on the Call Tree tab. Click it to open the recursive call tree in a separate tab. You can preview the number of merged stacks in a tooltip.

Unfolding a collapsed recursion

What-if: focus on specific methods

CLion allows you to examine specific methods in the Call Tree: you can exclude particular methods or other way around, focus only on the methods in which you are interested at the moment.

Right-click the necessary method on the Call Tree tab and select one of the following options to open the results in a dedicated tab:

  • Focus on Subtree: show only the selected method call. Parent method sample time counter shows only the time spent in the selected subtree.

  • Focus on Call: show the selected method and the methods that call it. When this option is enabled, every time frame shows only the time spent in the selected method.

  • Exclude Subtree: ignore the selected method call.

  • Exclude Call: ignore all calls to the selected method.

Using the What-if feature

Method List

The Methods List collects all methods in the profiled data and sorts them by cumulative sample time. For each function from the list, you can view Back Traces and Merged Callees.

Method List tab
Last modified: 02 April 2021