CLion 2019.2 Help

Profiler

With CLion's CPU profiler integration, you can analyze the metrics of performance collected for your application (both kernel and user's code). The profiler is available on Linux and macOS, and the implementation is based on the Perf and DTrace tools respectfully. In the current state, you can run the profiler for CMake and Gradle projects. Note that it is not supported for WSL toolchain and remote development mode.

Perf and DTrace use sampling at a fixed rate to interrupt the application and collect program counter and stack traces, which are then translated into profiling reports. Such reports can be long and difficult to analyze, so CLion provides visualization for the profiler's output data.

Prerequisites

  1. Install the Perf tool for your particular kernel release.

    Use uname -r to find out the exact version, and then install the corresponding linux-tools package. For example:

    $ uname -r 4.15.0-36-generic $ sudo apt-get install linux-tools-4.15.0-36-generic

  2. Adjust kernel options

    • perf_event_paranoid - controls the use of the performance events data by non-root users.

      Set the value to be less than 2 to let the profiler collect performance information without root privileges:

      sudo sh -c 'echo 1 >/proc/sys/kernel/perf_event_paranoid'

    • kptr_restrict - sets restrictions on exposing kernel addresses.

      To have kernel symbols properly resolved, disable the protection offered by kptr_restrict by setting its value to 0:

      sudo sh -c 'echo 0 >/proc/sys/kernel/kptr_restrict'

    By default, these changes affect your current OS session only. To keep the settings across system reboots, run:

    sudo sh -c 'echo kernel.perf_event_paranoid=1 >> /etc/sysctl.d/99-perf.conf' sudo sh -c 'echo kernel.kptr_restrict=0 >> /etc/sysctl.d/99-perf.conf' sudo sh -c 'sysctl --system'

    Upon the first launch of the profiler, CLion checks whether kernel variables are already set up and suggests the necessary changes:

    adjust linux kernel variables for the profiler
  3. For human-readable names in the output and jump-to-source navigation, the profiler requires addr2line.

    This tool is a part of the binutils package, so you likely have it on your system by default. If not, install the package separately:

    apt-get install binutils

  • The only required tool is DTrace, which is most likely installed by default on your macOS. Check it by calling the dtrace command in the terminal.

CLion automatically detects the Perf or DTrace executable in case its location is included in the PATH environment variable. You can also set the path manually in Settings / Preferences | Build, Execution, Deployment | Dynamic Analysis Tools | Profiler.

Run profiling

Prepare the build

  1. The profiler relies on debug information to provide meaningful output data and navigation, so Debug configurations are preferable to be used for profiling.

  2. Compiler optimizations, such as inlining, can influence profiling results. To make sure none of the frames are missing due to inlining, set the optimization level to -O0 in your CMakeLists.txt:

    set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -O0") set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -O0")

    Also, compilers can use the frame pointer register as a general-purpose register for optimization purposes, which may lead to broken stack traces. On Linux, the profiler implementation does not depend on this, but on macOS, we recommend setting the -fno-omit-frame-pointer compilation flag for gcc and both -fno-omit-frame-pointer and -mno-omit-leaf-frame-pointer for clang.

Configure sampling frequency

  • The default sampling rate value is rather high, which might require a lot of disk space for long-running programs.

    If required, you can change the profiler's sampling frequency in Settings / Preferences | Build, Execution, Deployment | Dynamic Analysis Tools | Profiler.

    profiler settings

    When choosing a sampling rate, mind other timer-driven activities that may be scheduled in your system. As an example, the default value is set to 99 Hertz instead of 100 Hertz to avoid lockstep sampling with other possible activity with a sampling frequency of 100Hz.

Run the profiler

  1. To run the profiler, use the icons actions profile svg button on the main toolbar or call Run | Profile. Another option is to choose icons actions profile svgProfile from the left gutter menu:

    run gutter menu with the profiler option

    Note that on macOS, you can also attach the profiler to a running process (call Run | Attach Profiler to Process):

    attach profiler to a process

  2. When you launch profiling, CLion notifies you if the profiler is attached successfully.

    After the application stops, and the profiling data is ready, CLion shows a balloon with a link to the CPU Profiler tool window (also accessible from the main menu View | Tool Windows | CPU Profiler):

    profiling finished balloon

    To stop the profiler prior to stopping the application, use the Docker core icons StopContainer 1 button in the Profiler tool window.

Interpret the results

In the CPU Profiler tool window, you can see the collected data presented in three tabs: Flame Graph, Call Tree, and Method List. The left-hand part lists the application threads and All threads merged. On Linux, CLion shows meaningful thread names if they were set in the program, and on macOS thread names are shown as id-s.

To search in the profiling results, you can start typing right in the tool window area, and the results will be highlighted in the currently opened tab:

type in the profiler tool window to start the search

Context menu is available in all three tabs of the Profiler tool window. It allows you to locate the selected function in another tab (for example, Focus on method in Methods List for a Flame Graph block), navigate to the source code (Jump to Source), and copy frame information to clipboard: only the frame name (Copy Frame) or the sequence of frame names from the stack bottom up to the selected frame (Copy Stack up to Frame).

context menu for tab elements

Flame Graph

Raw profiling data collected by Perf or DTrace is a call tree summary. Flame Graphs visualize it as a collection of stack traces: the rectangles stand for frames of the call stack, ordered by width.

Each block represents a function in the stack (a stack frame). The width of each block corresponds to the method’s CPU time used (or the allocation size, in case of allocation profiling). On the Y-axis, there is a stack depth going from bottom up. The X-axis shows the stack profile sorted from the most resource-consuming functions to the least consuming ones.

When reading the flame graph, focus on the widest blocks. These blocks are the functions most presented in the profile. You can start from the bottom and move up, following the code flow from parent to child methods, or use the opposite direction to explore the top blocks that show the functions running directly on the CPU.

Hover the mouse over any block to view the details:

block details in the flame graph

Call Tree

The Call Tree tab shows the program call tree with the percentage of each function in the total profiling time. The optional number right after the percentage presents a filtered sequence of calls. Click it to expand this sequence.

call tree tab in the profiler results

To configure and filter the Call Tree view, use the Presentation Settings button (settings).

Method List

The Methods List collects all methods in the profiled data and sorts them by cumulative sample time. For each function from the list, you can view Back Traces and Merged Callees.

method list tab in the profiler results
Last modified: 22 August 2019