CLion 2018.3 Help

Profiler

Introduction

With CLion's CPU profiler integration, you can analyze the metrics of performance collected for your application (both kernel and user's code). The profiler is available on Linux and macOS, and the implementation is based on the Perf and DTrace tools respectfully. In the current state, you can run the profiler for CMake and Gradle projects. Note that it is not supported for WSL toolchain and remote development mode.

Perf and DTrace use sampling at a fixed rate to interrupt the application and collect program counter and stack traces, which are then translated into profiling reports. Such reports can be long and difficult to analyze, so CLion provides visualization for the profiler's output data.

Prerequisites

Before you start using the profiler, make sure to install the following tools:

  1. Install the Perf tool for your particular kernel release. Use uname -r to find out the exact version, and then install the corresponding linux-tools package. For example:

    marinak@marinak-VirtualBox:~$ uname -r 4.15.0-36-generic marinak@marinak-VirtualBox:~$ sudo apt-get install linux-tools-4.15.0-36-generic

  2. Adjust kernel options. Before starting the profiler on Linux, you need to set up two kernel options:

    • perf_event_paranoid - controls the use of performance events data by non-root users. Set its value to be less than 2 to let the profiler collect performance information without root privileges:

      sudo sh -c 'echo 1 >/proc/sys/kernel/perf_event_paranoid'

    • kptr_restrict sets restrictions on exposing kernel addresses. To have kernel symbols properly resolved, disable the protection offered by kptr_restrict by setting its value to 0:

      sudo sh -c 'echo 0 >/proc/sys/kernel/kptr_restrict'

    By default, these changes affect your current OS session only. To keep the settings across system reboots, run:

    sudo sh -c 'echo kernel.perf_event_paranoid=1 >> /etc/sysctl.d/99-perf.conf' sudo sh -c 'echo kernel.kptr_restrict=0 >> /etc/sysctl.d/99-perf.conf' sudo sh -c 'sysctl --system'

    Upon the first launch of the profiler, CLion checks whether kernel variables are already set up and suggests the necessary changes:

    adjust linux kernel variables for the profiler
  3. For human-readable names in the output and jump-to-source navigation, the profiler requires addr2line. This tool is a part of the binutils package, so you likely have it on your system by default. If not, you need to install the package separately: apt-get install binutils.

  • The only required tool is DTrace, which is most likely installed by default on your macOS. Check it by calling the dtrace command in the terminal.

CLion automatically detects the Perf or DTrace executable, in case its location is included in the PATH environment variable. You can also set the path manually in Settings / Preferences | Build, Execution, Deployment | Dynamic Analysis Tools | Profiler.

Using the profiler

Prepare the build

  1. The profiler relies on debug information to provide meaningful output data and navigation, so Debug configurations are preferable to be used for profiling.

  2. Compiler optimizations such as inlining can influence profiling results. To make sure none of the frames are missing due to inlining, set the optimization level to -O0 in your CMakeLists.txt:

    set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -O0") set(CMAKE_CXX_FLAGS "${CMAKE_C_FLAGS} -O0")

    Also, compilers can use the frame pointer register as a general purpose register for optimization purposes, which may lead to broken stack traces. On Linux, the profiler implementation does not depend on this, but on macOS we recommend to set the -fno-omit-frame-pointer compilation flag for gcc and both -fno-omit-frame-pointer and -mno-omit-leaf-frame-pointer for clang.

Configure sampling frequency

  • The default sampling rate value is rather high, which might require a lot of disk space for long-running programs. If required, you can change the profiler's sampling frequency in Settings / Preferences | Build, Execution, Deployment | Dynamic Analysis Tools | Profiler.

    profiler settings
    When choosing a sampling rate, mind other timer-driven activities that may be scheduled in your system. As an example, the default value is set to 99 Hertz instead of 100 Hertz to avoid lockstep sampling with other possible activity with sampling frequency of 100Hz.

Run the profiler

  1. To run the profiler, use the icons actions profile svg button on the main toolbar or the Run | Profile action on the main menu. Another option is to choose icons actions profile svgProfile from the left gutter menu:

    run gutter menu with the profiler option

  2. When you launch the profiling, CLion notifies you if the profiler is attached successfully. After the application stops, and the profiling data is ready, CLion shows a balloon with a link to the CPU Profiler tool window (also accessible from the main menu View | Tool Windows | CPU Profiler):

    profiling finished balloon
    To stop the profiler prior to stopping the application, use the Docker core icons StopContainer 1 button in the Profiler tool window.

Interpreting the results

In the CPU Profiler tool window, you can see the collected data presented in three tabs - Flame Chart, Call Tree, and Method List:

profiler tool window on linux
On the left side, there is a list of application threads and the All threads merged option. On Linux, you can view meaningful thread names if they were set in the program, and on macOS, the thread names are shown as id-s.

Initially, the Profiler tool window shows a notification with helpful tips for navigation and search. For example, you can start typing right in the tool window area, and search results will be highlighted in the currently opened tab:

type in the profiler tool window to start the search

Right-click context menu is available in all tabs of the Profiler tool window. It allows you to locate the selected function in another tab (for example, Focus on method in Methods List for the Flame Chart blocks), navigate to the source code (Jump to Source), and copy frame information to clipboard: only the frame name (Copy Frame) or the sequence of frame names from the stack bottom up to the selected frame (Copy Stack up to Frame):

context menu for tab elements

Flame Chart

Raw profiling data collected by Perf or DTrace is a call tree summary showing the calls and the percentage of time in each of the code branches. Flame charts visualize such reports as collections of stack traces: on the y-axis there is stack depth going from bottom up, and the x-axis shows the stack profile sorted from the most CPU-consuming functions to the least.

When reading the flame chart, focus on the widest blocks, which are the functions that are presented in the profile most. You can start from the bottom and move up, following the code flow from parent to child functions, or use the opposite direction to explore the top blocks that show the functions running directly on the CPU.

In the Flame Chart tab, you can hover the mouse over any block to view the details:

block details in the flame chart

Call Tree

The Call Tree tab shows the program call tree with the percentage of each function in the total profiling time. To configure and filter the Call Tree view, use the settings and filter buttons.

call tree tab in the profiler results

Method List

The Method List tab shows the list of functions sorted by the number of samples. For each function from the list, you can view Back Traces and Merged Callees. Right-click context menu is available for functions in the list and in the Back Traces and Merged Callees view.
method list tab in the profiler results
Last modified: 7 December 2018