CPU profiler

GoLand collects and visualizes CPU profiles, traces, and heap profiles. To collect all the necessary data, GoLand uses the pprof package. GoLand includes four profilers that you can run from the user interface: CPU, memory, blocking (contention), and mutex.

Profiling results help you locate performance issues, but code improvements must be implemented manually. For more information, see the Profiling at go.dev and the description of the pprof package at pkg.go.dev.

After the analysis is complete, the profiler visualizes the results in reports.

Before you start

Before running a profile, make sure that:

Go is installed, you can install, upgrade, or configure Go by using the GOROOT article. For more information, refer to GOROOT.
GoLand is installed on your machine.
The Go project you want to profile is open in the IDE.

All examples in this topic are available in a sample project on GitHub.

CPU profiling

The CPU profiler measures how much CPU time each function consumes during program execution.

Example program

The following program sorts a slice of random integers using an inefficient bubble sort algorithm:

package main

import (
	"fmt"
	"math/rand"
	"time"
)

func BubbleSort(nums []int) {
	for i := 0; i < len(nums); i++ {
		for j := 0; j < len(nums)-i-1; j++ {
			if nums[j] > nums[j+1] {
				nums[j], nums[j+1] = nums[j+1], nums[j]
			}
		}
	}
}

func main() {
	r := rand.New(rand.NewSource(time.Now().UnixNano()))
	nums := r.Perm(60000)
	BubbleSort(nums)
	fmt.Println("Sorted 60,000 numbers")
}
            

You can run this program using go run main.go by selecting the Run option from the gutter menu.

Create a test for profiling

Create a unit test that runs the sorting function:

package main

import (
	"math/rand"
	"testing"
)

func TestBubbleSort(t *testing.T) {
	nums := rand.Perm(60000)
	BubbleSort(nums)
}
            

Run CPU profiling

Open the _test.go file.
Click the Run option from the gutter menu next to the test function.
Select Profile with CPU Profiler.

Analyze CPU profiling results

GoLand presents CPU profiling data in three views:

Flame graph: visualizes how CPU time is distributed across functions.
The Flame Graph tab shows function calls and the percentage of time each call takes to execute. Each block represents a function in the stack (a stack frame). The Y-axis shows the stack depth (bottom-up), while the X-axis represents functions sorted by CPU usage, from the most to the least resource-consuming.
When reading the flame graph, focus on the widest blocks — they represent functions that consume the most CPU time. Hover over any block to view detailed information.
Call tree: displays how functions call one another and how much time each call takes.
The Call Tree tab provides detailed information about the program’s call stacks sampled during profiling. It includes:
- Method names
- Percentage of total sample time (can be toggled to show parent call time)
- Total sample count
- Number of filtered calls
Method list: provides a tabular view of all functions with cumulative execution time and total CPU usage percentage.
The Method List tab lists all methods found in the profiling data, sorted by cumulative sample time. Each method entry includes a Back Trace tree and a Merge Callees tree.
The Back Traces tab displays the hierarchy of callers, showing which methods invoke the selected one. The Merge Callees view summarizes all methods called by the selected function.
The Merged Callees tab shows call traces that started from the selected method. Callee List is the method list summarizing the methods down the call hierarchy.

In this example, most time is spent in the nested loop of BubbleSort, which indicates an O(n²) bottleneck.

Optimize and compare results

To improve performance, replace bubble sort with quicksort, which is significantly more efficient, offering an average time complexity of O(n log n). In the test, increase the number of random integers to 10 million to ensure that the sorting process takes a measurable amount of time — otherwise, the algorithm will complete almost instantly (around 0.01 seconds).

func QuickSort(nums []int) {
	if len(nums) < 2 {
		return
	}
	quickSortHelper(nums, 0, len(nums)-1)
}

func quickSortHelper(nums []int, low, high int) {
	if low < high {
		pivotIndex := partition(nums, low, high)
		quickSortHelper(nums, low, pivotIndex-1)
		quickSortHelper(nums, pivotIndex+1, high)
	}
}

func partition(nums []int, low, high int) int {
	pivot := nums[high]
	i := low - 1

	for j := low; j < high; j++ {
		if nums[j] <= pivot {
			i++
			nums[i], nums[j] = nums[j], nums[i]
		}
	}

	nums[i+1], nums[high] = nums[high], nums[i+1]
	return i + 1
}
            

When you rerun the CPU profiler, the program finishes much faster and consumes significantly less CPU time — a clear indication of improved performance.

Benchmark the optimized code

To confirm the improvement, benchmark both implementations:

func BenchmarkBubbleSort(b *testing.B) {
	for i := 0; i < b.N; i++ {
		nums := rand.Perm(1000)
		BubbleSort(nums)
	}
}

func BenchmarkQuicksort(b *testing.B) {
	for i := 0; i < b.N; i++ {
		nums := rand.Perm(1000)
		QuickSort(nums)
	}
}
            

Benchmark results show that Quicksort() completes in a fraction of the time compared to BubbleSort().

17 December 2025