Skip to content

Instantly share code, notes, and snippets.

@andrewhodel
Last active May 14, 2025 18:23
Show Gist options
  • Save andrewhodel/ed7625a14eb87404cafd37493849d1ba to your computer and use it in GitHub Desktop.
Save andrewhodel/ed7625a14eb87404cafd37493849d1ba to your computer and use it in GitHub Desktop.
Go profiling with pprof

Importing net/http/pprof adds HTTP endpoints to serve profile files that can be viewed or charted with a command line tool when the runtime.Set* functions are executed.

1. import pprof into the go program

import _ "net/http/pprof"

If your application is not already running an http or https server, add net/http to the program imports and the following code to the start of the main function:

go http.ListenAndServe(":8550", nil)

# SetBlockProfileRate controls the fraction of goroutine blocking events that are reported in the blocking profile.
# The profiler aims to sample an average of one blocking event per rate nanoseconds spent blocked.
# To include every blocking event in the profile, pass rate = 1. To turn off profiling entirely, pass rate <= 0.
runtime.SetBlockProfileRate(1)

# SetMutexProfileFraction controls the fraction of mutex contention events that are reported in the mutex profile.
# On average 1/rate events are reported. The previous rate is returned.

# To turn off profiling entirely, pass rate 0. To just read the current rate, pass rate < 0. (For n>1 the details of sampling may change.)

# You need a rate greater than 0 to gather the contended mutex data.

# if you use a number greater than 1, not all events will be reported because the "fraction" will be less than 1/1.  A value of 1 means report 100% of the events at a 1:1 ratio of time.
# For example, if a caller holds a lock for 1s while 5 other goroutines are waiting for the entire second to acquire the lock, its unlock call stack will report 5s of contention.
runtime.SetMutexProfileFraction(1)

2. open profiles from the http server with pprof

pprof is a tool that runs on any machine with go and fetches the profile files from the server you placed in your program via http.

  1. goroutine stack trace:

go tool pprof "http://127.0.0.1:8550/debug/pprof/goroutine"

  1. memory heap sample:

go tool pprof "http://127.0.0.1:8550/debug/pprof/heap"

  1. sampling of all past memory allocations:

go tool pprof "http://127.0.0.1:8550/debug/pprof/allocs"

  1. threadcreate stack trace:

go tool pprof "http://127.0.0.1:8550/debug/pprof/threadcreate"

  1. blocking of synchronization primitives, available with runtime.SetBlockProfileRate(1):

go tool pprof "http://127.0.0.1:8550/debug/pprof/block"

  1. holders of contended mutexes, available with runtime.SetMutexProfileFraction(1):

go tool pprof "http://127.0.0.1:8550/debug/pprof/mutex"

With the contended mutex report you will quickly know if there are contended mutexes by the time in the left column.

Showing nodes accounting for 86.11hrs, 99.49% of 86.55hrs total
Dropped 117 nodes (cum <= 0.43hrs)
      flat  flat%   sum%        cum   cum%
  51.76hrs 59.81% 59.81%   51.76hrs 59.81%  runtime.selectgo
  34.35hrs 39.69% 99.49%   34.35hrs 39.69%  sync.(*Cond).Wait

Contended is not locked.

Contended mutexes are just busy. If you want to find which mutexes you did not unlock:

To find a locked mutex by line number, download the goroutine stack and look at every running goroutine with curl -o goroutine_stack.txt "http://127.0.0.1:8550/debug/pprof/goroutine?debug=2" then less goroutine_stack.txt.

Start at the top of the file and search for Lock, , you will find something similar to goroutine 1242 [sync.Mutex.Lock, 1039 minutes]:. Near the end of that line there is a time showing how long the goroutine has been waiting. Beneath that line is the function trace with line numbers that allows you find the locked mutex in your code.

The first block with Lock, and a duration is near the oldest allowing you to find the mutex that never unlocked.

The duration that the mutex has been locked is not available when you get the stack trace with hold ctrl and press \ to exit the running program. You must download the goroutine stack with debug=2 to view that information.

view graphs, charts and flame graphs in the browser

First install graphviz because it is used to create the charts from the available data with sudo yum install graphviz.

Then run pprof with any profile goroutine, heap, allocs, threadcreate, block or mutex and these additional command line parameters: go tool pprof -http 0.0.0.0:8999 "http://127.0.0.1:8550/debug/pprof/goroutine".

Open a browser to the IP address of the machine running go tool pprof and port 8999.

  1. View all in-use functions heirarchically with go tool pprof -http 0.0.0.0:8999 "http://127.0.0.1:8550/debug/pprof/goroutine by clicking View -> Graph.
Screenshot 2025-05-14 at 12 41 20 PM

CPU profiling

The CPU profile is not available as a Profile. It has a special API, the StartCPUProfile and StopCPUProfile functions, because it streams output to a writer during profiling.

That means you need to write code to start and stop output to a file that you can then read with go tool pprof.

  1. Add SetCpuProfileRate to the start of the main function.
# SetCPUProfileRate sets the CPU profiling rate to hz samples per second. If hz <= 0, SetCPUProfileRate turns off profiling.
# If the profiler is on, the rate cannot be changed without first turning it off.
runtime.SetCPUProfileRate(27)
  1. Import runtime/pprof:

import "runtime/pprof"

  1. Start and end CPU profiling in your program with:
f, perr := os.Create("/tmp/cpu.pprof")
if perr != nil {
	fmt.Println(perr)
	os.Exit(1)
}
pprof.StartCPUProfile(f)

// the code you want to profile

pprof.StopCPUProfile()

Then you can view flame graphs of CPU usage with go tool pprof -http 0.0.0.0:8999 /tmp/cpu.pprof

go tool pprof

go tool is a command line tool, once it is open type help for more information.

------------------------------------------------------------------------------

IMPORTANT!

Use -sample_index=alloc_space, the default -sample_index=inuse_space does not show all memory leaks causing out of memory.

go tool pprof -sample_index=alloc_space "http://127.0.0.1:8550/debug/pprof/heap"

This is example output from the memory heap profiling tool in pprof.

go tool pprof "http://127.0.0.1:8550/debug/pprof/heap"
Fetching profile over HTTP from 127.0.0.1:8550/debug/pprof/heap
Saved profile in /Users/zip/pprof/pprof.listener.alloc_objects.alloc_space.inuse_objects.inuse_space.001.pb.gz
File: listener
Type: inuse_space
Time: Aug 5, 2023 at 7:18pm (CDT)
Entering interactive mode (type "help" for commands, "o" for options)

top - show functions using the most memory

(pprof) top
Showing nodes accounting for 159.41MB, 98.13% of 162.44MB total
Dropped 25 nodes (cum <= 0.81MB)
Showing top 10 nodes out of 33
      flat  flat%   sum%        cum   cum%
  155.61MB 95.80% 95.80%   155.61MB 95.80%  reflect.unsafe_NewArray
    3.29MB  2.03% 97.82%     3.29MB  2.03%  encoding/gob.(*encBuffer).writeByte
    0.50MB  0.31% 98.13%        1MB  0.62%  main.collectors_read_journal_entry
         0     0% 98.13%   156.11MB 96.11%  encoding/gob.(*Decoder).Decode
         0     0% 98.13%   156.11MB 96.11%  encoding/gob.(*Decoder).DecodeValue
         0     0% 98.13%   156.11MB 96.11%  encoding/gob.(*Decoder).decOpFor.func2
         0     0% 98.13%   155.61MB 95.80%  encoding/gob.(*Decoder).decOpFor.func3
         0     0% 98.13%   155.61MB 95.80%  encoding/gob.(*Decoder).decOpFor.func4
         0     0% 98.13%    74.01MB 45.56%  encoding/gob.(*Decoder).decodeArrayHelper
         0     0% 98.13%   156.11MB 96.11%  encoding/gob.(*Decoder).decodeMap

tree - show a text rendering of call graph

peek - show callers/callees of functions

(pprof) peek reflect.unsafe_NewArray
Showing nodes accounting for 162.44MB, 100% of 162.44MB total
----------------------------------------------------------+-------------
      flat  flat%   sum%        cum   cum%   calls calls% + context 	 	 
----------------------------------------------------------+-------------
                                          155.61MB   100% |   reflect.MakeSlice
  155.61MB 95.80% 95.80%   155.61MB 95.80%                | reflect.unsafe_NewArray
----------------------------------------------------------+-------------
(pprof) peek reflect.MakeSlice
Showing nodes accounting for 162.44MB, 100% of 162.44MB total
----------------------------------------------------------+-------------
      flat  flat%   sum%        cum   cum%   calls calls% + context 	 	 
----------------------------------------------------------+-------------
                                          155.61MB   100% |   encoding/gob.(*Decoder).decodeSlice
         0     0%     0%   155.61MB 95.80%                | reflect.MakeSlice
                                          155.61MB   100% |   reflect.unsafe_NewArray
----------------------------------------------------------+-------------

go tool trace, data similar to strace's output

trace data with the go tool trace command. To collect a 5-second execution trace:

curl -o trace.out "http://127.0.0.1:8550/debug/pprof/trace?seconds=5"
go tool trace trace.out

end program and output goroutine stack

Hold ctrl and press \.

Remember that the duration that the mutex has been locked is not available when you get the stack trace with hold ctrl and press \ to exit the running program. You must download the goroutine stack with debug=2 to view that information.

downloading the goroutine stack with debug=2

Download the goroutine stack, it is similar to the output from a go program that crashes and lists every running goroutine on exit.

curl -o goroutine_stack.txt "http://127.0.0.1:8550/debug/pprof/goroutine?debug=2"

This is useful for solving the error runtime: goroutine stack exceeds 1000000000-byte limit. Reading the goroutine stack allows you to know what subroutines are being started too often.

# this fails with runtime: goroutine stack exceeds 1000000000-byte limit
func a() {
	if (queue_item_exists == true) {
		// process queue item
		a()
	} else {
		time.Sleep(time.Second)
		a()
	}
}

# this works
func a() {
	if (queue_item_exists == true) {
		// process queue item
		go a()
	} else {
		time.Sleep(time.Second)
		go a()
	}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment