Go • September 30, 2024 • Aditya Rawas

Understanding the Go Runtime: Memory, Goroutines & GC Explained

When developers talk about Go’s power — its efficiency, simplicity, and strong concurrency model — much of that capability comes directly from the Go runtime. This post assumes you’re already familiar with Go structs and basic Go data modeling. The runtime handles memory management, garbage collection, goroutine scheduling, and system integration, all transparently so you can focus on business logic.

This post explains what the Go runtime is, how each component works, how to tune it, and how to profile your application when something goes wrong.

What is the Go Runtime?

The Go runtime is the underlying system that manages:

Memory allocation and deallocation
Garbage collection (automatic memory cleanup)
Concurrency (goroutines and scheduling)
System calls and standard library integrations

Unlike languages like C, you don’t manage memory manually. Unlike Java’s early GC implementations, Go’s runtime is designed for low-latency operation — GC pauses are kept to sub-millisecond levels.

The runtime is compiled directly into your Go binary — there’s no separate VM or interpreter to install.

1. Memory Management

Stack vs Heap

Go allocates variables on either the stack or the heap:

Stack: Fast, per-goroutine, automatically reclaimed when the function returns. Each goroutine starts with a small stack (~8KB) that grows dynamically as needed.
Heap: Slower, shared, managed by the garbage collector. Variables that outlive the function that created them (e.g., returned pointers, interface values) are heap-allocated.

The Go compiler performs escape analysis at compile time to decide where each variable lives:

func newInt() *int {
    x := 42
    return &x // x escapes to heap — its address outlives the function
}

func localInt() int {
    x := 42
    return x  // x stays on stack — value is copied
}

Use go build -gcflags='-m' to see escape analysis decisions:

go build -gcflags='-m' ./...
# ./main.go:5:2: moved to heap: x

Memory Allocator

Go uses a tcmalloc-inspired allocator that maintains per-P (processor) memory caches. Small allocations are served from thread-local caches with no locking — making them extremely fast. Large allocations go directly to the OS.

2. Garbage Collection

Go uses a concurrent, tri-color mark-and-sweep garbage collector.

How It Works

The GC classifies objects into three colors:

White: Not yet visited — candidates for collection
Grey: Discovered but children not yet scanned
Black: Fully scanned — will not be collected

The GC starts with all objects white. It marks root objects (globals, stack variables) grey, then scans grey objects, turning them black and marking their references grey. When no grey objects remain, all remaining white objects are unreachable and can be freed.

Concurrent Execution

Go’s GC runs concurrently with your program on separate goroutines. It uses short stop-the-world (STW) pauses only for:

Enabling write barriers at GC start
Final mark completion at GC end

Modern Go achieves STW pauses under 1 millisecond for most applications — suitable for latency-sensitive web services.

When Does GC Run?

The GC triggers automatically when the heap doubles since the last collection. You can control this with the GOGC environment variable:

GOGC=100  # default — GC when heap doubles (100% growth)
GOGC=200  # GC when heap triples — less frequent GC, more memory used
GOGC=50   # GC when heap grows 50% — more frequent GC, less memory
GOGC=off  # Disable GC entirely (useful for short-lived programs)

You can also set it programmatically:

import "runtime/debug"

debug.SetGCPercent(200) // same as GOGC=200

Manual GC Trigger

import "runtime"

runtime.GC() // force a GC cycle — rarely needed in production

Use this in benchmarks to ensure a clean heap before measuring, not in production code.

Memory Limit (Go 1.19+)

Set a soft memory limit to prevent OOM in constrained environments:

import "runtime/debug"

debug.SetMemoryLimit(512 * 1024 * 1024) // 512 MB soft limit

GOMEMLIMIT=512MiB go run main.go

When the heap approaches the limit, GC runs more aggressively to stay within bounds.

3. Goroutines

Goroutines are Go’s lightweight concurrency primitive — user-space threads managed entirely by the Go runtime. They’re far cheaper than OS threads:

	OS Thread	Goroutine
Stack size	1–8 MB (fixed)	8 KB (grows dynamically up to 1 GB)
Creation cost	Expensive (kernel syscall)	Cheap (runtime call, ~1μs)
Practical limit	Thousands	Millions

go func() {
    fmt.Println("Running concurrently")
}()

Goroutine Communication via Channels

ch := make(chan int)

go func() {
    ch <- 42
}()

result := <-ch
fmt.Println(result) // 42

Goroutine Leaks

A goroutine leak occurs when a goroutine is spawned but never terminates — usually because it’s blocked waiting on a channel that will never send, or a context that never cancels.

// LEAK: goroutine blocks forever if nobody reads from ch
func leak() {
    ch := make(chan int)
    go func() {
        ch <- 42 // blocks forever if nobody receives
    }()
    // ch goes out of scope, goroutine is stuck
}

Fix: use context cancellation

func noLeak(ctx context.Context) {
    ch := make(chan int, 1) // buffered — sender never blocks
    go func() {
        select {
        case ch <- 42:
        case <-ctx.Done(): // exit if context is cancelled
            return
        }
    }()
}

Detecting leaks with goleak:

import "go.uber.org/goleak"

func TestNoLeak(t *testing.T) {
    defer goleak.VerifyNone(t) // fails if goroutines are leaked

    noLeak(context.Background())
}

4. The Go Scheduler (M:N Model)

The Go scheduler maps M goroutines onto N OS threads — the M:N threading model. Three key abstractions:

G (Goroutine): A unit of concurrent work.
M (Machine): An OS thread.
P (Processor): A scheduling context that runs goroutines on a machine. GOMAXPROCS controls the number of Ps (defaults to CPU core count).

Work-Stealing

Each P has a local run queue of goroutines. When a P runs out of work, it steals goroutines from another P’s queue, keeping all cores busy without developer intervention.

import "runtime"

runtime.GOMAXPROCS(4) // use 4 CPU cores for goroutine execution

Goroutine Preemption

Before Go 1.14, goroutines could only be preempted at function call boundaries. A tight CPU loop could starve the scheduler:

// Before Go 1.14: could block the scheduler
for {
    // tight loop with no function calls
}

Go 1.14 introduced asynchronous preemption — the scheduler can interrupt a goroutine at any safe point using OS signals, even in tight loops. This is fully transparent to application code.

5. Profiling with pprof

pprof is the Go runtime’s built-in profiler. It’s essential for diagnosing performance issues, memory leaks, and goroutine leaks in production.

Enable the HTTP pprof Endpoint

import (
    _ "net/http/pprof" // registers handlers on DefaultServeMux
    "net/http"
)

func main() {
    go func() {
        http.ListenAndServe("localhost:6060", nil)
    }()

    // ... your actual server
}

Available Profiles

Endpoint	What it shows
`/debug/pprof/heap`	Memory allocations
`/debug/pprof/goroutine`	All goroutine stack traces
`/debug/pprof/cpu`	CPU usage (requires ?seconds=N)
`/debug/pprof/block`	Goroutine blocking events
`/debug/pprof/mutex`	Mutex contention

Analyze with go tool pprof

# CPU profile — sample for 30 seconds
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30

# Heap profile — current allocations
go tool pprof http://localhost:6060/debug/pprof/heap

# Goroutine profile — see all goroutines and their stacks
go tool pprof http://localhost:6060/debug/pprof/goroutine

Inside the pprof interactive shell:

(pprof) top        — show top functions by resource usage
(pprof) list main  — show annotated source for functions matching "main"
(pprof) web        — open flame graph in browser (requires graphviz)
(pprof) pdf        — save profile as PDF

Benchmark-Based Profiling

func BenchmarkMyFunc(b *testing.B) {
    for i := 0; i < b.N; i++ {
        myFunc()
    }
}

go test -bench=. -cpuprofile=cpu.prof -memprofile=mem.prof
go tool pprof cpu.prof
go tool pprof mem.prof

Runtime Environment Variables

Variable	Default	Effect
`GOMAXPROCS`	# CPU cores	Number of OS threads for Go code
`GOGC`	`100`	GC frequency — lower = more GC, less memory
`GOMEMLIMIT`	unlimited	Soft memory limit (Go 1.19+)
`GODEBUG=gctrace=1`	off	Print GC stats on each cycle
`GODEBUG=gccheckmark=1`	off	Enable GC consistency checks (debug)

# Monitor GC in real time
GODEBUG=gctrace=1 go run main.go

GC trace output looks like:

gc 1 @0.004s 5%: 0.02+0.64+0.004 ms clock, 0.08+0.26/0.42/0+0.016 ms cpu, 4->4->2 MB, 5 MB goal, 8 P

This shows: GC number, time since program start, wall-clock time, CPU time, heap size before/after/live, target size, and P count.

Key Takeaways

The Go runtime handles memory allocation, GC, goroutine scheduling, and system calls — compiled directly into your binary.
Escape analysis decides stack vs heap allocation at compile time. Use -gcflags='-m' to inspect.
Go’s concurrent GC achieves sub-millisecond STW pauses — safe for production web services.
Tune GC frequency with GOGC (higher = less GC, more memory) and cap memory with GOMEMLIMIT.
Goroutines cost ~8KB stack and ~1μs to create — spawn liberally, but watch for leaks.
Use context cancellation to prevent goroutine leaks; detect them with goleak in tests.
The M:N work-stealing scheduler and asynchronous preemption keep all CPU cores busy transparently.
Profile with pprof to diagnose CPU hotspots, memory leaks, and goroutine leaks in production.

Aditya Rawas

Full-stack engineer writing deep-dives on JavaScript, TypeScript, React, AWS, Docker, and Kubernetes. Passionate about making complex engineering concepts accessible to developers at every level.

GitHub LinkedIn Twitter/X