go pprof 采样何时进行

import (
	_ "net/http/pprof"
)
func main() {
	go func() {
		log.Println(http.ListenAndServe("localhost:6060", nil))
	}()
}

这段代码大家应该都很熟悉，go tool pprof工具的通用代码，用来做性能等分析。我这两天尝试用了一下，不免有了一个疑问：pprof是要采样的，这个采样是何时进行的？是程序启动就开始采样还是当我curl localhost:$PORT/debug/pprof/$PROFILE_TYPE开始？

为什么会考虑这个呢，主要是在看的框架里集成了pprof，默认打开，如果是程序一开始就采样，那对于性能是有损耗的。这种问题别人回答你，你也不一定相信，我们还是来看看pprof的代码吧。

标准库的代码是在$GOROOT/src下面，我们找一下pprof，在我的mac上路径是/usr/local/Cellar/go/1.9.2/libexec/src/net/http/pprof/pprof.go

首先是init函数，注册了我们用到的分析urls：

func init() {
    http.Handle("/debug/pprof/", http.HandlerFunc(Index))
    http.Handle("/debug/pprof/cmdline", http.HandlerFunc(Cmdline))
    http.Handle("/debug/pprof/profile", http.HandlerFunc(Profile))
    http.Handle("/debug/pprof/symbol", http.HandlerFunc(Symbol))
    http.Handle("/debug/pprof/trace", http.HandlerFunc(Trace))
}

这是在程序启动就执行的，这也是我们感觉只要import就万事大吉的原因。其实看到这里应该就明白了，profile只是普通的函数调用而已，程序启动只是注册了handler，真正的采样应该是在请求之后执行的。

但是来都来了，我们不妨往下看看，毕竟需要看标准库的机会不多。我们来看一下cpu profile，也就是Profile函数：

// Profile responds with the pprof-formatted cpu profile.
// The package initialization registers it as /debug/pprof/profile.
func Profile(w http.ResponseWriter, r *http.Request) {
    sec, _ := strconv.ParseInt(r.FormValue("seconds"), 10, 64)
    if sec == 0 {
        sec = 30
    }
    if durationExceedsWriteTimeout(r, float64(sec)) {
        w.Header().Set("Content-Type", "text/plain; charset=utf-8")
        w.Header().Set("X-Go-Pprof", "1")
        w.WriteHeader(http.StatusBadRequest)
        fmt.Fprintln(w, "profile duration exceeds server's WriteTimeout")
        return
    }
    // Set Content Type assuming StartCPUProfile will work,
    // because if it does it starts writing.
    w.Header().Set("Content-Type", "application/octet-stream")
    if err := pprof.StartCPUProfile(w); err != nil {
        // StartCPUProfile failed, so no writes yet.
        // Can change header back to text content
        // and send error code.
        w.Header().Set("Content-Type", "text/plain; charset=utf-8")
        w.Header().Set("X-Go-Pprof", "1")
        w.WriteHeader(http.StatusInternalServerError)
        fmt.Fprintf(w, "Could not enable CPU profiling: %s\n", err)
                return
    }
    sleep(w, time.Duration(sec)*time.Second)
    pprof.StopCPUProfile()
}

获取了一个默认30秒的时间，执行StartCPUProfile，里面应该是个goroutine，调用返回后sleep了一下，结束cpu profile。采样应该是在StartCPUProfile，执行seconds时间。

再来看看StartCPUProfile

func StartCPUProfile(w io.Writer) error {
    // The runtime routines allow a variable profiling rate,
    // but in practice operating systems cannot trigger signals
    // at more than about 500 Hz, and our processing of the
    // signal is not cheap (mostly getting the stack trace).
    // 100 Hz is a reasonable choice: it is frequent enough to
    // produce useful data, rare enough not to bog down the
    // system, and a nice round number to make it easy to
    // convert sample counts to seconds. Instead of requiring
    // each client to specify the frequency, we hard code it.
    const hz = 100
    cpu.Lock()
    defer cpu.Unlock()
    if cpu.done == nil {
        cpu.done = make(chan bool)
    }
    // Double-check.
    if cpu.profiling {
        return fmt.Errorf("cpu profiling already in use")
    }
    cpu.profiling = true
    runtime.SetCPUProfileRate(hz)
    go profileWriter(w)
    return nil
}

注释解释了取样频率的由来，我们先看一下CPU主频的概念：

CPU的主频，即CPU内核工作的时钟频率（CPU Clock Speed）。CPU的主频的基本单位是赫兹（Hz），但更多的是以兆赫兹（MHz）或吉赫兹（GHz）为单位。时钟频率的倒数即为时钟周期。时钟周期的基本单位为秒（s），但更多的是以毫秒（ms）、微妙（us）或纳秒（ns）为单位。在一个时钟周期内，CPU执行一条运算指令。也就是说，在1000 Hz的CPU主频下，每1毫秒可以执行一条CPU运算指令。在1 MHz的CPU主频下，每1微妙可以执行一条CPU运算指令。而在1 GHz的CPU主频下，每1纳秒可以执行一条CPU运算指令。

在默认情况下，Go语言的运行时系统会以100 Hz的的频率对CPU使用情况进行取样。也就是说每秒取样100次，即每10毫秒会取样一次。为什么使用这个频率呢？因为100 Hz既足够产生有用的数据，又不至于让系统产生停顿。并且100这个数上也很容易做换算，比如把总取样计数换算为每秒的取样数。实际上，这里所说的对CPU使用情况的取样就是对当前的Goroutine的堆栈上的程序计数器的取样。由此，我们就可以从样本记录中分析出哪些代码是计算时间最长或者说最耗CPU资源的部分了。

代码很容易理解，锁住cpu，设置runtime的采样频率，profileWriter就是实际采样了。

所以结论是，采样在请求之后才会进行，线上打开采样的接口没有问题。

参考资料