Testing for Zombie Goroutines

Post by Saul Shanabrook

Recently I have been working on some stage lighting control softeware, written in Go. In my tests, I spin up a websocket TCP server plus some workers that output lighting levels continously. I want to not only test to make sure that those parts work, but they also are stopped properly, after the test is over.

To deal with cancelation, I am using the awesome golang.org/x/net/context library. It has a couple of nice features, but here I am only using the context.WithCancel function to get a cancel function, that I use after testing to cleanup.

Using GoConvey for testing, the setup would look something like this:

package main

import (  
    "net/http"
    "testing"
    "time"

    "golang.org/x/net/context"

    "github.com/lucibus/subicul/testutils"
)


func TestServer(t *testing.T) {  
    Convey("when i start the server", t, func() {
        port := 9001
        ctx, cancelF := context.WithCancel(context.Background())
        err := MakeStateServer(ctx, port)
        So(err, ShouldBeNil)

        url := fmt.Sprintf("ws://localhost:%v/", port)
        d := websocket.Dialer{}     

        Convey("I should be able to connect", func() {
            conn, _, err := d.Dial(url, http.Header{})
            So(err, ShouldBeNil)
            conn.Close()
        })

        Reset(func() {
            cancelF()
            time.Sleep(time.Millisecond)
            So("subicul", ShouldNotBeRunningGoroutines)
        })

    })
}

Here is the interesting function, ShouldNotBeRunningGoroutines:

package main

import (  
    "bufio"
    "bytes"
    "runtime/pprof"
    "strings"
)

// ShouldNotBeRunningGoroutines takes in the name of the current module as
// `actual` and returns a blank string if no other goroutines are running
// in that module, besides testing gorutines.
// If there are other goroutines running, it will output the full stacktrace.
// It does this by parsing the full stack trace of all currently running
// goroutines and seeing if any of them are within this module and are not
// testing goroutines.
func ShouldNotBeRunningGoroutines(actual interface{}, _ ...interface{}) string {  
    // this function has to take an interface{} type so that you can use it with
    // GoConvey's `So(...)` function.
    module := actual.(string)

    var b bytes.Buffer
    // passes 1 as the debug parameter so there are function names and line numbers
    pprof.Lookup("goroutine").WriteTo(&b, 1)
    scanner := bufio.NewScanner(&b)
    // each line of this stack trace is one path in one goroutine that is running
    for scanner.Scan() {
        t := scanner.Text()
        // now we wanna check when this line we are looking at shows a goroutine
        // that is running a file in this module that is not a test or a dep
        runningInModule := strings.Contains(t, module)
        runningTest := strings.Contains(t, "test")
        runningExternal := strings.Contains(t, "Godeps") || strings.Contains(t, "vendor")
        runningOtherFileInModule := runningInModule && !runningTest && !runningExternal
        if runningOtherFileInModule {
            // if we find that it is in fact running another goroutine from this
            // package then output the full stacktrace, with debug level 2 to show
            // more information
            pprof.Lookup("goroutine").WriteTo(&b, 2)
            return "Was running other goroutine: " + t + "\n" + "\n" + b.String()
        }
    }
    return ""
}

As you can see, this is a pretty hacky/horrible way of checking whether we have any zombie goroutines left running. It just basically searches for a line in the whole stacktrace that points to a file in your module, that isn't a test file.

At first I thought I could just check whether there were more than on goroutine running, because I assumed the test would use one goroutine, so if there were any other left over, after I cancelled, then I should assume it was my fault. However, I found that there were often multiple testing goroutines running at once.

Another options is just to not care if you leave any zombie goroutine around, since they don't use that much memory. I wanted to care, however, since some of my gourtines had had effects (opening up files or serial ports), so I wanted to make sure they died when I told them too.

I am, however, very new at Go. So I am most likely thinking about this whole problem wrong. If you agree, please comment or let me know (@saulshanabrook) where my thinking has gone astray. I will also be at GopherCon this week and would love to talk.

This post was inspired by Peter Bourgon's great talk, which asked the Go community to talk more about their approaches to using the context package.