TLS Handshake Timeouts in Go

I recently came across a situation where a Go program (built with go1.8.3) had spawned a lot more goroutines than required. Profiling revealed the cause to be go routines blocked on reading from a client while negotiating the TLS handshake. The goroutine stack trace led me to the following code segment in crypto/tls:

// File: cryto/tls/conn.go
// Line: 589 (go1.8.3)

// Read header, payload.
// recordHeaderLen is a global variable set to 5
if err := b.readFromUntil(c.conn, recordHeaderLen); err != nil {

A server side goroutine could be stuck here for a long time if the client doesn’t write anything to the wire. Clearly, the fix is to set a TLSHandshakeTimeout on http.Server. Just one problem, there’s no such setting.

http.Server exposes a ReadTimeout and a ReadHeaderTimeout. The latter applies only after a TLS handshake has been completed successfully and does not help our cause. The former, would definitely fix our problem, but it comes with a caveat. The entire request Body must be read within the ReadTimeout, meaning we cannot stream large request bodies.

Indirect TLSHandshakeTimeout

http.Server exposes a callback ConnState which fits our needs perfectly, i:e allows us to terminate connections stuck during a TLS handshake but still support streaming large request bodies.

// ConnState specifies an optional callback function that is
// called when a client connection changes state. See the
// ConnState type and associated constants for details.
ConnState func(net.Conn, ConnState)

ConnState is called for the first time immediately after Accept. The ConnState at this point is StateNew.

The next time it is called is when the server has read atleast one byte of the request. The ConnState is now StateActive

Inbetween these state transistions, the server has, if required, negotiated the TLS handshake. If a ReadDeadline is set on the conn immediately after it enters StateNew it is only overriden by the ReadHeaderTimeout when the server attempts to read the request. Thus, the only meaningful activity carried out the Server after setting the custom ReadDeadline is the TLSHandshake. I’ve included the code for this below:

var TLSHandshakeTimeout = 30 * time.Second // very generous timeout

s := &http.Server{
  ReadHeaderTimeout: 1 * time.Minute, 
  ConnState: func(conn net.Conn, cs ConnState){
    switch cs{
      case StateNew:
      conn.SetReadDeadline(time.Now().Add(TLSHandshakeTimeout))
      default:
      // NOTE: this is a good place to track connection level metrics :)
    }
  }, 
}

Resources

Cloudflare has a great blog post on timeouts in http that I found very helpful while researching solutions for this problem.