Re: cloud_controller_ng performance degrades slowly over time

Matt Cholick

So things got a little more interesting yesterday. We have some custom
Golang jobs for auditing and some other purposes. A recent change added the
consult_agent template to the box with these jobs and I saw similar
behavior: requests with dns resolution taking +5s consistently, fixable by
restarting the process. I wrote a small Golang app to try and reproduce it.
I got similar, but not identical behavior in the test app.

package main

import (

func main() {
for {
startOpen := time.Now()
conn, err := net.Dial("tcp", "")
if err != nil {
fmt.Printf("%#v", err)
} else {
endOpen := time.Now()
endClose := time.Now()
openDur := endOpen.Sub(startOpen) / 1e6
closeDur := endClose.Sub(endOpen) / 1e6

fmt.Printf("dns_open: %dms | dns_close: %dms\n", openDur, closeDur)

After some time, I saw:
dns_open: 14ms | dns_close: 0ms
dns_open: 5019ms | dns_close: 1ms
dns_open: 25ms | dns_close: 1ms
dns_open: 27ms | dns_close: 1ms
dns_open: 20ms | dns_close: 0ms
dns_open: 5015ms | dns_close: 0ms
dns_open: 14ms | dns_close: 0ms
dns_open: 5018ms | dns_close: 1ms
dns_open: 15ms | dns_close: 0ms
dns_open: 5018ms | dns_close: 1ms
dns_open: 19ms | dns_close: 0ms
dns_open: 5046ms | dns_close: 0ms

To reiterate, the behavior in the actual app seemed to be every single
request. The test app went into a similar mode where a roughly every other
request failed. Here's the strace for that section:

On some other runs it was quite consistently alternating slow and fast
calls (for ~300 calls in a row alternating slow/fast). Just like the Ruby,
once things get into an unhappy state they stay that way until restarting
the process. (The process ran for 12 hours and never recovered).

I can capture a Ruby strace too if you'd like, but I figured this was more
useful as it added more novel information to the mix.

These Golang jobs have run in production for months without showing this
behavior and, within a few hours of including the consul template, the
issue manifested.

Join { to automatically receive all group messages.