Yes, we see similar behavior. The top slowest calls (with average times):
* GET v2/organizations/([^/?#]+)/users (9s)
* GET v2/organizations/([^/?#]+)/managers (7.48s)
* GET v2/spaces/([^/?#]+)/managers (6.45s)
* PUT v2/service_brokers/([^/?#]+) (6.25s)
* GET v2/spaces/([^/?#]+)/developers (5.81s)
* DELETE v2/spaces/([^/?#]+) (5.66s)
* POST v2/service_instances (4.95s)
* GET v2/apps/([^/?#]+)/summary (4.21s)
service_brokers, delete spaces, and service_instances all communicate with
a service broker which in turn communicates with external services: slow
does seem legitimate there. apps/summary is slow in a similar way around
communication with hm9000:
[image: Inline image 2]
It definitely looks like network, but that wouldn't explain why things get
immediately better after restarting the controller (a monit restart on the
job, not restarting the whole vm). If network is the cause, I would also
expect Postgres to be slow as those machines are next to everything else in
the network (though that's private, internal IP based rather than public
IP, so the path is a bit different).
I've tried looking back a bit through history, but the response time of
those users/managers/developers endpoints vary so much based on the number
of users in the organization that the averages over time don't show any
meaningful trend: we just start hearing from users when it gets really
slow, as those are the endpoints that have a lot of paging for the big orgs.
-Matt