Noisy Neighbor Nozzle CLI - Quickly Determine High Volume Log Producers #loggregator

Adam Hevenor

Nearly all log reliability problems on CF can be attributed to two main factors. Improper scaling or "noisy neighbor" applications. For the former you can install logmon, check the scaling recommendations published by Pivotal and learn about our capacity planning methodology in our whitepaper.  For the latter Operators have had to rely on their own solutions until now. The Noisy Neighbor Nozzle is a set of nozzle applications and a CLI Plugin that make it easy to identify applications that are logging excessively (after some time playing with it we built in a color threshold at 1,000,000 logs per minute). We worked with several customers as part of Pivotal's early access program to make this nozzle easy to deploy and quickly utilize from the Command Line. We also have included an example "accumulator" that will send the results to datadog and allow operators to operationalize their monitoring of these events. One more's backwards compatible to nearly all versions of Cloud Foundry since it uses a simple methodology of counting logs through the Firehose.  

To learn more about how this helps Operators check out my post on Rapid Troubleshooting of the Cloud Foundry Logging System on Medium

Jouke Waleson <jt.waleson@...>

We've been hit by noisy loggers more than once, this will be a very useful tool in identifying them quickly!

Any thoughts on integrating this with CF Top ( )? It's becoming a standard tool for investigating CF issues, at least in my team. I think noisy loggers would be a good addition so that operators would only need a single tool.


Adam Hevenor

I can see that overlap, but there is an important scope difference to point out here. The Noisy Neighbor Nozzle requires Operator scope to read from the firehose and is intended for Platform wide monitoring, where as CF Top can be used by app developers to monitor the applications within their space. This is important because apps from one space can effect other spaces--although I can see the appeal of also adding this to the information app developers have access to. 


With CF Top plugin there are two fields "LOG_OUT" and  "LOG_ERR" which gives the total logged messages since cf top has been running.   Although it doesn't give you a rate of messages per minute, it can quickly identify chatty apps. If you have "doppler.firehose" scope it will show logging totals for all apps, if not it will only show apps you have authority to see.   You can sort by any column, so sorting on "LOG_OUT" column will show the top loggers.


Adam Hevenor

One thing to watch out for using cf nozzle or cf top to try and calculate Noisy Neighbors by reading from the firehose. Reading with a single client can put back pressure on the Loggregator system by being a "slow consumer" (this is especially true if you are on a slow connection). This can cause load in Traffic Controller and Doppler and result in dropped messages. This is fine for small deployments but potentially damaging for large deployments with high rates of Firehose throuhgput. The Noisy Neighbor Nozzle mitigates this by providing a scalable application that can shard the Firehose stream.