Telemetry within BOSH


Mike Lloyd <mike@...>
 

Hey folks,

 

I have an interesting use case in front of me that I’m trying to figure out how to approach both sanely as well as sustainably. I have a use case where I want highly structured telemetry data from the BOSH director for downstream analytics. My target goal of this use case is to have a comprehensive and clear perspective into where BOSH is spending it’s time. I could see any type of telemetry data being immensely useful for operators and CFF developers as it can help give insight into where improvements can be made.

 

Currently I’m exploring 3 options:

  • Dump of the Director database.
    • Pro: Anything that’s logged to the database is available, can be used with external data visualisation solutions.
    • Con: Schema relations and changes with versions make this difficult to sustain; database schema isn’t well documented; potentially lots of SQL script maintenance; only visible for what’s stored in the database.
  • Adding telemetry hooks to the Director and BOSH Agent.
    • Pro: Every action the director takes can be tracked and measured through the majority of BOSH.
    • Cons: Immense amount of initial work; finding a telemetry framework; handling telemetry output.
  • Adding telemetry hooks to the BOSH CLI.
    • Pro: Every action the CLI takes can be tracked and measured.
    • Cons: Only CLI actions can be tracked; immense amount of initial work; finding a telemetry framework; handling telemetry output.

 

Looking across the industry, telemetry is very prevalent, and it’s almost always an opt-in model, so anything I explore . Since I haven’t seen anything like this discussed in the mailing lists before, I wanted to surface my explorations to get others’ thoughts and opinions on telemetry within BOSH.

 

Respectfully,

 

Mike Lloyd

t: @mxplusc

g: @mxplusb
Professional Member, ACM

 


Marco Voelz
 

Dear Mike,


Thanks for bringing this up. More insight in what the Director could definitely be helpful for a number of things. 


Concerning your use-case, I hope you can help me understand a few points:

  • What is it that you're trying to achieve? My current understanding is that you're trying to run some analytics on the data you want to gather. Let's say you have all the data for your Director – how are you going to use it, specifically?
  • At which level of granularity would you imagine this data to be gathered? When I'm reading 'where the Director spends its time', I'm understanding anything between 'each single statement in the code' to 'each interaction with the IaaS/CPI'.
  • To get a better understanding of where you're coming from: Have you looked at particular libraries/tools/software for this use-case that you want to share as examples when gathering telemetry data for other services you worked with?

Thanks and warm regards
Marco


From: cf-bosh@... <cf-bosh@...> on behalf of Mike Lloyd <mike@...>
Sent: Thursday, December 13, 2018 1:12:41 AM
To: cf-bosh@...
Subject: [cf-bosh] Telemetry within BOSH
 

Hey folks,

 

I have an interesting use case in front of me that I’m trying to figure out how to approach both sanely as well as sustainably. I have a use case where I want highly structured telemetry data from the BOSH director for downstream analytics. My target goal of this use case is to have a comprehensive and clear perspective into where BOSH is spending it’s time. I could see any type of telemetry data being immensely useful for operators and CFF developers as it can help give insight into where improvements can be made.

 

Currently I’m exploring 3 options:

  • Dump of the Director database.
    • Pro: Anything that’s logged to the database is available, can be used with external data visualisation solutions.
    • Con: Schema relations and changes with versions make this difficult to sustain; database schema isn’t well documented; potentially lots of SQL script maintenance; only visible for what’s stored in the database.
  • Adding telemetry hooks to the Director and BOSH Agent.
    • Pro: Every action the director takes can be tracked and measured through the majority of BOSH.
    • Cons: Immense amount of initial work; finding a telemetry framework; handling telemetry output.
  • Adding telemetry hooks to the BOSH CLI.
    • Pro: Every action the CLI takes can be tracked and measured.
    • Cons: Only CLI actions can be tracked; immense amount of initial work; finding a telemetry framework; handling telemetry output.

 

Looking across the industry, telemetry is very prevalent, and it’s almost always an opt-in model, so anything I explore . Since I haven’t seen anything like this discussed in the mailing lists before, I wanted to surface my explorations to get others’ thoughts and opinions on telemetry within BOSH.

 

Respectfully,

 

Mike Lloyd

t: @mxplusc

g: @mxplusb
Professional Member, ACM

 


Damzog Jochen (CI/OSC1)
 

Hi Marco and Mike,

 

We have since long time a requirement that is similar. I have noted it down some time ago:

 

Subscription mechanism for events

bosh does have a notion of events, see http://bosh.io/docs/events. These events will be fired on each activity taken by the director o.a. deployments, creation and deletion of vms. These events could very well be used to trigger operations outside of bosh to either prepare or complement the action taken by the director. For example we would like to use these events to trigger creation or deletion of firewall configurations.

In order to use events to trigger external action we require to have a mechanism to subscribe for these events. There are multiple ways to achieve this. For example, the director could forward these events to a messaging system (like rabbitmq) or it could offer a mechanism to register webhooks. 

For our use case of setting up FW rules it would be useful to configure a synchronous coupling between the director action and the external action to ensure FW rules are applied before a particular vms is started. This feature, however, should remain configurable if implemented because many other use cases will probably prefer to be executed asynchronously.

 

 

Mit freundlichen Grüßen / Best regards

Jochen Damzog

Service Integration and Brokering (CI/OSC1)
Robert Bosch GmbH | Postfach 30 02 20 | 70442 Stuttgart | GERMANY
| www.bosch.com
Tel. +49 711 811-18977 | Mobil +49 173 8673114 | Fax +49 711 811 |
Jochen.Damzog@...

Sitz: Stuttgart, Registergericht: Amtsgericht Stuttgart, HRB 14000;
Aufsichtsratsvorsitzender: Franz Fehrenbach; Geschäftsführung: Dr. Volkmar Denner,
Prof. Dr. Stefan Asenkerschbaumer, Dr. Michael Bolle, Dr. Rolf Bulander, Dr. Christian Fischer,
Dr. Stefan Hartung, Dr. Markus Heyn, Dr. Dirk Hoheisel, Christoph Kübel, Uwe Raschke, Peter Tyroller


Von: cf-bosh@... <cf-bosh@...> Im Auftrag von Marco Voelz
Gesendet: Donnerstag, 13. Dezember 2018 16:46
An: cf-bosh@...
Betreff: Re: [cf-bosh] Telemetry within BOSH

 

Dear Mike,

 

Thanks for bringing this up. More insight in what the Director could definitely be helpful for a number of things. 

 

Concerning your use-case, I hope you can help me understand a few points:

  • What is it that you're trying to achieve? My current understanding is that you're trying to run some analytics on the data you want to gather. Let's say you have all the data for your Director – how are you going to use it, specifically?
  • At which level of granularity would you imagine this data to be gathered? When I'm reading 'where the Director spends its time', I'm understanding anything between 'each single statement in the code' to 'each interaction with the IaaS/CPI'.
  • To get a better understanding of where you're coming from: Have you looked at particular libraries/tools/software for this use-case that you want to share as examples when gathering telemetry data for other services you worked with?

 

Thanks and warm regards

Marco


From: cf-bosh@... <cf-bosh@...> on behalf of Mike Lloyd <mike@...>
Sent: Thursday, December 13, 2018 1:12:41 AM
To: cf-bosh@...
Subject: [cf-bosh] Telemetry within BOSH

 

Hey folks,

 

I have an interesting use case in front of me that I’m trying to figure out how to approach both sanely as well as sustainably. I have a use case where I want highly structured telemetry data from the BOSH director for downstream analytics. My target goal of this use case is to have a comprehensive and clear perspective into where BOSH is spending it’s time. I could see any type of telemetry data being immensely useful for operators and CFF developers as it can help give insight into where improvements can be made.

 

Currently I’m exploring 3 options:

  • Dump of the Director database.
    • Pro: Anything that’s logged to the database is available, can be used with external data visualisation solutions.
    • Con: Schema relations and changes with versions make this difficult to sustain; database schema isn’t well documented; potentially lots of SQL script maintenance; only visible for what’s stored in the database.
  • Adding telemetry hooks to the Director and BOSH Agent.
    • Pro: Every action the director takes can be tracked and measured through the majority of BOSH.
    • Cons: Immense amount of initial work; finding a telemetry framework; handling telemetry output.
  • Adding telemetry hooks to the BOSH CLI.
    • Pro: Every action the CLI takes can be tracked and measured.
    • Cons: Only CLI actions can be tracked; immense amount of initial work; finding a telemetry framework; handling telemetry output.

 

Looking across the industry, telemetry is very prevalent, and it’s almost always an opt-in model, so anything I explore . Since I haven’t seen anything like this discussed in the mailing lists before, I wanted to surface my explorations to get others’ thoughts and opinions on telemetry within BOSH.

 

Respectfully,

 

Mike Lloyd

t: @mxplusc

g: @mxplusb
Professional Member, ACM