Re: [abacus] Handling Notifications


Jean-Sebastien Delfino
 

I've begun to work on supporting notifications in abacus.
Great! What you've described here looks like a good start to me. I've
included a few comments and questions below:

We already have a github issue and pivotal tracker story associated with
it as seen here:
https://github.com/cloudfoundry-incubator/cf-abacus/issues/161
https://www.pivotaltracker.com/story/show/101097834
and one more, created from Github #161 by the CF gitbot:
https://www.pivotaltracker.com/n/projects/1385414/stories/107779224
I'm guessing that's the one that'll track the commits under Github #161.

There are two ways that this will probably need to work. One via a
webhook and another via letting a client keep and open connection that
would receive a stream of notifications.

I've asked Subhash (@sinduri, author of Github #161) to post more details
of what he needs here. I'm guessing it's Webhooks but will let him
elaborate.

For the registry, I have considered Eureka <...> I'm not sure if also
using it as a registry for handling notifications would kind of warp the
purpose of its use.

IMO Eureka works fine as a registry of IP addresses for performance
monitoring of our services, but I'm not so sure about it for persistent
registrations of Webhooks and corresponding triggers/criteria, as AIUI
that'd a big deviation from what it's originally designed for. So +1 to
what you said about warping its purpose :)

Documents would probably be keyed to the given URL just for supporting
updates on new criteria for a certain callback URL and prevent multiple
documents for the same URL.

How about keying by criteria as well to know when to trigger a Webhook
call? and maybe allow multiple registrations per URL? (e.g please call me
back at http://foo.com/bar on new usage for org abc, and also when usage
for app xyz goes over 1000, would probably be two separate registration
docs for a single URL)

If a call is an error, it'd be best to retry for a set amount of
time/tries until that particular URL is marked unreachable (and thus isn't
called in subsequent notifications).

+1 for a sort of quarantine on an unreachable Webhook. How about slow
Webhooks causing back pressure problems? Would we quarantine these too?


... And a few more questions that would be good to discuss here as well:

- should we let the rating service app do this or have a separate
notification service app?

- with partitioning of the orgs across multiple deployments of our apps
(for scalability or regional deployments for example) do I need to first
find the right service to register with (e.g. register with the us-south
notification service or the eu-west notification service)? or can I
register with a central notification service that will then figure out
which deployment instance will call me back?

- how do we secure the registration calls and Webhook callbacks?

- do we replay notifications when we can't deliver them?

- can I register to receive all notifications from a certain point in the
logical stream of notifications matching a criteria (e.g. call me back if
this org consumed too much per hour at any point since last week)

- do we have specific examples of notification trigger criteria? (I've
asked Subhash to post some here to initiate that discussion).

More thoughts?

- Jean-Sebastien

On Wed, Nov 11, 2015 at 1:21 PM, Benjamin Cheng <bscheng(a)us.ibm.com> wrote:

I've begun to work on supporting notifications in abacus. We already have
a github issue and pivotal tracker story associated with it as seen here:
https://github.com/cloudfoundry-incubator/cf-abacus/issues/161
https://www.pivotaltracker.com/story/show/101097834
Abacus needs to have some support of sending a notification to some client
based upon some criteria being met.

Right now, I'm in an exploratory phase. There are two ways that this will
probably need to work. One via a webhook and another via letting a client
keep and open connection that would receive a stream of notifications.
Here's a very basic breakdown.

In terms of the webhook, there's two things to consider:
-Maintaining the information of the clients and their criteria
-Calling the provided callback URLs when a certain criteria is met

For the registry, I have considered Eureka(specifically, the module inside
of abacus) since the registration spec has a metadataType property that
hold a sequence of anything. Eureka is used as a registry for middle tier
services(or application instances in our case), so I'm not sure if also
using it as a registry for handling notifications would kind of warp the
purpose of its use. There's also some other things such as the heartbeats
that I'm not sure would be the right fit in terms of a webhook.
Maintaining the list of the URLs in a database with a separate
"Notification Registry/Hander" app could work as well. Documents would
probably be keyed to the given URL just for supporting updates on new
criteria for a certain callback URL and prevent multiple documents for the
same URL. Each document would hold a list of criteria if that's the case.

When some usage gets rated, that usage would be checked against the list
of URLs. All the matching URLs would be called in parallel. If a call is
successful, then there's nothing additional that needs to be done. If a
call is an error, it'd be best to retry for a set amount of time/tries
until that particular URL is marked unreachable(and thus isn't called in
subsequent notifications).

I haven't begun on the other avenue of keeping an open connection but
wanted to start a discussion with just the webhook for now.

Any thoughts, comments, or suggestions on this?

Join {cf-dev@lists.cloudfoundry.org to automatically receive all group messages.