toggle quoted messageShow quoted text
I'm adding a 'processed' time field to the usage docs, hoping that'll help
maintain the history of usage within the slack window we've been discussing
here (more precisely that will help you know how much of that history can
roll out of the configured slack window when you process a new usage doc).
That field will also allow us to more clearly distinguish between the usage
event 'start', 'stop' and 'processed' times.
On Mon, Oct 12, 2015 at 9:04 AM, Jean-Sebastien Delfino <jsdelfino(a)gmail.com
That makes sense to me. What you've described will enable refinements of
accumulated usage for a month as we continue to receive delayed usage
during the first few days of the next month.
To illustrate this with an example: with a 48h time window, on Sept 30 you
can retrieve the Sept 30 usage doc and find 'provisional' usage for Sept in
the 'month time window', not including unknown usage not yet been submitted
to Abacus. Later on Oct 2nd you can retrieve the Oct 2nd usage doc and find
the 'final usage' for Sept in the 'month - 1 time window'. I think this is
better than waiting for Oct 2nd to 'close the Sept window', as our users
typically want to see both their *real time* usage for Sept before Oct 2nd
and their final usage later once it has settled for sure.
I also like that with that approach you don't need to go back to your Sept
30 usage doc to patch it up with delayed usage, as that way you're also
keeping a record of the Sept usage that was really known to us on Sept 30.
Another interesting aspect of this is that the history you're going to
maintain will allow us to write 'marker' usage docs when we transition from
one time window to another. Since a usage doc contains both the usage for
the day and the previous day, you can write the first document you process
each day, as a marker, in a reporting db and that'll give you an easy and
efficient way to retrieve the accumulated usage for the previous day. For
example, to retrieve the usage accumulated at the end of Oct 11, just
retrieve the 'marker' usage doc for Oct 12 and get the usage in its 'day -
1 time window'. That could help us implement the kind of query that Georgi
mentioned on the chat last week when he was looking for an efficient way to
retrieve daily usage for all the days of the month.
Finally, looking at the array of numbers/objects currently used to
maintain our time windows, I'm wondering if keeping the 'yearly' and
'forever' usage time windows is not a bit overkill (and could actually
become a problem).
That data is going to be duplicated in all individual usage docs for
little value IMO as the yearly usage at least is easy to reconstruct at
reporting time with a query over 12 monthly usage docs. Also, maintaining
that 'forever' usage will require us to keep usage docs around for resource
instances that may have been deleted long time ago, and will complicate our
database partitioning scheme as these old resource instances will cause the
databases to grow forever. So, I'd prefer to let old usage data sit in old
monthly database partitions instead of having to carry that old data over
each month forever just to maintain these 'forever' time windows.
In other words, I'm suggesting to change our current array of 7 time
windows [Forever, Y, M, D, h, m, s] to 5 windows [M, D, h, m, s]. Combined
with your slack window proposal, with a 2D slack time we'll be looking at
an array like follows: [[M, M-1], [D, D-1, D-2], [h], [m], [s]]. With a 48h
slack time the array will have 49 hourly entries [h, h-1, h-2, h-3, etc]
instead of one.
On Sun, Oct 11, 2015 at 6:04 AM, Benjamin Cheng <bscheng(a)us.ibm.com>
One of the things that need to be supported in abacus is the handling of
delayed usage submissions within a particular slack window after the usage
has happened. For example, given a slack window of 48 hours, a service
provider will be able to submit usage back to September 30th on October 2nd.
An idea that we were discussing about for this was augmenting the
quantity from an array of numbers/objects to an array of arrays of
numbers/objects and using an environmental variable that is currently going
to be called SLACK to hold the configuration of the slack window. SLACK
would follow a format of [0-9]+[YMDhms] with the width of the slack window
and to what precision the slack window should be maintained. 2D and 48h
both are the same time, but 48h will keep track of the history to the hour
level while 2D will only keep it to the day level. If this environment
variable isn't configured, the current idea is to have no slack window as
The general formula for the length of each array in a time window would
be as follows: 1(This is for usage covered in the current window) + (number
of windows to cover the configured slack window for the particular time
IE: Given a slack of 48h. The year time window would be 1 + 1. Month
would be 1 + 1. Day would be 1 + 2. Hours would be 1 + 48. Minutes/Seconds
would stay at 1.
Thoughts on this idea?