Date
1 - 6 of 6
[abacus] Accepting delayed usage within a slack window
Jean-Sebastien Delfino
Hi Ben,
toggle quoted message
Show quoted text
I'm adding a 'processed' time field to the usage docs, hoping that'll help maintain the history of usage within the slack window we've been discussing here (more precisely that will help you know how much of that history can roll out of the configured slack window when you process a new usage doc). That field will also allow us to more clearly distinguish between the usage event 'start', 'stop' and 'processed' times. HTH - Jean-Sebastien On Mon, Oct 12, 2015 at 9:04 AM, Jean-Sebastien Delfino <jsdelfino(a)gmail.com
wrote: Hi Ben, |
|
Michael Maximilien
+1
toggle quoted message
Show quoted text
So long this does not prevent DBs from being sharded. Even if the penalty for queries of distant past is higher (e.g., slow). And, as we discussed last Friday, this slack value can be fixed for now and be made configurable later in future stories. I am hoping others who hare interested in this feature chime here as well. Best, max On Mon, Oct 12, 2015 at 7:53 PM, Jean-Sebastien Delfino <jsdelfino(a)gmail.com
wrote: The benefit in having the year window allows only having to go to asingle database as opposed to a potential 12 databases with month windows |
|
Jean-Sebastien Delfino
The benefit in having the year window allows only having to go to asingle database as opposed to a potential 12 databases with month windows Correct, if your resource instance has incurred usage in the last month, but if no usage has been submitted for a resource instance since Jan for example, then we still need to run a descending query back to Jan, giving us a max of 12 database partitions to scan for old/inactive resource instances when we do that in Dec (which is typically when people start to get more interested in their yearly usage.) but I think that probably doesn't outweigh having to duplicate the yearlydata on every document. +1, that's what I was thinking. - Jean-Sebastien On Mon, Oct 12, 2015 at 5:57 PM, Benjamin Cheng <bscheng(a)us.ibm.com> wrote: I'm leaning towards agreeing with you in terms of reducing the number of |
|
Benjamin Cheng
I'm leaning towards agreeing with you in terms of reducing the number of windows. I agree with what you've said on forever. The only case I can point out is in years. The benefit in having the year window allows only having to go to a single database as opposed to a potential 12 databases with month windows, but I think that probably doesn't outweigh having to duplicate the yearly data on every document.
|
|
Jean-Sebastien Delfino
Hi Ben,
toggle quoted message
Show quoted text
That makes sense to me. What you've described will enable refinements of accumulated usage for a month as we continue to receive delayed usage during the first few days of the next month. To illustrate this with an example: with a 48h time window, on Sept 30 you can retrieve the Sept 30 usage doc and find 'provisional' usage for Sept in the 'month time window', not including unknown usage not yet been submitted to Abacus. Later on Oct 2nd you can retrieve the Oct 2nd usage doc and find the 'final usage' for Sept in the 'month - 1 time window'. I think this is better than waiting for Oct 2nd to 'close the Sept window', as our users typically want to see both their *real time* usage for Sept before Oct 2nd and their final usage later once it has settled for sure. I also like that with that approach you don't need to go back to your Sept 30 usage doc to patch it up with delayed usage, as that way you're also keeping a record of the Sept usage that was really known to us on Sept 30. Another interesting aspect of this is that the history you're going to maintain will allow us to write 'marker' usage docs when we transition from one time window to another. Since a usage doc contains both the usage for the day and the previous day, you can write the first document you process each day, as a marker, in a reporting db and that'll give you an easy and efficient way to retrieve the accumulated usage for the previous day. For example, to retrieve the usage accumulated at the end of Oct 11, just retrieve the 'marker' usage doc for Oct 12 and get the usage in its 'day - 1 time window'. That could help us implement the kind of query that Georgi mentioned on the chat last week when he was looking for an efficient way to retrieve daily usage for all the days of the month. Finally, looking at the array of numbers/objects currently used to maintain our time windows, I'm wondering if keeping the 'yearly' and 'forever' usage time windows is not a bit overkill (and could actually become a problem). That data is going to be duplicated in all individual usage docs for little value IMO as the yearly usage at least is easy to reconstruct at reporting time with a query over 12 monthly usage docs. Also, maintaining that 'forever' usage will require us to keep usage docs around for resource instances that may have been deleted long time ago, and will complicate our database partitioning scheme as these old resource instances will cause the databases to grow forever. So, I'd prefer to let old usage data sit in old monthly database partitions instead of having to carry that old data over each month forever just to maintain these 'forever' time windows. In other words, I'm suggesting to change our current array of 7 time windows [Forever, Y, M, D, h, m, s] to 5 windows [M, D, h, m, s]. Combined with your slack window proposal, with a 2D slack time we'll be looking at an array like follows: [[M, M-1], [D, D-1, D-2], [h], [m], [s]]. With a 48h slack time the array will have 49 hourly entries [h, h-1, h-2, h-3, etc] instead of one. Thoughts? - Jean-Sebastien On Sun, Oct 11, 2015 at 6:04 AM, Benjamin Cheng <bscheng(a)us.ibm.com> wrote:
One of the things that need to be supported in abacus is the handling of |
|
Benjamin Cheng
One of the things that need to be supported in abacus is the handling of delayed usage submissions within a particular slack window after the usage has happened. For example, given a slack window of 48 hours, a service provider will be able to submit usage back to September 30th on October 2nd.
An idea that we were discussing about for this was augmenting the quantity from an array of numbers/objects to an array of arrays of numbers/objects and using an environmental variable that is currently going to be called SLACK to hold the configuration of the slack window. SLACK would follow a format of [0-9]+[YMDhms] with the width of the slack window and to what precision the slack window should be maintained. 2D and 48h both are the same time, but 48h will keep track of the history to the hour level while 2D will only keep it to the day level. If this environment variable isn't configured, the current idea is to have no slack window as the default. The general formula for the length of each array in a time window would be as follows: 1(This is for usage covered in the current window) + (number of windows to cover the configured slack window for the particular time window). IE: Given a slack of 48h. The year time window would be 1 + 1. Month would be 1 + 1. Day would be 1 + 2. Hours would be 1 + 48. Minutes/Seconds would stay at 1. Thoughts on this idea? |
|