Date
1 - 3 of 3
[abacus] Refactor Aggregated Usage and Aggregated Rated Usage data model
Saravanakumar A. Srinivasan
Started to look into two user stories([1] and [2]) titled "Organize the usage report data model for better querying and DB utilization"
Current state of Abacus processing pipeline starting from Usage Accumulator: a) Usage Accumulator processes metered usage for a resource instance, accumulates the usage at resource instance scope and then forwards accumulated usage for a resource instance to Usage Aggregator. b) Usage Aggregator processes accumulated usage for a resource instance, aggregates the usage at following scopes: organization.resources, organization.resources.plans, organization.spaces.resources, organization.spaces.resources.plans, organization.spaces.consumers.resources and organization.spaces.consumers.resources.plans, and then forwards aggregated usage for an organization to Usage Rating Service. c) Usage Rating Service processes aggregated usage for an organization, rates the aggregated usage at following scopes: organization.resources.plans, organization.spaces.resources.plans, and organization.spaces.consumers.resources.plans. d) Usage Reporting Service processes rated usage for an organization and summarizes usage and charge at all aggregation scopes. See [3] for a sample Abacus usage report. Initial thought on changes needed to optimize steps b, c, and d are b) Usage Aggregator processes accumulated usage for a resource instance and aggregates and rates the usage at a consumer scope - equivalent to the scopes of organization.spaces.consumers.resources and organization.spaces.consumers.resources.plans and then maintains a normalized aggregated usage for an organization that contains references to all consumer scoped documents that belong to the organization. c) Eliminate Usage Rating Service and split the current rating step across Usage Aggregator and Usage Reporting Service. d) Usage Reporting Service processes a normalized aggregated usage for an organization, uses references to get all consumer scoped documents that belong to the organization, aggregates and rates consumer scoped usage at all other scopes, and then summarizes usage and charge at all aggregation scopes. Any comments? [1] https://www.pivotaltracker.com/story/show/107598654 [2] https://www.pivotaltracker.com/story/show/107598652 [3] https://gist.github.com/sasrin/697437b33d38bdddf825#file-report-json Thanks, Saravanakumar Srinivasan (Assk), Bay Area Lab, 1001, E Hillsdale Blvd, Ste 400, Foster City, CA - 94404. E-mail: sasrin(a)us.ibm.com Phone: 650 645 8251 (T/L 367-8251) |
|
Jean-Sebastien Delfino
Hi all,
Here's an update on this topic and the design discussions Assk, Ben and I had in the last few days: I'll start with a description of the problem we're trying to solve here: Abacus currently computes and stores the aggregated usage at various levels within an org in real time. Each time new usage for resource instances gets submitted we compute your latest aggregated usage at the org, space, app, resource and plan level, and store that in a new document keyed by the org id and the current time. We effectively write a history of your org's aggregated usage in the Abacus database, and that design allows us to efficiently report your latest usage, your usage history, or trigger usage limit alerts in real time for example, simply because we always have your latest usage for a given time in hand in a single doc, as opposed to having to run complex database queries pulling all your usage data into an aggregation when it's needed. So, that design is all good until somebody creates a thousand (or even a hundred) apps in the org. With many apps, our aggregated usage (JSON) docs get pretty big as we're keeping track of the aggregated usage for each app, JSON is not very space-efficient at representing all that data (that's a euphemism), and since we're writing a new doc for each new submitted usage, we eventually overload our Couch database with these big JSON docs. Long story short... this discussion is about trying to optimize our data model for aggregated usage to fix that problem. It's also an example of the typical tension in systems that need to stream a lot of data, compute some aggregates, and make quick decisions based on them: (a) do you pro-actively compute and store the aggregated values in real time as you're consuming your stream of input data? or (b) do you just write the input data and then run a mix of pseudo-real time and batch queries over and over on that data to compute the aggregates later? Our current design is along the lines of (a), but we're starting to also poke at ideas from the (b) camp to mitigate some of the issues of the (a) camp. The initial proposal described by Assk earlier in this thread was to split the single org level doc containing all the usage aggregations within the org into smaller docs: one doc per app for example (aka consumer in Abacus as we support usage from other things than pure apps). That's what he was calling 'normalized' usage, since the exercise of coming up with that new structure would be similar to a 'normalization' of the data in the relational database sense, as opposed to the 'denormalization' we went through to design the structure of our current aggregated usage doc (a JSON hierarchical structure including some data duplication). Now, while that data 'normalization' would help reduce the size of the docs and the amount of data written to record the history of your org's aggregated usage, in the last few days we've also started to realize that it would on the other hand increase the amount of data we'd have to read, to retrieve all the little docs representing the current aggregated usage and 'join' them into a complete view of the org's aggregated usage before adding new usage to it... Like I said before, a tension between two approaches, (a) writes a lot of data, is cheap on reads, (b) writes the minimum, requires a lot of reads... nothing's easy or perfect :) So the next step here is going to be an evaluation of some of the trade-offs between: a) write all the aggregated usage data for an org in one doc like we do now but simplify and refactor a bit the JSON format we use to represent it, in an attempt to make that JSON representation much smaller; b) split the aggregated usage in separate docs, one per app, linked together by a parent doc per org containing their ids, and optimize (with caching for example) the reads and 'joins' of all the docs forming the aggregated usage for the org; c) a middle-ground approach where we'll store the aggregated usage per app in separate docs, but maintain the aggregated usage at the upper levels (org, space, resource, plan) in the parent doc linking the app usage docs together, and explore what constrains or limitations that would impose on our ability to trigger real time usage limit alerts at any org, space, resource, plan, app etc level. This is a rather complex subject, so please feel free to ask questions or send any thoughts here, or in the tracker and Github issues referenced by Assk earlier if that's easier. Thanks! - Jean-Sebastien On Fri, Nov 20, 2015 at 11:09 AM, Saravanakumar A Srinivasan < sasrin(a)us.ibm.com> wrote: Started to look into two user stories([1] and [2]) titled "Organize the |
|
Saravanakumar A. Srinivasan
c) a middle-ground approach where we'll store the aggregated usage per app in separate docs, but maintain the aggregated usage at the upper levels (org, space, resource, plan) in the parent doc linking the app usage docs together, and explore what constrains or limitations that would impose on our ability to trigger real time usage limit alerts at any org, space, resource, plan, app etc level.As a first step (refer to [1] for more details) to refactor the usage data model using middle-ground approach, we have removed Usage Rating Service from Abacus pipeline (refer to commit at [2]) and moved entire rating implementation from Usage Rating Service to Usage Aggregator (refer to commit at [3]) With these commits, If you are using Abacus, be aware that the Abacus pipeline has become shorter and you have one less application (Usage Rating Service) to manage. [1] https://github.com/cloudfoundry-incubator/cf-abacus/issues/184 [2] https://github.com/cloudfoundry-incubator/cf-abacus/commit/1488e1ae2e4547a010151ad2245f3a3f1ff2e488 [3] https://github.com/cloudfoundry-incubator/cf-abacus/commit/c661b7bdd35e70e985583570cb9920b90ced44a8 |
|