[abacus]Reporting Query


Benjamin Cheng
 

In abacus, a user can get a report for an organization while passing a timestamp. This is done via a descending query with the given timestamp as the beginning and the beginning of the previous year is the end. This ensures that the report will have usage data for up to the previous year for the given organization. This query used to only look for the current day until time windows support was recently added.

With time windows, each record has information of usage accumulated/aggregated/rated to the last second/minute/hour/day/month/year/forever in sync with the document's end time. Choosing to query up to the beginning of the previous year was a choice based upon allowing the user to know usage up to the previous year for any given timestamp.

Here are a couple of questions I have regarding this:
- In addition to partitioning by bucketing, abacus partitions by period in terms of month for its databases. Therefore, each month with usage would have its own set of databases with a minimum of 1. In the worst condition(assuming only 1 database per month), that is possibly 24 databases (ie: 2014-01 to 2015-12) that the report would have to search for the last usage of an organization. Does it make sense to have to look through all of those databases if the organization hasn't had usage in the past 3 years for instance?
- In terms of report, does it make sense to return the latest record within the past year? For example, say a user queried the monthly or yearly usage for an organization in 2015-10, but the last time the organization had any usage was 2014-04. Does it make sense to return 2014-04 to the user or would it better to inform the user that there is no usage within the specific time range? I guess in a way, this is asking what a report should entail that would make the information useful to the user.

Thanks.


Jean-Sebastien Delfino
 

Hi Ben,

Choosing to query up to the beginning of the previous year was a choice
based upon allowing the user to know usage up to the previous year for any
given timestamp.

If I'd want my usage for last year, I'd just get a report with a 12/31/2014
time (or 1/2015 to see any delayed usage leftover from 2014, but that part
probably deserves a different discussion thread...) So I don't think the
query needs to automatically go that far back.

abacus partitions by period in terms of month for its databases
Correct, and that's really useful to manage their growth and archival and
accommodate for schema changes over time.

Does it make sense to have to look through all of those databases if the
organization hasn't had usage in the past 3 years for instance?

I don't think so, if we clarify what you get out of each report, more on
that below.

In terms of report, does it make sense to return the latest record within
the past year? For example, say a user queried the monthly or yearly usage
for an organization in 2015-10, but the last time the organization had any
usage was 2014-04. Does it make sense to return 2014-04 to the user or
would it better to inform the user that there is no usage within the
specific time range?

The typical use case is to get your usage for the month. So I'd suggest to
keep this really simple for now:
- you have usage in 10/2015, you get a report;
- you don't have usage in 10/2015, you get a 'sorry no usage, nothing to
report';
- you don't have usage in 10/2015 and you still wanted to see your yearly
usage, ask for your 09/2015 report, still nothing for 09? ask for 08...

With that approach, we avoid confusing the caller with some magic... (as
magically returning the 09/2015 report when the request was for 10/2015
just in case you'd want to see your yearly usage could be pretty confusing
IMO).

I guess in a way, this is asking what a report should entail that would
make the information useful to the user.

With the addition of more reporting time windows (you can get your usage
for the month, day, hour etc) we can probably imagine many different types
of reporting queries (I want my usage for this hour, accumulated today
until this hour, yearly unit now...). I'd suggest to get more concrete
input from our users on the types of queries they're most interested in,
before attempting to automate all these combinations in the reporting
service.

Until then, would the proposal I've described above work?

- Jean-Sebastien

On Wed, Oct 7, 2015 at 4:20 PM, Benjamin Cheng <bscheng(a)us.ibm.com> wrote:

In abacus, a user can get a report for an organization while passing a
timestamp. This is done via a descending query with the given timestamp as
the beginning and the beginning of the previous year is the end. This
ensures that the report will have usage data for up to the previous year
for the given organization. This query used to only look for the current
day until time windows support was recently added.

With time windows, each record has information of usage
accumulated/aggregated/rated to the last
second/minute/hour/day/month/year/forever in sync with the document's end
time. Choosing to query up to the beginning of the previous year was a
choice based upon allowing the user to know usage up to the previous year
for any given timestamp.

Here are a couple of questions I have regarding this:
- In addition to partitioning by bucketing, abacus partitions by period in
terms of month for its databases. Therefore, each month with usage would
have its own set of databases with a minimum of 1. In the worst
condition(assuming only 1 database per month), that is possibly 24
databases (ie: 2014-01 to 2015-12) that the report would have to search for
the last usage of an organization. Does it make sense to have to look
through all of those databases if the organization hasn't had usage in the
past 3 years for instance?
- In terms of report, does it make sense to return the latest record
within the past year? For example, say a user queried the monthly or yearly
usage for an organization in 2015-10, but the last time the organization
had any usage was 2014-04. Does it make sense to return 2014-04 to the user
or would it better to inform the user that there is no usage within the
specific time range? I guess in a way, this is asking what a report should
entail that would make the information useful to the user.

Thanks.


Benjamin Cheng
 

Yes, I think your proposal makes sense. I would prefer that approach rather than what I've detailed above with retrieving everything within a potential 2-year timeframe to fit purposes that the user most likely did not query in the first place for.


Jean-Sebastien Delfino
 

OK great. I've been looking into our database partitioning to fix issue #69
[1] (related to this discussion as well) earlier this week so I'll go ahead
and make that simple change then.

[1] https://github.com/cloudfoundry-incubator/cf-abacus/issues/69

HTH

- Jean-Sebastien

On Fri, Oct 9, 2015 at 4:59 PM, Benjamin Cheng <bscheng(a)us.ibm.com> wrote:

Yes, I think your proposal makes sense. I would prefer that approach
rather than what I've detailed above with retrieving everything within a
potential 2-year timeframe to fit purposes that the user most likely did
not query in the first place for.