Running Numbers

Running numbers are, in simple terms, when numbers are added up to each other. Like you want to know year-to-date (YTD) values. Summing numbers is maybe an often used case, but the possibilities are not restrichted to it. You could as well as do a moving average.

For this case we work with a YTD example. Let’s assume we have a table with the revenue for a certain category per month.

ea_month    id       amount    ea_year    circle_id
April       92570    1000      2014        1
April       92571    3000      2014        2
April       92572    2000      2014        3
March       92573    3000      2014        1
March       92574    2500      2014        2
March       92575    3750      2014        3
February    92576    2000      2014        1
February    92577    2500      2014        2
February    92578    1450      2014        3   

To get the total of the month up to now you might be tempted to use a subquery like get all the amount of the previous months of the current year of my product. It actually works, but for every row fetched a subquery is executed. Which can easily lead up to thousands of queries and therefore take ages like this example:

select branch_id,
       net_profit as store_profit,
       (select sum(net_profit) from store_sales as s2 where s2.city = s1.city) as city_profit,
       store_profit / city_profit * 100 as store_percentage_of_city_profit
    from store_sales as s1
    order by branch_id;
+-----------+--------------+-------------+---------------------------------+
| BRANCH_ID | STORE_PROFIT | CITY_PROFIT | STORE_PERCENTAGE_OF_CITY_PROFIT |
|-----------+--------------+-------------+---------------------------------|
|         1 |     10000.00 |    25000.00 |                     40.00000000 |
|         2 |     15000.00 |    25000.00 |                     60.00000000 |
|         3 |     10000.00 |    19000.00 |                     52.63157900 |
|         4 |      9000.00 |    19000.00 |                     47.36842100 |
+-----------+--------------+-------------+---------------------------------+

Postgres provides a so called window function called OVER. So in a more less natural language you tell in SQL you want to have a sum() OVER a dataset PARTITIONED by time and/or product.

SELECT ea_month, id, amount, ea_year, circle_id
     , sum(amount) OVER (PARTITION BY circle_id
                         ORDER BY ea_year, ea_month) AS cum_amt
FROM   tbl
ORDER  BY circle_id, month;

To actually have a YTD the year has to be part of the PARTITION, otherwise it will be over all time.

The calculation will be done per row. So if the table has daily data, but the output is desired on a monthly basis, then the data has to be grouped and summed up per month first, before a ytd on a monthly basis would work.