Running Numbers
Running numbers are, in simple terms, when numbers are added up to each other. Like you want to know year-to-date (YTD) values. Summing numbers is maybe an often used case, but the possibilities are not restrichted to it. You could as well as do a moving average.
For this case we work with a YTD example. Let’s assume we have a table with the revenue for a certain category per month.
ea_month id amount ea_year circle_id April 92570 1000 2014 1 April 92571 3000 2014 2 April 92572 2000 2014 3 March 92573 3000 2014 1 March 92574 2500 2014 2 March 92575 3750 2014 3 February 92576 2000 2014 1 February 92577 2500 2014 2 February 92578 1450 2014 3
To get the total of the month up to now you might be tempted to use a subquery like get all the amount of the previous months of the current year of my product. It actually works, but for every row fetched a subquery is executed. Which can easily lead up to thousands of queries and therefore take ages like this example:
select branch_id, net_profit as store_profit, (select sum(net_profit) from store_sales as s2 where s2.city = s1.city) as city_profit, store_profit / city_profit * 100 as store_percentage_of_city_profit from store_sales as s1 order by branch_id; +-----------+--------------+-------------+---------------------------------+ | BRANCH_ID | STORE_PROFIT | CITY_PROFIT | STORE_PERCENTAGE_OF_CITY_PROFIT | |-----------+--------------+-------------+---------------------------------| | 1 | 10000.00 | 25000.00 | 40.00000000 | | 2 | 15000.00 | 25000.00 | 60.00000000 | | 3 | 10000.00 | 19000.00 | 52.63157900 | | 4 | 9000.00 | 19000.00 | 47.36842100 | +-----------+--------------+-------------+---------------------------------+
Postgres provides a so called window function called OVER. So in a more less natural language you tell in SQL you want to have a sum() OVER a dataset PARTITIONED by time and/or product.
SELECT ea_month, id, amount, ea_year, circle_id , sum(amount) OVER (PARTITION BY circle_id ORDER BY ea_year, ea_month) AS cum_amt FROM tbl ORDER BY circle_id, month;
To actually have a YTD the year has to be part of the PARTITION, otherwise it will be over all time.
The calculation will be done per row. So if the table has daily data, but the output is desired on a monthly basis, then the data has to be grouped and summed up per month first, before a ytd on a monthly basis would work.