Pipe-lining Time Series Calculations for Cache Efficiency
I always like to investigate new technology and this week I found a nice automatic technique for improved cache use that I had previously seen some people manually write.
Consider a database query with three steps (three SQL SELECTs), some databases may pass results of each step to temporary tables in main memory. When the first step is finished, these intermediate results are passed back into CPU cache to be transformed by the second step, then back into a new temporary table in main memory, and so on.
To eliminate this back-and-forth, vector-based statistical functions can be pipelined, with the output of one function becoming input for the next, whose output feeds a third function, etc. Intermediate results stay in the pipeline inside CPU cache, with only the full result being materialized at the end.
This technology is part of ExtremeDB, they have a video that explains it well:
Time Series Calculations
Moving Averages Stock Price Example
This is what the actual code would look like to calculate the 5-day and 21-day moving averages for a stock and detect the points where the faster moving average (5-day) crosses over or under the slower moving average (21-day):
- Two invocations of ‘seq_window_agg_avg’ execute over the closing price sequence, ‘ClosePrice’, to obtain 5-day and 21-day moving averages.
- The function ‘seq_sub’ subtracts 21- from 5-day moving averages;
- The result “feeds” a fourth function, ‘seq_cross’, to identify where the 5- and 21-day moving averages cross.
- Finally, the function ‘seq_map’ maps the crossovers to the original ‘ClosePrice’ sequence, returning closing prices where the moving averages crossed.
This approach eliminates the need to create, populate and query temporary tables outside CPU cache in main memory. One “tile” of input data is processed by each invocation of ‘mco_seq_window_agg_avg_double()’, each time producing one tile of output that is passed to ‘mco_seq_sub_double()’ which, in turn, produces one tile of output that is passed as input to mco_seq_cross_double(), which produces one tile of output that is passed to mco_seq_map_double(). When the last function, mco_seq_map_double() has exhausted its input, the whole process repeats from the top to produce new tiles for additional processing.
A very cool idea!
And yes, ExtremeDB are the same guys that posted the top Stac M3 benchmark for a while (in 2012/13 I think).