LRU Query Cache

Question: A cache is a data store housed in memory (RAM) that is used for quick data retrieval. Real-time data in kdb+ is usually cached, allowing for quick queries. Historical data, however, is always stored on disk which results in retrieval times that are magnitudes slower. Due to the cost of RAM and sheer volume of data, it is not practical to store historical data in its entirety in RAM. We can, however, cache bits of it based on usage. The cache we will build is a Least Recently Used (LRU) cache, whose eviction policy is to discard the least recently used item. The cache will be used to store a whole day's worth of data for a single date and sym. The cache size is the maximum number of items allowed in the cache. If the cache fills up past the cache size, then the eviction policy will need to be enforced.

Given the below definitions, override the function '.z.pg' to use and update the cache whenever a query is received. Each entry in the cache is a whole day's worth of data for a single sym (ex. all AAPL data for one date). The query can come in as a qSQL string or parse tree. For simplicity's sake, no more than one date and sym are allowed in the query. Date and sym should also be supplied as the first two filters in the where clause. If these conditions are not met, '.z.pg' should error with 'BadQuery'. We are only sticking to one date and sym to simplify the logic needed for this question, however it is easy to extend the cache for supporting multiple syms in each entry.

More Information:

https://en.wikipedia.org/wiki/Cache_replacement_policies#Least_recently_used_(LRU)

Example

                                
                                cache:([]date:();sym:();data:());
CACHE_SIZE:2;

// After .z.pg defined, date partitioned and parted sym column

q)\t t1:.z.pg "select from quote where date=2018.01.03, sym=`AABA"
984
q)\t t2:.z.pg "select from quote where date=2018.01.03, sym=`AABA"
0
q)t1~t2
1b
q)cache
date       sym  data                                                         ..
-----------------------------------------------------------------------------..
2018.01.03 AABA +`date`sym`time`exchange`bid_price`bid_size`offer_price`offer..
q)\t t1:.z.pg "select max bid_price, min offer_price by `minute$time from quote where date = 2018.01.03, sym=`AAPL"
2158
q)\t t2:.z.pg "select max bid_price, min offer_price by `minute$time from quote where date = 2018.01.03, sym=`AAPL"
51
q)t1~t2
1b
q)\t .z.pg parse "select max bid_price, min offer_price by `minute$time from quote where date = 2018.01.03, sym=`AAPL"
52
q)cache
date       sym  data                                                         ..
-----------------------------------------------------------------------------..
2018.01.03 AAPL +`date`sym`time`exchange`bid_price`bid_size`offer_price`offer..
2018.01.03 AABA +`date`sym`time`exchange`bid_price`bid_size`offer_price`offer..
q)\t t1:.z.pg "select from quote where date=2018.01.03, sym=`A"
424
q)\t t2:.z.pg "select from quote where date=2018.01.03, sym=`A"
0
q)t1~t2
1b
q)cache
date       sym  data                                                         ..
-----------------------------------------------------------------------------..
2018.01.03 A    +`date`sym`time`exchange`bid_price`bid_size`offer_price`offer..
2018.01.03 AAPL +`date`sym`time`exchange`bid_price`bid_size`offer_price`offer..
                                
                            

Solution

Tags:
architecture ipc optimizations sql tables
Searchable Tags
algorithms api architecture asynchronous c csv data structures dictionaries disk feedhandler finance functions ingestion ipc iterators machine learning math multithreading optimizations realtime shared library sql statistics streaming strings tables temporal utility websockets