Optimized Table Sort

Question: The built-in table sort function 'xasc' can sort tables in memory or on disk. It's commonly used during end-of-day operations to sort tables on disk after write down. Define a function 'xasc2' that is similar to 'xasc', but leverages multithreading when sorting on disk. You can disregard keyed tables as inputs.

More Information:

https://code.kx.com/q/ref/asc/#xasc

Example

                                
                                q).z.zd
17 2 6
q)count get `:trade
41303818
q)count cols `:trade
15
q)\s
12i
q)\t `sym`time xasc `:trade
145303
q)\t `sym`time xasc2 `:trade / 64% faster
51506

Solution

                                
                                ######## solution.q ########

.q.xasc2:{[c;t]
    d:$[s:-11h~type t;get t;t];
    if[(`s=attr d c) or not count c;:t];
    cls:cols d;
    d:@[;first c;`s#] @[d;cls;@[;iasc flip c!d c:(),c]];
    $[not s;
        t:d;
        t~hsym t;
            {.Q.dd[x;y 0] set y 1;}[t] peach flip (cls;d cls);
            t set d
    ];
    t
 }

Explanation: First, get the value of the second argument (table/symbol/filepath) and store that in a table variable. Check prematurely whether the sorted attribute is already placed on the column you are trying to sort on, if so then return early. This check will only work for single column sorts. Next, generate the indices for the table using 'iasc' and apply them to the table, and apply the sorted attribute on the first column passed into the function. Afterwards, check the type of the second argument. If it is a table, then assign your return variable to it. If it is a regular symbol, then overwrite the variable for that symbol, otherwise it is a filepath and for that we overwrite the table on disk. Here is where the optimization lies. Use 'peach' to leverage multithreading and assign each column name-value pair to a thread, and have that thread write to the correct location on disk. This will take less time as more threads are enabled up to the number of columns. This will also consume more memory as multiple columns are being used simultaneously. Finally, return the return variable.

Code:

https://bitbucket.org/alvikabir/dailyq/src/master/Advanced/optimized-table-sort

Tags:

multithreading optimizations tables

Searchable Tags

algorithms api architecture asynchronous c csv data structures dictionaries disk feedhandler finance functions ingestion ipc iterators machine learning math multithreading optimizations realtime shared library sql statistics streaming strings tables temporal utility websockets

daily q

Optimized Table Sort

Example

Solution

Tags:

Searchable Tags