- Aggregated Pivot Table
- Bit Shifting
- Bitwise XOR
- Calculate Tick Size
- Collapsing Dictionaries
- Graphs - Breadth First Search
- Graphs - Depth First Search
- Graphs - Detecting Cycles
- Greatest Common Divisor
- Identity Matrix
- Least Common Multiple
- Log Parser
- Nanosleep
- One Hot Encoding
- Parent Child ID Mapping
- Slippage
- Split Training and Testing Sets
- Stratified Sampling
- Symbol Column Update
- Table Indexing
- Word Count
Optimized Table Sort
Question: The built-in table sort function 'xasc' can sort tables in memory or on disk. It's commonly used during end-of-day operations to sort tables on disk after write down. Define a function 'xasc2' that is similar to 'xasc', but leverages multithreading when sorting on disk. You can disregard keyed tables as inputs.
More Information:
https://code.kx.com/q/ref/asc/#xascExample
q).z.zd
17 2 6
q)count get `:trade
41303818
q)count cols `:trade
15
q)\s
12i
q)\t `sym`time xasc `:trade
145303
q)\t `sym`time xasc2 `:trade / 64% faster
51506
Solution
######## solution.q ########
.q.xasc2:{[c;t]
d:$[s:-11h~type t;get t;t];
if[(`s=attr d c) or not count c;:t];
cls:cols d;
d:@[;first c;`s#] @[d;cls;@[;iasc flip c!d c:(),c]];
$[not s;
t:d;
t~hsym t;
{.Q.dd[x;y 0] set y 1;}[t] peach flip (cls;d cls);
t set d
];
t
}
Explanation: First, get the value of the second argument (table/symbol/filepath) and store that in a table variable. Check prematurely whether the sorted attribute is already placed on the column you are trying to sort on, if so then return early. This check will only work for single column sorts. Next, generate the indices for the table using 'iasc' and apply them to the table, and apply the sorted attribute on the first column passed into the function. Afterwards, check the type of the second argument. If it is a table, then assign your return variable to it. If it is a regular symbol, then overwrite the variable for that symbol, otherwise it is a filepath and for that we overwrite the table on disk. Here is where the optimization lies. Use 'peach' to leverage multithreading and assign each column name-value pair to a thread, and have that thread write to the correct location on disk. This will take less time as more threads are enabled up to the number of columns. This will also consume more memory as multiple columns are being used simultaneously. Finally, return the return variable.