Tuesday, May 16, 2023

Parsing Binary Files in q

I gave a presentation on parsing binary files in q at the March 16, 2023 KX Meetup; here are some relevant links:

Friday, December 30, 2016

Riddle 4 Answer

So apparently this is an annual series now…

Anyway, Dan Nugent and Akash were both on the right track, but neither had it exactly right: when a qsql query inside a function is executed, relative names used in it resolve against globals in the current namespace, not the namespace that was in effect when the function was created (i.e. the one returned by running {(get x). 3 0} on the function).

(Note that this actually applies to any type of global (e.g. an atom, a vector, etc.) referenced from a query inside a function, but for convenience, I’ll be writing this post assuming a function is what’s being referenced.)

I will admit to making this a bit of a trick question, as the way I constructed the example was designed around the most common case, which is entirely consistent with Dan’s and Akash’s answers. Here's a snippet showing the full behavior:
% q
KDB+ 3.3 2015.09.02 Copyright (C) 1993-2015 Kx Systems
m32/ 16()core 8192MB adavies aaron-daviess-mac-pro.local 192.168.1.151 NONEXPIRE  

q)f:{x+1}
q)\d .foo
q.foo)g:{select f a from x}
q.foo)\d .bar
q.bar).foo.g([]a:1 2 3)
{select f a from x}
'f
q.foo))\
q.bar)f:{x+2}
q.bar).foo.g([]a:1 2 3)
a
-
3
4
5
q.bar)
Compare to this snippet, where I reference the function from outside the qsql query:
% q
KDB+ 3.3 2015.09.02 Copyright (C) 1993-2015 Kx Systems
m32/ 16()core 8192MB adavies aaron-daviess-mac-pro.local 192.168.1.151 NONEXPIRE  

q)\d .foo
q.foo)f:{x+1}
q.foo)g:{f select a from x}
q.foo)\d .
q).foo.g([]a:1 2 3)
a
-
2
3
4
q)
This can become problematic if you try to create a group of functions in a namespace, some of which reference each other in queries, and then call those functions from outside that namespace. I’ve found two solutions for this, both unfortunately rather inelegant.
  1. You can “copy” the function you need from the global space to a local variable by referencing it from outside any qsql queries:
    % q
    KDB+ 3.3 2015.09.02 Copyright (C) 1993-2015 Kx Systems
    m32/ 16()core 8192MB adavies aaron-daviess-mac-pro.local 192.168.1.151 NONEXPIRE  
    
    q)\d .foo
    q.foo)f:{x+1}
    q.foo)g:{f0:f;select f0 a from x}
    q.foo)\d .
    q).foo.g([]a:1 2 3)
    a
    -
    2
    3
    4
    q)
  2. You can do the name resolution yourself, by leveraging the get results to automatically reference the correct namespace:
    % q
    KDB+ 3.3 2015.09.02 Copyright (C) 1993-2015 Kx Systems
    m32/ 16()core 8192MB adavies aaron-daviess-mac-pro.local 192.168.1.151 NONEXPIRE  
    
    q)\d .foo
    q.foo)f:{x+1}
    q.foo)g:{select((` sv`,((get .z.s). 3 0))`f)a from x}
    q.foo)\d .
    q).foo.g([]a:1 2 3)
    a
    -
    2
    3
    4
    q)
    This can be encapsulated in a utility function, but note that it must then itself be referenced by an absolute name, or the same problem will apply to it:
    % q
    KDB+ 3.3 2015.09.02 Copyright (C) 1993-2015 Kx Systems
    m32/ 16()core 8192MB adavies aaron-daviess-mac-pro.local 192.168.1.151 NONEXPIRE  
    
    q)\d .util
    q.util)r:{(` sv`,((get x). 3 0))y}
    q.util)\d .foo
    q.foo)f:{x+1}
    q.foo)g:{select((.util.r .z.s)`f)a from x}
    q.foo)\d .
    q).foo.g([]a:1 2 3)
    a
    -
    2
    3
    4
    q)

Labels: , ,

Friday, May 20, 2016

Riddle 4: What’s Going On Here?

What’s happening in this snippet, and why is it interesting?
% q
KDB+ 3.3 2016.03.14 Copyright (C) 1993-2016 Kx Systems
m32/ 2()core 2048MB adavies air.local 10.37.129.2 NONEXPIRE

q)f:{x+1}
q)\d .foo
q.foo)g:{select f a from x}
q.foo)\d .
q).foo.g([]a:1 2 3)
a
-
2
3
4
q)

Labels: ,

Riddle 3 Answer

And it looks like I did it again—Riddle 3 has been lacking an official answer for almost a year!
Ciaran Gorman’s answer was correct—strings with leading and/or trailing space are exactly what I was thinking of, and the serialization technique he showed is how to deal with them.
The following function should work as a general solution:
{0x01,($[.z.o like"s*";reverse;::]0x0 vs"i"$10+count x),0x000000f5,("x"$x),0x00}

Labels: , ,

Hello from Montauk!

I’m at KxCon 2016 in Montauk, and having lots of fun so far. Look me up if you’re here!

Labels:

Thursday, May 19, 2016

github.com/adavies42/qist

q-ist is now on GitHub! I’ve started a new repository, https://github.com/adavies42/qist. My old code from contrib is now available there, but more importantly, I received permission from work to release a collection of utilities and example code from my personal library. This includes a much fuller-featured version of my wtf function, my awq tool for using q as a text filter, dozens of miscellaneous utility functions, and more. I don’t have a README for the whole repository written yet, so to get you started, most of the interesting stuff is in lib, except for awq, which is in bin. Have fun exploring the repository, and feel free to comment or email me with any questions about the code.

Labels:

Friday, October 16, 2015

A Combinatoric Combinator

I wrote this while I was playing around with using q on a hobby project, and I thought I’d share it in case anyone else might find it useful.
It takes a number k, a function f, and a list or dictionary y of count n, and runs f once for each of the k-combinations of y. The result is returned as a dictionary with the function outputs as its values and its keys determined by the type of y: if y is a list, its keys are the subsets of y that produced the outputs; if y is a dictionary, its keys are the subsets of the keys of y that index the subsets of y that produced the outputs.

eachc:{
    c:(where reverse 0b vs)each c@:where((first x)=sum 0b vs)each c:til"j"$2 xexp count y;
    (last x)peach$[99h=type y;(key each y)!y:y{((key x)y)#x}/:c;y!y:y@/:c]}

Examples:
q)eachc[(3;sum)]til 5
0 1 2| 3
0 1 3| 4
0 2 3| 5
1 2 3| 6
0 1 4| 5
0 2 4| 6
1 2 4| 7
0 3 4| 7
1 3 4| 8
2 3 4| 9
q)eachc[(3;sum)]`a`b`c`d`e!til 5
a b c| 3
a b d| 4
a c d| 5
b c d| 6
a b e| 5
a c e| 6
b c e| 7
a d e| 7
b d e| 8
c d e| 9
q)
Notes and caveats:
On little-endian machines (i.e. Sparc), the reverse will probably need to be removed.
The size of the result set gets very big very quickly—an n of thirty is probably infeasible for most machines.
I’ve written it to execute f on the combinations with peach, rather than each; this may or may not be appropriate, depending on the nature of any given f and y.

Labels:

Monday, June 8, 2015

Riddle 3: not x~string`$x

Here’s an easy one:

When will {(10h=type x)&not x~string`$x} return true (1b)?

Slightly harder:

In cases where this is problematic, what can be done about it?

Labels: ,

Friday, April 24, 2015

Name! That! Function! (Pivot Table Edition)

What time is it, kids? That’s right, it’s time to play Name! That! Function!
Seriously though, I have a small, useful (IMAO) function I’ve been entering freehand in the console for something like two years now.
Normally, I’d put it in my personal library, but there’s a problem—I can’t think of a good name for it.
Here’s the function: {((union)over key each x)#/:x}.
And here it is in context, showing what it’s good for:
q)t:([]id:1 1 2 2 3;k:`a`b`b`a`b;v:1 2 3 4 5)
q)t
id k v
------
1  a 1
1  b 2
2  b 3
2  a 4
3  b 5
q)exec k!v by id:id from t
id|         
--| --------
1 | `a`b!1 2
2 | `b`a!3 4
3 | (,`b)!,5
q){((union)over key each x)#/:x}exec k!v by id:id from t
id| a b
--| ---
1 | 1 2
2 | 4 3
3 |   5
q)
So, there it is—an easy way to fix up ad-hoc pivots1 when your data doesn’t have all keys present (and in the same order) on all ids.
Anyone have any ideas what to call it?
  1. Note that for production pivots, particularly if they involve significant amounts of data, you should be using an optimized pivot function.

Labels:

Monday, March 9, 2015

Riddle 2 Answer

This is somewhat esoteric, and I wouldn’t be surprised if very few people had any idea what I was even asking. I discovered this largely by chance, while fiddling around with function bytecode, though I think it could be deduced from observation without that.

So, the answer:

:: is usually taught as a sui generis operator, called “global amend”, which has the specific behavior (when used as a verb inside a function) of setting a global variable (instead of the local one that : would set in the same place). No connection is typically drawn between it and any other operator (other than :).

However, I’m pretty sure this is inaccurate. While obviously I don’t know for certain, I strongly suspect that there is no code anywhere in the q binary saying that :: is defined as “global amend”. Rather, it is a specific case of the dyadic “f:” pattern, where f is some dyadic function—e.g. dyadic +:, -:, *:, etc.

These all have the same behavior—x f:y is defined as x:x f y.

Additionally, when used inside functions on variables that have not been identified by the compiler as locals, they modify (and if necessary, create), global variables.

q){a:1;a+:1;a}[]
2
q)a
'a
q){a+:1}[]
q)a
1
q)

It follows that if f is :, then the operation involved is assignment, and so x gets y assigned to it, as a global variable if not identified as a local variable.

In fact, this can be seen in the same way:

q){a:1;a::2;a}[]
2
q)a
'a
q){a::2}[]
q)a
2
q)

Thus arises “global amend”.

If anything, the “create view” sense of :: must be the special case, as ordinarily, dyadic f: verbs behave identically inside and outside functions.

Labels: , ,

Monday, January 19, 2015

Riddle 2: “:: is not a special case”

(I suppose it’s a good thing I only said I’d be posting riddles “occasionally”, as it’s now almost two years since I last posted one.)

Referring specifically to the “global amend” sense, I make the claim that :: is not a special case.

What do I mean by this?

Labels: ,

Wednesday, October 9, 2013

Riddle 1 Answer

I appear to have left Riddle 1 sitting out there without an official answer for almost seven months now. Sorry about that.

The answer given by Peter Byrne was valid, and essentially the one I was thinking of: while his example dealt with the untyped empty list (), I had the typed empty list `boolean$() in mind.

The insight here is that any and all are forms of min and max; and that min x,y, the min of the concatenation of two lists, is equal to min(min x;min y), the min of their separate mins (and mutatis mutandis for max). For this to work consistently for empty lists, the min of an empty list must be the maximum possible value for that data type (and mutatis mutandis for max).

Labels: , ,

Quick Tip: Intra-Statement Breakpoints

A statement referencing a non-existent variable is a common way to add a breakpoint to a q function.
q)f:{x:x+1;break;x+2}
While this is handy, it only lets you break between statements. A simple extension lets you break within statements, inspecting values at arbitrary points in the code, and continuing with execution once you’re done:
q)f:{x:x+1;{break;x}x+2}
Now, the break will occur after the portion of the statement to the right of the break function has executed, and x within the break function will have the value returned by that code. Since the break statement itself has no effects, and x contains the value returned by the code executed so far, typing : to continue will allow the rest of the break function to execute, returning x leftwards and allowing the rest of the statement to execute as if nothing had interrupted it.

Note that this is entirely legal inside qsql queries; I’ve often found it of particular utility there, since you can’t create new local variables inside a query. Unfortunately no specific examples come to mind at the moment; I’ll try to post one later to make it clearer how this technique works.

Labels: ,

Thursday, May 30, 2013

Parallel xasc

Sorting a table can be divided into two parts: determining the new order for the rows, and applying that ordering to the columns. While the former can’t be parallelized in q, the latter can. I don’t have any hard numbers handy at the moment, but with large tables and under the right conditions, I’ve seen noticeable speedups.

Note, BTW, that you can’t (and shouldn’t) write to disk from inside a peach, so this is only applicable to an ordinary in-memory table sort, not the on-disk variety (`c xasc`:t).

q)pxasc :{(count keys y)!flip{y x}[ iasc(raze x)#0!y]peach flip 0!y}
q)pxdesc:{(count keys y)!flip{y x}[idesc(raze x)#0!y]peach flip 0!y}

Labels:

Monday, March 11, 2013

My kdb+ User Meeting Presentation

Here are the slides from the presentation I gave today at the kdb+ user meeting at BAML, in keynote, pdf, and powerpoint formats:
UPDATE: and here's the code as a loadable .q file:

Labels: ,

Riddle 1: (all x)and not any x

When will {(all x)and not any x} return true (1b)?

Labels: ,

Tuesday, December 11, 2012

Riddle 0 Answer

As several people guessed in the comments, the answer I had in mind was what I think of as “compressed” matrices—general lists where some entries are lists of the same length and others are atoms.

q)(1f;`a`b)
1f
`a`b
q)flip flip(1f;`a`b)
1 1
a b
q)

This is, as far as I can tell, directly related to atomic extension (x f'y behaves identically for vector/vector, atom/vector, and vector/atom) and similar concepts, such as the ability to use an atom for a constant column in a table literal (([]x:1 2 3;y:4)).

(My original intention was to post these more or less weekly; hopefully I’ll be able to stick a bit closer to that in future.)

Labels: ,

Sunday, November 4, 2012

Riddle 0: not x~flip flip x

I’m going to start occasionally posting q riddles; this is the first.

When will {x~flip flip x} return false (0b)?

Labels: ,

Monday, October 15, 2012

Functional Query Functions

So, five months ago I said I’d cover functional query utility functions “next time”. Well, next time is finally here. Sorry for the wait; hope it was worth it.

Before I begin, though, two quick notes:
  1. Functional queries have no performance advantage. They are identical to qsql queries in speed and memory usage.
  2. Funcitonal queries are hard to maintain. Even using the techniques described here, functional queries are still harder to read and understand than qsql queries.
Therefore, functional queries should only be used when qsql is incapable of accomplishing the task at hand. (Typically, this involves dynamically choosing columns.)

As mentioned previously, getting the parse trees of the c, b, and a arguments to functional queries right is quite tricky. One technique I frequently use to make this easier is to use utility functions to abstract away as much of that complexity as possible. In their most basic form, they simply generate (composable) elements of the relevant structures:
q)c:{parse["select from t",$[count x;" where ",x;""]]. 2 0}
q)b:{parse["select",$[count x;" by ",x;""]," from t"]3}
q)a:{parse["select ",x," from t"]4}
For example:
q)t:([]x:1 2 3 4;y:1 1 2 3;z:7 8 9 10)
q)select sum x by y from t where x<>1
y| x
-| -
1| 2
2| 3
3| 4
q)?[t;c"x<>1";b"y";a"sum x"]
y| x
-| -
1| 2
2| 3
3| 4
q)
In practice, of course, these functions should be used to express in qsql whatever parts of a query are so expressible, while reserving the actual parse trees for the parts of the query that need them:
q)show each t{?[x;c"x<>1";b"y";(enlist y)!enlist(sum y)]}/:`x`z;
y| x
-| -
1| 2
2| 3
3| 4
y| z
-| --
1| 8
2| 9
3| 10
q)
More specialized utilities can also be written, once their need becomes apparent. One particularly difficult thing to express in functional form is a multi-column fby:
q)t2::update k:`a`a`b`c from t
q)select from t2 where i=(last;i)fby([]y;k)
x y z  k
--------
2 1 8  a
3 2 9  b
4 3 10 c
q)parse"select from t2 where i=(last;i)fby([]y;k)"
?
`t2
,,(=;`i;(k){@[(#y)#x[0]0#x 1;g;:;x[0]'x[1]g:.=y]};(enlist;last;`i);(+:;(!;,`y`k;(enlist;`y;`k)))))
0b
()
q)
The primary problem is that the parse of this query contains the instructions for assembling a table literal out of its individual pieces. Here’s the functional equivalent, written all the way out:
q)?[t2;enlist(=;`i;(fby;(enlist;last;`i);(flip;(!;enlist`y`k;(enlist;`y;`k)))));0b;()]
x y z  k
--------
2 1 8  a
3 2 9  b
4 3 10 c
q)
Using a utility function, it can instead be written much more simply:
q)fbyx:{.[parse["select from t where ",x,"fby c"]. 2 0 0;2 2;:;(flip;(!;enlist(),y;(enlist,y)))]}
q)?[t2;enlist fbyx["i=(last;i)";`y`k];0b;()]
x y z  k
--------
2 1 8  a
3 2 9  b
4 3 10 c
q)
Judicious application of these techniques (not to mention judicious application of functional querying in the first place) can make code much more readable.

Labels:

Thursday, July 12, 2012

Dictionaries and Vectors as Functions

IMAO this is one of the more interesting bits of theory embedded in q:

Considering a dictionary as a (partial) function from its key (domain) to its value (range), then two dictionaries f and g such that f's value and g's key are of the same type can be composed:

q)f:`a`b`c!1 2 3
q)g:1 2 3!("foo";"bar";"quux")
q)g f
a| "foo"
b| "bar"
c| "quux"
q)(g f)`b
"bar"
q)

Considering a vector v as a dictionary with a key of the vector of integers from 0 to count[v]-1, then v can be composed with a dictionary h of integer value:

q)v:42 137 23
q)h:`a`b`c!0 1 2
q)v h
a| 42
b| 137
c| 23
q)(v h)`b
137
q)

Labels: ,

Wednesday, June 13, 2012

Quick Tip: Statistics on Booleans

Due to q’s type promotion rules, it’s entirely legal to use statistical functions on boolean vectors. avg tends to be the most useful, but all of them should work as expected.

A typical use case: average nullness (handy while developing ETL code). We will simulate with a vector of floats mistakenly parsed as ints (due to mostly looking like ints):

q)v:@[string 1000?1000;-10?1000;,[;".123"]]
q)t:flip(enlist`v)!(enlist"I";" ")0:v
q)t
v  
---
468
959
221
694
934
865
344
997
314
580
45 
745
898
935
64 
177
238
361
850
241
..
q)avg null t
v| 0.01
q)

So 10% of t.v is null, which (unless this is expected) should cause us to ask ourselves whether "I" was really the right parse.

Labels:

Monday, June 4, 2012

What the Function?!?

q is notorious for its limited debugging features. One prominent aspect of this is the way runtime errors in the interactive shell are handled: you enter the debug shell and are presented with the body of the function that failed and the error that occurred. What’s missing? The name of the function!

So, to fill that gap, I’ve written a function wtf[] which will search the workspace for its argument, returning a fully-qualified name if it finds anything. wtf, along with brief documentation and a couple examples, can now be found in the lib subdirectory of my contrib (a section I hope to expand in the future).

Labels: ,

Monday, May 28, 2012

And Back

Well, I’m back from Ireland and Kx2012. I met a lot of interesting people, heard some great talks, had a lot of fun, saw a beautiful country, and came back with a bunch of ideas for more posts. All in all, a good time. Hope to see some of you more often!

Labels:

Tuesday, May 22, 2012

Off to Ireland!

Looking forward to meeting some of you at the conference!

Labels:

Monday, May 21, 2012

Principles of Parse Trees

One of the more frequent topics that comes up when people start moving beyond beginner's q is functional queries, which are required to parameterize a query by column name, and are often useful for various other reasons as well.

One of trickiest parts of functional queries is the parse trees that appear in their second, third, and fourth arguments (henceforth c, b, and a). Here are a few rules that should summarize most of what's important to know about them:

  • c is a general list, each element of which is a parse tree. Its degenerate form (no conditions) is (), the empty general list.
  • c appears in the results of parse with an extra level of enlist applied to it; this is only necessary when sending the parse tree back to eval (as opposed to using it to write a direct invocation of ?).
  • b and a are both dictionaries with symbol keys and parse tree values. The degenerate form of b (no grouping) is 0b; the degenerate form of a (select all columns as-is) is ().
  • Symbol literals are enlisted.
  • Names (parameters, local or global variables, table columns, functions, etc.) become symbols.
  • Functions become sexps, like in LISP.
  • Adverbs are (technically) monadic functions which take dyadic verbs as arguments and yield dyadic verbs as their return values.
  • Global constants can be referenced by name or by value.
  • Anything which can be computed outside the query framework, may be.

Some elucidation on the last few points:

Adverbs will show up in the output of parse as ((adverb;`verb);`x;`y):

q)unshow parse"select f'[a;b]from t"
(?;`t;();0b;(,`b)!,((';`f);`a;`b))

While this will work fine, simply including the modified verb directly in the tree by value is also legal:

q)?[t;();0b;(enlist`b)!enlist(f';`a;`b)]

Built-in functions from the .q namespace will be included by value in parse output:

q)unshow parse"select dev a from t"
(?;`t;();0b;(,`a)!,(k){sqrt var x};`a))

These should always be replaced by their names in code:

q)?[t;();0b;(enlist`a)!enlist(var;`a)]

Finally, fully parsed code may be mixed freely with expressions which generate results which are otherwise acceptable:

q)unshow parse"select from t1 uj t2 where a in(exec a from t3)"
(?;(k){$[()~x;y;98h=@x;x,(!+x:.Q.ff[x;y])#.Q.ff[y;x];lj[(?(!x),!y)#x]y]};`t1;`t2);,,(in;`a;(?;`t3;();();,`a));0b;())

can be written as

q)?[t1 uj t2;enlist(in;`a;exec a from t3);0b;()]

Next time, I'll discuss how to write utility functions to make advanced constructs like multi-column fby easy to write.

Labels:

JSON

I've just added a small JSON parser I wrote a while back to my contrib: http://code.kx.com/wsvn/code/contrib/adavies/json/.

Labels:

Thursday, April 12, 2012

contrib/adavies

My public q code can, for the moment, be found in my part of code.kx.com's contrib section.

So far, there's a basic P&L calculator and a C extension for retrieving thread ids, but more should be showing up in the near future.

Labels:

-1"Hello, world!";

So, I figured it was time I started a q blog. I intend this to be a repository of snippets, FAQs, and general advice, which will, I hope, be of some use to other q programmers.

(For anyone wondering, “q-ist” is a pun on “deist”—I believe in q.)

Labels: