Quick Tip: Statistics on Booleans
Due to q’s type promotion rules, it’s entirely legal to use statistical functions on boolean vectors. avg tends to be the most useful, but all of them should work as expected.
A typical use case: average nullness (handy while developing ETL code). We will simulate with a vector of floats mistakenly parsed as ints (due to mostly looking like ints):
q)v:@[string 1000?1000;-10?1000;,[;".123"]] q)t:flip(enlist`v)!(enlist"I";" ")0:v q)t v --- 468 959 221 694 934 865 344 997 314 580 45 745 898 935 64 177 238 361 850 241 .. q)avg null t v| 0.01 q)
So 10% of t.v is null, which (unless this is expected) should cause us to ask ourselves whether "I" was really the right parse.
Labels: tip


2 Comments:
Don't you mean 1% of t.v is null? But yeah, avg null is very useful (just used it to see how many nulls in a table's columns)
Oops, yeah, thanks, I'll fix that.
Another useful one is "sum null", which is the same thing as "count where null" but shorter, and much shorter, and simpler, when used in a query.
Post a Comment
Subscribe to Post Comments [Atom]
<< Home