Quick Tip: Statistics on Booleans
Due to q
’s type promotion rules, it’s entirely legal to use statistical functions on boolean vectors. avg
tends to be the most useful, but all of them should work as expected.
A typical use case: average null
ness (handy while developing ETL code). We will simulate with a vector of float
s mistakenly parsed as int
s (due to mostly looking like int
s):
q)v:@[string 1000?1000;-10?1000;,[;".123"]] q)t:flip(enlist`v)!(enlist"I";" ")0:v q)t v --- 468 959 221 694 934 865 344 997 314 580 45 745 898 935 64 177 238 361 850 241 .. q)avg null t v| 0.01 q)
So 10% of t.v
is null, which (unless this is expected) should cause us to ask ourselves whether "I"
was really the right parse.
Labels: tip
2 Comments:
Don't you mean 1% of t.v is null? But yeah, avg null is very useful (just used it to see how many nulls in a table's columns)
Oops, yeah, thanks, I'll fix that.
Another useful one is "sum null", which is the same thing as "count where null" but shorter, and much shorter, and simpler, when used in a query.
Post a Comment
Subscribe to Post Comments [Atom]
<< Home