awk question

Polytropon freebsd at edvax.de
Mon Oct 5 21:58:17 UTC 2015


On Mon, 05 Oct 2015 17:44:55 -0400, Quartz wrote:
> 
> > The form "input | step1 | step2 | step3 | step4>  result" usually
> > is more readable
> 
> That's what I meant my being easier to understand conceptually. I agree 
> about being more readable- even though this format sometimes needs the 
> 'useless cat' it's often my preferred coding style, especially in 
> scripts where the input might change around.

And the "useless cat" method also makes it easy to test the
script with varying input (for example, pre-generated test
input) before it "goes live". It also makes it easier to
"extend" the pre- or post-processing commands with new ones.



> > Additionally, awk isn't that hard to learn. Reading "man awk" will
> > provide you with a good background. And if you're already a C
> > programmer, you'll see that many things you can do in C will also
> > work similarly in awk, which _might_ not even be a good thing. :-)
> 
> The problem with awk is the whole BEGIN/END/braces thing and how commas 
> interact with the operands.

It's not that hard:

BEGIN { ... } will be executed _before_ any input is processed,
END { ... } will be executed _after_ all input has been processed.
/pattern/ { ... } will be executed for each matching input line,
(condition) { ... } will be executed when the condition is true,
and { ... } will be executed for _every_ input line.

Regarding commas: You can use the "print a b c" form as well as
the more sophisticated C-like printf("format string", a, b, c)
form. For all other functions, commas are argument separators
just like in many other programming languages.

	% echo "a b c" | awk '{ print $3 $1 $2 }'
	cab
	% echo "a b c" | awk '{ print $3, $1, $2 }'
	c a b
	% echo "a b c" | awk '{ printf("%s-%s-%s\n", $3, $1, $2); }'
	c-a-b

Those are the three "main methods" of printing: concatenated,
separated by a space, and custom formatted string. And the
semicolon is optional, it's just my C-contamination. :-)


> It's not very much like sh or C syntax (or 
> any other syntax) and new users tend to get really confused.

Hmmm... I don't know, could you provide an example where you
would say, like, "this is not intuitive" or even "this does
something totally strange"?



> Also, different versions of awk handle math (esp floating point) with 
> different rounding/precision/overflow, making calculations vary between 
> installations, only further adding to the confusion.

Yes, this is true, but keep in mind what awk is: a "pattern-directed
scanning and processing language". If you want higher precision
math, use system("<math stuff> | dc") and incorporate the result;
awk isn't really for math, but integer math is usually fine. :-)



-- 
Polytropon
Magdeburg, Germany
Happy FreeBSD user since 4.0
Andra moi ennepe, Mousa, ...


More information about the freebsd-questions mailing list