Regex Wizards
Matthew Seaman
m.seaman at infracaninophile.co.uk
Tue Sep 27 06:38:48 UTC 2011
On 27/09/2011 03:02, grarpamp wrote:
> Under the ERE implementation in RELENG_8, I'm having
> trouble figuring out how to group and backreference this.
>
> Given a line, where:
> If AAA is present, CCC will be too, and B may appear in between.
> If AAA is not present, neither CCC or B will be present.
> DDDD is always present.
> Junk may be present.
> Match good lines and ouput in chunks.
>
> echo junkAAAABCCCDDDDjunk | \
>
> This works as expected:
> sed -E -n 's,^.*(AAAB?CCC)(DDDD).*$,1 \1 2 \2,p'
> 1 AAABCCC 2 DDDD
>
> But making the leading bits optional per spec does not work:
> sed -E -n 's,^.*(AAAB?CCC)?(DDDD).*$,1 \1 2 \2,p'
> 1 2 DDDD
>
> Nor does adding the usual grouping parens:
> sed -E -n 's,^.*((AAAB?CCC)?)(DDDD).*$,1 \1 2 \2,p'
> 1 2
>
> How do I group off the leading bits?
> Or is this a limitation of ERE's?
> Or a bug?
Hmmmm.... works fine with perl REs, or with sed if you trim the 'match
any sequence of characters at the beginning and end of line bits:
% echo junkAAAABCCCDDDDjunk | perl -nle 'm/(AAAB?CCC)?(DDDD)/ && print
"1 $1 2 $2";'
1 AAABCCC 2 DDDD
% echo junkAAAABCCCDDDDjunk | sed -E -n 's/(AAAAB?CCC)?(DDDD)/1 \1 2 \2/p'
junk1 AAAABCCC 2 DDDDjunk
Of course, the problem with sed is that you're using a *substitution*
command rather than just printing out what the RE matched. Suppressing
the leading and trailing junk from the output is what is screwing you up.
Trouble is, that '^.*' term in you RE is greedy, so it will match to the
end of AAABCCC, then the RE engine will say to itself 'I've found DDDD,
so I'm not going to backtrack and look for all the optional AAAB?CCC
stuff.' In fact, adding the bits to match the leading and training junk
makes the RE ambiguous -- there's two ways it could match your test
string, and the law of natural cussedness being what it is, it chooses
the wrong one.
Cheers,
Matthew
--
Dr Matthew J Seaman MA, D.Phil. 7 Priory Courtyard
Flat 3
PGP: http://www.infracaninophile.co.uk/pgpkey Ramsgate
JID: matthew at infracaninophile.co.uk Kent, CT11 9PW
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 267 bytes
Desc: OpenPGP digital signature
Url : http://lists.freebsd.org/pipermail/freebsd-questions/attachments/20110927/e5827847/signature.pgp
More information about the freebsd-questions
mailing list