Grepping a list of words
Chip Camden
sterling at camdensoftware.com
Thu Aug 12 17:56:19 UTC 2010
Quoth Anonymous on Thursday, 12 August 2010:
> Oliver Fromme <olli at lurza.secnetix.de> writes:
>
> > John Levine <johnl at iecc.com> wrote:
> > > > > % egrep 'word1|word2|word3|...|wordn' filename.txt
> > >
> > > > Thanks for the replies. This suggestion won't do the job as the list of
> > > > words is very long, maybe 50-60. This is why I asked how to place them all
> > > > in a file. One reply dealt with using a file with egrep. I'll try that.
> > >
> > > Gee, 50 words, that's about a 300 character pattern, that's not a problem
> > > for any shell or version of grep I know.
> > >
> > > But reading the words from a file is equivalent and as you note most
> > > likely easier to do.
> >
> > The question is what is more efficient. This might be
> > important if that kind of grep command is run very often
> > by a script, or if it's run on very large files.
> >
> > My guess is that one large regular expression is more
> > efficient than many small ones. But I haven't done real
> > benchmarks to prove this.
>
> BTW, not using regular expressions is even more efficient, e.g.
>
> $ fgrep -f /usr/share/dict/words /etc/group
>
> When using egrep(1) it takes considerably more time and memory.
Having written a regex engine myself, I can see why. Though I'm sure
egrep is highly optimized, even the most optimized DFA table is going to take more
cycles to navigate than a simple string comparison. Not to mention the
initial overhead of parsing the regex and building that table.
--
Sterling (Chip) Camden | sterling at camdensoftware.com | 2048D/3A978E4F
http://camdensoftware.com | http://chipstips.com | http://chipsquips.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-questions/attachments/20100812/2248fb14/attachment.pgp
More information about the freebsd-questions
mailing list