Analyzing Log files of very large size

Paul Procacci pprocacci at gmail.com
Mon Jul 12 04:06:19 UTC 2021


This advice is sound.
I'd personally do the same leaning on either awk or perl mysql.
It just depends naturally on what you're after in the long run.

>> I am in a requirement to analyze large log files of sonic wall firewall

> >> around 50 GB. for a suspect attack. ...
>
> >But if this project is for an employer or client, I would recommend
> >starting with the commercial-off-the-shelf (COTS) log analysis tool made
> >by the hardware vendor.  Train up on it.  Buy a support contract:
> >
> >
> https://www.sonicwall.com/wp-content/uploads/2019/01/sonicwall-analyzer.pdf
>
> This is reasonable advice if you plan to be doing these analyses on a
> regular
> basis, but it's overkill if you only expect to do it once.
>
> I have found that some of the text processing utilities that come with BSD
> are a lot faster than others.  The regex matching in perl is a lot faster
> than python, sometimes by an order of magnitude.  My took of choice is
> mawk,
> an implementation of the funky but very useful awk language that is
> amazingly
> fast.  grep is OK, sed is too slow for anything other than tiny jobs.
>
> I'd suggest first dividing up the logs into manageable chunks, perhaps
> using
> split or csplit, or it would be a good first project in mawk, using
> patterns
> to divide the files into chunks that represent an hour or a day.
>
> Then you can start looking for interesting patterns, perhaps with grep if
> they
> are simple enough, or more likely with some short mawk scripts.
>
> R's,
> John
>
>
This advice is sound.
I'd personally do the same leaning on either awk or perl myself.

Another note, I've done something similar before where awk/perl simply
weren't enough
for 50+ TB of logs that were being consumed daily so I had to roll my own
using C/qp-tries[1].
Again, if not only your volume is high but your frequency of processing
this data is often,
you'd consider a more custom solution should not one already exist:

Note: A latest poster mentioned AVL trees as well.  That's fine too.  I
just prefer qp-tries.

[1] https://dotat.at/prog/qp/README.html
-- 
__________________

:(){ :|:& };:


More information about the freebsd-questions mailing list