Optimize shell
Olivier Nicole
on at cs.ait.ac.th
Tue Feb 7 19:26:58 PST 2006
Thanks for the suggestions.
> I am setting up a machine to work as a mail back-up. It receives copy
> of every email for every user. When the disk is almost full, I want to
> delete older messages up to a total size of 4000000000.
Going to database storing was a good idea, but not an issue as the
system is already running. Using delete functions from other tools
could be a solution though I doubt it goes accross all the users.
Using bash could be a way to go, as using locate (possible, but then
it would need a second command to get the file size, so I am not sure
that it would save much).
And my assumption was wrong, the most time consumption was in the sed,
not in the sort. In fact I did not need the sed as I could split the
fields on the / for sort and pick up the correct argument in
awk. Using xargs also speed up the things a small bit.
Here is the final solution:
mailback<root>66: cat func5
#!/bin/sh
/usr/bin/find /home -mindepth 5 -ls | /usr/bin/grep /Maildir/cur/ | /usr/bin/sort -t/ -n +6 | /usr/bin/awk '{sum+=$7; if (sum < 200000000) print $11;}'|xargs cat >/dev/null
mailback<root>67: time ./func5
0.806u 3.086s 0:35.69 10.8% 67+405k 9864+21io 5pf+0w
And the original one:
mailback<root>68: cat func1
#!/bin/sh
for i in `/usr/bin/find /home -mindepth 5 -ls | /usr/bin/grep /Maildir/cur/ | /usr/bin/sed -E 's/^ *[0-9]+ +[0-9]+ +[-rwx]+ +[0-9]+ +[^ ]+ +[^ ]+ +([0-9]+) +.*(\/home\/.*\/)([0-9]+)(\..*)$/\3 \1 \2\3\4/' | /usr/bin/sort -n +0 -1 | /usr/bin/awk '{sum+=$2; if (sum < 200000000) print $3;}'`; do
cat $i >/dev/null
done
mailback<root>69: time ./func1
223.665u 12.341s 4:53.42 80.4% 48+315k 9100+13io 0pf+0w
35 seconds is OK.
Best regards,
Olivier
Original question:
> I am setting up a machine to work as a mail back-up. It receives copy
> of every email for every user. When the disk is almost full, I want to
> delete older messages up to a total size of 4000000000.
>
> Messages are stored in /home/sub_home/user/Maildir/cur in maildir
> format.
>
> Message name is of the form 1137993135.86962_0.machine.cs.ait.ac.th
> where the first number is a Unix time stamp.
>
> I came up with the following sheel to find the messages of all users,
> sort them by date and compute the total size up to 4gB.
>
> for i in `/usr/bin/find /home -mindepth 5 -ls | /usr/bin/grep /Maildir/cur/ | /usr/bin/sed -E 's/^ *[0-9]+ +[0-9]+ +[-rwx]+ +[0-9]+ +[^ ]+ +[^ ]+ +([0-9]+) +.*(\/home\/.*\/)([0-9]+)(\..*)$/\3 \1 \2\3\4/' | /usr/bin/sort -n +0 -1 | /usr/bin/awk '{sum+=$2; if (sum < 4000000000) print $3;}'`; do
> /bin/rm $i
> done
>
> find /home -mindepth 5 -ls makes a list of all files and directory at
> a depth of 5 and more because my directory structure is so that
> messages are store at level 6
>
> grep /Maildir/cur/ because courrierimapo tends to put things in other
> directories it creates when it needs too
>
> These two commads give me a list of the form:
>
> 1397490 8 -rw------- 1 on staff 3124 Jan 27 15:23 /home/java/on/Maildir/cur/1138350182.1413_1.mackine.cs.ait.ac.th
>
> where 3124 is the size
>
> The sed command transforms the line into date, size, filname:
>
> 1137994623 2466 /home/java/on/Maildir/cur/1137994623.87673_0.mail.cs.ait.ac.th
>
> Then it sorts on the date field and awk is used to sum on the size
> field and print the filename until the total of 4gB is reached.
>
> That works OK, but it is damn slow: for 200 users, 7800 messages and
> 302MB it takes something like 3+ minutes... For 25 GB of email it
> should take more than 4 hours, this is too much.
>
> It sems that the long part is the sort:
>
> without sort
> time /usr/bin/find /home -mindepth 5 -ls | /usr/bin/grep /Maildir/cur/ | /usr/bin/sed -E 's/^ *[0-9]+ +[0-9]+ +[-rwx]+ +[0-9]+ +[^ ]+ +[^ ]+ +([0-9]+) +.*(\/home\/.*\/)([0-9]+)(\..*)$/\3 \1 \2\3\4/' | cat /dev/null
> 0.026u 0.035s 0:07.67 0.6% 51+979k 0+0io 0pf+0w
>
> with sort
> time /usr/bin/find /home -mindepth 5 -ls | /usr/bin/grep /Maildir/cur/ | /usr/bin/sed -E 's/^ *[0-9]+ +[0-9]+ +[-rwx]+ +[0-9]+ +[^ ]+ +[^ ]+ +([0-9]+) +.*(\/home\/.*\/)([0-9]+)(\..*)$/\3 \1 \2\3\4/' | /usr/bin/sort -n +0 -1 | cat /dev/null
> 0.281u 0.366s 3:44.75 0.2% 39+1042k 0+0io 0pf+0w
>
> Any idea how to speed up the things?
>
> Thanks in advance,
>
> Olivier
> _______________________________________________
> freebsd-questions at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-questions
> To unsubscribe, send any mail to "freebsd-questions-unsubscribe at freebsd.org"
>
More information about the freebsd-questions
mailing list