Fast diff command for large files?

Kirk Strauser kirk at strauser.com
Mon Nov 7 17:29:41 GMT 2005


On Monday 07 November 2005 10:40, francisco at natserv.net wrote:

> I had the same setup a while back.
> A few suggestions.

Thanks for the tips; unfortunately, any fix that involves touching the 
FoxPro code is basically impossible.  It's not that we *can't*, but that 
the sole FoxPro programmer at our company is completely occupied with other 
projects.

> What type of system is this? In particular do any record can be modified
> or are only recent records changed?

Nope - every line in each table is subject to change.

Here's how our current system works:

1) Copy each FoxPro table file (and associated memo file if one exists) to a 
Unix server via Samba.
2) Run my modified version of the "xbase" program to convert each table to a 
tab-delimited file that can be loaded into PostgreSQL using the "copy 
table" command.  These files are named "foo.dump", "bar.dump", etc.
3) If "foo.dump-old" exists:
    a) Using Andrew's algorithm, get the difference between foo.dump-old and
       foo.dump.  Write these out as a set of "delete from ..." commands and
       a "copy table" command.  Pipe this relatively tiny file into the
       "psql" command to upload the modifications.
  Otherwise:
    b) Use the psql command to upload foo.dump
4) "mv foo.dump foo.dump-old"
5) Profit!

I've already cut the runtime in half.  The next big step is going to be 
getting our Windows admin to install rsync on the fileserver so that we can 
minimize the time spent in step one.  With the exception of the space 
required by keeping the old version of the dump files (step 4), this is 
exceeding all of our performance expectations by a wide margin.

Even better, step 3a cuts the time that the PostgreSQL server has to spend 
committing the new data by several orders of magnitude.  The net effect is 
that our web visitors don't see a noticeable slowdown during the import 
stage.   
-- 
Kirk Strauser
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 155 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-questions/attachments/20051107/1ee9f8ea/attachment.bin


More information about the freebsd-questions mailing list