Tuning PostgreSQL for bulk imports

Ivan Voras ivoras at fer.hr
Sun Jan 7 23:16:04 UTC 2007


Kirk Strauser wrote:
> On Sunday 07 January 2007 15:15, Ivan Voras wrote:

>> - What might help you is to keep the WAL (write-ahead-log, i.e. journal)
>> files on a completely separate (and fast) drive from the rest of the
>> database, to allow parallelism and speed. For best results, format it
>> with 32k blocks/8k fragments.
> 
> Thanks for the idea.  Assuming I actually get my wish of a matched set of 4 
> high speed drives, would I be better off setting one aside for the journal, 
> or striping them all together so everything benefits?

Everything is first written to the WAL, and then copied to the "normal"
database. So, speed is important, but maybe striping all 4 drives would
be an overkill, though. The logs are written sequentially so seek times
are not that important. What are your current IO rates? Since you have
only two drives you might be restricted by available disk bandwidth...

>> - If you don't specifically need the atomicity of transactions, you
>> might divide your import into many small transactions, for example one
>> for every 100,000 rows instead of doing 8 million at once.
> 
> Would that actually make a difference in total elapsed time spent importing?

It will certainly lower the final "commit" time and avoid copying large
chunks between the WAL and the database, but I don't have my own
measurements.

Do you have a SMP machine? If so, you can try importing NCPU of the big
transactions in parallel. This could help you almost linearly, if your
disks allow it. (see
http://www.postgresql.org/docs/8.1/interactive/runtime-config-wal.html :
you can increase commit_delay and decrease commit_siblings to achieve
sort of localized-async operation across several (parallel) transactions)

I've just remembered - do you run VACUUM ANALYZE after your big imports?
You should (to get performance on SELECTs afterwards, but it won't help
the imports themselves).

>> Also, what version of PostgreSQL are you using? As a general rule, the
>> newer the version, the faster it is. This is especially true if you're
>> using 7.x - go to 8.1.5 immediately (but don't use 8.2 until 8.2.1 gets
>> out).
> 
> We're already running 8.2 because it fixed some problems we were having with 
> 8.1.5.  Other than the excessively long import times, it's absolutely 
> screaming and we couldn't be more pleased.

Ok. Be careful not to use SELECT ... LIMIT ALL ;)

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 250 bytes
Desc: OpenPGP digital signature
Url : http://lists.freebsd.org/pipermail/freebsd-questions/attachments/20070107/06691436/signature.pgp


More information about the freebsd-questions mailing list