SCSI tape data loss
Kern Sibbald
kern at sibbald.com
Fri Jun 6 07:38:41 PDT 2003
Hello,
I have now completed a fairly extensive series of tests
on my Linux machine with a DDS-4 drive and on Dan's FreeBSD
machine with a DDS-1 drive.
Bottom line: There is a significant data loss (500KB to 2MB)
at the EOM on Dan's drive. There is no data loss on my drive.
The variation in the data loss seems to be inversely dependent
on how compressible the data is (i.e the more the data can be
compressed to fit in a fixed size driver buffer, the more user
data is lost).
I ran three different kinds of tests and several variations of some
of those tests:
Tests:
1. Bacula saving a 1GB file containing random data.
2. Simulation of Bacula writing easily compressible, non-random data.
3. Raw write() of random data (same data each write except for
first 32 bits).
Variations:
1. Bacula stop writing before EOM reached.
2. Test 2 above without drive hardware compression
3. Test 3 above without writing EOF but simply rewinding
4. Tests with and without using ioctl(MTIOCLRERROR).
5. Various tests with block size at 64,512 bytes, others with
block size at 61,440 bytes.
Results:
1. All tests on my machine succeeded.
2. All tests (Test 1 Variation 1) not writing to EOM succeed
on both machines. (Previously we indicated that there
was a loss when not writing to the EOM. I could not
produce this and believe we had a misunderstanding
somewhere).
3. All tests of all variations writing to EOM failed
on Dan's machine.
4. The number of buffers lost was quite consistent (1-2 buffer
difference) for any given variation.
5. There was not much difference in the number of buffers
lost with/without hardware compression when the data was
random.
6. The number of buffers lost was 4 times greater with
non-random data and drive compression enabled than
with random data or with no drive compression.
Conclusions:
1. On Dan's machine, data is always lost at EOM.
2. The amount of data lost appears to be closely
related to what is in the drive buffer (more buffers
are lost if the data is easily compressed).
Possible causes:
1. The hardware does not have an LEOM
2. The driver is not signaling to the program when an LEOM
occurs thus the buffered data is lost at the PEOM, The
ONLY write() status I got in all the tests was -1 with
errno=ENOSPC (no zero bytes written were ever returned).
3. Some miscommunication between the hardware and the driver.
What next:
- Time for the SCSI guys to look at this. The problem is easily
repeatable on Dan's machine -- just do a whole bunch
of write()s, nothing else, and it is guaranteed
to happen.
Perhaps all the above is not clear enough, in which case,
please ask, but if I write it out with all the reasoning, it will
be a monster essay, so I've tried to give the important test
results so that you can draw your own conclusions and then
compare them to mine.
Best regards,
Kern
More information about the freebsd-scsi
mailing list