VIA Sata Problem maybe?
Paul-Kenji Cahier
deathwolf at gmail.com
Sat Sep 30 15:22:16 PDT 2006
Hello, thanks for the reply.
> You mentioned softupdates. Try disabling softupdates on the file
> system. I've had to do this in the past with a sata raid 1 setup or
> the system would reboot every few days.
I tried that earlier, and it didnt help at all, i got a crash in the minutes after rebooting during fsck.
> Also verify the "speed" of
> the drives. I had a problem with freebsd 5.3 where it would set my
> sata drives as udma33 which caused quite a few problems.
How do i check the speed?(although i think it's not linked to that)
You may
> want to check if there are any updates in stable for the driver you
> are using. 6.2 is close to release and while there are several
> outstanding issues, it may improve your situation.
where can i find a list of the drivers updates? I downloaded the latest HEAD kernel source, but from looking into the GENERIC configuration file, i didnt see anything for a VIA ATA chipset, or is there more than what is in the GENERIC file?
Best Regards,
Paul-Kenji Cahier
>> Hello,
>> I am currently experiencing some technical problems with one of my
>> remote servers running 6.1-RELEASE(#0: Sun May 7 04:32:43 UTC
>> 2006 root at opus.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC
>> i386) with the default kernel.
>> Configuration is the following: http://83.149.102.206/lspci.txt
>> The problem is basically the following:
>> Every few days, the server crashes. Sometimes it reboots, sometimes
>> it doesnt. In all cases, there's not a single trace of anything in
>> logs(not even a single word in /var/log/messages).
>> The problem usually worsens after it auto-reboots itself as it runs
>> an fsck at the end of the boot time. That fsck runs, and most of
>> the time it crashes again, causing a reboot or a complete freeze,
>> then another reboot, etc. Once the system freezes, my only option
>> is(since i dont have local access to the server) to remote reboot
>> on a rescue os that allows me to set my rescue freebsd partition as
>> main, and get a rescue freebsd. Once in, i just fsck the file
>> system until it is marked clean and reboot. And everything goes
>> fine...
>> for a few days.
>> Now what i tried was, after an unwanted reboot, to monitor the logs
>> of fsck running to see what made it crash.
>> Here is what i got from tail -f /var/log/messages(ad4s2f is the /
>> usr partition that has most of the data):
>> Sep 29 19:31:54 arcueid fsck: /dev/ad4s2f: INCORRECT BLOCK COUNT
>> I=2615093 (20 should be 0) (CORRECTED)
>> Sep 29 19:31:54 arcueid fsck: /dev/ad4s2f: INCORRECT BLOCK COUNT
>> I=2615094 (20 should be 0) (CORRECTED)
>> Sep 29 19:31:54 arcueid fsck: /dev/ad4s2f: INCORRECT BLOCK COUNT
>> I=2615095 (20 should be 0) (CORRECTED)
>> Sep 29 19:31:54 arcueid fsck: /dev/ad4s2f: INCORRECT BLOCK COUNT
>> I=2615096 (20 should be 0) (CORRECTED)
>> Sep 29 19:31:54 arcueid fsck: /dev/ad4s2f: INCORRECT BLOCK COUNT
>> I=2615097 (20 should be 0) (CORRECTED)
>> Sep 29 19:31:54 arcueid fsck: /dev/ad4s2f: INCORRECT BLOCK COUNT
>> I=2615098 (20 should be 0) (CORRECTED)
>> Sep 29 19:31:54 arcueid fsck: /dev/ad4s2f: INCORRECT BLOCK COUNT
>> I=2615099 (20 should be 0) (CORRECTED)
>> Sep 29 19:31:54 arcueid fsck: /dev/ad4s2f: INCORRECT BLOCK COUNT
>> I=2615100 (16 should be 0) (CORRECTED)
>> Sep 29 19:31:54 arcueid fsck: /dev/ad4s2f: INCORRECT BLOCK COUNT
>> I=3674361 (12 should be 0) (CORRECTED)
>> Sep 29 19:31:54 arcueid fsck: /dev/ad4s2f: INCORRECT BLOCK COUNT
>> I=14994399 (4 should be 0) (CORRECTED)
>> Sep 29 19:31:54 arcueid fsck: /dev/ad4s2f: INCORRECT BLOCK COUNT
>> I=16016109 (6624 should be 320) (CORRECTED)
>> Sep 29 19:31:54 arcueid fsck: /dev/ad4s2f: INCORRECT BLOCK COUNT
>> I=16016110 (12 should be 0) (CORRECTED)
>> Sep 29 19:31:54 arcueid fsck: /dev/ad4s2f: INCORRECT BLOCK COUNT
>> I=16016111 (4352 should be 0) (CORRECTED)
>> Sep 29 19:31:54 arcueid fsck: /dev/ad4s2f: INCORRECT BLOCK COUNT
>> I=16016112 (8 should be 0) (CORRECTED)
>> Sep 29 19:31:54 arcueid fsck: /dev/ad4s2f: UNREF FILE I=6
>> OWNER=root MODE=100400
>> Sep 29 19:31:54 arcueid fsck: /dev/ad4s2f: SIZE=151445575864
>> MTIME=Sep 29 19:02 2006 (CLEARED)
>> Sep 29 19:31:54 arcueid fsck: /dev/ad4s2f: ALLOCATED FRAGS
>> 431136-431167 MARKED FREE
>> Sep 29 19:31:54 arcueid fsck:
>> Sep 29 19:31:54 arcueid fsck: /dev/ad4s2f: UNEXPECTED SOFT UPDATE
>> INCONSISTENCY; RUN fsck MANUALLY.
>> And then the server went more or less haywire and crashed(no other
>> input in /var/log/messages though)
>> I also monitored the fsck process to see how it was running:
>> <19:15:57> root at arcueid:~$ while (true); do echo `date` `ps aux |
>> grep fsck | grep ad4`; sleep 2; done
>> <...>
>> Fri Sep 29 19:29:36 CEST 2006 root 777 0.7 1.8 19092 18612 ?? DN
>> 7:11PM 0:21.07 fsck_ufs: /dev/ad4s2f p2 92% (fsck_ufs)
>> Fri Sep 29 19:29:38 CEST 2006 root 777 0.6 1.8 19092 18612 ?? DN
>> 7:11PM 0:21.09 fsck_ufs: /dev/ad4s2f p2 92% (fsck_ufs)
>> Fri Sep 29 19:29:40 CEST 2006 root 777 0.6 1.8 19092 18612 ?? RN
>> 7:11PM 0:21.12 fsck_ufs: /dev/ad4s2f p2 92% (fsck_ufs)
>> Fri Sep 29 19:29:42 CEST 2006 root 777 0.6 1.8 19092 18612 ?? RN
>> 7:11PM 0:21.14 fsck_ufs: /dev/ad4s2f p2 94% (fsck_ufs)
>> Fri Sep 29 19:29:44 CEST 2006 root 777 0.6 1.8 19092 18612 ?? SN
>> 7:11PM 0:21.17 fsck_ufs: /dev/ad4s2f p2 94% (fsck_ufs)
>> Fri Sep 29 19:29:46 CEST 2006 root 777 0.6 1.8 19092 18612 ?? DN
>> 7:11PM 0:21.19 fsck_ufs: /dev/ad4s2f p2 95% (fsck_ufs)
>> Fri Sep 29 19:29:48 CEST 2006 root 777 0.6 1.8 19092 18612 ?? DN
>> 7:11PM 0:21.21 fsck_ufs: /dev/ad4s2f p2 95% (fsck_ufs)
>> Fri Sep 29 19:29:50 CEST 2006 root 777 0.6 1.8 19092 18612 ?? SN
>> 7:11PM 0:21.25 fsck_ufs: /dev/ad4s2f p2 95% (fsck_ufs)
>> Fri Sep 29 19:29:52 CEST 2006 root 777 0.5 1.8 19092 18612 ?? DN
>> 7:11PM 0:21.29 fsck_ufs: /dev/ad4s2f p2 96% (fsck_ufs)
>> Fri Sep 29 19:29:54 CEST 2006 root 777 0.8 1.8 19092 18612 ?? SN
>> 7:11PM 0:21.35 fsck_ufs: /dev/ad4s2f p2 96% (fsck_ufs)
>> Fri Sep 29 19:29:56 CEST 2006 root 777 0.8 1.8 19092 18612 ?? SN
>> 7:11PM 0:21.38 fsck_ufs: /dev/ad4s2f p2 96% (fsck_ufs)
>> Fri Sep 29 19:29:58 CEST 2006 root 777 0.8 1.8 19092 18612 ?? DN
>> 7:11PM 0:21.40 fsck_ufs: /dev/ad4s2f p2 96% (fsck_ufs)
>> Fri Sep 29 19:30:00 CEST 2006 root 777 0.7 1.8 19092 18612 ?? DN
>> 7:11PM 0:21.43 fsck_ufs: /dev/ad4s2f p2 96% (fsck_ufs)
>> Fri Sep 29 19:30:02 CEST 2006 root 777 0.7 1.8 19092 18612 ?? SN
>> 7:11PM 0:21.47 fsck_ufs: /dev/ad4s2f p2 97% (fsck_ufs)
>> Fri Sep 29 19:30:04 CEST 2006 root 777 0.6 1.8 19092 18612 ?? SN
>> 7:11PM 0:21.49 fsck_ufs: /dev/ad4s2f p2 97% (fsck_ufs)
>> Fri Sep 29 19:30:06 CEST 2006 root 777 0.5 1.8 19092 18612 ?? DN
>> 7:11PM 0:21.51 fsck_ufs: /dev/ad4s2f p2 97% (fsck_ufs)
>> Fri Sep 29 19:30:08 CEST 2006 root 777 0.4 1.8 19092 18612 ?? DN
>> 7:11PM 0:21.53 fsck_ufs: /dev/ad4s2f p2 97% (fsck_ufs)
>> Fri Sep 29 19:30:10 CEST 2006 root 777 0.6 1.8 19092 18612 ?? DN
>> 7:11PM 0:21.57 fsck_ufs: /dev/ad4s2f p2 97% (fsck_ufs)
>> Fri Sep 29 19:30:12 CEST 2006 root 777 0.7 1.8 19092 18612 ?? SN
>> 7:11PM 0:21.59 fsck_ufs: /dev/ad4s2f p2 98% (fsck_ufs)
>> Fri Sep 29 19:30:14 CEST 2006 root 777 0.6 1.8 19092 18612 ?? SN
>> 7:11PM 0:21.59 fsck_ufs: /dev/ad4s2f p2 98% (fsck_ufs)
>> Fri Sep 29 19:30:16 CEST 2006 root 777 0.5 1.8 19092 18612 ?? DN
>> 7:11PM 0:21.61 fsck_ufs: /dev/ad4s2f p2 98% (fsck_ufs)
>> Fri Sep 29 19:30:18 CEST 2006 root 777 0.4 1.8 19092 18612 ?? SN
>> 7:11PM 0:21.62 fsck_ufs: /dev/ad4s2f p2 98% (fsck_ufs)
>> Fri Sep 29 19:30:20 CEST 2006 root 777 0.4 1.8 19092 18612 ?? SN
>> 7:11PM 0:21.63 fsck_ufs: /dev/ad4s2f p2 99% (fsck_ufs)
>> Fri Sep 29 19:30:23 CEST 2006 root 777 0.3 1.8 19092 18612 ?? DN
>> 7:11PM 0:21.64 fsck_ufs: /dev/ad4s2f p2 99% (fsck_ufs)
>> Fri Sep 29 19:30:25 CEST 2006 root 777 0.2 1.8 19092 18612 ?? DN
>> 7:11PM 0:21.66 fsck_ufs: /dev/ad4s2f p2 99% (fsck_ufs)
>> Fri Sep 29 19:30:27 CEST 2006 root 777 0.1 1.8 19092 18612 ?? SN
>> 7:11PM 0:21.68 fsck_ufs: /dev/ad4s2f p2 99% (fsck_ufs)
>> Fri Sep 29 19:30:29 CEST 2006 root 777 0.0 1.8 19092 18612 ?? SN
>> 7:11PM 0:21.70 fsck_ufs: /dev/ad4s2f p2 99% (fsck_ufs)
>> Fri Sep 29 19:31:12 CEST 2006 root 777 0.0 1.8 19092 18724 ?? RN
>> 7:11PM 0:22.17 fsck_ufs: /dev/ad4s2f p4 0% (fsck_ufs)
>> Fri Sep 29 19:31:14 CEST 2006 root 777 0.9 1.8 19092 18748 ?? DN
>> 7:11PM 0:22.34 fsck_ufs: /dev/ad4s2f p4 0% (fsck_ufs)
>> Fri Sep 29 19:31:16 CEST 2006 root 777 0.9 1.8 19092 18748 ?? DN
>> 7:11PM 0:22.35 fsck_ufs: /dev/ad4s2f p4 0% (fsck_ufs)
>> Fri Sep 29 19:31:18 CEST 2006 root 777 0.7 1.8 19092 18748 ?? DN
>> 7:11PM 0:22.37 fsck_ufs: /dev/ad4s2f p4 0% (fsck_ufs)
>> Fri Sep 29 19:31:20 CEST 2006 root 777 0.7 1.8 19092 18748 ?? DN
>> 7:11PM 0:22.40 fsck_ufs: /dev/ad4s2f p4 0% (fsck_ufs)
>> Fri Sep 29 19:31:22 CEST 2006 root 777 0.6 1.8 19092 18748 ?? DN
>> 7:11PM 0:22.40 fsck_ufs: /dev/ad4s2f p4 0% (fsck_ufs)
>> Fri Sep 29 19:31:25 CEST 2006 root 777 0.5 1.8 19092 18748 ?? DN
>> 7:11PM 0:22.42 fsck_ufs: /dev/ad4s2f p4 0% (fsck_ufs)
>> Fri Sep 29 19:31:27 CEST 2006 root 777 0.5 1.8 19092 18748 ?? DN
>> 7:11PM 0:22.45 fsck_ufs: /dev/ad4s2f p4 0% (fsck_ufs)
>> Fri Sep 29 19:31:29 CEST 2006 root 777 0.4 1.8 19092 18748 ?? DN
>> 7:11PM 0:22.46 fsck_ufs: /dev/ad4s2f p4 0% (fsck_ufs)
>> Fri Sep 29 19:31:31 CEST 2006 root 777 0.4 1.8 19092 18748 ?? DN
>> 7:11PM 0:22.48 fsck_ufs: /dev/ad4s2f p4 0% (fsck_ufs)
>> Fri Sep 29 19:31:33 CEST 2006 root 777 0.4 1.8 19092 18748 ?? DN
>> 7:11PM 0:22.50 fsck_ufs: /dev/ad4s2f p4 0% (fsck_ufs)
>> Fri Sep 29 19:31:35 CEST 2006 root 777 0.3 1.8 19092 18748 ?? DN
>> 7:11PM 0:22.51 fsck_ufs: /dev/ad4s2f p4 0% (fsck_ufs)
>> Fri Sep 29 19:31:37 CEST 2006 root 777 0.2 1.8 19092 18748 ?? DN
>> 7:11PM 0:22.53 fsck_ufs: /dev/ad4s2f p4 0% (fsck_ufs)
>> Fri Sep 29 19:31:39 CEST 2006 root 777 0.1 1.8 19092 18748 ?? DN
>> 7:11PM 0:22.55 fsck_ufs: /dev/ad4s2f p4 0% (fsck_ufs)
>> Fri Sep 29 19:31:41 CEST 2006 root 777 0.2 1.8 19092 18748 ?? DN
>> 7:11PM 0:22.57 fsck_ufs: /dev/ad4s2f p4 0% (fsck_ufs)
>> Fri Sep 29 19:31:43 CEST 2006 root 777 0.1 1.8 19092 18748 ?? DN
>> 7:11PM 0:22.57 fsck_ufs: /dev/ad4s2f p4 0% (fsck_ufs)
>> Fri Sep 29 19:31:45 CEST 2006 root 777 0.0 1.8 19092 18748 ?? DN
>> 7:11PM 0:22.58 fsck_ufs: /dev/ad4s2f p4 0% (fsck_ufs)
>> Fri Sep 29 19:31:47 CEST 2006 root 777 0.0 1.8 19092 18748 ?? DN
>> 7:11PM 0:22.59 fsck_ufs: /dev/ad4s2f p4 0% (fsck_ufs)
>> Fri Sep 29 19:31:49 CEST 2006 root 777 0.0 1.8 19092 18748 ?? DN
>> 7:11PM 0:22.60 fsck_ufs: /dev/ad4s2f p4 0% (fsck_ufs)
>> Fri Sep 29 19:31:51 CEST 2006 root 777 0.0 1.8 19092 18748 ?? DN
>> 7:11PM 0:22.61 fsck_ufs: /dev/ad4s2f p4 0% (fsck_ufs)
>> Fri Sep 29 19:31:53 CEST 2006 root 777 0.0 1.8 19092 18748 ?? DN
>> 7:11PM 0:22.61 fsck_ufs: /dev/ad4s2f p4 0% (fsck_ufs)
>> Fri Sep 29 19:31:55 CEST 2006
>> Fri Sep 29 19:31:57 CEST 2006
>> Fri Sep 29 19:31:59 CEST 2006
>> Fri Sep 29 19:32:01 CEST 2006
>> Fri Sep 29 19:32:03 CEST 2006
>> Fri Sep 29 19:32:05 CEST 2006
>> Fri Sep 29 19:32:07 CEST 2006
>> Fri Sep 29 19:32:09 CEST 2006
>> Fri Sep 29 19:32:11 CEST 2006
>> Fri Sep 29 19:32:13 CEST 2006
>> Fri Sep 29 19:32:15 CEST 2006
>> Fri Sep 29 19:32:17 CEST 2006
>> Fri Sep 29 19:32:19 CEST 2006
>> Fri Sep 29 19:32:21 CEST 2006
>> Fri Sep 29 19:32:23 CEST 2006
>> Fri Sep 29 19:32:25 CEST 2006
>> <boom>
>> As you can see, at the same time as softupdate reported a problem,
>> the whole system wouldnt work anymore, process wouldnt display
>> properly anymore either.
>> RAM was extensively tested and is perfectly fine.
>> The power supply unit seems to be running really fine too, with no
>> particular events.
>> The hard drive itself was tested extensively using both smart
>> values and writing/reading from a linux setup and had not a single
>> fault or alarming value.
>> Another quite surprising event is the following:
>> After a crash, i went into the rescue freebsd and as usual started
>> an fsck. Result was:
>> arcueid-rescue# fsck -y /dev/ad4s2f
>> ** /dev/ad4s2f
>> ** Last Mounted on /usr
>> ** Phase 1 - Check Blocks and Sizes
>> ** Phase 2 - Check Pathnames
>> ** Phase 3 - Check Connectivity
>> ** Phase 4 - Check Reference Counts
>> ** Phase 5 - Check Cyl groups
>> 1362084 files, 47944480 used, 23676944 free (1752392 frags, 2740569
>> blocks, 2.4% fragmentation)
>> ***** FILE SYSTEM MARKED CLEAN *****
>> Just for the fun i ran it a second time just after:
>> arcueid-rescue# fsck -y /dev/ad4s2f
>> ** /dev/ad4s2f
>> ** Last Mounted on /usr
>> ** Phase 1 - Check Blocks and Sizes
>> CANNOT READ BLK: 190083360
>> UNEXPECTED SOFT UPDATE INCONSISTENCY
>> CONTINUE? yes
>> THE FOLLOWING DISK SECTORS COULD NOT BE READ: 190083360, 190083361,
>> 190083362, 190083363, 190083364, 190083365, 190083366, 190083367,
>> 190083368, 190083369, 190083370, 190083371, 190083372, 190083373,
>> 190083374, 190083375, 190083376, 190083377, 190083378, 190083379,
>> 190083380, 190083381, 190083382, 190083383, 190083384, 190083385,
>> 190083386, 190083387, 190083388, 190083389, 190083390, 190083391,
>> INCORRECT BLOCK COUNT I=18375172 (640 should be 416)
>> CORRECT? yes
>> CANNOT READ BLK: 293557088
>> UNEXPECTED SOFT UPDATE INCONSISTENCY
>> CONTINUE? yes
>> THE FOLLOWING DISK SECTORS COULD NOT BE READ: 293557088, 293557089,
>> 293557090, 293557091, 293557092, 293557093, 293557094, 293557095,
>> 293557096, 293557097, 293557098, 293557099, 293557100, 293557101,
>> 293557102, 293557103, 293557104, 293557105, 293557106, 293557107,
>> 293557108, 293557109, 293557110, 293557111, 293557112, 293557113,
>> 293557114, 293557115, 293557116, 293557117, 293557118, 293557119,
>> CANNOT READ BLK: 190090784
>> UNEXPECTED SOFT UPDATE INCONSISTENCY
>> CONTINUE? yes
>> THE FOLLOWING DISK SECTORS COULD NOT BE READ: 190090784, 190090785,
>> 190090786, 190090787, 190090788, 190090789, 190090790, 190090791,
>> 190090792, 190090793, 190090794, 190090795, 190090796, 190090797,
>> 190090798, 190090799, 190090800, 190090801, 190090802, 190090803,
>> 190090804, 190090805, 190090806, 190090807, 190090808, 190090809,
>> 190090810, 190090811, 190090812, 190090813, 190090814, 190090815,
>> INCORRECT BLOCK COUNT I=18375174 (480 should be 416)
>> CORRECT? yes
>> CANNOT READ BLK: 190122624
>> UNEXPECTED SOFT UPDATE INCONSISTENCY
>> CONTINUE? yes
>> THE FOLLOWING DISK SECTORS COULD NOT BE READ: 190122624, 190122625,
>> 190122626, 190122627, 190122628, 190122629, 190122630, 190122631,
>> 190122632, 190122633, 190122634, 190122635, 190122636, 190122637,
>> 190122638, 190122639, 190122640, 190122641, 190122642, 190122643,
>> 190122644, 190122645, 190122646, 190122647, 190122648, 190122649,
>> 190122650, 190122651, 190122652, 190122653, 190122654, 190122655,
>> INCORRECT BLOCK COUNT I=18375178 (512 should be 416)
>> CORRECT? yes
>> CANNOT READ BLK: 190422528
>> UNEXPECTED SOFT UPDATE INCONSISTENCY
>> CONTINUE? yes
>> [etc]
>> And then it crashed....
>> Again i did the sata tests numerous times and the drive seems
>> perfectly healthy... So i have to think it might be driver related
>> rather than hardware related.
>> Any help would be appreciated,
>> --
>> Best regards,
>> Paul-Kenji
>> _______________________________________________
>> freebsd-hackers at freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
>> To unsubscribe, send any mail to "freebsd-hackers-
>> unsubscribe at freebsd.org"
> Lucas Holt
> Luke at FoolishGames.com
> ________________________________________________________
> FoolishGames.com (Jewel Fan Site)
> JustJournal.com (Free blogging)
> FoolishGames.net (Enemy Territory site)
More information about the freebsd-hackers
mailing list