amd64/153351: [zfs] locking directories/files in ZFS

Vladislav V. Prodan universite at ukr.net
Tue Dec 21 19:20:08 UTC 2010


>Number:         153351
>Category:       amd64
>Synopsis:       [zfs] locking directories/files in ZFS
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-amd64
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Tue Dec 21 19:20:08 UTC 2010
>Closed-Date:
>Last-Modified:
>Originator:     Vladislav V. Prodan
>Release:        9.0-CURRENT  amd64
>Organization:
>Environment:
FreeBSD second.site.com 9.0-CURRENT FreeBSD 9.0-CURRENT #0: Fri Sep 17 02:33:22 EEST 2010     root at second.site.com:/usr/obj/usr/src/sys/second.4  amd64

>Description:

When accessing daemon / process to certain files on the disk, in this case - /www/site/data/www/site.ua/project/www.site.ua/files/, is sticking this process, kill <pid> not help .
It seems like zfs should identify bad blocks, but this does not happen.


#ls /www/site/data/www/site.ua/project/www.site.ua/files
and the session hangs...


# lsof | grep /www/site/data/www/site.ua/project/www.site.ua/
httpd 2720 www txt VREG 146,2806775880 18446742975789345120 1148769 /www/site/data/www/site.ua/project/www.site.ua/files/catalog.files/100-516070727.jpg
httpd 2720 www 57r VREG 146,2806775880 18446742975789345120 1148769 /www/site/data/www/site.ua/project/www.site.ua/files/catalog.files/100-516070727.jpg
mc 3067 root cwd VDIR 146,2806775880 18446742975782687200 1148593 /www/site/data/www/site.ua/project/www.site.ua/files
httpd 3068 www txt VREG 146,2806775880 18446742975782685024 1148707 /www/site/data/www/site.ua/project/www.site.ua/files/catalog.files/100-2d0e4312d8f09c72ede95fccdc012b02.jpg
httpd 3068 www 57r VREG 146,2806775880 18446742975782685024 1148707 /www/site/data/www/site.ua/project/www.site.ua/files/catalog.files/100-2d0e4312d8f09c72ede95fccdc012b02.jpg
csh 3069 root cwd VDIR 146,2806775880 18446742975782687200 1148593 /www/site/data/www/site.ua/project/www.site.ua/files
mc 8807 root cwd VDIR 146,2806775880 18446742975782622624 1148609 /www/site/data/www/site.ua/project/www.site.ua/files/catalog.files
csh 8808 root cwd VDIR 146,2806775880 18446742975782622624 1148609 /www/site/data/www/site.ua/project/www.site.ua/files/catalog.files

It does not help:
# kill -9 2720 3067 3068 3069

Only helps to reboot and reset the system unit.

# df -h
Filesystem           Size    Used   Avail Capacity  Mounted on
tank                 1.0T    808M    1.0T     0%    /
devfs                1.0K    1.0K      0B   100%    /dev
tank/backup          1.2T    170G    1.0T    14%    /backup
tank/tmp             1.0T     26M    1.0T     0%    /tmp
tank/usr             1.0T    3.4G    1.0T     0%    /usr
tank/usr/home        1.0T     30K    1.0T     0%    /usr/home
tank/usr/ports       1.0T    2.2G    1.0T     0%    /usr/ports
tank/usr/src         1.0T    333M    1.0T     0%    /usr/src
tank/var             1.0T    1.8M    1.0T     0%    /var
tank/var/crash       1.0T     22K    1.0T     0%    /var/crash
tank/var/db          1.0T     61M    1.0T     0%    /var/db
tank/mysql           1.0T    294M    1.0T     0%    /var/db/mysql
tank/mysql/ibdata    1.0T    3.2G    1.0T     0%    /var/db/mysql/ibdata
tank/mysql/iblogs    1.0T     10M    1.0T     0%    /var/db/mysql/iblogs
tank/var/log         1.0T    100M    1.0T     0%    /var/log
tank/var/mail        1.0T    538K    1.0T     0%    /var/mail
tank/var/run         1.0T     83K    1.0T     0%    /var/run
tank/var/tmp         1.0T     13M    1.0T     0%    /var/tmp
tank/www             1.1T     21G    1.0T     2%    /www
devfs                1.0K    1.0K      0B   100%    /var/named/dev


ZFS filesystem version 4
ZFS pool version 15

tank
    version=15
    name='tank'
    state=0
    txg=2790
    pool_guid=15415411259239146062
    hostid=814717323
    hostname='second.site.com'
    vdev_tree
        type='root'
        id=0
        guid=15415411259239146062
        children[0]
                type='mirror'
                id=0
                guid=16020562126957161505
                whole_disk=0
                metaslab_array=23
                metaslab_shift=33
                ashift=9
                asize=1483117166592
                is_log=0
                children[0]
                        type='disk'
                        id=0
                        guid=11217068100198816386
                        path='/dev/gpt/disk0'
                        whole_disk=0
                        DTL=120
                children[1]
                        type='disk'
                        id=1
                        guid=4665162630340381592
                        path='/dev/gpt/disk1'
                        whole_disk=0
                        DTL=118

# gpart show
=>        34  2930277101  ad4  GPT  (1.4T)
          34         128    1  freebsd-boot  (64K)
         162    33554432    2  freebsd-swap  (16G)
    33554594  2896722541    3  freebsd-zfs  (1.3T)

=>        34  2930277101  ad8  GPT  (1.4T)
          34         128    1  freebsd-boot  (64K)
         162    33554432    2  freebsd-swap  (16G)
    33554594  2896722541    3  freebsd-zfs  (1.3T)

# smartctl -a /dev/ad4
smartctl 5.39.1 2010-01-28 r3054 [FreeBSD 9.0-CURRENT amd64] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.11 family
Device Model:     ST31500341AS
Serial Number:    9VS2B140
Firmware Version: CC1H
User Capacity:    1 500 301 910 016 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Tue Dec 21 21:01:28 2010 EET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                 ( 609) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 255) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x103f) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   115   099   006    Pre-fail  Always       -       93576265
  3 Spin_Up_Time            0x0003   100   099   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       44
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   084   060   030    Pre-fail  Always       -       274870085
  9 Power_On_Hours          0x0032   091   091   000    Old_age   Always       -       8230
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   037   020    Old_age   Always       -       60
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   090   000    Old_age   Always       -       50
189 High_Fly_Writes         0x003a   073   073   000    Old_age   Always       -       27
190 Airflow_Temperature_Cel 0x0022   070   046   045    Old_age   Always       -       30 (Lifetime Min/Max 28/44)
194 Temperature_Celsius     0x0022   030   054   000    Old_age   Always       -       30 (0 20 0 0)
195 Hardware_ECC_Recovered  0x001a   049   017   000    Old_age   Always       -       93576265
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       4359391810201
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       367084491
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       1582503175

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


# smartctl -a /dev/ad8
smartctl 5.39.1 2010-01-28 r3054 [FreeBSD 9.0-CURRENT amd64] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.11 family
Device Model:     ST31500341AS
Serial Number:    9VS2B0HA
Firmware Version: CC1H
User Capacity:    1 500 301 910 016 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Tue Dec 21 21:01:07 2010 EET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                 ( 617) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 255) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x103f) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   112   099   006    Pre-fail  Always       -       44718699
  3 Spin_Up_Time            0x0003   100   100   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       41
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       1
  7 Seek_Error_Rate         0x000f   084   060   030    Pre-fail  Always       -       265723778
  9 Power_On_Hours          0x0032   091   091   000    Old_age   Always       -       8197
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   037   020    Old_age   Always       -       58
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   099   000    Old_age   Always       -       7
189 High_Fly_Writes         0x003a   080   080   000    Old_age   Always       -       20
190 Airflow_Temperature_Cel 0x0022   070   047   045    Old_age   Always       -       30 (Lifetime Min/Max 27/44)
194 Temperature_Celsius     0x0022   030   053   000    Old_age   Always       -       30 (0 19 0 0)
195 Hardware_ECC_Recovered  0x001a   053   014   000    Old_age   Always       -       44718699
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       70901320127123
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       3213332020
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       1673372002

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


>How-To-Repeat:

>Fix:


>Release-Note:
>Audit-Trail:
>Unformatted:


More information about the freebsd-amd64 mailing list