ZFS replace "device is too small" even though drives are the same size?

Hello,
I have a zfs pool with a singe raidz1 vdev:

config:

        NAME        STATE     READ WRITE CKSUM
        zfs         ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            sdf1    ONLINE       0     0     0
            sdi1    ONLINE       0     0     0
            sdg1    ONLINE       0     0     0
            sdj1    ONLINE       0     0     0

Recently a couple of the drives (sdg, sdj) in that vdev have been giving me some read seek errors (same model). While I evaluate how much of an issue those errors are, I want to swap out one of the drives with a spare 8TB drive I have (sdh) from a different manufacturer since my vdev is only tolerant for 1 drive failure.

However, when I run the replace command, I get an error that the new drive is too small, despite also being an 8TB drive:

root@Tower:~# zpool replace zfs sdg1 /dev/sdh
cannot replace sdg1 with /dev/sdh: device is too small

I ran fdisk-l to check the exact byte size of each disk and they are identical:

Disk /dev/sdg: 7.28 TiB, 8001563222016 bytes, 15628053168 sectors
Disk /dev/sdh: 7.28 TiB, 8001563222016 bytes, 1953506646 sectors

Where am I going wrong here?

What’s your ashift? If it is 9, you’ll be having problems mixing the different sector sized drives.

1 Like

Also sdh has a different sector size.
I guess the new disk has 4096 Bytes sector size and the old ones have 512 Bytes per sector.

1 Like

This is why you consider padding your drives unless you’re an Enterprise business and can stockpile or source the exact same model for years.

2 Likes
15628053168/1953506646 = 8 

I think you’re right on this one. I’m not sure why ZFS is complaining, but it’s probably because of this, as capacity is identical.

Usually this happens if you don’t use some padding partition as @diizzy suggested and have varying manufacturers where 4kb difference can disqualify a disk because it is not identical (==) in capacity. But I never mixed 512b and 4k drives at the same time, so I never have seen this problem myself.

1 Like

Out of curiosity, does smartctl -a device-name-goes-here show the same data between two “identical” drives?

There’s also a nifty command in FreeBSD called diskinfo which can show useful information

# diskinfo -v ada0
ada0
        512             # sectorsize
        8001563222016   # mediasize in bytes (7.3T)
        15628053168     # mediasize in sectors
        4096            # stripesize
        0               # stripeoffset
        15504021        # Cylinders according to firmware.
        16              # Heads according to firmware.
        63              # Sectors according to firmware.
        TOSHIBA HDWN180 # Disk descr.
        REDACTED        # Disk ident.
        ahcich6         # Attachment
        No              # TRIM/UNMAP support
        7200            # Rotation rate in RPM
        Not_Zoned       # Zone Mode

Intel+Dell chipset RAID for instance did let me initialise the RAID but after every reboot it would be broken. So I guess the error message of ZFS is bad but it will probably protect your data in the long run.

I guess, if you have the space in your case and the money this is a good moment to upgrade to bigger disks?

@Exard3k This limitation due to how parity is handled not just in ZFS but also in hardware RAID controllers as you cannot have drives with different sector arrangements as the RAID distribution of data as well as the parity calculations will not work correctly. So, this means you cannot mix 512b and 4k drives in the same pool/array/virtual disk.

ZFS handles parity information at the recordsize level as opposed to the sector size level. As long as the ashift value allows it, mixing hdds with different sector sizes within a pool can be done in ZFS; hardware raid doesn’t allow you to do this, and even in ZFS it probably isn’t a best practice for maximum performance.

Likely what has happened to OP is that they have their ashift value set to 512b which is making sdh look like a 1TB drive since only 1/8th of each sector can be used.

a 4kb ashift value should allow 512b and 4kb sector drives to coexist in peace, but the pool will need to be recreated to change ashift values.

pretty sure I’ve got mixed 512b, 4kn and 512e drives. just use ashift=12 when creating the pool/vdev?

(I know each vdev in a pool can have a different ashift, but I try to keep to 12 whenever adding a vdev, or creating a pool. Unless ssd. then 13. and HDD’s are okay in ashift=13 too… iirc)

1 Like

Ashift is 12 on this pool

root@Tower:~# zpool get ashift
NAME  PROPERTY  VALUE   SOURCE
zfs   ashift    12      local

Only difference I see (besides the scary SMART warnings) is that the disk currently in the pool (sdg) says it has 512 byte “logical” sectors and 4096 physical sectors, whereas the new drike is 4k for both.

sdg (drive currently in pool, possibly failing):

sdg
root@Tower:~# smartctl -a /dev/sdg
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.49-Unraid] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Toshiba N300/MN NAS HDD
Device Model:     TOSHIBA HDWG180
Serial Number:    X040A0CEFBEG
LU WWN Device Id: 5 000039 a78c888c3
Firmware Version: 0603
User Capacity:    8,001,563,222,016 bytes [8.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database 7.3/5533
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Mon Oct 16 16:29:13 2023 MST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
See vendor-specific Attribute list for failed Attributes.

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (  73) The previous self-test completed having
                                        a test element that failed and the test
                                        element that failed is not known.
Total time to complete Offline 
data collection:                (  120) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine 
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 842) minutes.
SCT capabilities:              (0x003d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   050    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   100   100   050    Pre-fail  Offline      -       0
  3 Spin_Up_Time            0x0027   100   100   001    Pre-fail  Always       -       8074
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       43
  5 Reallocated_Sector_Ct   0x0033   100   100   050    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000b   040   001   050    Pre-fail  Always   FAILING_NOW 0
  8 Seek_Time_Performance   0x0005   100   100   050    Pre-fail  Offline      -       0
  9 Power_On_Hours          0x0032   038   038   000    Old_age   Always       -       24904
 10 Spin_Retry_Count        0x0033   100   100   030    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       42
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       7
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       19
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       506
194 Temperature_Celsius     0x0022   100   100   000    Old_age   Always       -       36 (Min/Max 19/47)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
220 Disk_Shift              0x0002   100   100   000    Old_age   Always       -       1441804
222 Loaded_Hours            0x0032   039   039   000    Old_age   Always       -       24509
223 Load_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
224 Load_Friction           0x0022   100   100   000    Old_age   Always       -       0
226 Load-in_Time            0x0026   100   100   000    Old_age   Always       -       538
240 Head_Flying_Hours       0x0001   100   100   001    Pre-fail  Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed: unknown failure    90%     24874         0
# 2  Extended offline    Completed: unknown failure    90%     24874         0
# 3  Extended offline    Completed: unknown failure    90%     24874         0
# 4  Short offline       Completed: unknown failure    90%     24863         0

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

sdh (new drive):

sdh
root@Tower:~# smartctl -a /dev/sdh
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.49-Unraid] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Enterprise Capacity 3.5 HDD
Device Model:     ST8000NM0045-1RL112
Serial Number:    ZA1JWGLZ
LU WWN Device Id: 5 000c50 0c6f99f96
Firmware Version: UG07
User Capacity:    8,001,563,222,016 bytes [8.00 TB]
Sector Size:      4096 bytes logical/physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database 7.3/5533
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Mon Oct 16 16:29:53 2023 MST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever 
                                        been run.
Total time to complete Offline 
data collection:                (  567) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine 
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 762) minutes.
Conveyance self-test routine
recommended polling time:        (   3) minutes.
SCT capabilities:              (0x70bd) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   076   064   044    Pre-fail  Always       -       4574308
  3 Spin_Up_Time            0x0003   090   090   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       11
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   092   061   045    Pre-fail  Always       -       1473853670
  9 Power_On_Hours          0x0032   097   097   000    Old_age   Always       -       3007 (229 249 0)
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       11
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0 0 0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   060   056   040    Old_age   Always       -       40 (Min/Max 39/43)
191 G-Sense_Error_Rate      0x0032   099   099   000    Old_age   Always       -       2428
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       83
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       130
194 Temperature_Celsius     0x0022   040   044   000    Old_age   Always       -       40 (0 26 0 0 0)
195 Hardware_ECC_Recovered  0x001a   006   001   000    Old_age   Always       -       4574308
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
203 Run_Out_Cancel          0x00b3   100   100   099    Pre-fail  Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       2993h+50m+59.202s
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       5509984416
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       44512512145

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Unfortunately, I don’t have a super easy way of running that (slackware linux/unRAID). I could give it a go if we really need the info from it though.

What’s puzzling to me is that I’m not sure how much I would actually need to pad the drives by. The drives are actually the same exact size; zfs isn’t telling me just how undersized the new disk is.

No hardware RAID here, just an LSI HBA flashed to IT mode for these particular drives.

From my understanding, ashift=12 would lead to a block size of 4096, which would line up with the 4096 sectors on my new disk. Very strange. I’m still pretty new to ZFS so here are the rest of my properties in case there’s something else I’ve messed up here:

properties
root@Tower:~# zpool get all
NAME  PROPERTY                       VALUE                          SOURCE
zfs   size                           29.1T                          -
zfs   capacity                       75%                            -
zfs   altroot                        -                              default
zfs   health                         ONLINE                         -
zfs   guid                           17062760427266059542           -
zfs   version                        -                              default
zfs   bootfs                         -                              default
zfs   delegation                     on                             default
zfs   autoreplace                    off                            default
zfs   cachefile                      -                              default
zfs   failmode                       wait                           default
zfs   listsnapshots                  off                            default
zfs   autoexpand                     on                             local
zfs   dedupratio                     1.00x                          -
zfs   free                           7.18T                          -
zfs   allocated                      21.9T                          -
zfs   readonly                       off                            -
zfs   ashift                         12                             local
zfs   comment                        -                              default
zfs   expandsize                     -                              -
zfs   freeing                        0                              -
zfs   fragmentation                  1%                             -
zfs   leaked                         0                              -
zfs   multihost                      off                            default
zfs   checkpoint                     -                              -
zfs   load_guid                      1792945199749377827            -
zfs   autotrim                       off                            default
zfs   compatibility                  off                            default
zfs   feature@async_destroy          enabled                        local
zfs   feature@empty_bpobj            active                         local
zfs   feature@lz4_compress           active                         local
zfs   feature@multi_vdev_crash_dump  enabled                        local
zfs   feature@spacemap_histogram     active                         local
zfs   feature@enabled_txg            active                         local
zfs   feature@hole_birth             active                         local
zfs   feature@extensible_dataset     active                         local
zfs   feature@embedded_data          active                         local
zfs   feature@bookmarks              enabled                        local
zfs   feature@filesystem_limits      enabled                        local
zfs   feature@large_blocks           enabled                        local
zfs   feature@large_dnode            active                         local
zfs   feature@sha512                 enabled                        local
zfs   feature@skein                  enabled                        local
zfs   feature@edonr                  enabled                        local
zfs   feature@userobj_accounting     active                         local
zfs   feature@encryption             enabled                        local
zfs   feature@project_quota          active                         local
zfs   feature@device_removal         enabled                        local
zfs   feature@obsolete_counts        enabled                        local
zfs   feature@zpool_checkpoint       enabled                        local
zfs   feature@spacemap_v2            active                         local
zfs   feature@allocation_classes     enabled                        local
zfs   feature@resilver_defer         enabled                        local
zfs   feature@bookmark_v2            enabled                        local
zfs   feature@redaction_bookmarks    enabled                        local
zfs   feature@redacted_datasets      enabled                        local
zfs   feature@bookmark_written       enabled                        local
zfs   feature@log_spacemap           active                         local
zfs   feature@livelist               enabled                        local
zfs   feature@device_rebuild         enabled                        local
zfs   feature@zstd_compress          enabled                        local
zfs   feature@draid                  enabled                        local

I was not expecting that, what does the command zdb | egrep 'ashift|vdev|type' | grep -v disk yield?

This is the output when I run that:

root@Tower:~# zdb | egrep 'ashift|vdev|type' | grep -v disk
    com.delphix:has_per_vdev_zaps
    vdev_children: 1
    vdev_tree:
        type: 'root'
            type: 'raidz'
            ashift: 12
            com.delphix:vdev_zap_top: 129
                com.delphix:vdev_zap_leaf: 130
                com.delphix:vdev_zap_leaf: 131
                com.delphix:vdev_zap_leaf: 132
                com.delphix:vdev_zap_leaf: 133

What if one did a DD copy of an existing member drive to the new disk, then swap the the drives?

The UUID would be copied across, and all part labels, so system should assemble the array with the clone of the drive?

Obviously if both drives with the same UUID are conecred, the system will grab one pretty much at random/logicalz and use that, but might cause issues if running with them both in?
Probably best with pool exported…

hmmm… ashift doesn’t seem to be the problem.

Perhaps the error message can be taken at face value; that a 4kn disk the same size as a 512e disk isn’t big enough for ZFS due to the “wasted” space that occurs on 4kn disks when storing structures that are less than 3.75kb.
It’d take someone with a deeper understanding of how ZFS is structured on disk (maybe the DSL is producing alot of small distinct structures) than me to confirm that though.

1 Like

Was a solution to this found? I seem to have run into the same problem!

Execute diskinfo -v on the new replacement and a current one in the same array/mirror and post output

No solution so far, I still have these 2 disks occasionally kicking out seek errors but I honestly haven’t had the time to deep dive into it with university taking so much time.

As mentioned above I’m running Linux (unRAID) so unfortunately I don’t have diskinfo available to me

In that case I guess I would need to buy a new drive, I guess I would have to get something with 512 logical sectors?

ohh I did a typo, that should be 3.75KB.
Yes a 512 byte logical sector size drive if you want to replace the drive in existing pool (assuming you stay at 8TB sized drives)… however if you recreated the pool from scratch it should accept the 4kn drive just fine.

So I found myself in this situation, the reason is that zpool/zfs is stupid.

Basically when given a full disk zfs will let the first partition start at 2048 sectors and have the partition at the end also be 16384 sectors. But it does not devide these values by 8 when being given a 4k native disk so you end up with a (2048+16384)*(4096-512)=63MB smaller paritition for the actual ZFS partition. If I manually give a partition instead of a full disk, ZFS accepts that as replacement for the failed 512byte disk. I don’t have the whole_disk flag on this disk as results but at least I have redundancy again.

2 Likes