Sysadmin Mega Thread

I have a book but honestly I don’t end up using it much. For either you need a pretty decent networking background to understand what you’re looking at. It’s literally raw packets…

1 Like

you mean frames? :troll:

I don’t…

I’m just being facetious. Plz ignore

3 Likes

I have this. She has some of it. Im trying to explain some of the more advanced stuff.

She is familair with TCP dump like I am.

TCP dump is tits I love it. Wireshark is cool too but TCP Dump is instrumental in me creating new firewall rules usually

1 Like

I mean im open to resources you might have?

Are you just troubleshooting the firewall? If so, then just watch the packets and note when they’re blocked (or not blocked). If you want to get really in the weeds with the firewall rules, I highly recommend adding one at a time and testing them as you go. If you write up a dozen rules and then things don’t work, it’s harder to figure out what’s wrong.

If you’re troubleshooting something in the application layer, then that’s more complicated and you’ll have to familiarize yourself with whatever protocol(s) are in play…

Side note: if you’re serving ftp, the firewall rules get weird because it’s so ancient.

1 Like

Im afraid this is the information Im going to find out tonight during the screenshare via Zoom

Engineering consultations often go this way

This just happens to be in a part of stuff im only starting to get deep into. I figured id help her with it though. Otherwise I gotta show up on their corporate campus and go into their labs to do the work and understand what I am working with. For most that would require a TS-SCI clearance (given which facility it is) but fortunately I have one of those LOL

1 Like

@Dynamic_Gravity @oO.o @ThatGuyB small off the books consultation went well. Without revealing too much info it was essentially a software defined SATCOM unit and their smart switch had a iptables embedded and they fucked up the packet flow and a few other things. Didnt understand how it was actually supposed to work. Not too difficult of a fix. Great way to establish some contacts. Not gonna go in deeper given its sensitivity.

Taught a good few of them how the packets flow. This particular thing was instrumental.

Man arent diagrams everything. Picture says a thousand words. Good thing I brought that with me on the USB drive because the LAB has no internet just an intranet.

It was a RHEL based system too so I was right at home

It was a fun time though. They didnt know you could see the counters and TRACE log stuff through IPtables (once rules are setup for packet tracing). (FirewallD is a lot better at this). But it was fun.

Thanks for the help :+1: all the RHEL documentation helped too. Thanks for making me search for it

5 Likes

Find a Neckbeard? I will admit that my IPTables-fu is very dusty, but even back then, alot of it was still magic to me.

2 Likes

Can you fine folks suggest something for power monitoring? I need something cheap like a Kill-a-Watt but could be connected to a USB/LAN that can be monitored by a server.

I already have a “PDU” which is just a glorified powerstrip and while a PDU that can monitor power is nice to have, I’d rather not buy anything expensive right now (sub 100 USD budget, including shipping). Any suggestions? The powergrid is 220V, in SE Asia.

Cheap smart plug

2 Likes

How about a washed-up old smart UPS?

Looks like this PDU with SNMP card is just within your budget, too:

WTH, brand new HDDs, 1 single power cycle. 10 TB Ironwolf drives:
sda

#smartctl -a /dev/sda1
smartctl 7.1 2019-12-30 r5022 [aarch64-linux-4.9.277-83] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate IronWolf
Device Model:     ST10000VN0008-2PJ103
Serial Number:    
LU WWN Device Id: 
Firmware Version: SC61
User Capacity:    10,000,831,348,736 bytes [10.0 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-4 (minor revision not indicated)
SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sun Apr  3 17:00:15 2022 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  567) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 921) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x50bd) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   100   100   044    Pre-fail  Always       -       2383388
  3 Spin_Up_Time            0x0003   099   099   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       1
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   100   253   045    Pre-fail  Always       -       274512
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       0
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       1
 18 Unknown_Attribute       0x000b   100   100   050    Pre-fail  Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   055   044   040    Old_age   Always       -       45 (Min/Max 22/46)
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       0
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       3
194 Temperature_Celsius     0x0022   045   046   000    Old_age   Always       -       45 (0 22 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0023   100   100   001    Pre-fail  Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       0 (214 18 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       2356358
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       27030

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%         0         -
# 2  Extended offline    Aborted by host               90%         0         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.



-----


#smartctl -a  -v 7,raw48:54 /dev/sda1
## I read somewhere that the above command shows the actual details of the drive
smartctl 7.1 2019-12-30 r5022 [aarch64-linux-4.9.277-83] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate IronWolf
Device Model:     ST10000VN0008-2PJ103
Serial Number:    
LU WWN Device Id: 
Firmware Version: SC61
User Capacity:    10,000,831,348,736 bytes [10.0 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-4 (minor revision not indicated)
SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sun Apr  3 17:00:22 2022 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  567) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 921) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x50bd) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   100   100   044    Pre-fail  Always       -       2383388
  3 Spin_Up_Time            0x0003   099   099   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       1
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   100   253   045    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       0
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       1
 18 Unknown_Attribute       0x000b   100   100   050    Pre-fail  Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   055   044   040    Old_age   Always       -       45 (Min/Max 22/46)
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       0
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       3
194 Temperature_Celsius     0x0022   045   046   000    Old_age   Always       -       45 (0 22 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0023   100   100   001    Pre-fail  Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       0 (241 51 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       2356358
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       27030

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%         0         -
# 2  Extended offline    Aborted by host               90%         0         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

sdb

#smartctl -a /dev/sdb1
smartctl 7.1 2019-12-30 r5022 [aarch64-linux-4.9.277-83] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate IronWolf
Device Model:     ST10000VN0008-2PJ103
Serial Number:    
LU WWN Device Id: 
Firmware Version: SC61
User Capacity:    10,000,831,348,736 bytes [10.0 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-4 (minor revision not indicated)
SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sun Apr  3 17:04:53 2022 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  567) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 962) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x50bd) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   100   100   044    Pre-fail  Always       -       2379053
  3 Spin_Up_Time            0x0003   099   099   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       1
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   100   253   045    Pre-fail  Always       -       274168
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       0
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       1
 18 Unknown_Attribute       0x000b   100   100   050    Pre-fail  Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   055   050   040    Old_age   Always       -       45 (Min/Max 23/47)
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       0
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       2
194 Temperature_Celsius     0x0022   045   047   000    Old_age   Always       -       45 (0 23 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0023   100   100   001    Pre-fail  Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       0 (127 33 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       2356350
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       22703

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%         0         -
# 2  Extended offline    Aborted by host               90%         0         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

smartctl 7.1 2019-12-30 r5022 [aarch64-linux-4.9.277-83] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate IronWolf
Device Model:     ST10000VN0008-2PJ103
Serial Number:    
LU WWN Device Id: 
Firmware Version: SC61
User Capacity:    10,000,831,348,736 bytes [10.0 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-4 (minor revision not indicated)
SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sun Apr  3 17:06:48 2022 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  567) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 962) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x50bd) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   100   100   044    Pre-fail  Always       -       2379053
  3 Spin_Up_Time            0x0003   099   099   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       1
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   100   253   045    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       0
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       1
 18 Unknown_Attribute       0x000b   100   100   050    Pre-fail  Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   055   050   040    Old_age   Always       -       45 (Min/Max 23/47)
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       0
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       2
194 Temperature_Celsius     0x0022   045   047   000    Old_age   Always       -       45 (0 23 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0023   100   100   001    Pre-fail  Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       0 (141 194 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       2356350
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       22703

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%         0         -
# 2  Extended offline    Aborted by host               90%         0         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

The drives sound like they are scratching the plates with a dull knife when they spin up. I may have not touched a spinning rust drive in more than half a year, but I’m certain they shouldn’t be sounding like that. They remind me of an old, bad Maxtor 40 GB IDE HDD, that’s how bad those new drives sound like.

I’ll put them to a long test again, but wow, those raw read errors are horrible for brand new drives with not even 1h running.

A long test will make me wait 16 hours for the test to complete. I may not like this, but I may need to spend double on flash storage with 2 less TB. I don’t mind some slower qvo drives or something. I do intend to run them 24/7, even though I won’t be writing to them too often. Also, it would be flash storage, so I will have to worry less about the reliability of the drives, and being always powered on, losing electrons is not something to worry about.

I don’t know how to think about this. Anyone wants to help me interpret the SMART data? My older 2 TB drives do have raw read errors in the 200m range, but they are more than 10 years old, those are brand new and have 2m read errors.

Should I return those and buy flash storage?

Where are you seeing that?

My bad, there’s a scroll bar…

Run badblocks on it?

1 Like

You can’t go by the raw values for either the read or seek error rates on Seagate drives. The values raw are a 48-bit number that represents both the errors and total number of operations.

Values should be near zero if drive health is good with this command:

sudo smartctl -a -v 1,raw48:54 -v 7,raw48:54 /dev/sda

I haven’t used any of the IronWolf NAS drives, but the enterprise ones are insanely loud. The first time I heard them I swore the whole batch as defective, but nope… all was perfectly fine.

2 Likes

I didn’t know I could run -v twice in a row. Nice!
sda

  1 Raw_Read_Error_Rate     0x000f   100   100   044    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0003   099   099   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       1
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   100   253   045    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       2
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       1
 18 Unknown_Attribute       0x000b   100   100   050    Pre-fail  Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   050   044   040    Old_age   Always       -       50 (Min/Max 22/50)
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       0
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       5
194 Temperature_Celsius     0x0022   050   050   000    Old_age   Always       -       50 (0 22 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0023   100   100   001    Pre-fail  Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       2 (179 79 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       2356358
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       36726

sdb

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   100   100   044    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0003   099   099   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       1
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   100   253   045    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       2
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       1
 18 Unknown_Attribute       0x000b   100   100   050    Pre-fail  Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   049   049   040    Old_age   Always       -       51 (Min/Max 23/51)
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       0
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       4
194 Temperature_Celsius     0x0022   051   051   000    Old_age   Always       -       51 (0 23 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0023   100   100   001    Pre-fail  Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       2 (92 186 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       2356350
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       22703

This looks more like what I was looking for.

I used both Ironwolfs and Exos and they were a bit loud, but never sounded “crunchy” / “scratchy,” if you understand what I mean. It seems like after 1h, they got more stable somehow. Maybe the oil (or whatever lubricant used) on the bearings was too greasy, before the HDDs warmed up a bit? I did leave them inside for 24h before plugging them in.

I’ll run a stress-test on them after the long smart test finishes, then another smart test. It might take 4 days for this to finish, but I would rather make sure those things work properly and not have them fail after 3-9 months, like it happened to me in the past with other HDDs. I didn’t know of badblocks, but it’s something that I’ve been looking for. I was about to just write random data to the HDD and look for any reallocated sectors, but I didn’t think of checking the data written on them as well.

After that, I’ll need to move my 3TB-ish data over to these new drives. One funny note regarding my 10 year HDDs is that SMART doesn’t show anything weird, all tests are passed, the “value” and “worst” are higher than the “threshold,” but ZFS reports that my pool is degraded, with 26M of failed checksums.

FAULTED      0    35 26.3M  too many errors

I trust ZFS more than I trust SMART. I was planning to move my data anyway, even without the small write error, the old drives had 5 years of Power_On_Hours. I may have been confusing the year when those models were first made, 2013, with their power-on time, so they’re not technically 10 years old, probably like 6-7, depending on how long they stayed in warehouses + 1 year for being standby drives on my shelf, but getting close to 8 years old.

Ok, I just realized that the disk with errors had this line:

199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       6

I knew ZFS wasn’t lying.

Now, transferring the data will take an excruciating amount of time. If I can get 10 Mbps, I will consider myself lucky, and I don’t want the disks shipped to me. It will probably take around 20 days to transfer if I’m lucky with the 10 Mbps speed. I wonder what’s the best method to transfer a ZFS pool over a slow and unreliable network, zfs-send or simply rsync. I have read somewhere that it may be worth piping zfs-send into bzip and then send it via rsync, but for me, rsync wasn’t the most reliable when sending 1 large file over the internet.

Thoughts?

When my Toshiba drives spin up they also don’t sound healthy. I have to make a recording on of these days.

1 Like

The first one is 120V and the other one doesnt say.

The 14tb Exos aren’t terribly loud… at least not compared to the server fans (4U, so not screamers).

1 Like