Keeping track of disk statistics, I suddenly received a write time of 2 days.
Looking at this: https://www.kernel.org/doc/Documentation/ABI/testing/procfs-diskstats
That’s field 11. Which in the documentation: https://www.kernel.org/doc/Documentation/admin-guide/iostats.rst
Says:
Field 11 -- weighted # of milliseconds spent doing I/Os (unsigned int)
This field is incremented at each I/O start, I/O completion, I/O
merge, or read of these stats by the number of I/Os in progress
(field 9) times the number of milliseconds spent doing I/O since the
last update of this field. This can provide an easy measure of both
I/O completion time and the backlog that may be accumulating.
Now, the device in question is a RAID of about 50 disks, not sure on the exact amount.
Could it be that this peak is a cumulative value for all those disks? That the array had something to flush, or perhaps an internal error.
Usually I don’t have to dive into stats like this, so I’m a little out of my depth.
Appreciate any help or pointers.
Could you give us more details on the RAID array please?
Are they SAS or SATA, what model?
" …by the number of I/Os in progress (field 9) times the number of milliseconds spent doing I/O…"
So if I’m reading that correctly the field is incremented by # of IOs * milliseconds doing IO
. So for a busy RAID, with an uptime of a month or more, I could easily see it being a large value. It sounds like an incrementing field designed for a monitoring tool to determine rate of change from.
If you’re interested in disk latency then run iostat -x 5
, specifically await fields. Besides that the bcc-tools package has the biolatency
tool, which shows a histogram of block device latency. (Bcc tools is a collection of handy ebpf scripts). There is also a biotop script which can be useful.
a disk array that large with write verification will take a considerable amount of time but reporting software is generally conservative of active memory depending on workload.
Actual time may be a bit shorter dependant on how much system memory is allocated and the constraints of the drives write speeds.
Can’t provide more details as I’m not savy to the details of the system.
Sounds like there are things I can check, and the option of a sudden spike is possible.
Thanks for the help.