I was recently trying to see if
strace ... -e trace=open,close,read,write,openat -o ...
could be used to determine what write and read sizes (how large blocks) a given process does. Purpose would to try empirically determine what the optimal recordsize for a zfs dataset would be - for applications where this might not be known or deducible.
So I figured I’d try on an application that has a known optimal recordsize: mariadb (innoDB) that supposedly writes in 16K records according to the openZFS docs.
The problem is that when i run the above strace command on the mariadb process, absolutely not a single read or write is 16K nor are they one set size. I’m also getting confused about a very large amount being written to the /tmp folder, which is not on the mariadb “optimised” dataset(s)
(which are at /mnt/mariaDB/ or /mnt/mariadb/log)…
#Please note these are just some examples I picked, don't
#put much focus on the order or the file descriptor number
13246 openat(AT_FDCWD, "./blabla/blabla.frm", O_RDONLY|O_CLOEXEC) = 45
13246 read(45, "data"..., 64) = 64
13246 read(45, "data"..., 2076) = 2076
13245 openat(AT_FDCWD, "/tmp/#sql-temptable-2ab-1596-841.MAI", O_RDWR|O_CREAT|O_TRUNC|O_NOFOLLOW|O_CLOEXEC, 0660) = 45
13245 write(45, "data"..., 215) = 215
13245 write(45, "data"..., 135) = 135
13245 write(45, "data", 20) = 20
13245 openat(AT_FDCWD, "/tmp/#sql-temptable-2ab-1596-84b.MAI", O_RDWR|O_CREAT|O_TRUNC|O_NOFOLLOW|O_CLOEXEC, 0660) = 172
13245 write(172, "data"..., 239) = 239
13245 write(172, "data"..., 135) = 135
13245 write(172, "data", 18) = 18
13245 write(45, "data"..., 4096) = 409613238 openat(AT_FDCWD, "/tmp/#sql-temptable-2ab-15c1-86b.MAD", O_RDWR|O_CLOEXEC) = 172
13238 read(172, "data"..., 13212) = 13212
Some observations are:
largest writes are 4096 bytes, all to a “/tmp/#sql-temptable-something.MAI”. I could find no write larger.
Largest reads are 13212 bytes, also from a “/tmp/#sql-templatable…” file.
There are some write and read sizes that occur multiple times, but none are 16K.
Where are the supposed 16K innoDB reads/writes?
Am I using strace wrong or misunderstanding the output? Have I configured mariadb wrongly? Why are things being written to /tmp (which is not tmpfs and on slow hdds + a nvme zfs special device for those who are interested).
Please halp