MariaDB weird read/write sizes

I was recently trying to see if
strace ... -e trace=open,close,read,write,openat -o ...
could be used to determine what write and read sizes (how large blocks) a given process does. Purpose would to try empirically determine what the optimal recordsize for a zfs dataset would be - for applications where this might not be known or deducible.

So I figured I’d try on an application that has a known optimal recordsize: mariadb (innoDB) that supposedly writes in 16K records according to the openZFS docs.

The problem is that when i run the above strace command on the mariadb process, absolutely not a single read or write is 16K nor are they one set size. I’m also getting confused about a very large amount being written to the /tmp folder, which is not on the mariadb “optimised” dataset(s)
(which are at /mnt/mariaDB/ or /mnt/mariadb/log)…

#Please note these are just some examples I picked, don't
#put much focus on the order or the file descriptor number
13246 openat(AT_FDCWD, "./blabla/blabla.frm", O_RDONLY|O_CLOEXEC) = 45
13246 read(45, "data"..., 64) = 64
13246 read(45, "data"..., 2076) = 2076
13245 openat(AT_FDCWD, "/tmp/#sql-temptable-2ab-1596-841.MAI", O_RDWR|O_CREAT|O_TRUNC|O_NOFOLLOW|O_CLOEXEC, 0660) = 45
13245 write(45, "data"..., 215) = 215
13245 write(45, "data"..., 135) = 135
13245 write(45, "data", 20) = 20
13245 openat(AT_FDCWD, "/tmp/#sql-temptable-2ab-1596-84b.MAI", O_RDWR|O_CREAT|O_TRUNC|O_NOFOLLOW|O_CLOEXEC, 0660) = 172
13245 write(172, "data"..., 239) = 239
13245 write(172, "data"..., 135) = 135
13245 write(172, "data", 18) = 18
13245 write(45, "data"..., 4096) = 409613238 openat(AT_FDCWD, "/tmp/#sql-temptable-2ab-15c1-86b.MAD", O_RDWR|O_CLOEXEC) = 172
13238 read(172, "data"..., 13212) = 13212

Some observations are:

largest writes are 4096 bytes, all to a “/tmp/#sql-temptable-something.MAI”. I could find no write larger.

Largest reads are 13212 bytes, also from a “/tmp/#sql-templatable…” file.

There are some write and read sizes that occur multiple times, but none are 16K.

Where are the supposed 16K innoDB reads/writes?

Am I using strace wrong or misunderstanding the output? Have I configured mariadb wrongly? Why are things being written to /tmp (which is not tmpfs and on slow hdds + a nvme zfs special device for those who are interested).

Please halp

Not a MariaDB/MySQL user, but I notice InnoDB mentions AIO. I’m also not an AIO user :smiley: but it looks like one would be making a series of io_ syscalls, not the traditional read/write syscalls.

It might be of interest to cross reference your observed write sizes with bpftrace and friends.

1 Like