[SOLVED] Most of my filesystem became readonly

lI_Simo_Hayha_Il · January 24, 2022, 11:04am

I am working from home and suddenly my FS became read only. I have access on very few folders but the rest is read-only.

I can use /home/username but I cannot use /home/username/Pictures, or Documents, or any other folder. Also I cannot use /
I checked journal for entries but I couldn’t find any particular related to a device. Also, I have multiple mounts, SSDs, M.2s, Samba drives and neither of them is accessible, therefore it shouldn’t be a single device, correct?

Please advice as my “work PC” is a virtual machine and it fails to run cause of this issue, and I need to work…

lI_Simo_Hayha_Il · January 24, 2022, 11:51am

Steps taken so far:
Checked NTFS mount points (not automounted, used only from certain VMs)
Checked with other Kernels (5.10, 5.12, 5.15, 5.16)
Checked folder user/group access (there some strange thing there, like group owner was ‘videos’, changed it to the correct, still inaccessible)

I can rwx on /home/username/Data which is mounted on a RAID (/dev/md127p1)
I cannot on /home/username/Pictures which is mounted on the system disk (/dev/nvme0n1p2)
I cannon on /home/username/VirtualMachines which is mounted on a M.2 disk (/dev/nvme2n1p2)

All of them are EXT4

get_off_my_lawn · January 24, 2022, 1:53pm

Checked with other Kernels? Does this mean you’ve rebooted? Did you actually check your logs, dmesg, SMART? Have you fsck’d the disks?

lI_Simo_Hayha_Il · January 24, 2022, 2:10pm

While I was checking the logs/smart, it turned out the /dev/sda1 had some errors, which was part of /dev/md127.

I marked the disk as failed, removed the disk from the pc completely, booted up but I am still unable to use the filesystem!

edit: yes, I used a live usb and fsck all EXT4 disks. That before removing the faulty HDD

get_off_my_lawn · January 24, 2022, 2:18pm

Well obviously you can’t use the FS if it’s a failed RAID array unless you have a redundant disk to the failed one, you never said what RAID mode it was in. Unless the errors were serious it would have been smarter to leave the drive in so you could at least READ the data to migrate to another disk or array, now you can’t do jack.

I don’t know if you’re panicked or don’t understand your own set up but I’m too tired for this vague discourse heh. I’ve had enough years of “I have a problem!” OK what’s wrong? “The thing it doesn’t do the thing it’s supposed to!” Good luck.

regulareel · January 24, 2022, 2:18pm

Have a thread bump (for all its worth).

At least it is not that catastrophic. I mean you could have lost the ability to read files…

lI_Simo_Hayha_Il · January 24, 2022, 2:24pm

Well, you are completely wrong…

I do have backup of every file from yesterday.
In order to migrate to another drive you have to REMOVE they faulty drive first and then replace it.
I am using the RAID without problem, that is the essence of having a RAID on the first place, just a bit risky until you replace the faulty drive and re-sync
My goal is to get the FS back to rwx mode, not save the data from there, and that was my original question.

lI_Simo_Hayha_Il · January 24, 2022, 2:25pm

Thankfully I haven’t lost my work/files, but KVM cannot access any file, so I cannot run my Work VM. That is what I need for now, not to protect my data.

lI_Simo_Hayha_Il · January 24, 2022, 9:10pm

Haven’t been more frustrated in my life…!!!

I removed the faulty HDD (if that was the actual issue).
Broke the RAID, deleted the other disk and created an EXT4 partition.

Still not working.

Since I need to work tomorrow, cannot afford another day off, I did a clean Manjaro installation (deleted only the primary disk).

Everything looked like it was working ok, until now, where I tried to edit a text file with some notes, that is not even locally, it is on a NAS share and I got the same error!!!

After that, my FS is locked again!!!

HELP!

regulareel · January 24, 2022, 11:04pm

Maybe the SATA connectors are having an issue? Have you tried replacing it?

xzpfzxds · January 24, 2022, 11:51pm

ext4’s ext4_handle_error function is the entry point for many types of detected errors, which remounts the fs read-only and outputs “Remounting filesystem read-only” to syslog at CRIT severity - do you see these messages in dmesg? If you do not, then it could be that something else is remounting the fs - you can use auditd to find that.

Remember that a fsck’d fs can still have bad data, permissions, attrs, etc - it’s just that the metadata should be (hopefully) consistent.

Sounds like time for some memtest, ECC RAM and ZFS!

Mastic_Warrior · January 25, 2022, 1:41pm

Broseph, that is not the right attitude when you are asking for help but not providing much information.

Basic questions still stand. What are the syslogs and dmesg outputting? What does fsck say when you have it scan the disks?

What are the logs saying? If you are locking up after working on a networked file, it is possibly the main board or a memory issue but you need to read your journalctl or something.

lI_Simo_Hayha_Il · January 25, 2022, 1:51pm

You are right, but I replied with a similar attitude I was approached. The guy “I’ve had enough years of…” so I didn’t want him to spend some more for me.

Regardless, I posted before, that there are no errors in the logs, dmesg looks ok, fsck fixed few things on one disk, and still wasn’t working. If somebody is suspicious of a certain error in the logs, they should provide some keywords to filter by, so I know what to look for.

And since it was urgent, I had to reinstall OS from scratch to be able to work. Therefore, no more troubleshooting, problem “solved”.