[SOLVED] Most of my filesystem became readonly

I am working from home and suddenly my FS became read only. I have access on very few folders but the rest is read-only.

I can use /home/username but I cannot use /home/username/Pictures, or Documents, or any other folder. Also I cannot use /
I checked journal for entries but I couldn’t find any particular related to a device. Also, I have multiple mounts, SSDs, M.2s, Samba drives and neither of them is accessible, therefore it shouldn’t be a single device, correct?

Please advice as my “work PC” is a virtual machine and it fails to run cause of this issue, and I need to work…

Steps taken so far:
Checked NTFS mount points (not automounted, used only from certain VMs)
Checked with other Kernels (5.10, 5.12, 5.15, 5.16)
Checked folder user/group access (there some strange thing there, like group owner was ‘videos’, changed it to the correct, still inaccessible)

I can rwx on /home/username/Data which is mounted on a RAID (/dev/md127p1)
I cannot on /home/username/Pictures which is mounted on the system disk (/dev/nvme0n1p2)
I cannon on /home/username/VirtualMachines which is mounted on a M.2 disk (/dev/nvme2n1p2)

All of them are EXT4

Checked with other Kernels? Does this mean you’ve rebooted? Did you actually check your logs, dmesg, SMART? Have you fsck’d the disks?

1 Like

While I was checking the logs/smart, it turned out the /dev/sda1 had some errors, which was part of /dev/md127.

I marked the disk as failed, removed the disk from the pc completely, booted up but I am still unable to use the filesystem!

edit: yes, I used a live usb and fsck all EXT4 disks. That before removing the faulty HDD

Well obviously you can’t use the FS if it’s a failed RAID array unless you have a redundant disk to the failed one, you never said what RAID mode it was in. Unless the errors were serious it would have been smarter to leave the drive in so you could at least READ the data to migrate to another disk or array, now you can’t do jack.

I don’t know if you’re panicked or don’t understand your own set up but I’m too tired for this vague discourse heh. I’ve had enough years of “I have a problem!” OK what’s wrong? “The thing it doesn’t do the thing it’s supposed to!” Good luck.

1 Like

Have a thread bump (for all its worth).

At least it is not that catastrophic. I mean you could have lost the ability to read files…

Well, you are completely wrong…

  1. I do have backup of every file from yesterday.
  2. In order to migrate to another drive you have to REMOVE they faulty drive first and then replace it.
  3. I am using the RAID without problem, that is the essence of having a RAID on the first place, just a bit risky until you replace the faulty drive and re-sync
  4. My goal is to get the FS back to rwx mode, not save the data from there, and that was my original question.

Thankfully I haven’t lost my work/files, but KVM cannot access any file, so I cannot run my Work VM. That is what I need for now, not to protect my data.

Haven’t been more frustrated in my life…!!!

I removed the faulty HDD (if that was the actual issue).
Broke the RAID, deleted the other disk and created an EXT4 partition.

Still not working.

Since I need to work tomorrow, cannot afford another day off, I did a clean Manjaro installation (deleted only the primary disk).

Everything looked like it was working ok, until now, where I tried to edit a text file with some notes, that is not even locally, it is on a NAS share and I got the same error!!!

After that, my FS is locked again!!!

HELP!

Maybe the SATA connectors are having an issue? Have you tried replacing it?

ext4’s ext4_handle_error function is the entry point for many types of detected errors, which remounts the fs read-only and outputs “Remounting filesystem read-only” to syslog at CRIT severity - do you see these messages in dmesg? If you do not, then it could be that something else is remounting the fs - you can use auditd to find that.

Remember that a fsck’d fs can still have bad data, permissions, attrs, etc - it’s just that the metadata should be (hopefully) consistent.

Sounds like time for some memtest, ECC RAM and ZFS!

Broseph, that is not the right attitude when you are asking for help but not providing much information.

Basic questions still stand. What are the syslogs and dmesg outputting? What does fsck say when you have it scan the disks?

What are the logs saying? If you are locking up after working on a networked file, it is possibly the main board or a memory issue but you need to read your journalctl or something.

You are right, but I replied with a similar attitude I was approached. The guy “I’ve had enough years of…” so I didn’t want him to spend some more for me.

Regardless, I posted before, that there are no errors in the logs, dmesg looks ok, fsck fixed few things on one disk, and still wasn’t working. If somebody is suspicious of a certain error in the logs, they should provide some keywords to filter by, so I know what to look for.

And since it was urgent, I had to reinstall OS from scratch to be able to work. Therefore, no more troubleshooting, problem “solved”.