2010-12-24

Fixing umountable file system in Ubuntu

I find the current version (10.10, Maverick Meerkat) of Ubuntu Linux pretty reliable, but it seems to fail to handle one special situation correctly. If you suddenly run out of the disk space on the main partition (e.g., an application writing out lots more data than you should), you may suddenly find it that you can't save the situation by removing some files: the file system will suddenly appear as "mounted in read-only mode". On reboot, the main partition will show as unmountable. If you boot from an Ubuntu CD, an attempt to run e2fsc on that partition fails, because the partition shows as "busy".

Explanation?

I conjecture that one can't run "e2fsck /dev/sda1" from Ubintu Live CD because Ubuntu tries to mount the (now unmountable) partition during its start-up process, and the mounting process just sits there without giving up. This is why if you do "sudo lsof |grep sda1", you get a report like this:
jbd2/sda1 296 root cwd DIR 0,17 300 2 /
jbd2/sda1 296 root rtd DIR 0,17 300 2 /
jbd2/sda1 296 root txt unknown /proc/296/exed
and then, when you do "ps auxw | grep 296", you learn that it is a kernel-originating process that keeps the device busy:
root 296 0.0 0.0 0 0 ? S 21:36 0:00 [jbd2/sda1-8]
I tried to figure out how to prevent Ubuntu Live CD from trying to mount /dev/sda1, but couldn't: it seemed that adding options such as "sda=noprobe" (or should it have been "sda1=noprobe"?) to the boot command line had no effect.

Solutions

It seemed that other people with the same problem solved it by booting from a Slax live CD, rather than a Ubuntu one. But as I did not have a working Slax CD (the CD writer I used was not quite compatible with the CD reader), that did not work for me. Sanjaya Karunasena proposes a working solution for recovery. It turns out that even though /dev/sda1 is no mountable and can't be fsck'd, it is still accessible by the bulk copy (dd) command! So what he suggests is: * copying the entire "bad" partition to a file (an "image file") some other device (such as a big enough external hard drive) with dd, * runnning e2fsc on that file (yes, you can do it, if the file is an image of a partition) * re-writing the original corrupted partition by copying the image file back to it with dd. In between (after e2fsck), you can loop-mount the corrected file as a partition, so that you can cd to it and see if your data is actually there, Something like this, that is:
#-- copy data from bad partition to an alternative drive
dd if=/dev/sda1 of=/mnt/some-other-disk/sda1.img
#-- file system repair (on an image file!)
e2fsck -f /mnt/some-other-disk/sda1.img

#-- mount the "fixed" file as a file system just to see if it's indeed fixed
mount -o loop /mnt/some-other-disk/sda1.img /media/copy-of-sda1
#-- here you can "cd /media/copy-of-sda1" and see what's there; maybe copy some files to elsewhere
umount /media/copy-of-sda1

#-- copy the data back
dd if=/mnt/some-other-disk/sda1.img of=/dev/sda1
I first tried to copy the data with "dd" from sda1 to a USB device, but soon realized that all my USB devices were either too small to copy the entire sda1 to them, or were already formatted with vfat and thus could not store files bigger than 4 GB. So I ended up unearthing an old internal hard disk drive, opening up my computer, and connecting this old drive in (so it became sdb1). Then everything worked! Incidentally, it is useful to know that "dd" can read the unmountable device, and then write to it, even when that device appears as "busy" to e2fsck.

1 comment:

  1. P.S. If you have a Ubuntu 11.10 Live CD, you may not need this trick anymore. It seems that you can run "e2fsck /dev/sda1" successfully from a Ubuntu 11.10 Live CD. I suppose it's smart enough not to perpetually try to mount an unmountable drive.

    ReplyDelete