Last thursday the hard disk drive on a development machine died big time. First it started to behave erratically and dmesg showed that it has having trouble with some bad blocks. It did not survive a reboot: ReiserFS woud not mount it on boot, and reiserfsck running from an Accelerated Knoppix CD refused to bring it back to life. At first glance, the disk was beyond repair.
Of course, upon closer inspection, it turned out that the warranty expired exactly two months ago. Normally -after swearing my heart off- I would just replace the disk and make myself a nice paperweight or some other modern art piece -I’ve been looking forward to make one of those nice HDD clocks– but in the guts of that particular HDD were some uncomitted changes that I just wasn’t on the mood of rewriting. Besides, even though most of the data was expendable, the configuration hadn’t been backed up in quite a while (Yes, there is a pattern here).
So here’s the recipe I usually apply in these situations using Kurt Garloff’s dd_rescue. First get a brand-new HDD of approximately the same capacity and place both disks in a working Linux box (Depending on your necessities, booting from Knoppix might do). Let’s call the old, dying HDD /dev/hdg, and the spankin’ new disk will be /dev/hde. For the sake of simplicity, let’s assume that /dev/hdg was partitioned in /dev/hdg1 for swap and /dev/hdg2 for data.
First we’ll copy the entire data partition from /dev/hdg2 to /dev/hde2:
# dd_rescue /dev/hdg2 /dev/hde2
This will take a long, long time. dd_rescue starts with a reasonable block size, but whenever it encounters and error it retries a few times with a smaller block size before skipping the defective blocks and moving along. This is useful because it will copy all data in every readable block, instead of giving up at the first error like dd does. In my case, this took more than a day for a 248GB partition.
Once the data is in a new disk you can try to mount it directly, although it is a good idea to run reiserfsck first to make sure that the files you’ll copy are usable.
# reiserfsck /dev/hde2
Now here you might run into a small obstacle. Ideally I would buy the exact same model as the old drive for recovery purposes, because that guarantees that an exact bit-for-bit copy will work in most cases, partition maps and all. However in this case I bought a different brand, which resulted in a slightly smaller drive and a completely different geometry. When this happens, reiserfsck will complain about the different partition size, and suggests that you rebuild the superblock:
# reiserfsck --rebuild-sb /dev/hde2
Now you can do a normal reiserfsck.
When you’re done just mount the new partition and copy your data to a safe place:
# mount /dev/hde2 /mnt/tmp # rsync -a --progress /mnt/tmp/etc /backup/dir/ # rsync -a --progress /mnt/tmp/home/arturo /another/backup/dir/
After this you can reformat the new drive for normal usage. Mine is being debbootstrapped as I write this.
This little recipe has saved quite some data and a few disks, including most of mcleod’s late Xbox hard disk. As usual your mileage may vary, but with a litle luck you just might get some of your files back.
Now about that crappy Maxtor HDD… I might just go for the wind chimes instead.
[tags]Linux, sysadmin, reiserfs, dd_rescue, data recovery[/tags]
¡Saludos tocayo!
Hace un tiempo encontré tu blog, aunque lamentablemente no he tenido tiempo de seguirlo. Me ha parecido interesante por muchas cosas: tivo, mutuo interés por tecnología, somos de la misma institución (tec), tus imágenes de la misma me sirvieron mucho…
En fin, espero darme una vuelta más seguido.
javier.
It is good to see prople sharing their experinces like this which would give some knowledge about what to do in case of a possible data loss in thier system. I have seen most of the people loose thier data just because of the fact they are unaware that there is something called Data recovery with which they can recover most of their data without allmost all the files to be intact. I got my data recovered from a service provider Disk doctors Labs and after sent most of the people i sent to disk doctors gave me good and positive response
As always, my friend codehead to the rescue; i know this is the second time this happens to me and the second i bother you for help… You know it will also be to your benefit when we sync our multimedia collections ;)
A hug from the Land down Under!
Dude, the dd_rescue processs has finished succesfully:
dd_rescue: (info): /dev/sdb2 (480320047.5k): EOF
Summary for /dev/sdb2 -> /dev/sda1:
dd_rescue: (info): ipos: 480320047.5k, opos: 480320047.5k, xferd: 480320047.5k
errs: 168, errxfer: 84.0k, succxfer: 480319963.5k
+curr.rate: 832kB/s, avg.rate: 9812kB/s, avg.load: -2.9%
But now i am worried about the fsck, because it has hanged at 40%, just like with the original Hard Drive:
Will rebuild the filesystem (/dev/sda1) tree
Will put log info to ‘stdout’
Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes
Replaying journal..
Reiserfs journal ‘/dev/sda1’ in blocks [18..8211]: 0 transactions replayed
###########
reiserfsck –rebuild-tree started at Wed Aug 8 08:02:26 2007
###########
Pass 0:
####### Pass 0 #######
Loading on-disk bitmap .. ok, 103818443 blocks marked used
Skipping 11875 blocks (super block, journal, bitmaps) 103806568 blocks will be read
0%….20%….40%
It hasnt finished yet, but i fear a bad outcome; if the problem is no physical is there anything i can do to fix this?
Regards my friend…
Please forgive my total lack of patience, the process ended just fine:
327035 directory entries were hashed with “r5” hash.
“r5” hash is selected
Flushing..finished
Read blocks (but not data blocks) 103806568
Leaves among those 116376
Objectids found 324508
Pass 1 (will try to insert 116376 leaves):
####### Pass 1 #######
Looking for allocable blocks .. finished
0%….20%….40%….60%….80%….100%
Flushing..finished
116376 leaves read
116176 inserted
– pointers in indirect items pointing to metadata 7 (zeroed)
200 not inserted
non-unique pointers in indirect items (zeroed) 2984
####### Pass 2 #######
Pass 2:
0%….20%….40%….60%vpf-10260: The file we are inserting the new item (134320 14974 0xae532001 IND (1), len 4048, location 48 entry count 0, fsck need 0, format new) into has no StatData, insertion was skipped
….80%….100%
Flushing..finished
Leaves inserted item by item 200
Pass 3 (semantic):
####### Pass 3 #########
vpf-10680: The file [249620 343697] has the wrong block count in the StatData (1414608) – corrected to (1403248)
vpf-10680: The file [335046 335060] has the wrong block count in the StatData (23448) – corrected to (22904)
vpf-10680: The file [241839 243017] has the wrong block count in the StatData (64) – corrected to (56)
Flushing..finished
Files found: 285283
Directories found: 33617
Symlinks found: 4426
Others: 1167
Pass 3a (looking for lost dir/files):
####### Pass 3a (lost+found pass) #########
Looking for lost directories:
Flushing..finished
Pass 4 – finished
Deleted unreachable items 2
Flushing..finished
Syncing..finished
###########
reiserfsck finished at Wed Aug 8 13:18:34 2007
###########
No i am on a “dd if=/dev/zero of=/dev/sda bs=1M” run to the original drive and see if it can be rescued ;)
When running dd_rescue – don’t forget the “-r” option! (reverse copy)
What you can do is this – try it running dd_rescue the “normal” way and then – when it finishes (or hangs, or whatever) and you have bad blocks, try running the same command again with the “-r” option. It will take the same drive, the same output file, the same bad-block file, etc – and do the whole job again filling in the places where it was able to get good data.
example: A failing 20 gig partition on a 60 gig drive – copying to a file (so I can loop-mount and extract data before using the failed hard drive as a target for my 12 gage shotgun!)
Viz: dd_rescue -v -l 60gigp1log.txt -o 60gigp1bb.txt /dev/sdg1 ./60gigp1.bin
-v = verbose output (tell me a LOT about what you’re doing)
-l = name of logfile to write output to (important!)
-o = name of bad-block file – where the locations of potentially bad blocks are found (*very* important!)
/dev/sdg1 (the drive that is failing)
./60gigp1.bin = the binary “image” being made by dd_rescue.
I ran the dd_rescue command shown above, and it went through the drive about 1/3 of the way and then just bogged-down. I cancelled out with a ctl-C and re-ran the same command with “-r”
Viz: dd_rescue -r -v -l 60gigp1log.txt -o 60gigp1bb.txt /dev/sdg1 ./60gigp1.bin
It then started attacking the drive from the “other side”.
A couple of repetitions of this and I had a complete binary with no bad-blocks. Of course, I have not yet loop-mounted the binary so I don’t know what kind of trouble the actual filesystem might be in – but once you have a “clean” (or nearly clean) binary image of the partition, you are in a position to let the file system’s own recovery tools (as noted in the previous post) do their job.
What say ye?
Jim
Data Recovery is a very costly option that is why you should always check your storage media for any signs of wear and tear.,:’