Noob in need of some help figuring out what's going on with dd

I should start off by saying that I’m a complete noob when it comes to dd, so please forgive me if there’s an obvious answer to any of my questions.

I have two disks, we’ll call them HDD and SSD. HDD has a working Windows 10 install on where as SSD used to have a working Windows 10 install on but it’s partitions were either damaged or were improperly deleted. I’d like to see if I can recover any data from the SSD (just for the hell of it) but I need to use the SSD for something else in the next few days (installing Manjaro). So I’ve been trying to create a dd image of the entire SSD to a .img.gz file, in a folder on the HDD. I should note that I don’t really care about any of the data on either the SSD or HDD.

I’ve been using a Linux Live USB to boot the system and run the following dd commands.

sudo dd if=/dev/sdb conv=sync,noerror bs=64K | gzip -c > /media/ubuntu/Windows/Users/HanSolo/Image/SSDbackup.img.gz status=progress

The above failed to function with the “status=progress” option but I did get it to run without the option and it created a ~23GB image of what is a 240GB SSD. Thinking that the command might have been interrupted when the system turned the display off and locked the desktop, I ran the command again… twice in fact. But I made the mistake of using different destination disks for each image (more on this in a moment). Each time, I got a ~23GB file but all three files were slightly different in size and had md5sums that didn’t match.

I did have to reboot the system between making the last two images (due to an issue with stupid Windows and the fact it doesn’t actually fully shutdown unless you hold the Shift key when you click the power icon). And in order to rule out the hashes being different becasue something had modified the SSD, I then made sure that swap was disabled, I double checked that the SSD wasn’t and could not be auto mounted before then creating another two images, this time with the same name, to the same disk but in separate folders so as to eliminate as many variables as I could think of (such as each destination disk having different block sizes).

Same results… similar but not identical files sizes and checksums that don’t match.

At his point, I did something that was kind of dumb… I modified the command without being 100% what I was doing but neither the SSD or HDD contain data I care about in the slightest, so I figured what the hell.

sudo dd if=/dev/sdb of=/media/ubuntu/Windows/Users/HanSolo/Image/SSDbackup.img.gz status=progress

This resulted in a ~240GB image. I also ran this command twice, using to different destinations but again, the hashes don’t match.

Now I’ve got through the wall of text, here are my questions.

  1. Would the first command create a block for block copy of the entire disk?

  2. If it does, then how can the files be so much smaller (am I wrong thinking that a raw disk image wouldn’t be that compressible)?

  3. Why do non of the hashes match?

  4. Why am I wasting time dd’ing and checksumming 240GB image files, of a disk I don’t give a funk about when I could be doing something more productive?

Would the first command create a block for block copy of the entire disk?

IIRC, no. The conv=sync,noerror ensures that the correct block size is written out, filling with zeros in the event of an I/O error. Someone might want to doublecheck me on that.

If it does, then how can the files be so much smaller (am I wrong thinking that a raw disk image wouldn’t be that compressible)?

Note that conv=sync behavior - it writes zeros to maintain block size in the event of an I/O error. If you’re getting a lot of errors, you’re getting a lot of zeros. Long strings of zeros are very compressable.

Why do non of the hashes match?

Have you checked dmesg for read/write errors on either drive? My hunch is that you’re getting some, and even a single bit will change the checksum.

Why am I wasting time dd’ing and checksumming 240GB image files, of a disk I don’t give a funk about when I could be doing something more productive?

Curiosity and learning. Both good reasons. :slight_smile:

id forget try to compress it and just use the conv=sparse option with dd.

[quote=“imhigh.today, post:2, topic:139103”]
IIRC, no. The conv=sync,noerror ensures that the correct block size is written out, filling with zeros in the event of an I/O error. Someone might want to doublecheck me on that.[/quote]

I’m think you’re right but I’ll see if I can find confirmation.

Now that you mention it, it’s pretty obvious.

No, I haven’t but I will when I get back home.

Assuming there are now read/write errors, is there anything else I should check?

It’s curiosity that’s making me question why I’m doing this. As I said, the SSD is earmarked for use with Manjaro. Which is part of a whole other project I’ve been planning, which I’m very excited and curious about.

Thank you for the suggestion, I’ll certainly give it a go but file size isn’t the top priority for me a the moment. My main concern is being able to ensure that the image I create is an accurate, reliable copy of the SSD to maximise my chances of being able to recover it’s data.

Curious intentions.
Regarding compression: You can always try without the pipeline and compressing afterwards.
Regarding checksums: If you are using sync and block size you can run those chksums per block and figure out where you are losing the data.
My suspicion falls on the nature of SSDs, they are not bit-by-bit storage devices and some corruption might just lead to incorrect readings from different cells. This would seem like random read.

Overall the last dd command does create an image and you can use in combination with LVM to create snapshots of each reading while wasting little space. After you are confident you had enough you can run wipe and diagnostics on the SSD. :slight_smile:

If your drive is healthy and not mounted or written to, its contents shouldn’t change, its hashes should be the same.

You may want to try ddrescue instead of plain dd, it’s like sync,noerror mode of dd but much faster and resumable and you can have limited retries on reads if you want

Don’t put any options unless you’re seeing issues.
dd if=/dev/sdb bs=8m should be enough if your disk is ok

manuals (something to read while the copy is going):
https://manpages.debian.org/stretch/coreutils/dd.1.en.html
https://manpages.debian.org/stretch/gddrescue/ddrescue.1.en.html

ddrescue /dev/sdb /some/path/sdb.raw /some/path/sdb.log should be enough in the other case (you’ll see status on the command line)

Also, check this out: http://www.forensicswiki.org/wiki/Ddrescue

Create an image of ssd that includes freespace copied as well as that flagged as data
Why can you not just use hdd on days you say you need ssd?
If you do not care about data on each drive yet your planning on undeleting files from ssd you have i confused as to this path chosen

this command failed because you’re trying to pipe to gzip -c with the status = progress as argument.

sudo dd if=/dev/sdb conv=sync,noerror bs=64K status=progress | gzip -c > /media/ubuntu/Windows/Users/HanSolo/Image/SSDbackup.img.gz

Would be the correct way of doing it.

  1. No the sync argument pads the blocks with 0’s.
    "sync pad every input block with NULs to ibs-size; when used with block or unblock, pad with spaces rather than NULs "

  2. when you’re compressing disc images, or data in general, with alot of 0’s e.g. empty space on your harddrive, the compression algorithm pretty much compresses 100%. So if you only have 24gb on your drive then expect a ~24gb file.

  3. Data had changed?

  4. Your guess is as good as mine :wink: