giant file-rsync+dd+md5sum=no cry

I recently ran into a situation where:

  • I transferred a large file (in my case 32GB)
  • The md5 at the recipient didn’t match my source file (sadface)
  • The source or destination machine did not have rsync installed, but both have md5sum and dd.

Resending the entire 32GB file would be a waste of time. Why not just resend the chunks that failed?

The correct answer in this situation is usually “just use rsync, that’s what it’s for”. But I couldn’t since the target system doesn’t have rsync installed and I couldn’t install it. If you can install rsync at both ends, use it to fix the broken file. Here’s a great example.

You can’t do that? Then do this:

Giant file-rsync+dd+md5sum=no cry

  1. Create a bash/cmd script at each end to break the file into pieces with dd.
  2. md5sum each piece at both ends and compare to figure out which chunks are bad
  3. transfer the bad chunks from source to target
  4. dd the chunks back into the giant file
  5. recheck the md5sum of the file to make sure it matches

Create a bash/cmd script at each end to break the file into pieces

Tip: rename the file to something which doesn’t require escape sequences, especially if your source/target are running different OSes. For example, spaces mean the name has to be enclosed in quotes on Windows and have a backslash prepended on Linux. So get those spaces out of there.

dd  thinks in terms of blocks.

blocksize \times count = chunk size

I set the blocksize to 1 megabyte to make the math easier. I want each chunk to be 128MB. The size of the chunk is up to you, but the trade-off is waiting for excess data to transfer versus dealing with more part files. Anyhow, we have bs=1048576 count=128 .

To tell dd  where to start when it’s copying data out of a file, supply the skip option. So the first chunk has skip=0 , the second chunk has skip=128 , the third has skip=256 , and so on. Why?

dd  thinks in terms of blocks.

I usually create an Excel workbook and use fill-down to create the correct skip numbers and then CONCATENATE()  to create the actual dd command lines. Copy and paste them into a text document. Send it to both ends with the correct extensions/permissions/shebang line/etc.

Snippet of Excel sheet showing formulas
Here’s how I set up my excel sheet to create my batch/shell script
Snippet of an Excel sheet to fill in the right values in my dd batch/shell script.
The formulas allow me to fill down to create the correct lines in my batch/shell script

Run the batch/shell script at each end to create corresponding partXXXX files. If you follow my example, the value in the K column shows you where to stop copying; it changes to false at the line where you’ve passed the final dd required.

md5sum the pieces at each end and compare

Pretty easy; use md5sum  on all of the partXXXX files at each end. Save the output into an md5 file and then get both files in the same place so you can compare.

Using the command line diff  tool will work, but if you have a GUI tool it should make it easier to see which files don’t match. Let’s hope there aren’t many.

Transfer the bad chunks from source to target

This part should be easy; just send the good chunks from the source to the target to replace the bad chunks. To make sure you haven’t wasted your time, md5sum  the replacement chunks once they reach the destination. Re-retransfer any that don’t match.

dd the chunks back into the giant file

We will use dd again. Instead of redoing the whole process in reverse, we only need to dd in the fixed chunks.

Either redo your Excel sheet or just find and replace in your target batch/shell script.

The key things here are that the if and of have been swapped, we must add conv=notrunc, and we use seek instead of skip. We swap the input and output files because we’re outputting to the big file. We use conv=notrunc  because by default dd will truncate the destination file at the point where you start writing. We don’t want to destroy the file, so this is important. Finally, when we need to write the destination file anywhere other than the start, we have to use seek  instead of skip .

You only need the lines corresponding to the fixed chunks. So your final batch/shell script might end up looking like this:

Recheck the md5sum of the file to make sure it matches

You’re all done, assuming it matches. (Cue spooky music)

Hey nevermind, here’s my Excel Workbook. Just use that. That’s what I’m going to do from now on.

Set Internet Options via the registry

I ran into a situation where I needed to remotely set the values in a user’s Internet Options control panel. With all the problems with SSL 2.0 and SSL 3.0 lately, we’ve pushed out configurations to block them. Some of our users have reported problems connecting to business-critical websites which aren’t working with our settings. So I need to remotely check the SSL 3.0 box for them.

To get this done I had two problems:

  • What values do I need to store in the registry?
  • Where do I store them?

First, find the keys.

I fired up procmon from Sysinternals and opened up my Internet Options control panel. With some trial and error I was able to narrow the settings i needed to change. The process is to change the settings in my UI (remember to click apply!) and watch the registry changes in procmon. In case you’re looking for exactly the same thing I am, changing the SSL/TLS settings, here’s the key you need:

HKCU\Software\Microsoft\Windows\CurrentVersion\Internet Settings\SecureProtocols

And here are the values you need:

SSL/TLS Version Decimal Hexidecimal
SSL 2.0 8 0x8
SSL 3.0 32 0x20
TLS 1.0 128 0x80
TLS 1.1 512 0x200
TLS 1.2 2048 0x800

This is a bitfield. To get the correct value, you just add up the options you want and then store that value in the registry.

I needed to have SSL 3.0, TLS 1.0, and TLS 1.1 enabled.
\begin{array}{c}  \phantom{+9}32\\  \phantom{+}128\\  \underline{+512}\\  \phantom{+}672\end{array}

When you store the result in the registry, make sure you enter it in the expected format.

regedit dialog showing radio buttons for selecting Hexadecimal or Decimal input. Be sure the selection matches the number you are inputting.
Regedit input dialog

Second, figure out where to store the values.

Folder tree from regedit. I have connected to a remote PC and discovered that there is no HKCU hive listed.
donde esta HKCU?

Now, just open up the remote registry and find HKEY_CURRENT_USER and and rock and roll!

Okay, going to have to pull some teeth here. The issue is that there really isn’t a HKEY_CURRENT_USER hive. When a user logs on, Windows maps their HKEY_USERS hive onto the HKCU hive. It makes things so much easier. Since we’re not logged on to this system as that user, we don’t get the easy version.

If your users generally have one PC each, you probably will see several short SIDs and a pair of long ones. The long one without “_classes” on the end is your user’s SID. But you can get a user’s SID via powershell to be 100% sure.

So in my case, I’ll need to use HKEY_Users\S-1-5-21-776511741-573735546-682002230-13423.

Put it all together.

Almost done, I swear. In regedit I connected to the remote computer then browsed to the right user’s HKEY_USERS key (that long SID we found earlier). I browsed to the key I found earlier, Software\Microsoft\Windows\CurrentVersion\Internet Settings\SecureProtocols. Finally, I set the value I calculated, 672 (decimal).

Sites are fixing their SSL settings as fast as they can, so don’t just set something like this and forget it. Periodically test the sites your users require to see if they work with SSL 2.0 and SSL 3.0 disabled. Once they do, you can undo your changes.