Let’s face it. Backups are great, but backup systems are even better. And one great tool for cloning your critical servers is rsync. Whether you clone your file systems to second disks or to entirely different servers, rsync can help get the synchronization done cleanly and efficiently.
In fact, rsync is one of the best tools for replicating files, directories full of files, and entire file systems — and for keeping collections of files on multiple systems in sync. It’s both wonderfully efficient and extremely versatile. So let’s take a look at how this tool works and see just how easily you can achieve workable redundancy with this very clever tool.
From single files to file systems
While the rsync command might not come to mind as a tool for moving a single file from one place to another, it definitely will do this for you. And, if the file that you’re moving is large and you just happen to have an older copy of it sitting on the remote system, you might gain some advantage from rsync’s ability to update files by transmitting only the differences between the source and destination files. This feature is rsync’s primary claim to fame and it makes the tool very efficient, especially with respect to the network traffic that it generates. It does, however, mean that you must have rsync installed on both of the systems involved.
The primary advantage of rsync is that, under most conditions, it copies only what it needs to copy. Have a large file that you need to synchronize on a remote server and in which only a single byte is different? No problem, rsync will transmit that single byte along after coordinating with rsync on the remote system to determine what it needs to send. Depending on the file you’re copying, this behavior can save you a lot of time and network bandwidth.
Copying single files
The simplest form of rsync command looks like this:
$ rsync helloworld.py /tmp
That’s basically just a copy from-here to-there command though, in such a simple example, rsync isn’t likely to exactly shine. And, like scp, rsync can push files to a remote system or to pull files from a remote system if you just reverse the order of the systems.
$ rsync localfile remote-server:/tmp
$ rsync remote-server:/tmp/remotefile /tmp
Rsync also offers the generally useful advantage of creating a missing destination directory if you just end your destination argument with a / as shown in the example below (3rd line). You can, however, only go one level deep with this. If you need your copied file to be deeply nested in a new directory structure, try using mkdir –p with the full directory path first.
$ ls -l /tmp/uploads ls: /tmp/uploads: No such file or directory $ rsync helloworld.py /tmp/uploads/ $ ls -l /tmp/uploads total 4 -rw-r--r-- 1 sbob staff 1237 Feb 29 16:54 helloworld.py
Here, we’re copying to a local directory and providing a subdirectory that we want to create. The first command was run just to show that the directory didn’t already exist.
And, yes, this technique works whether you’re copying files to a local or to a remote location.
$ rsync helloworld.py remote-server:/tmp/uploads/
Copying entire directories
Copying an entire directory takes almost no additional effort. In the example below, we’re copying a directory, rather than a single file.
$ rsync -av localdir remote-server:/home/sbob building file list ... done localdir/ localdir/phase1 localdir/phase2 localdir/phase3 localdir/completion/ localdir/completion/ phase4 sent 32535 bytes received 120 bytes 178.00 bytes/sec total size is 0 speedup is 0.00
Notice that I tossed in a couple options with this command -– the -a and -v options. The -a option is a little deceptive. It means “archive mode”. This is something of a shortcut as it takes the place of a string of options that you invoke with just this one letter — namely –r, -l, -p, -t, -g, -o, and –D. So, with this single option, you get the command to run recursively; copy symbolic links as symbolic links (i.e., rather than creating regular files); and preserve permissions, time stamps, group and owner settings, devices, and special files. So, with this one option, you get the behavior that you’re likely to want when replicating a group of files — namely they’ll be the same both in content and metadata as the original files.
In the example below, we copy a single fairly large file. From the numbers shown, you can see that the file was compressed (see the “sent bytes” figure) and that a significant speedup in the transfer was achieved.
$ ls -l bigfile -rwxr--r-- 1 sbob staff 7350358 Aug 3 2015 bigfile $ rsync -avzh bigfile remote-server:/tmp building file list ... done bigfile sent 1.21M bytes received 42 bytes 268.23K bytes/sec total size is 7.35M speedup is 6.09
The -h and -z arguments that we’ve added above will give us more human readable output during the copying process and ensure that files are compressed during transmission. Some file types -– such as files are already compressed, mp3, mp4, and jpg files – will not be compressed even with this option, presumably because little would be gained in the overall file size.
Keep in mind, however, that when you elect to compress your files during transmission, you’re trading CPU time (on both ends of the transfer) for network bandwidth. Unless your network is slow or very busy, you might not want to bother.
So far, we’ve used the arguments shown below, but type rsync –help and rsync will gladly provide you with a list of its several pages worth of options and short descriptions of what they all mean.
- a = archive mode (a combination of arguments that works for replication)
- v = verbose
- z = compress the file during the transfer
- h = show numbers in human-readable format
Another nice option that rsync offers is a feature called “dry run”. Using the -n option or –dry-run, rsync will show you what it will do when you run the command for real. Just don’t get too excited about the speedup figure as it will be grossly inflated since you’re not actually moving files.
$ rsync -av --dry-run bin remote-server:~sbob building file list ... done bin/ bin/checkStuff bin/checkm bin/chkBackups ... bin/vpn-users bin/warnings sent 3450 bytes received 704 bytes 1661.60 bytes/sec total size is 2243025 speedup is 539.97
With and without passwords
Rsync will typically require passwords but, like ssh, allows you to run without being prompted if you’re set up to run ssh commands in password-free fashion like I am (i.e., if you’ve set up your ssh keys and authorized_keys files). This allows you to run rsync commands in hands-off fashion. For example, you can set up cron jobs that can greatly simplify the job of keeping important file collections synchronized.
Closing thoughts on rsync …
First, synchronization is definitely the best feature of this tool — ensuring that both copies of a file or groups of files remain the same. There are numerous options that add to its flexibility. For example, if you remove a file from one system, you might or might not want rsync to remove the file from the remote system. You get to choose.
In general, rsync is faster when the files already exist on the receiving side because it can transfer just the file differences. But, again, rsync has to exist on both of the systems involved in the synchronization.
By default, rsync uses ssh. This means that you can trust it not to send your data over the wire insecurely. It also means that you can set it up to run unattended (e.g., using cron).
While the copy as little as possible behavior of rsync is one of its more appealing characteristics, it’s really only one of many things rsync can do for you. In my next post, I’ll provide some more complex rsync commands to demonstrate the many ways the command can be made to do just what you want.