Tue 23 Feb 2010
The best ways to copy large amounts of data between servers
Posted by Kris under Tech
No Comments
As a sysadmin, I often have to copy large amounts of data between servers, and there are a few ways of doing this (each with their own pros and cons).
I’ll try to explain the most common methods here, and give examples and benchmarks.
1, Simple compression and encrypted transmission
This is probably the most common way to copy data, and it’s simply tar and gzip the data and then send it to another machine over scp:
machine1$ tar -zcvf backup.tgz /home/backupdata/
machine1$ scp backup.tgz root@machine2:/home/backups/machine2$ tar -zxvf backup.tgz
The problem with this method is that disk reads and writes can be intensive because you need to read from machine1′s hard drive and then write to it during the gzip process, then read from it and write to machine2′s hard drive during the scp. This can slow down the process. Also you’re required to log in to machine2 and unzip the resulting file.
2, rsync
rsync is good for syncing data between two machines, and for incremental copying (ie if a copy is interrupted, it can resume). I won’t go into the actual details here because there are many ways of using rsync and this article is mainly aimed at one shot copying of data. It’s definitely worth checking ‘man rsync’ to see if it will be relevant for your needs though.
I tend to use rsync for backups where I want to keep an exact copy of a directory on another server, and keep it up to date. rsync is useful here because any time I run the rsync command it will only copy changes over between the servers. An example of this usage is:
machine1$ rsync -Ravr –delete –delete-after ./backupdata/ root@machine2:/home/backups/
The flags I’ve passed are: R – relative, r – recursive, a – archive mode, v – verbose.
3, Gzip and SCP in one command
This will usually perform better than option 1 simply because you aren’t writing the gzip data to the disk on machine1 before sending it to machine2. Basically, this is the commands from option 1 in one line, so the data is gzipped and piped to ssh on machine2 rather than writing to the disk on machine1 and then copying over.
machine1$ tar -zcvf – /home/backupdata/* | ssh root@machine2 “cd /home/backups/; tar -zxvf -”
4, Netcat
In theory, this should be the best solution because the data isn’t encrypted or decrypted as it is in ssh/scp (a little less cpu overhead), and there isn’t any needless IO activity as in option 1. First you will need to tell machine2 to ‘listen’ on a specific port (98765 in this example), and uncompress anything which arrives on that port:
machine2$ nc -l -p 98765 | tar -zxvf -
Then gzip and send the data from machine1 to the specified port on machine2:
machine1$ tar -zcvf – /home/backupdata/ | nc -q 1 machine2 98765
As we’re using the verbose (-v) option in tar, you will see the output on both machines. I find netcat more convenient if I need to send a lot of data from one server to another as I can just leave netcat running on machine2 and send data to it multiple times from server1 (or any other server).
5, SMB or NFS
These are also decent alternatives, and if you already have NFS or SMB shares set up then it may be worth just compressing the data and copying it to a mounted share using ‘cp’. An example would be (assuming the share name is ‘backups’).
machine1$ tar -zcvf backup.tgz /home/backupdata/
(SMB) machine1$ mount -t smbfs //machine2/backups /mnt/machine2
(NFS) machine1$ mount machine2:/home/backups /mnt/machine2
machine1$ cp backup.tgz /mnt/machine2/
Benchmarks
For my tests, I’m using 2 servers with the following basic specs:
Dual CPU Intel E5520
16gb RAM
Hardware RAID1.
I’ll be sending a few log files which total 7GB of data.
1 – The initial gzip took 114 seconds and scp took 6 seconds for a total of 120 seconds.
2 – This took a total of around 140 seconds, so was the slowest in this test. You would notice the benefits of rsync when doing incremental backups so only differences are copied over rather than redoing the whole copy.
3 – Total of 119 seconds (so not really any faster but you don’t need to log in to machine2 to unzip).
4 – Total of 109 seconds, so this is the fastest option but not by a huge margin.








No Responses to “ The best ways to copy large amounts of data between servers ”