Benchmark - What is the quickest way to copy files?

Benchmark - What is the quickest way to copy files?

Testing cp, rsync and tar

Purpose

In software development, there is sometimes a need to copy a large amount of file from one directory to another. For instance, when you have shared volumes in a Docker environment with different services, some services need to have access to the application source files, you put the source code in a volume which is shared between these services. On an initial boot up, or on an update of the app image, there is a need to update the files in the volume.

Test arrangement

For our testing scenario, we try to use a common use case. It is an application image with 140195 files. I used an installed Magento 2 within its all its dependencies.

As a benchmark tool for the CLI, I used hyperfine on macOS.

Test use cases

  1. On an initial boot up, the target directory is empty. We have to copy all files from one directory to another.
  2. The second use case to just update files where is a change.

CP

CP is a very basic functionality in Linux based shells. It’s made to copy files. It is possible to copy recursively. It keeps the directory structure. The target directory should be empty. Perfect! Let’s get do it!

parameterdescription
-R, -r, --recursivecopy directories recursively
-u, --updatecopy only when the SOURCE file is newer than the destination file or when the destination file is missing
-p, --preserve[=ATTR_LIST]preserve the specified attributes (default: mode, ownership, timestamps), if possible additional attributes: context, links, xattr, all

Test with cp

hyperfine \
    'cp -upr --sparse=never src tmp-cp' \
    'cp -upr --sparse=never src tmp-cp'

It copies file by file from source to target without checking if there are any changes in it.

hyperfine 'cp -upr --sparse=never src tmp-cp' 'cp -upr --sparse=never src tmp-cp'
Benchmark #1: cp -upr --sparse=never src tmp-cp
  Time (mean ± σ):     11.744 s ± 32.450 s    [User: 775.2 ms, System: 2396.0 ms]
  Range (min … max):    1.284 s … 104.094 s

Benchmark #2: cp -upr --sparse=never src tmp-cp
  Time (mean ± σ):      1.397 s ±  0.086 s    [User: 326.0 ms, System: 1033.2 ms]
  Range (min … max):    1.308 s …  1.511 s

rsync

Rsync copies files either to or from a remote host, or locally on the current host as like CP. It tries to improve the speed by using an index file. Rsync finds files that need to be transferred using a "quick check" algorithm that looks for files that have changed in size or in last-modified time. Any changes in the other preserved attributes (as requested by options) are made on the destination file directly when the quick check indicates that the file’s data does not need to be updated.

So we assume that there is no improvement on the first copy, but on updating files.

parameterdescription
-u, --updateskip files that are newer on the receiver
-p, --permspreserve permissions
--del, --deletedelete extraneous files from dest dirs
--forceforce deletion of dirs even if not empty
--size-onlyskip files that match in size
-r, --recursiverecurse into directories
-q, --quietsuppress non-error messages
-z, --compresscompress file data during the transfer

Test with rsync

hyperfine \
    'rsync -a src tmp-rsync' \
    'rsync -a src tmp-rsync'

Benchmark #1: rsync -a src tmp-rsync
  Time (mean ± σ):      7.710 s ± 19.948 s    [User: 1.330 s, System: 2.346 s]
  Range (min … max):    1.254 s … 64.481 s

Benchmark #2: rsync -a src tmp-rsync
  Time (mean ± σ):      1.424 s ±  0.147 s    [User: 543.1 ms, System: 837.6 ms]
  Range (min … max):    1.281 s …  1.779 s

Tar

tar creates and manipulates streaming archive files.

ParameterDescription
-f fileRead the archive from or write the archive to the specified file.
-q (--fast-read)Extract, or list only the first archive entry that matches each pattern or filename operand. Exit as soon as each specified pattern or filename has been matched. By default, the archive is always read to the very end, since there can be multiple entries with the same name and, by convention, later entries overwrite earlier entries. This option is provided as a performance optimization.
-xExtract to disk from the archive. If a file with the same name appears more than once in the archive, each copy will be extracted, with later copies overwriting (replacing) earlier copies.
-C, --directory=DIRChange to DIR before performing any operations. This option is order-sensitive, i.e. it affects all options that follow.

Test with tar

hyperfine 'tar -qxf src.tar -C tmp-tar' 'tar -qxf src.tar -C tmp-tar'
Benchmark #1: tar -xf src.tar
  Time (mean ± σ):     53.214 s ±  4.403 s    [User: 3.303 s, System: 12.450 s]
  Range (min … max):   50.510 s … 65.249 s
Benchmark #1: tar -xf src.tar -C tmp-tar
  Time (mean ± σ):     54.455 s ±  0.434 s    [User: 3.551 s, System: 12.724 s]
  Range (min … max):   54.006 s … 55.305 s

Consumption

image.png

These tests have shown that CP is slightly faster than RSYNC when updating. However, CP takes significantly more time to initially copy the files. TAR is faster on the first copy as CP, but it has no advantages over RSYNC.