rdiff-backup FAQ

Table of contents

  1. What do the different verbosity levels mean?
  2. Does rdiff-backup run under Windows?
  3. Does rdiff-backup run under Mac OS X?
  4. My backup set contains some files that I just realized I don't want/need backed up. How do I remove them from the backup volume to save space?
  5. Does rdiff-backup work under Solaris?
  6. How fast is rdiff-backup? Can it be run on large data sets?
  7. What do the various fields mean in the session statistics and directory statistics files?
  8. Is there some way to limit rdiff-backup's bandwidth usage, as in rsync's --bwlimit option?
  9. How much memory should rdiff-backup use? Is there a memory leak?
  10. I use NFS and keep getting some error that includes "OSError: [Errno 39] Directory not empty"
  11. For some reason rdiff-backup failed while backing up. Now every time it runs it says "regressing destination" and then fails again. What should I do?
  12. Where does rdiff-backup need free space and how much is required? What is the problem if rdiff-backup says "ValueError: Incorrect length of data produced"?

Questions and Answers

  1. What do the different verbosity levels mean?

    There is no formal specification, but here is a rough description (settings are always cumulative, so 5 displays everything 4 does):

    0No information given
    1Fatal Errors displayed
    2Warnings
    3Important messages, and maybe later some global statistics (default)
    4Some global settings, miscellaneous messages
    5Mentions which files were changed
    6More information on each file processed
    7More information on various things
    8All logging is dated
    9Details on which objects are moving across the connection
  2. Does rdiff-backup run under Windows?

    Yes, apparently it is possible. First, follow Jason Piterak's instructions:

    Subject: Cygwin rdiff-backup
    From: Jason  Piterak <Jason_Piterak@c-i-s.com>
    Date: Mon, 4 Feb 2002 16:54:24 -0500 (13:54 PST)
    To: rdiff-backup@keywest.Stanford.EDU
    
    Hello all,
      On a lark, I thought I would attempt to get rdiff-backup to work under
    Windows98 under Cygwin. We have a number of NT/Win2K servers in the field
    that I'd love to be backing up via rdiff-backup, and this was the start of
    getting that working.
    
    SUMMARY:
      o You can get all the pieces for rdiff-backup working under Cygwin.
      o The backup process works up to the point of writing any files with
    timestamps.
          ... This is because the ':' character is reserved for Alternate Data
    Stream (ADS) file designations under NTFS.
    
    HOW TO GET IT WORKING (to a point, anyway):
      o Install Cygwin
      o Download the Python 2.2 update through the Cygwin installer and install.
      o Download the librsync libraries from the usual place, but before
    compiling...
      o Cygwin does not use/provide glibc. Because of this, you have to repoint
    some header files in the Makefile:
    
       -- Make sure that you have /usr/include/inttypes.h
          redirected to /usr/include/sys/types.h. Do this by:
    
          create a file /usr/include/inttypes.h with the contents:
          #include <sys/types.h>
      o Put rdiff-backup in your PATH, as you normally would.
    
    

    XXX The above information is old, point to newer porting efforts?

    Although some Windows filesystems lack features like FIFOs, case sensitive filenames, or files with colons (":") in them, this should be autodetected and compensated for by rdiff-backup.

  3. Does rdiff-backup run under Mac OS X?

    Yes, quite a few people seem to be using rdiff-backup under Mac OS X. rdiff-backup can also backup resource forks to a traditional unix filesystem, which is can be a handy feature for Mac users. The easiest option is probably to use Fink http://fink.sourceforge.net/, which can install rdiff-backup automatically for you. If you want to build rdiff-backup yourself, see this message from Gerd Knops:

    From: Gerd Knops <gerti@bitart.com>
    Date: Thu, 3 Oct 2002 03:56:47 -0500 (01:56 PDT)
    
    [parts of original message deleted]
    these instructions build it fine with all tests running OK
    (librsync-0.9.5.1 on OS X 10.2.1):
    
    	aclocal
    	autoconf
    	automake --foreign --add-missing
    	env CFLAGS=-no-cpp-precomp ./configure
    	make
    	make install
  4. My backup set contains some files that I just realized I don't want/need backed up. How do I remove them from the backup volume to save space?

    The only official way to remove files from an rdiff-backup repository is by letting them expire using the --remove-older-than option. Deleting increments from the rdiff-backup-data directory will prevent you from recovering those files, but shouldn't prevent the rest of the repository from being restored.

  5. Does rdiff-backup work under Solaris?

    There may be a problem with rdiff-backup and Solaris' libthread. Adding "ulimit -n unlimited" may fix the problem though. Here is a post by Kevin Spicer on the subject:

    Subject: RE: Crash report....still not^H^H^H working
    From: "Spicer, Kevin" <kevin.spicer@bmrb.co.uk>
    Date: Sat, 11 May 2002 23:36:42 +0100
    To: rdiff-backup@keywest.Stanford.EDU
    
    Quick mail to follow up on this..
    My rdiff backup (on Solaris 2.6 if you remember) has now worked
    reliably for nearly two weeks after I added...
    
        ulimit -n unlimited
    
    to the start of my cron job and created a wrapper script on the remote
    machine which looked like this...
    
        ulimit -n unlimited
        rdiff-backup --server
        exit
    
    And changed the remote schema on the command line of rdiff-backup to
    call the wrapper script rather than rdiff-backup itself on the remote
    machine.  As for the /dev/zero thing I've done a bit of Googleing and
    it seems that /dev/zero is used internally by libthread on Solaris
    (which doesn't really explain why its opening more than 64 files - but
    at least I think I've now got round it).
    
  6. How fast is rdiff-backup? Can it be run on large data sets?

    rdiff-backup can be limited by the CPU, disk IO, or available bandwidth, and the length of a session can be affected by the amount of data, how much the data changed, and how many files are present. That said, in the typical case the number/size of changed files is relatively small compared to that of unchanged files, and rdiff-backup is often either CPU or bandwidth bound, and takes time proportional to the total number of files. Initial mirrorings will usually be bandwidth or disk bound, and will take much longer than subsequent updates.

    To give one arbitrary data point, when I back up my personal HD locally (about 36GB, 530000 files, maybe 500 MB turnover, Athlon 2000, 7200 IDE disks, version 0.12.2) rdiff-backup takes about 15 minutes and is usually CPU bound.

  7. What do the various fields mean in the session statistics and directory statistics files?

    Let's examine an example session statistics file:

    StartTime 1028200920.44 (Thu Aug  1 04:22:00 2002)
    EndTime 1028203082.77 (Thu Aug  1 04:58:02 2002)
    ElapsedTime 2162.33 (36 minutes 2.33 seconds)
    SourceFiles 494619
    SourceFileSize 8535991560 (7.95 GB)
    MirrorFiles 493797
    MirrorFileSize 8521756994 (7.94 GB)
    NewFiles 1053
    NewFileSize 23601632 (22.5 MB)
    DeletedFiles 231
    DeletedFileSize 10346238 (9.87 MB)
    ChangedFiles 572
    ChangedSourceSize 86207321 (82.2 MB)
    ChangedMirrorSize 85228149 (81.3 MB)
    IncrementFiles 1857
    IncrementFileSize 13799799 (13.2 MB)
    TotalDestinationSizeChange 28034365 (26.7 MB)
    Errors 0

    StartTime and EndTime are measured in seconds since the epoch. ElapsedTime is just EndTime - StartTime, the length of the rdiff-backup session.

    SourceFiles are the number of files found in the source directory, and SourceFileSize is the total size of those files. MirrorFiles are the number of files found in the mirror directory (not including the rdiff-backup-data directory) and MirrorFileSize is the total size of those files. All sizes are in bytes. If the source directory hasn't changed since the last backup, MirrorFiles == SourceFiles and SourceFileSize == MirrorFileSize.

    NewFiles and NewFileSize are the total number and size of the files found in the source directory but not in the mirror directory. They are new as of the last backup.

    DeletedFiles and DeletedFileSize are the total number and size of the files found in the mirror directory but not the source directory. They have been deleted since the last backup.

    ChangedFiles are the number of files that exist both on the mirror and on the source directories and have changed since the previous backup. ChangedSourceSize is their total size on the source directory, and ChangedMirrorSize is their total size on the mirror directory.

    IncrementFiles is the number of increment files written to the rdiff-backup-data directory, and IncrementFileSize is their total size. Generally one increment file will be written for every new, deleted, and changed file.

    TotalDestinationSizeChange is the number of bytes the destination directory as a whole (mirror portion and rdiff-backup-data directory) has grown during the given rdiff-backup session. This is usually close to IncrementFileSize + NewFileSize - DeletedFileSize + ChangedSourceSize - ChangedMirrorSize, but it also includes the space taken up by the hardlink_data file to record hard links.

  8. Is there some way to limit rdiff-backup's bandwidth usage, as in rsync's --bwlimit option?

    There is no internal rdiff-backup option to do this. However, external utilities such as cstream can be used to monitor bandwidth explicitly. trevor@tecnopolis.ca writes:

    rdiff-backup --remote-schema
      'cstream -v 1 -t 10000 | ssh %s '\''rdiff-backup --server'\'' | cstream -t 20000'
      'netbak@foo.bar.com::/mnt/backup' localbakdir
    
    (must run from a bsh-type shell, not a csh type)
    
    That would apply a limit in both directions [10000 bytes/sec outgoing,
    20000 bytes/sec incoming].  I don't think you'd ever really want to do
    this though as really you just want to limit it in one direction.
    Also, note how I only -v 1 in one direction.  You probably don't want
    to output stats for both directions as it will confuse whatever script
    you have parsing the output.  I guess it wouldn't hurt for manual runs
    however.

    To only limit bandwidth in one directory, simply remove one of the cstream commands. Two cstream caveats may be worth mentioning:

    1. Because cstream is limiting the uncompressed data heading into or out of ssh, if ssh compression is turned on, cstream may be overly restrictive.
    2. cstream may be "bursty", limiting average bandwidth but allowing rdiff-backup to exceed it for significant periods.

    Another option is to limit bandwidth at a lower (and perhaps more appropriate) level. Adam Lazur mentions The Wonder Shaper.

  9. How much memory should rdiff-backup use? Is there a memory leak?

    The amount of memory rdiff-backup uses should not depend much on the size of directories being processed. Keeping track of hard links may use up memory, so if you have, say, hundreds of thousands of files hard linked together, rdiff-backup may need tens of MB.

    If rdiff-backup seems to be leaking memory, it is probably because it is using an early version of librsync. librsync 0.9.5 leaks lots of memory. Later versions should not leak and are available from the librsync homepage.

  10. I use NFS and keep getting some error that includes "OSError: [Errno 39] Directory not empty"

    Several users have reported seeing errors that contain lines like this:

    File "/usr/lib/python2.2/site-packages/rdiff_backup/rpath.py",
        line 661, in rmdir
    OSError: [Errno 39] Directory not empty:
        '/nfs/backup/redfish/win/Program Files/Common Files/GMT/Banners/11132'
    Exception exceptions.TypeError: "'NoneType' object is not callable"
         in <bound method GzipFile.__del__ of

    All of these users were backing up onto NFS (Network File System). I think this is probably a bug in NFS, although tell me if you know how to make rdiff-backup more NFS-friendly. To avoid this problem, run rdiff-backup locally on both ends instead of over NFS. This should be faster anyway.

  11. For some reason rdiff-backup failed while backing up. Now every time it runs it says "regressing destination" and then fails again. What should I do?

    Firstly, this shouldn't happen. If it does, it indicates a corrupted destination directory, a bug in rdiff-backup, or some other serious recurring problem.

    However, here is a workaround that you might want to use, even though it probably won't solve the underlying problem: In the destination's rdiff-backup-data directory, there should be two "current_mirror" files, for instance:

    current_mirror.2003-09-07T16:43:00-07:00.data
    current_mirror.2003-09-08T04:22:01-07:00.data

    Delete the one with the earlier date. Also move the mirror_metadata file with the later date out of the way, because it probably didn't get written correctly because that session was aborted:

    mv mirror_metadata.2003-09-08T04:22:01-07:00.snapshot.gz aborted-metadata.2003-09-08T04:22:01-07:00.snapshot.gz

    The next time rdiff-backup runs it won't try regressing the destination. Metadata will be read from the file system, which may result in some extra files being backed up, but there shouldn't be any data loss.

  12. Where does rdiff-backup need free space and how much is required? What is the problem when rdiff-backup says "ValueError: Incorrect length of data produced"?

    When backing up, rdiff-backup needs free space in the mirror directory. The amount of free space required is usually a bit more than the size of the file getting backed up, but can be as much as twice the size of the current file. For instance, suppose you ran rdiff-backup foo bar and the largest file, foo/largefile, was 1GB. Then rdiff-backup would need 1+GB of free space in the bar directory.

    When restoring, rdiff-backup needs free space in the default temp directory. Under unix systems this is usually the /tmp directory---see the entry for tempfile.tempdir in the Python tempfile docs for more information on the default temp directory. The amount of free space required can vary, but it usually about the size of the largest file being restored.

    Usually free space errors are intelligible, like IOError: [Errno 28] No space left on device or similar. However, do to a gzip quirk they may look like ValueError: Incorrect length of data produced.