The source code is - as the name already tells - the real valuable data for software development. Therefore a proper backup of this important data is absolutely required.
Backups shall provide protection against hardware failures (Hard Disk crash, …) but ideally also protect against erroneous software (defects in the version control software itself, DB corruption, …). Usually problems appear slightly resulting in a corrupted DB. The danger of a slowly degrading system is that you might backup an already corrupted DB. So, if you are lucky you will have a full crash and a healthy backup. If your backup got already corrupted as well, on restoration, you will face the same problems as with the original DB.
This article will provide hints on possible backup solutions for subversion repositories to help choosing an appropriate one.
Subversion's repository structure is concentrated in one directory (conf, dav, db, hooks, locks). Each repository holds the bare source, in the db directory. The structure of this db directory varies slightly for subversion from version to version. The data is stored in form of delta files containing the so-called changesets (consistent set of changes across several files). In the versions I looked at, the properties are also stored in separate files. But the db directory also contains extra data, like specific scripts (hook scripts) or the repository configuration (in the conf directory). Beside this, transactional data (valid during a commit for example) is also stored there until the action can be made all at once (atomic commit feature from subversion). The nature of a version control system of beeing additive qualifies it well for incremental backup procedures.
For backing up the repositories, you have following options (see the svn book for details regarding on how to perform them):
The solutions differ in:
|1) naive||2) dump||3) hotcopy||4) sync||5) git-svn|
|c1) hook files||yes||-||yes||-||-|
|c2) config files||yes||-||yes||-||-|
|d) backup RO||yes||-||yes||yes*||yes (git)|
|e) restore time||File copy||Restore (long)||File copy||File copy*||dump → load|
|f) back-end independence||-||yes||-||-||yes|
* Take care of the uuid of the repository it can be changed manually to match the original one in the db directory. In case of mismatching uuids a relocate to the backup repository will not succeed.
Another pitfall is that in the case of svnsync and git-svn, the user performing the backup needs an account with full access to all paths to the source repository. Failing to do that will result into an incomplete backup.
The restoration of a backup is a procedure that is quite time consuming in the case of dump files. These have to be loaded with “svnadmin load”. Rebuilding the repository can then take ages depending on the amount of data to restore. Until then you will not be able to work. The hotcopy can be re-used as it is, as well as the self made copy using a standard backup tool. You will need to copy back the files again. The sync solution will need adapting the uuid and replacing the hook files (which you will have to backup manually) but in principle it is also quite easy.
Considering all these aspects I would recommend using svnsync to create a copy that can be updated incrementally at any time. Take care of copying the hook scripts and config files separately as well as the uuid. (To operate svnsync you will need to allow the pre-revprop-change hook temporarily)
I recommend against the hotcopy unless you have small repositories. In the case of several GB of data, the unability of hotcopy to operate incrementally is a KO criteria (scales badly with the repository size).
Beside this, a regular textual dump with “svnadmin dump” of the repositories is advised for dealing with the scenario where the normal backup is unusable. This is maybe quite paranoid, but it's there as a fallback in this worst case scenario.