2 Backup Files Index format:
3 * one line per file so that it can be parsed line by line
4 * it will contain one line per file in the directory, even if the file didn't
5 change. This way we can restore a diff backup without needing previous diffs.
8 {"type": "python-delta-tar-index", version: "1" }
9 {"type": "BEGIN-FILE-LIST"}
10 {"type": "directory", "path": value, "mode": value, "mtime": value, "ctime": value, "uid": value, "gid": value, "inode": value, "size": value, "volume": 0, "offset": 0}
11 {"type": "file", "path": value, "mode": value, "mtime": value, "ctime": value, "uid": value, "gid": value, "inode": value, "size": value, "volume": 0, "offset": 0}
12 {"type": "file", "path": value, "mode": value, "mtime": value, "ctime": value, "uid": value, "gid": value, "inode": value, "size": value, "volume": 0, "offset": 56464}
14 {"type": "file", "path": value, "mode": value, "mtime": value, "ctime": value, "uid": value, "gid": value, "inode": value, "size": value, "volume": 1, "offset": 0}
15 {"type": "END-FILE-LIST"}
16 {"type": "file-list-checksum", "checksum": "4327847432743278943278942" }
17 (future additional fields)
19 This is an extensible format. The first line indicates that this is a
20 python-delta-tar-index, and the version. Then there's the file list and the
21 checksum of the file list. After that, nothing else is currently defined but
22 new extra fields could be defined in the future.
24 The items inside of the file list are usually of type "directory" or "file":
25 * The "path" field of a directory points to the relative path to the backup
26 directory, for example "mbox/m/marina" if the backup dir is "/var/mail/".
27 The complete restore path would be "/var/mail/mbox/m/marina".
28 * The "path" field of a file points to the filename of the file in the
29 previous directory marker. For example "marina.dat" could be inside
30 "mbox/m/marina/" and the complete restore path could be
31 "/var/mail/mbox/m/marina/marina.dat".
32 * When a file is going to be removed, it will be prepended with "del:/" and
33 the file will have no offset set.
35 DeltaTar proposed backup directory structure is quite simple:
38 ├── backup-2013-07-22-0200/
39 │ ├── bfull-2013-07-22-0200.index
40 │ ├── bfull-2013-07-22-0200-001.tar.gz.aes128
41 │ ├── bfull-2013-07-22-0200-002.tar.gz.aes128
42 │ └── bfull-2013-07-22-0200-003.tar.gz.aes128
43 ├── backup-2013-07-22-1400/
44 │ ├── bdiff-2013-07-22-1400.index
45 │ ├── bdiff-2013-07-22-1400-001.tar.gz.aes128
46 └── backup-2013-07-23-0200/
47 │ ├── bdiff-2013-07-23-0200.index
48 │ ├── bdiff-2013-07-23-0200-001.tar.gz.aes128
49 │ ├── bdiff-2013-07-23-0200-002.tar.gz.aes128
54 class DeltaTar(object):
56 Backup class used to create backups
59 def __init__(self, excluded_files=[], included_files=[],
60 filter_func=None, mode="tar", password=None, logger=None,
61 index_encrypted=True, index_name_func=None,
62 volume_name_func=None):
64 Constructor. Configures the diff engine.
67 - excluded_files: list of files to exclude in the index. It can
68 contain python regular expressions.
70 - included_files: list of files to include in the index. It can
71 contain python regular expressions. If empty, all files in the source
72 path will be backed up, but of the list is set then only the files
73 include in the list will be backed up.
75 - filter_func: custom filter of files to be backed up. Unused by
76 default. The function receives a file path and must return a boolean.
78 - mode: Mode in which the delta will be created. Accepts the same modes
79 as our tarfile python library.
81 - password: used together with aes modes to encrypt and decrypt backups.
83 - logger: python logger object. Not required.
85 - index_encrypted: whether the index should be encrypted or not. Only
86 makes sense to set it as True if mode includes aes128 or aes256.
88 - index_name_func: function that sets a custom name for the index file. This
89 function receives the backup_path and if it's a full backup as arguments
90 and must return the name of the corresponding index file. Optional,
91 DeltaTar gives index files a "backup.index" name by default.
93 - volume_name_func: function that defines the name of tar volumes. It
94 receives the backup_path, if it's a full backup and the volume number,
95 and must return the name for the corresponding volume name. Optional,
96 DeltaTar has default names for tar volumes.
100 def create_full_backup(self, source_path, backup_path,
101 max_volume_size=None):
103 Creates a full backup.
106 - source_path: source path to the directory to back up.
107 - backup_path: path where the back up will be stored. Backup path will
108 be created if not existent.
109 - max_volume_size: maximum volume size. Used to split the backup in
110 volumes. Optional (won't split in volumes by default).
114 def create_diff_backup(self, source_path, backup_path, previous_index_path,
115 max_volume_size=None):
120 - source_path: source path to the directory to back up.
121 - backup_path: path where the back up will be stored. Backup path will
122 be created if not existent.
123 - previous_index_path: index of the previous backup, needed to know
124 which files changed since then.
125 - max_volume_size: maximum volume size in megabytes (MB). Used to split
126 the backup in volumes. Optional (won't split in volumes by default).
127 - restore_callback: callback function to be called during restore.
128 This is passed to the helper and gets called for every file.
132 def restore_backup(self, target_path, backup_indexes_paths=[],
133 backup_tar_path=None, restore_callback=None):
138 - backup_path: path where the back up will is stored.
139 - target_path: path to restore.
140 - backup_indexes_paths: path to backup indexes, in descending date order.
141 The indexes indicate the location of their respective backup volumes,
142 and multiple indexes are needed to be able to restore diff backups.
143 Note that this is an optional parameter: if not suplied, it will
144 try to restore directly from backup_tar_path.
145 - backup_tar_path: path to the backup tar file. Used as an alternative
146 to backup_indexes_paths to restore directly from a tar file without
147 using any file index. If it's a multivol tarfile, volume_name_func
153 class TestDeltaTar(UnitTest):
155 This is an example of how DeltaTar class could be used
157 def test_create(self):
159 from deltatar import DeltaTar
161 def index_name_func(backup_path, is_full):
162 prefix = "bfull" if is_full else "bdiff"
163 # get the name and remove backup-
164 basename = os.path.basename(backup_path)[7:]
166 return "%s-%s.index" % (prefix, basename)
168 def volume_name_func(backup_path, is_full, volume_number):
170 Handles the new volumes
172 prefix = "bfull" if is_full else "bdiff"
173 # get the name and remove backup-
174 basename = os.path.basename(backup_path)[7:]
176 return "%s-%s-%03d.tar.gz.aes128" % (prefix, basename, volume_number)
179 # constructor of DeltaTar class allows to set the configuration
181 # these options are the same as in tarfile:
182 mode="tar#gz.aes128",
183 max_volume_size=100, # 100MB
184 index_name_func=index_name_func, # optional
185 volume_name_func=volume_name_func # optional
188 # create first backup
189 deltatar.create_full_backup(
190 source_path="/path/to/important/dir",
191 backup_path="/var/backups/backup-2013-07-22-0200")
193 # here: change some files
195 # create second backup
196 deltatar.create_diff_backup(
197 source_path="/path/to/important/dir",
198 backup_path="/var/backups/backup-2013-07-22-1400",
199 previous_index_path="/var/backups/backup-2013-07-22-0200/bfull-2013-07-22-0200.index")
201 # restore backup in another dir. it will restore last version
202 deltatar.restore_backup(target_path="/path/to/second/dir",
203 backup_indexes_paths=[
204 "/var/backups/backup-2013-07-22-1400/bfull-2013-07-22-1400.index",
205 "/var/backups/backup-2013-07-22-0200/bfull-2013-07-22-0200.index"
211 Each step will include a comprehensive list of unit tests for the developed
212 features, pydoc documentation and email updates/reviews.
214 1. Initial simple implementation of full backup (7 hours, already done)
216 * It must be able to create a full backup and restore it.
217 * It will create the file index but will only use backup_tar_path option to
219 * It will support the options: mode, password, index_encrypted,
220 index_name_func, volume_name_func, max_volume_size. The other options will be
223 2. Restore from file index a full backup (5 hours)
225 * It'll be able to read a file index and restore a backup from it.
226 * It'll also support the logger option.
228 3. Include and exclude filters (5 hours)
230 * It'll support the include_files, exclude_files and filter_func for both
231 creating and restoring full backups.
233 4. Create diff backup (8 hours)
235 * It'll support to create a diff backup upon an existing full backup. This will
236 be implemented in a performant way, we'll take a look at duplicity for ideas.
237 * It'll be able to restore a diff backup without using the index, just applying
240 5. Restore diff backup (10 hours)
242 * It'll be able to restore a diff backup using the index, applying an efficient
245 6. Polishing and corner cases (12 hours)
247 * Review the existing features looking for possible bugs. Implement missing
248 corner cases and unit tests, for example support for diff backup chains.
249 * Benchmark agains other tools like duplicity in different scenarios to check
250 that our performance is good.
252 Total estimation: 47 hours