python-delta-tar
8 years agoadjust versioned encryption parameters
Philipp Gesang [Fri, 24 Mar 2017 10:12:29 +0000]
adjust versioned encryption parameters

Prepare for revised versioning: Both the encryption mechanism and
the KDF may be specified in the versioning.

8 years agoextend open_index() API for info file handling
Philipp Gesang [Thu, 23 Mar 2017 10:48:59 +0000]
extend open_index() API for info file handling

In fact, backup_python’s “info file” is just another “index file”
to deltatar. Conceptually they’re quite different though
especially regarding encryption. To allow requesting an info
flavored index file, add a parameter to communicate with the
crypto layer.

8 years agostart payload encryption counter at 2
Philipp Gesang [Thu, 23 Mar 2017 09:43:25 +0000]
start payload encryption counter at 2

As per the spec, a file counter of 1 is reserved for the info
file.

8 years agotrack encryption state
Philipp Gesang [Tue, 21 Mar 2017 14:32:32 +0000]
track encryption state

8 years agoreinit crypto for objects other than files
Philipp Gesang [Tue, 21 Mar 2017 14:20:44 +0000]
reinit crypto for objects other than files

Invoke the new-item handler to force a new crypto object for
directories and anything else as well.

8 years agotrack data handled in crypto
Philipp Gesang [Tue, 21 Mar 2017 14:11:32 +0000]
track data handled in crypto

8 years agofirst draft of revised encryption layer
Philipp Gesang [Tue, 21 Mar 2017 12:33:16 +0000]
first draft of revised encryption layer

WIP. Tested for encryption (no compression) only, and only for
TOC listings (-l).

Decryption is handled entirely by the stream, traversing the
archive entry wise relying only one the header informations.

Encryption requires poking the _Stream thingy to initiate a new
crypto entry.

8 years agosimplify password save and retrieval
Philipp Gesang [Tue, 21 Mar 2017 12:13:00 +0000]
simplify password save and retrieval

The password must be available for the entire time of the
decryption since it might be necessary to recalculate the key on
account of different salt or parameters of some object.

8 years agoimplement null-kdf to speed up testing
Philipp Gesang [Mon, 20 Mar 2017 16:35:45 +0000]
implement null-kdf to speed up testing

With “parameter version” zero, the KDF consists only of a trivial
string derived from the password so as to reduce runtime. (SCRYPT
takes about 48 seconds here with our parameters.)

8 years agoadapt tag handling in decryption
Philipp Gesang [Mon, 20 Mar 2017 15:42:51 +0000]
adapt tag handling in decryption

8 years agofix encrypted read logic for begin/end at entry boundaries
Philipp Gesang [Mon, 20 Mar 2017 14:02:03 +0000]
fix encrypted read logic for begin/end at entry boundaries

8 years agoretrieve and save GCM tag from object header
Philipp Gesang [Mon, 20 Mar 2017 11:07:08 +0000]
retrieve and save GCM tag from object header

8 years agodelay kdf until parameters are available from header
Philipp Gesang [Mon, 20 Mar 2017 10:48:38 +0000]
delay kdf until parameters are available from header

When decrypting, initialize the key immediately if parameters and
salt are being passed to the ctor. Otherwise, just save the
passphrase in the object and run the KDF when ``.next()`` is
passed the required bits as part of a PDTCRYPT header.

8 years agomove ct length bookkeeping into encryptor
Philipp Gesang [Fri, 17 Mar 2017 16:19:10 +0000]
move ct length bookkeeping into encryptor

This saves us unreliable calculations over the _Stream progress
in tarfile.py.

8 years agomove tag back into the header
Philipp Gesang [Fri, 17 Mar 2017 15:02:31 +0000]
move tag back into the header

Since we seek back to write the final header it makes little
sense to append the tag to the ciphertext regardless.

8 years agoinitially write dummy header during encryption
Philipp Gesang [Fri, 17 Mar 2017 10:22:44 +0000]
initially write dummy header during encryption

8 years agocreate crypto header in .next()
Philipp Gesang [Thu, 16 Mar 2017 16:39:57 +0000]
create crypto header in .next()

Saves us from exposing the IV to the stream.

8 years agoredo transition between objects in crypto layer
Philipp Gesang [Thu, 16 Mar 2017 16:06:28 +0000]
redo transition between objects in crypto layer

When encrypting, the ciphertext size isn’t known beforehand.
Likewise, the file name isn’t available when initializing the
decryption of a file.

8 years agopass salt between index and archive
Philipp Gesang [Thu, 16 Mar 2017 15:40:31 +0000]
pass salt between index and archive

Memoization of the scrypt params requires static storage because
both the index file and the archive each have an encryption
context.

8 years agorevise crypto context interface
Philipp Gesang [Thu, 9 Mar 2017 15:41:57 +0000]
revise crypto context interface

Fold key handling and encryption into a common context “class”.
The context takes care of the counter, iv, keys etc. It has one
entry point (ctor) for each direction (read, write → decrypt,
encrypt) and provides hooks for transitioning to the next item.

Header and tag handling remain accessible independent of the
context since tarfile operates on the archive stream and file
objects.

8 years agoredo stream decryption
Philipp Gesang [Mon, 6 Mar 2017 15:51:13 +0000]
redo stream decryption

When decrypting, the size of the encrypted object is known, as is
the length of the appended authentication tag; there is no
ambiguity regarding the end of an object. Thus the old string
matching logic with its linear search behavior can go.

8 years agosupersede encryption type by encryption parameters
Philipp Gesang [Fri, 3 Mar 2017 16:57:37 +0000]
supersede encryption type by encryption parameters

WIP

8 years agopolish up backup.py arguments
Philipp Gesang [Thu, 2 Mar 2017 14:41:42 +0000]
polish up backup.py arguments

These are about the only high-level clues regarding its
functionality so it may as well be presented right.

8 years agoinclude offending mode string in exception
Philipp Gesang [Thu, 2 Mar 2017 13:54:59 +0000]
include offending mode string in exception

Be a little more informative about the cause considering this
exception is passed on to the user as-is.

8 years agoensure octal format is fed an integer
Philipp Gesang [Thu, 2 Mar 2017 13:40:39 +0000]
ensure octal format is fed an integer

Fix a “type” error that seems to be enforced by Python 3:

      File "/src/python/python-delta-tar/deltatar/tarfile.py", line 220, in itn
        s = bytes("%0*o" % (digits - 1, n), "ascii") + NUL
    TypeError: %o format: an integer is required, not float

8 years agodisplay backup.py usage if no action was specified
Philipp Gesang [Thu, 2 Mar 2017 13:36:43 +0000]
display backup.py usage if no action was specified

Getting no feedback if the invocation had no effect is rather
unseemly. Print the help message instead.

8 years agohandle reading and formatting of tags
Philipp Gesang [Tue, 28 Feb 2017 16:45:54 +0000]
handle reading and formatting of tags

8 years agounit test scrypt wrapper
Philipp Gesang [Tue, 28 Feb 2017 16:01:07 +0000]
unit test scrypt wrapper

8 years agounit test bogus header data
Philipp Gesang [Tue, 28 Feb 2017 15:34:34 +0000]
unit test bogus header data

8 years agounit test auth tag handling
Philipp Gesang [Tue, 28 Feb 2017 14:58:20 +0000]
unit test auth tag handling

8 years agounit test crypto handling of data spanning multiple chunks
Philipp Gesang [Tue, 28 Feb 2017 14:45:16 +0000]
unit test crypto handling of data spanning multiple chunks

8 years agounit test header handling
Philipp Gesang [Tue, 28 Feb 2017 14:17:04 +0000]
unit test header handling

8 years agoadd basic wrapper for GCM handling with python-cryptography
Philipp Gesang [Tue, 28 Feb 2017 13:36:19 +0000]
add basic wrapper for GCM handling with python-cryptography

This currently requires our yet unmerged contribution:
https://github.com/pyca/cryptography/pull/3421

8 years agoadd unit test module for encryption layer
Philipp Gesang [Tue, 28 Feb 2017 13:34:59 +0000]
add unit test module for encryption layer

8 years agoremove key length parameter wherever feasible
Philipp Gesang [Fri, 24 Feb 2017 10:18:18 +0000]
remove key length parameter wherever feasible

Since we’re using fixed AES-128 everywhere, the  revised version
no longer offers adjustable key length.

8 years agomake tarfile.py error out on invalid crypto modes and combos
Philipp Gesang [Fri, 24 Feb 2017 09:50:03 +0000]
make tarfile.py error out on invalid crypto modes and combos

The tarfile stream ctor will simply gloss over encryption
requested by the caller unless it happens to exactly match the
string (!) “aes”. Furthermore, with non-gzip compression the
encryption will be ignored altogether.

Instead of deceiving the user about the encryption being applied,
have the ctor fail immediately on invalid combinations.

8 years agoinit crypto support v2
Philipp Gesang [Thu, 23 Feb 2017 15:34:19 +0000]
init crypto support v2

Implements header reading and writing as well as PoC encryption
wrappers.

WIP

8 years agoignore all symlinks
Philipp Gesang [Mon, 7 Nov 2016 09:00:32 +0000]
ignore all symlinks

Don’t delay the creation of symlinks but suppress it entirely.

The rationale is that extraction with deltatar will only ever
operate on inputs whose symlinks are dereferenced upon archive
creation. Thus valid archives will not contain symlinks at all.

Also, it would appear that deltatar assumes paths of objects
inside a tarball are unique. If the tarball contains ultiple
objects with the same path, it will extract only the first one it
encounters and ignore the rest. This means that it would take at
least two successive backups to perform a symlink attack, the
first one planting the link and the second writing over the
location. This is prevented by the current mitigation strategy
(and by the --unlink option of other tar utilities).

8 years agoadd unit test for overwriting symlinks
Philipp Gesang [Fri, 4 Nov 2016 16:00:59 +0000]
add unit test for overwriting symlinks

Currently, we implement the behavior of GNU Tar: Subsequent files
in an archive override previous ones, which is also true of
symlinks.

8 years agorectify delayed symlink restoration
Philipp Gesang [Fri, 4 Nov 2016 14:32:13 +0000]
rectify delayed symlink restoration

Again, GNU tar serves as the model for safe behavior: We now
check whether the placeholder file exists and if it is indeed the
one we created earlier.

Since deltatar does not allow including symlinks in the backup,
the unit tests invoke tarfile functionality directly to add some
symlinks to an existing backup.

8 years agoadd unit test tracking behavior wrt symlinks
Philipp Gesang [Fri, 4 Nov 2016 10:59:34 +0000]
add unit test tracking behavior wrt symlinks

8 years agofix calls to deprecated function in deltatar.py
Philipp Gesang [Thu, 3 Nov 2016 16:02:54 +0000]
fix calls to deprecated function in deltatar.py

Fixes warning mandating “.warning()” over “.warn()”:

    DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead

8 years agodelay only absolute symlinks and those pointing to parent dirs
Philipp Gesang [Thu, 3 Nov 2016 13:31:14 +0000]
delay only absolute symlinks and those pointing to parent dirs

Only apply the symlink hook on those with fishy targets. Internal
symlinks need not be contained so they can be applied as-is.

8 years agoimplement delayed symlink creation
Philipp Gesang [Thu, 3 Nov 2016 11:02:15 +0000]
implement delayed symlink creation

Introduce a hook in ``extract()`` to invoke a callback if a
symlink is encountered in the archive. The implementation is
modeled after GNU Tar.

This is a v2 attempt on the symlink extraction problematic. The
first version simply ``unlink(2)`` all files before extraction
which is a less efficient albeit more robust strategy.

8 years agoavoid crash in test helper due to fp division
Philipp Gesang [Mon, 31 Oct 2016 16:44:42 +0000]
avoid crash in test helper due to fp division

As a matter of fact, ``randomint()`` accepts only int-ishly typed
values, not floats. Consequently, integer division is the way to
go.

8 years agosimplify control flow in RestoreHelper methods
Philipp Gesang [Wed, 2 Nov 2016 16:42:33 +0000]
simplify control flow in RestoreHelper methods

Make the control flow more obvious. The code in question was
introduced with commit ea6d3c3e… but did not make sense back then
either because cur_index which is the constant $1$ was compared
to the literal constant $0$:

+        cur_index = 1
+        while cur_index < len(self._data):
+            data = self._data[cur_index]

+                if cur_index == 0:

This bogus test was since removed but the convoluted ``while``
loop survived. Instead, access index 1 only once using an integer
literal.

8 years agoIncrease version to 1.5
Thomas Jarosch [Mon, 4 Jul 2016 10:13:39 +0000]
Increase version to 1.5

8 years agoCode review done, comment changes only
Thomas Jarosch [Mon, 4 Jul 2016 09:49:06 +0000]
Code review done, comment changes only

8 years agoRemove dead code
Thomas Jarosch [Mon, 4 Jul 2016 09:48:28 +0000]
Remove dead code

cur_index is always >= 1 in this code path

8 years agoRemove code duplication
Thomas Jarosch [Mon, 4 Jul 2016 09:48:16 +0000]
Remove code duplication

8 years agoDon't use exception handling for normal control flow
Thomas Jarosch [Thu, 30 Jun 2016 08:03:40 +0000]
Don't use exception handling for normal control flow

-> Replace buf.index() with buf.find().

Unwinding the stack is expensive and we were
even doing it for the default code path.

8 years agoadd file_crypt.py to scripts in setup.py
Christian Herdtweck [Fri, 24 Jun 2016 07:15:56 +0000]
add file_crypt.py to scripts in setup.py

8 years agocreated tool to encrypt/decrypt files using aes128 with compression
Christian Herdtweck [Thu, 23 Jun 2016 16:03:03 +0000]
created tool to encrypt/decrypt files using aes128 with compression

8 years agoIncrease version to 1.4
Thomas Jarosch [Thu, 23 Jun 2016 12:17:46 +0000]
Increase version to 1.4

Lots of little fixes and improvements.

8 years agoappease pylint
Christian Herdtweck [Thu, 23 Jun 2016 12:33:40 +0000]
appease pylint

8 years agofix error found by pylint
Christian Herdtweck [Wed, 22 Jun 2016 15:14:15 +0000]
fix error found by pylint

8 years agoRename design document so pylint3 doesn't pick it up
Thomas Jarosch [Thu, 23 Jun 2016 10:31:11 +0000]
Rename design document so pylint3 doesn't pick it up

8 years agoImplement cache for pwd.getpwuid() and grp.getgrgid()
Thomas Jarosch [Thu, 23 Jun 2016 08:08:16 +0000]
Implement cache for pwd.getpwuid() and grp.getgrgid()

Those functions always parse /etc/passwd and we
look up the owner for each file we backup.

This change is only relevant when creating full backups.
Speed up with ~1.000.000 emails is 11%.

8 years agoFix 'directory' type when iterating tar archives without index
Thomas Jarosch [Tue, 21 Jun 2016 08:01:35 +0000]
Fix 'directory' type when iterating tar archives without index

'dir' is not used anywhere in the code base.

8 years agouse the "& 0xFFFfff" after all crc32 calculations
Christian Herdtweck [Mon, 20 Jun 2016 07:43:54 +0000]
use the "& 0xFFFfff" after all crc32 calculations

8 years agoIncrease release to 1.3
Thomas Jarosch [Fri, 17 Jun 2016 15:39:13 +0000]
Increase release to 1.3

Also switch group from Intranator to Intra2net

8 years agoimprove one more unittest: raise proper assertion instead of failing with non-existen...
Christian Herdtweck [Fri, 17 Jun 2016 14:02:27 +0000]
improve one more unittest: raise proper assertion instead of failing with non-existent variable

8 years agoadjust unittests in test_deltatar
Christian Herdtweck [Fri, 17 Jun 2016 14:01:37 +0000]
adjust unittests in test_deltatar

8 years agocorrect a comment, add more info to log message
Christian Herdtweck [Fri, 17 Jun 2016 13:29:19 +0000]
correct a comment, add more info to log message

8 years agoadjust filter_path: also remove trailing os.sep
Christian Herdtweck [Fri, 17 Jun 2016 13:28:34 +0000]
adjust filter_path: also remove trailing os.sep

8 years agofix strip_base_dir argument for DeltaTar._recursive_walk_dir: check for os.sep
Christian Herdtweck [Fri, 17 Jun 2016 13:28:09 +0000]
fix strip_base_dir argument for DeltaTar._recursive_walk_dir: check for os.sep

8 years agosimplify DeltaTar._recursive_walk_dir
Christian Herdtweck [Fri, 17 Jun 2016 09:59:27 +0000]
simplify DeltaTar._recursive_walk_dir

(had called os.path.isdir and filter_path twice on each file directly
 after another)

8 years agohad forgotten a few tarobj.close and os.unlink(temp_file) in new tests
Christian Herdtweck [Fri, 17 Jun 2016 09:56:24 +0000]
had forgotten a few tarobj.close and os.unlink(temp_file) in new tests

8 years agochange one output, make 2 variables to testing routine arguments
Christian Herdtweck [Fri, 17 Jun 2016 07:31:50 +0000]
change one output, make 2 variables to testing routine arguments

8 years agouse KiB, MiB (factor 1024) instead of KB, MB (factor 1000)
Christian Herdtweck [Fri, 17 Jun 2016 07:31:16 +0000]
use KiB, MiB (factor 1024) instead of KB, MB (factor 1000)

8 years agofix search for file with impossible size (had forgotten that volume_size is in MB)
Christian Herdtweck [Wed, 15 Jun 2016 13:07:24 +0000]
fix search for file with impossible size (had forgotten that volume_size is in MB)

8 years agoIncrease version to 1.2
Thomas Jarosch [Wed, 15 Jun 2016 12:28:38 +0000]
Increase version to 1.2

8 years agoMerge branch 'fix-compression-size'
Thomas Jarosch [Wed, 15 Jun 2016 12:18:33 +0000]
Merge branch 'fix-compression-size'

The new code will give final tar file sizes
close to the volume size even when using compression.

8 years agoensure temp file is deleted; add some comments about results
Christian Herdtweck [Wed, 15 Jun 2016 09:39:41 +0000]
ensure temp file is deleted; add some comments about results

8 years agoadded performance test script
Christian Herdtweck [Wed, 15 Jun 2016 09:19:09 +0000]
added performance test script

8 years agoadded minimum file size arg to find_random_files
Christian Herdtweck [Wed, 15 Jun 2016 09:18:55 +0000]
added minimum file size arg to find_random_files

should make returned files more realistic

8 years agoreduce time wasted on _dbg output: format string only when it is actually printed
Christian Herdtweck [Wed, 15 Jun 2016 07:55:16 +0000]
reduce time wasted on _dbg output: format string only when it is actually printed

8 years agoremove _dbg(str.format(args)) from performance-sensitive loop in addfile
Christian Herdtweck [Wed, 15 Jun 2016 07:54:02 +0000]
remove _dbg(str.format(args)) from performance-sensitive loop in addfile

8 years agoadded some more comments
Christian Herdtweck [Wed, 15 Jun 2016 07:53:48 +0000]
added some more comments

8 years agoadd unittest that runs one of the many multivolume compression size tests
Christian Herdtweck [Tue, 14 Jun 2016 10:28:32 +0000]
add unittest that runs one of the many multivolume compression size tests

8 years agocreated another test for multivolume compression size
Christian Herdtweck [Mon, 13 Jun 2016 11:07:08 +0000]
created another test for multivolume compression size

8 years agochanged debug output level of the debug output I added earlier
Christian Herdtweck [Tue, 14 Jun 2016 09:59:03 +0000]
changed debug output level of the debug output I added earlier

8 years agoremoved some debug output
Christian Herdtweck [Mon, 13 Jun 2016 11:06:38 +0000]
removed some debug output

8 years agofix ValueError message (otherwise '*' is interpreted as string repetition)
Christian Herdtweck [Mon, 13 Jun 2016 11:06:28 +0000]
fix ValueError message (otherwise '*' is interpreted as string repetition)

8 years agoensure max_volume_size is int or None
Christian Herdtweck [Mon, 13 Jun 2016 11:05:50 +0000]
ensure max_volume_size is int or None

8 years agoskip one test with known failure
Christian Herdtweck [Fri, 10 Jun 2016 12:34:05 +0000]
skip one test with known failure

8 years agocorrect number of compressed backup volumes in tests
Christian Herdtweck [Fri, 10 Jun 2016 12:32:38 +0000]
correct number of compressed backup volumes in tests

8 years agoadded class variable MODE_COMPRESS to test_deltatar unittests
Christian Herdtweck [Fri, 10 Jun 2016 12:18:33 +0000]
added class variable MODE_COMPRESS to test_deltatar unittests

8 years agodo a proper unittest.skip if test is not run
Christian Herdtweck [Fri, 10 Jun 2016 12:17:33 +0000]
do a proper unittest.skip if test is not run

8 years agoremove print()s in unittest test_multivol
Christian Herdtweck [Fri, 10 Jun 2016 12:15:30 +0000]
remove print()s in unittest test_multivol

8 years agoremove unused import
Christian Herdtweck [Fri, 10 Jun 2016 08:09:33 +0000]
remove unused import

8 years agoextend valid range of sizes for sample.tar.gz file
Christian Herdtweck [Fri, 10 Jun 2016 08:09:24 +0000]
extend valid range of sizes for sample.tar.gz file

8 years agoadded new unittest test_multivol_compress with bigger random-data file
Christian Herdtweck [Thu, 9 Jun 2016 16:01:37 +0000]
added new unittest test_multivol_compress with bigger random-data file

8 years agorenamed unittest test_multivol_compress to test_compress_single (compresses to single...
Christian Herdtweck [Thu, 9 Jun 2016 16:01:12 +0000]
renamed unittest test_multivol_compress to test_compress_single (compresses to single volume)

8 years agoadded a few debug messages to addfile and open_volume
Christian Herdtweck [Thu, 9 Jun 2016 15:58:39 +0000]
added a few debug messages to addfile and open_volume

8 years agochanged TarFile.addfile to get better sized volumes if compressing
Christian Herdtweck [Thu, 9 Jun 2016 15:58:04 +0000]
changed TarFile.addfile to get better sized volumes if compressing

8 years agocreated a 2nd TarFile._size_left: one for file and one for stream
Christian Herdtweck [Thu, 9 Jun 2016 15:56:00 +0000]
created a 2nd TarFile._size_left: one for file and one for stream

8 years agofix unittest's create_pseudo_random_file: do return file name as docu says
Christian Herdtweck [Thu, 9 Jun 2016 15:54:51 +0000]
fix unittest's create_pseudo_random_file: do return file name as docu says

author of that module seems to never have heard of the tempfile module in
python stdlib!

8 years agoimproved unittest's create file: do not gather GB in memory before writing
Christian Herdtweck [Thu, 9 Jun 2016 15:53:54 +0000]
improved unittest's create file: do not gather GB in memory before writing

8 years agoclean up in unittest: remove size_test_* files
Christian Herdtweck [Thu, 9 Jun 2016 15:52:58 +0000]
clean up in unittest: remove size_test_* files