Philipp Gesang [Mon, 14 Aug 2017 12:14:16 +0000]
fail with info message if recovery is asked with source path
Philipp Gesang [Mon, 14 Aug 2017 10:09:28 +0000]
allow for numbers of missing and failed files to differ in recovery test
Philipp Gesang [Mon, 14 Aug 2017 09:54:17 +0000]
adjust the expectations about checksum mismatches with non-authenticated recover modes
Philipp Gesang [Mon, 14 Aug 2017 09:35:18 +0000]
use index iterator to accomodate multivol extraction
For reasons unknown, the “tar path iterator” always terminates
after the last element of the first volume. In fact, it does so
even for multi volume archives if the last object in the first
volume extends into the second volume. In this case, the object
is completely extracted but extraction terminates.
Philipp Gesang [Fri, 11 Aug 2017 14:41:51 +0000]
use random data in multivol tests
Brute force incompressibility to preven gzip from invalidating
our multivolume tests.
Philipp Gesang [Fri, 11 Aug 2017 13:45:50 +0000]
give each recovery test a multivol companion
This derives single- and multivolume versions of the tests.
Multiple volumes are generated by stretching the input file count
and size.
Philipp Gesang [Fri, 11 Aug 2017 12:16:56 +0000]
work around false positives in deltatar fs checks during rpmbuild
These only happen when running in rpmbuild, otherwise the tests
are fine. Of course, on RHBT the choir resoundeth “thou shalt not
run thine rpm build as root” but that’s not really an option
here.
Philipp Gesang [Fri, 11 Aug 2017 09:50:55 +0000]
catch incomplete trailing header in tolerant recovery
This makes decryption in recovery mode resistant against
malformed trailing data which would otherwise error out for the
entire buffered chunk on account of a decryption failure.
Philipp Gesang [Fri, 11 Aug 2017 09:39:42 +0000]
test recovery behavior with traling data
Philipp Gesang [Fri, 11 Aug 2017 09:16:33 +0000]
track successful recover of corrupted payload in tests
Gzip does CRC32, GCM has a MAC, but ordinary Tar only checksums
the header part, not the content. Thus recovery of a damaged
object will appear to succeed provided the object header is
intact. In order to detect the corruption, an external integrity
check is necessary.
Philipp Gesang [Fri, 11 Aug 2017 08:53:09 +0000]
add recover tests for completely damaged headers
Philipp Gesang [Fri, 11 Aug 2017 08:25:12 +0000]
sync tarfile stream diligently when writing new objects
Turns out all the offsets written to the index when neither
encrypting nor compressing were, well, … off. In fact they would
only be updated at tar block boundaries due to buffering. Since
“last_block_offset” record keeping blatantly violates layering
boundaries, it would only work reliably with the concat
compression and encryption modes that do the same.
Sync when adding a new object so we get the accurate offset
value. Voilà, recovery now works with uncompressed and
unencrypted archives as well
Philipp Gesang [Thu, 10 Aug 2017 15:01:42 +0000]
add header corruption tests
We hit them where it hurts:
* for compressed backups, flip a bit in the magic;
* for encrypted backups, flip a bit in the tag.
In either case, normal restore must fail, and disaster recovery
will be incomplete.
Philipp Gesang [Thu, 10 Aug 2017 13:32:16 +0000]
add test for corruption of encrypted files
Philipp Gesang [Thu, 10 Aug 2017 12:39:40 +0000]
track irrecoverable files in test_recover
Philipp Gesang [Thu, 10 Aug 2017 11:06:30 +0000]
prefer index iterator for recovery
Philipp Gesang [Thu, 10 Aug 2017 09:38:39 +0000]
properly damage gzip files for recover test
Ensure we are flipping bits in the compressed payload, not in the
mostly useless header. Requires some extra parsing to determine
the header length.
Philipp Gesang [Thu, 10 Aug 2017 08:34:15 +0000]
add bit flip helper for recover tests
Philipp Gesang [Thu, 10 Aug 2017 08:13:18 +0000]
fix misleading docstrings for index file hook
Philipp Gesang [Thu, 10 Aug 2017 07:37:08 +0000]
lay out skeleton for disaster recovery tests
New series of tests for corrupting backup sets and restoring them
incompletely (“tolerant” or “disaster recovery” mode).
Philipp Gesang [Tue, 8 Aug 2017 11:58:20 +0000]
draft disaster recovery mode for deltatar
The first stage recovery assumes the index is intact and all
objects are at their expected position. In this scenario, an
attempt is made to extract each object, keeping track of those
that weren’t readable and why.
Philipp Gesang [Tue, 8 Aug 2017 10:03:01 +0000]
return valid decrypted data on decryption failure
Philipp Gesang [Tue, 8 Aug 2017 08:48:31 +0000]
force tarfile reopen after bad read in deltatar
Closing the tarfile after an unreadable object was encountered
causes the stream to be reopened for the next read. Otherwise,
the corrupt object is already buffered and tarfile would continue
to seek inside the bad data.
Philipp Gesang [Tue, 8 Aug 2017 07:44:56 +0000]
distinguish invalid files from parse errors in restore
Especially with index files, the parse error is misleading.
Indicate the prevalent cause of the problem, i. e. that the
file is compressed but compression was not requested during
restore.
Philipp Gesang [Tue, 8 Aug 2017 07:14:12 +0000]
update help usage strings wrt. crypto in backup.py
Philipp Gesang [Mon, 7 Aug 2017 13:37:19 +0000]
extend crypto.py exception descriptions
Philipp Gesang [Tue, 27 Jun 2017 08:24:00 +0000]
actually default to i2n mode with crypto.py scrypt
And adapt the relevant unit test to explicitly request the full
parameters output.
Philipp Gesang [Fri, 23 Jun 2017 08:35:08 +0000]
add crypto.py option to output cnf-compatible scrypt object
Philipp Gesang [Wed, 31 May 2017 11:53:21 +0000]
support PDT encrypted archives with rescue_tar.py
Philipp Gesang [Tue, 30 May 2017 15:29:26 +0000]
adapt file_crypt.py for revised crypto
Philipp Gesang [Tue, 30 May 2017 15:10:59 +0000]
kill off old crypto implementation
The old aescrypto.py was only kept for reference but since
downstream integration is more or less complete wrt. encryption
we don’t need it any longer.
Good riddance.
Philipp Gesang [Tue, 30 May 2017 10:40:19 +0000]
allow passing salt to crypto.py on the command line
Nifty shortcut for hashing without a corresponding pdtcrypt file.
Philipp Gesang [Tue, 30 May 2017 09:23:57 +0000]
properly align usage message of crypto.py
Philipp Gesang [Tue, 23 May 2017 12:55:10 +0000]
improve bad CLI argument handling of crypto.py
Philipp Gesang [Mon, 22 May 2017 12:10:33 +0000]
include header version info in scrypt handler
Philipp Gesang [Fri, 19 May 2017 15:22:17 +0000]
accept crypto format version in deltatar ctor
Philipp Gesang [Fri, 19 May 2017 09:16:10 +0000]
add unit test for CLI scrypt hashing
Philipp Gesang [Thu, 18 May 2017 15:47:25 +0000]
allow passing keys directly to CLI crypto.py
Keys may now be passed as command line argument or environment
variable.
The only valid format is 16 bytes in hexadecimal.
Philipp Gesang [Thu, 18 May 2017 11:44:07 +0000]
grab password from envp if not supplied on CLI
In order to avoid the password showing up in full in the process
table, pass it in the environment instead. Uses the environment
variable PDTCRYPT_PASSWORD with both crypto.py and backup.py.
Philipp Gesang [Tue, 16 May 2017 11:37:43 +0000]
default to index mode of deltatar object when choosing extension
For external use.
Philipp Gesang [Tue, 16 May 2017 08:57:01 +0000]
handle bad randomness during IV creation
Since IVs must be unique we rely on /dev/urandom to yield a
different sequence of bytes when requesting a new fixed part.
In the unlikely event that a new fixed part has already been
used earlier, repeat it for number of times.
Abort if no unique IV could be generated this way since it
most likely indicates a faulty RNG.
Philipp Gesang [Mon, 15 May 2017 15:44:48 +0000]
extend crypto.py documentation
Philipp Gesang [Thu, 11 May 2017 15:40:21 +0000]
distinguish auxiliary file errors
Auxiliary files that grow larger than the maximum defined
encrypted file size cause an irrecoverable error because their
fixed IV is being reused. Add a new exception to distinguish this
specific case. Encrypted auxiliary files thus never consist of
more than one object, no on-the-fly continuation is permitted
like with ordinary files.
Philipp Gesang [Thu, 11 May 2017 09:28:08 +0000]
adapt unit tests for crypto.py subcommands
Philipp Gesang [Thu, 11 May 2017 08:50:30 +0000]
export scrypt hashing functionality
Philipp Gesang [Thu, 11 May 2017 08:35:40 +0000]
add SCRYPT hashing mode to crypto.py
Add a subcommand “scrypt” to crypto.py in CLI mode. Example:
$ python3 ./deltatar/crypto.py scrypt foo -i - -o pwd \
<backup_dir/bfull-2017-05-11-0919-001.tar.pdtcrypt
{"scrypt_params": {"p": 1, "N": 65536, "dkLen": 16, "r": 8},
"salt": "b'
fbdbaa9890ae243eb16391199c9243f6'", "hash":
"b'
1e7d7a78b9300d461779e9c80e4a15ac'"}
The output “hash” is calculated from the salt in the first
header found in the given archive and the password specified.
Philipp Gesang [Tue, 9 May 2017 13:42:17 +0000]
graciously handle GCM data length limit
Philipp Gesang [Tue, 9 May 2017 08:59:28 +0000]
unit test crypto file counter wraparound
After the file counter reaches UINT_MAX, it wraps around and a
new fixed part must be created.
The file counter is 32 bit unsigned integer so it needs to be
lowered to make bounds testing feasible.
Philipp Gesang [Tue, 9 May 2017 08:22:43 +0000]
extend strict iv tracking to encryption
This is just an extra soundness check to prevent accidental reuse
if IVs when handled incorrectly (same initial counters passed
twice to the same context). In normal usage this case cannot
happen.
Philipp Gesang [Mon, 8 May 2017 14:27:13 +0000]
expand crypto api to accept precomputed key
Philipp Gesang [Mon, 8 May 2017 15:13:26 +0000]
reduce noise in test_multivol_compression_sizes.py
Philipp Gesang [Mon, 8 May 2017 13:33:29 +0000]
improve iv diagnostics when decrypting
Philipp Gesang [Mon, 8 May 2017 09:26:54 +0000]
test that seeking backwards is disallowed by _Stream
Re-extracting an already decrypted file will fail on account of
IV reuse. Currently, tarfile._Stream is not capable of performing
backward seeks, so we’re good. Should this limitation be removed
in a future version, this unit test will fail.
Philipp Gesang [Mon, 8 May 2017 07:58:33 +0000]
remove pytest dependency from test_crypto.py
Philipp Gesang [Fri, 5 May 2017 15:52:51 +0000]
add unit test for IV reuse
Philipp Gesang [Fri, 5 May 2017 15:18:11 +0000]
adapt crypto unit tests to run in main suite
Philipp Gesang [Fri, 5 May 2017 14:52:45 +0000]
remove IV validation step from RestoreHelper
Since the same decryption context is carried over between the Tar
volumes of one backup set, the built-in IV uniqueness checks
suffice. Between multiple backup sets, the salt and IV fixed
parts change, so there is no occasion for conflict. The IVs of
auxiliary files are unique anyways.
Philipp Gesang [Fri, 5 May 2017 12:27:29 +0000]
adjust acceptable size window for compressed unit test data
A low bound of 330 causes the test to fail with version 1.2.3 of
zlib.
Earlier this did not occur because in concat mode, tarfile would
always write an empty zlib compressed chunk right at the
beginning of the archive and then immediately create a new one as
soon as actual input arrived. For this reason, the resulting
archive size remained within the bounds chosen in
test_multivol.py. Due to the removal of the redundancy, this is
no longer the case. The problem is masked on newer versions of
zlib (tested: 1.2.8 of fc25) which create larger compressed files
in general for the same inputs.
For the “test_compress_single” unit test, the input consists of a
an archive 61440 bytes. Compress with level 9, window bits 31,
and a memlevel of 9, the output length is:
version size (B)
1.2.3 308
1.2.8 324
Add to that the file name in our custom header and the latter
passes 330 B whereas the former doesn’t.
A lower bound of 315 is justified.
Philipp Gesang [Fri, 5 May 2017 09:00:37 +0000]
reuse existing crypto context for subsequent volumes
Philipp Gesang [Fri, 5 May 2017 08:20:50 +0000]
validate exceptions being thrown from invalid tarfile.open() params
Philipp Gesang [Thu, 4 May 2017 16:06:04 +0000]
move final IV checks out of crypto context
Collect IVs while decrypting but postpone the final check for
duplicates. Reused IVs still trigger an exception during
decryption but since multiple different contexts may be active
(e. g. when handling a diff backup), the IVs they retrieved from
the headers must be compared afterwards. This test has its place
in a new function “validate” of the ``RestoreHelper`` and must be
called when decryption has been completed.
Philipp Gesang [Thu, 4 May 2017 13:12:50 +0000]
write auxiliary files whilst processing the backup
Introduce a fixed value for the index file counter to allow
encryption on the fly.
Philipp Gesang [Thu, 4 May 2017 12:24:06 +0000]
use independent decryption contexts for backup files
When restoring individual files from a diff backup, Deltatar will
traverse both tarballs simultaneously. This leads to access
patterns where reads are interleaved between the two sources,
possibly corrupting the decryption state. Thus when restoring
from multiple “index files” (in practice only two are relevant),
use a separate decryptor context for each of them.
Philipp Gesang [Tue, 2 May 2017 14:14:44 +0000]
clean up multi-index handling
WIP
Philipp Gesang [Tue, 2 May 2017 15:59:19 +0000]
properly handle encryption and compression of empty archives
The old implementation always initialized in the ctor regardless
of whether contents would be written to the archive. For empty
archives this now has to be done in ``.close()`` if no data has
been added yet.
Philipp Gesang [Tue, 2 May 2017 14:44:43 +0000]
adapt test_multivol_compression_sizes.py to revised crypto
Philipp Gesang [Tue, 2 May 2017 13:06:24 +0000]
remove redundant test
The first part of the condition always evaluates to True since
it’s the precondition to entering that branch.
Philipp Gesang [Fri, 28 Apr 2017 16:06:27 +0000]
encode operation modes
Introduce the “arcmode” to comprehensively switch modes to
supplant the pervasive ad-hoc string parsing and attribute
queries. Encodes the triple encryption, compression, concat.
Philipp Gesang [Fri, 28 Apr 2017 12:04:42 +0000]
cleanly perform block transition in non-concat mode
Philipp Gesang [Fri, 28 Apr 2017 08:41:27 +0000]
clarify exception-driven control flow
Distinguish the actual EOF when hit at the beginning from other
IO errors in _init_read_gz() and only catch this one where it’s
expected. Well formed archives do not end inside a header.
Philipp Gesang [Fri, 28 Apr 2017 08:28:29 +0000]
remove unused state variable
“internal_pos” which is only written to and never read was
introduced with this commit:
commit
85737f48c38a432f2429e9e3e4b81fed164c4b9a
Author: Eduardo Robles Elvira <edulix@wadobo.com>
Date: Fri Jul 5 11:50:43 2013 +0200
extracting files in r#gz mode now works too, includes unit tests
and lacks a raison d’être ever since.
Philipp Gesang [Thu, 27 Apr 2017 15:18:31 +0000]
use append mode in symlink unit test
These tests currently fail despite using the original Gzipfile
compression path. The archives appear to overwrite instead the
passed archive instead of writing new objects.
Philipp Gesang [Thu, 27 Apr 2017 14:03:46 +0000]
fix multivol compression handling
Philipp Gesang [Tue, 25 Apr 2017 14:38:12 +0000]
handle uncompressed, encrypted archives with tarfile
Internally, tarfile.py uses “tar” to refer to uncompressed
archives, so just handle this accordingly at the API level.
Philipp Gesang [Tue, 25 Apr 2017 13:28:09 +0000]
sync on .close() for unencrypted archives
Philipp Gesang [Tue, 25 Apr 2017 12:03:36 +0000]
properly (re-) initialize compressor at archive / volume bounds
For unencrypted streams, the compressor still must be reset in
concat mode. For encrypted streams, the decompressor can be
initialized right at the start of the archive since no further
inputs are needed.
Philipp Gesang [Tue, 25 Apr 2017 09:13:17 +0000]
keep separate encryptor and decryptor contexts in deltatar.py
The same Deltatar object appears to function as handle for
reading and writing files simultaneously. To support this,
introduce two different crypto contexts that are created
on demand.
Philipp Gesang [Mon, 24 Apr 2017 13:04:38 +0000]
properly restart compression when encrypting
Separate finalization of a zlib block from creation of a new one.
Otherwise, we end up with trailing data from the last object that
lingers in the write buffer and gets flushed to the archive after
the next encrypted object has been initialized.
Also get rid of the “new_compression_block” wrapper which
needlessly complicated things.
Special precautions must be taken for the PAX format. Due to its
requirement of a global archive header, TarFile will write to the
stream prior to initialization that is performed in addfile().
Thus, initialize compression before the PAX header is being
written and properly restart compression for the first object
committed to the archive or volume.
Philipp Gesang [Mon, 24 Apr 2017 10:06:46 +0000]
use crypto.py to split test archive in test_encryption.py
This again verifies individual decryptability of objects in the
PDT archive.
Philipp Gesang [Mon, 24 Apr 2017 09:37:23 +0000]
implement passthrough mode in crypto.py
When invoked with --no-decrypt, write object headers and
ciphertext to output. Combined with --split this allows
extracting encrypted objects from the archive.
Philipp Gesang [Mon, 24 Apr 2017 09:23:53 +0000]
implement split mode for CLI encryption
Philipp Gesang [Fri, 21 Apr 2017 16:03:17 +0000]
explicitly disable gz initalization for _Stream’s used in aux files
The process of writing an auxiliary (index, info) file differs
drastically from that of tar archives: Since files are not added
individually, the encryption must be initialized externally and
the compression layer cannot rely on being enable in the ctor
because, obviously, the latter is executed before the manual
encryption setup can be performed.
Extend the API of the _Stream ctor with a parameter to “noinit”
to request that all initialization be postponed until the
encryption has been set up. This seems to do the trick but is
quite ugly.
Philipp Gesang [Fri, 21 Apr 2017 13:23:49 +0000]
fix decompression error handling
This seems to be a copy&paste oversight from
commit
be60ffd0598fec172eccb69f3c6a8433af6fb643
Author: Eduardo Robles Elvira <edulix@wadobo.com>
Date: Mon Nov 4 08:50:55 2013 +0100
initial port to python 3, not finished
which added the per-compression mode exceptions but not the
actual handling code.
Philipp Gesang [Fri, 21 Apr 2017 13:21:06 +0000]
fix index file encryption handling
Philipp Gesang [Fri, 21 Apr 2017 07:57:05 +0000]
convert test_deltatar to revised crypto
Philipp Gesang [Fri, 21 Apr 2017 07:39:02 +0000]
permit setting crypto.py parameter version via deltatar ctor
Introduce an optional argument to request a specific crypto
parameter version when invoking Deltatar. This isn’t of much
use ATM since only the one version is implemented, but it’s
handy for testing nonetheless.
Philipp Gesang [Fri, 21 Apr 2017 07:16:49 +0000]
eliminate the last traces of encryption “modes”
Since encryption handling has been moved outside of tarfile.py
these no longer apply. Thus remove all references so they don’t
obscure problems in the unit tests.
Philipp Gesang [Thu, 20 Apr 2017 15:53:49 +0000]
initialize compression globally for non-“concat” archives
Philipp Gesang [Thu, 20 Apr 2017 15:26:04 +0000]
remove obsolete unittests for 256 bit AES
Philipp Gesang [Thu, 20 Apr 2017 14:53:06 +0000]
pass encryption context to deltatar volume handlers
Philipp Gesang [Tue, 18 Apr 2017 13:04:07 +0000]
rework encryption unittests
Philipp Gesang [Thu, 20 Apr 2017 09:34:35 +0000]
fix compression handling on volume bounds
The old “concat compression” simply relied on the _Stream() ctor
to create a new zip block which is no longer possible since the
prerequisite encryption is only available when the first object
is committed to the archive.
Hence, reintroduce the new block initialization after
transitioning to the new volume.
Philipp Gesang [Tue, 18 Apr 2017 14:07:29 +0000]
add strict IV validation to decryption handler
Optionally (on CLI, with the “-s” flag) check for additional IV
properties:
- Accidental reuse: in GCM, the same IV used more than once
means that the plaintext is compromised.
- Unstructured archive: In the headers of a normal PDT
encrypted archive, the variable parts of the IVs are
consecutive unless the fixed part changes.
Philipp Gesang [Tue, 18 Apr 2017 09:59:02 +0000]
allow selecting individual tests with runtests.py
If arguments are passed on the command line, interpret them as
test names and attempt to compose a suite comprising only the
tests specified.
The behavior remains the same if invoked without argument.
Philipp Gesang [Mon, 10 Apr 2017 15:37:33 +0000]
improve error handling in crypto handler
Since invalid tags are some the most important bits of
information to be passed down, make the corresponding error
message human-readable.
Philipp Gesang [Mon, 10 Apr 2017 14:32:50 +0000]
remove obsolete block size check
Philipp Gesang [Mon, 10 Apr 2017 13:08:31 +0000]
fix fallout from EOF changes in CLI decryptor
Philipp Gesang [Mon, 10 Apr 2017 13:01:56 +0000]
throw error on partial header reading stream
Throw the EOF exception only if the stream ends exactly at an
object boundary. Otherwise, when less then sizeof(hdr) bytes
are returned from read(), throw InvalidHeader to indicate a
malformed file. This keeps EOF a “benign” exception.
Philipp Gesang [Mon, 10 Apr 2017 11:43:53 +0000]
communicate remainder to caller when hitting EOF from crypto
Philipp Gesang [Mon, 10 Apr 2017 09:43:06 +0000]
strip extraneous parameters from decryption handler ctor
Format and parameter version as well as the salt are supplied
from the headers. Decrypting should thus only require the
password and, depending on context, an explicit counter as well
as the list of valid IV fixed parts.