Philipp Gesang [Tue, 9 May 2017 13:42:17 +0000]
graciously handle GCM data length limit
Philipp Gesang [Tue, 9 May 2017 08:59:28 +0000]
unit test crypto file counter wraparound
After the file counter reaches UINT_MAX, it wraps around and a
new fixed part must be created.
The file counter is 32 bit unsigned integer so it needs to be
lowered to make bounds testing feasible.
Philipp Gesang [Tue, 9 May 2017 08:22:43 +0000]
extend strict iv tracking to encryption
This is just an extra soundness check to prevent accidental reuse
if IVs when handled incorrectly (same initial counters passed
twice to the same context). In normal usage this case cannot
happen.
Philipp Gesang [Mon, 8 May 2017 14:27:13 +0000]
expand crypto api to accept precomputed key
Philipp Gesang [Mon, 8 May 2017 15:13:26 +0000]
reduce noise in test_multivol_compression_sizes.py
Philipp Gesang [Mon, 8 May 2017 13:33:29 +0000]
improve iv diagnostics when decrypting
Philipp Gesang [Mon, 8 May 2017 09:26:54 +0000]
test that seeking backwards is disallowed by _Stream
Re-extracting an already decrypted file will fail on account of
IV reuse. Currently, tarfile._Stream is not capable of performing
backward seeks, so we’re good. Should this limitation be removed
in a future version, this unit test will fail.
Philipp Gesang [Mon, 8 May 2017 07:58:33 +0000]
remove pytest dependency from test_crypto.py
Philipp Gesang [Fri, 5 May 2017 15:52:51 +0000]
add unit test for IV reuse
Philipp Gesang [Fri, 5 May 2017 15:18:11 +0000]
adapt crypto unit tests to run in main suite
Philipp Gesang [Fri, 5 May 2017 14:52:45 +0000]
remove IV validation step from RestoreHelper
Since the same decryption context is carried over between the Tar
volumes of one backup set, the built-in IV uniqueness checks
suffice. Between multiple backup sets, the salt and IV fixed
parts change, so there is no occasion for conflict. The IVs of
auxiliary files are unique anyways.
Philipp Gesang [Fri, 5 May 2017 12:27:29 +0000]
adjust acceptable size window for compressed unit test data
A low bound of 330 causes the test to fail with version 1.2.3 of
zlib.
Earlier this did not occur because in concat mode, tarfile would
always write an empty zlib compressed chunk right at the
beginning of the archive and then immediately create a new one as
soon as actual input arrived. For this reason, the resulting
archive size remained within the bounds chosen in
test_multivol.py. Due to the removal of the redundancy, this is
no longer the case. The problem is masked on newer versions of
zlib (tested: 1.2.8 of fc25) which create larger compressed files
in general for the same inputs.
For the “test_compress_single” unit test, the input consists of a
an archive 61440 bytes. Compress with level 9, window bits 31,
and a memlevel of 9, the output length is:
version size (B)
1.2.3 308
1.2.8 324
Add to that the file name in our custom header and the latter
passes 330 B whereas the former doesn’t.
A lower bound of 315 is justified.
Philipp Gesang [Fri, 5 May 2017 09:00:37 +0000]
reuse existing crypto context for subsequent volumes
Philipp Gesang [Fri, 5 May 2017 08:20:50 +0000]
validate exceptions being thrown from invalid tarfile.open() params
Philipp Gesang [Thu, 4 May 2017 16:06:04 +0000]
move final IV checks out of crypto context
Collect IVs while decrypting but postpone the final check for
duplicates. Reused IVs still trigger an exception during
decryption but since multiple different contexts may be active
(e. g. when handling a diff backup), the IVs they retrieved from
the headers must be compared afterwards. This test has its place
in a new function “validate” of the ``RestoreHelper`` and must be
called when decryption has been completed.
Philipp Gesang [Thu, 4 May 2017 13:12:50 +0000]
write auxiliary files whilst processing the backup
Introduce a fixed value for the index file counter to allow
encryption on the fly.
Philipp Gesang [Thu, 4 May 2017 12:24:06 +0000]
use independent decryption contexts for backup files
When restoring individual files from a diff backup, Deltatar will
traverse both tarballs simultaneously. This leads to access
patterns where reads are interleaved between the two sources,
possibly corrupting the decryption state. Thus when restoring
from multiple “index files” (in practice only two are relevant),
use a separate decryptor context for each of them.
Philipp Gesang [Tue, 2 May 2017 14:14:44 +0000]
clean up multi-index handling
WIP
Philipp Gesang [Tue, 2 May 2017 15:59:19 +0000]
properly handle encryption and compression of empty archives
The old implementation always initialized in the ctor regardless
of whether contents would be written to the archive. For empty
archives this now has to be done in ``.close()`` if no data has
been added yet.
Philipp Gesang [Tue, 2 May 2017 14:44:43 +0000]
adapt test_multivol_compression_sizes.py to revised crypto
Philipp Gesang [Tue, 2 May 2017 13:06:24 +0000]
remove redundant test
The first part of the condition always evaluates to True since
it’s the precondition to entering that branch.
Philipp Gesang [Fri, 28 Apr 2017 16:06:27 +0000]
encode operation modes
Introduce the “arcmode” to comprehensively switch modes to
supplant the pervasive ad-hoc string parsing and attribute
queries. Encodes the triple encryption, compression, concat.
Philipp Gesang [Fri, 28 Apr 2017 12:04:42 +0000]
cleanly perform block transition in non-concat mode
Philipp Gesang [Fri, 28 Apr 2017 08:41:27 +0000]
clarify exception-driven control flow
Distinguish the actual EOF when hit at the beginning from other
IO errors in _init_read_gz() and only catch this one where it’s
expected. Well formed archives do not end inside a header.
Philipp Gesang [Fri, 28 Apr 2017 08:28:29 +0000]
remove unused state variable
“internal_pos” which is only written to and never read was
introduced with this commit:
commit
85737f48c38a432f2429e9e3e4b81fed164c4b9a
Author: Eduardo Robles Elvira <edulix@wadobo.com>
Date: Fri Jul 5 11:50:43 2013 +0200
extracting files in r#gz mode now works too, includes unit tests
and lacks a raison d’être ever since.
Philipp Gesang [Thu, 27 Apr 2017 15:18:31 +0000]
use append mode in symlink unit test
These tests currently fail despite using the original Gzipfile
compression path. The archives appear to overwrite instead the
passed archive instead of writing new objects.
Philipp Gesang [Thu, 27 Apr 2017 14:03:46 +0000]
fix multivol compression handling
Philipp Gesang [Tue, 25 Apr 2017 14:38:12 +0000]
handle uncompressed, encrypted archives with tarfile
Internally, tarfile.py uses “tar” to refer to uncompressed
archives, so just handle this accordingly at the API level.
Philipp Gesang [Tue, 25 Apr 2017 13:28:09 +0000]
sync on .close() for unencrypted archives
Philipp Gesang [Tue, 25 Apr 2017 12:03:36 +0000]
properly (re-) initialize compressor at archive / volume bounds
For unencrypted streams, the compressor still must be reset in
concat mode. For encrypted streams, the decompressor can be
initialized right at the start of the archive since no further
inputs are needed.
Philipp Gesang [Tue, 25 Apr 2017 09:13:17 +0000]
keep separate encryptor and decryptor contexts in deltatar.py
The same Deltatar object appears to function as handle for
reading and writing files simultaneously. To support this,
introduce two different crypto contexts that are created
on demand.
Philipp Gesang [Mon, 24 Apr 2017 13:04:38 +0000]
properly restart compression when encrypting
Separate finalization of a zlib block from creation of a new one.
Otherwise, we end up with trailing data from the last object that
lingers in the write buffer and gets flushed to the archive after
the next encrypted object has been initialized.
Also get rid of the “new_compression_block” wrapper which
needlessly complicated things.
Special precautions must be taken for the PAX format. Due to its
requirement of a global archive header, TarFile will write to the
stream prior to initialization that is performed in addfile().
Thus, initialize compression before the PAX header is being
written and properly restart compression for the first object
committed to the archive or volume.
Philipp Gesang [Mon, 24 Apr 2017 10:06:46 +0000]
use crypto.py to split test archive in test_encryption.py
This again verifies individual decryptability of objects in the
PDT archive.
Philipp Gesang [Mon, 24 Apr 2017 09:37:23 +0000]
implement passthrough mode in crypto.py
When invoked with --no-decrypt, write object headers and
ciphertext to output. Combined with --split this allows
extracting encrypted objects from the archive.
Philipp Gesang [Mon, 24 Apr 2017 09:23:53 +0000]
implement split mode for CLI encryption
Philipp Gesang [Fri, 21 Apr 2017 16:03:17 +0000]
explicitly disable gz initalization for _Stream’s used in aux files
The process of writing an auxiliary (index, info) file differs
drastically from that of tar archives: Since files are not added
individually, the encryption must be initialized externally and
the compression layer cannot rely on being enable in the ctor
because, obviously, the latter is executed before the manual
encryption setup can be performed.
Extend the API of the _Stream ctor with a parameter to “noinit”
to request that all initialization be postponed until the
encryption has been set up. This seems to do the trick but is
quite ugly.
Philipp Gesang [Fri, 21 Apr 2017 13:23:49 +0000]
fix decompression error handling
This seems to be a copy&paste oversight from
commit
be60ffd0598fec172eccb69f3c6a8433af6fb643
Author: Eduardo Robles Elvira <edulix@wadobo.com>
Date: Mon Nov 4 08:50:55 2013 +0100
initial port to python 3, not finished
which added the per-compression mode exceptions but not the
actual handling code.
Philipp Gesang [Fri, 21 Apr 2017 13:21:06 +0000]
fix index file encryption handling
Philipp Gesang [Fri, 21 Apr 2017 07:57:05 +0000]
convert test_deltatar to revised crypto
Philipp Gesang [Fri, 21 Apr 2017 07:39:02 +0000]
permit setting crypto.py parameter version via deltatar ctor
Introduce an optional argument to request a specific crypto
parameter version when invoking Deltatar. This isn’t of much
use ATM since only the one version is implemented, but it’s
handy for testing nonetheless.
Philipp Gesang [Fri, 21 Apr 2017 07:16:49 +0000]
eliminate the last traces of encryption “modes”
Since encryption handling has been moved outside of tarfile.py
these no longer apply. Thus remove all references so they don’t
obscure problems in the unit tests.
Philipp Gesang [Thu, 20 Apr 2017 15:53:49 +0000]
initialize compression globally for non-“concat” archives
Philipp Gesang [Thu, 20 Apr 2017 15:26:04 +0000]
remove obsolete unittests for 256 bit AES
Philipp Gesang [Thu, 20 Apr 2017 14:53:06 +0000]
pass encryption context to deltatar volume handlers
Philipp Gesang [Tue, 18 Apr 2017 13:04:07 +0000]
rework encryption unittests
Philipp Gesang [Thu, 20 Apr 2017 09:34:35 +0000]
fix compression handling on volume bounds
The old “concat compression” simply relied on the _Stream() ctor
to create a new zip block which is no longer possible since the
prerequisite encryption is only available when the first object
is committed to the archive.
Hence, reintroduce the new block initialization after
transitioning to the new volume.
Philipp Gesang [Tue, 18 Apr 2017 14:07:29 +0000]
add strict IV validation to decryption handler
Optionally (on CLI, with the “-s” flag) check for additional IV
properties:
- Accidental reuse: in GCM, the same IV used more than once
means that the plaintext is compromised.
- Unstructured archive: In the headers of a normal PDT
encrypted archive, the variable parts of the IVs are
consecutive unless the fixed part changes.
Philipp Gesang [Tue, 18 Apr 2017 09:59:02 +0000]
allow selecting individual tests with runtests.py
If arguments are passed on the command line, interpret them as
test names and attempt to compose a suite comprising only the
tests specified.
The behavior remains the same if invoked without argument.
Philipp Gesang [Mon, 10 Apr 2017 15:37:33 +0000]
improve error handling in crypto handler
Since invalid tags are some the most important bits of
information to be passed down, make the corresponding error
message human-readable.
Philipp Gesang [Mon, 10 Apr 2017 14:32:50 +0000]
remove obsolete block size check
Philipp Gesang [Mon, 10 Apr 2017 13:08:31 +0000]
fix fallout from EOF changes in CLI decryptor
Philipp Gesang [Mon, 10 Apr 2017 13:01:56 +0000]
throw error on partial header reading stream
Throw the EOF exception only if the stream ends exactly at an
object boundary. Otherwise, when less then sizeof(hdr) bytes
are returned from read(), throw InvalidHeader to indicate a
malformed file. This keeps EOF a “benign” exception.
Philipp Gesang [Mon, 10 Apr 2017 11:43:53 +0000]
communicate remainder to caller when hitting EOF from crypto
Philipp Gesang [Mon, 10 Apr 2017 09:43:06 +0000]
strip extraneous parameters from decryption handler ctor
Format and parameter version as well as the salt are supplied
from the headers. Decrypting should thus only require the
password and, depending on context, an explicit counter as well
as the list of valid IV fixed parts.
Philipp Gesang [Mon, 10 Apr 2017 09:15:09 +0000]
add input checks at API boundaries
Verify conformance of user-supplied inputs on a very basic level,
communicating violations via InvalidParameter exception.
Of course due to the limitations of the type systems these can’t
be made exhaustive. E. g. no effort is being made to inspect a
(passing) list or dict test for well-formed contents.
Philipp Gesang [Mon, 10 Apr 2017 08:36:12 +0000]
document exceptions used in encryption handler
Prepare clear and rigorous communication of errors and other
unexpected conditions to the user. Eventually these will make
the foundation for messages propagating up the stack until they
reach the UI.
Philipp Gesang [Mon, 10 Apr 2017 08:27:47 +0000]
use exception to communicate tag mismatch
Philipp Gesang [Mon, 10 Apr 2017 08:13:02 +0000]
unify error and parameter handling in crypto.py
Three classes of errors:
- bad encryption (tag mismatch, bad IVs);
- bad user input (request info counter twice);
- internal error (state was reached that indicates a problem
with crypto.py).
Philipp Gesang [Mon, 10 Apr 2017 07:34:32 +0000]
remove obsolete tag handling functionality from crypto.py
The GCM tag does no longer occur independent of a PDT header so
these are no longer relevant.
Philipp Gesang [Fri, 7 Apr 2017 15:49:35 +0000]
fix search string in tar volume generation
Philipp Gesang [Fri, 7 Apr 2017 14:59:19 +0000]
use OSError instead of IOError
IOError is a synonym for OSError, so the latter should be used
everywhere to avoid confusion, especially when throwing the one
but catching the other.
Cf. PEP 3151.
Philipp Gesang [Fri, 7 Apr 2017 09:15:00 +0000]
allow test_compression_level.py as module
Philipp Gesang [Thu, 6 Apr 2017 15:56:09 +0000]
rework crypto.py unittests for revised encryption
Main changes:
- Adjust usage to revised encryption handler.
- Adapt to header format.
- Adjust to changes in error passing (above all ``hdr_read()``).
- Remove Scrypt or tag tests, these interfaces are no longer
available.
Philipp Gesang [Fri, 7 Apr 2017 07:29:27 +0000]
specify salt and version in ctor when encrypting
Simplify the signature of Encrypt.next() by removing the salt and
version arguments: This will make the encryptor reuse the values
it already has which was either passed to or randomly generated
by the ctor. Currently there is no case where we’d need to change
the salt or version during encryption. When decrypting, the
values from the headers are used anyways so nothing changes over
there.
Philipp Gesang [Thu, 6 Apr 2017 15:54:35 +0000]
increment file counter after handling current object
Philipp Gesang [Thu, 6 Apr 2017 15:06:05 +0000]
fix IV fixed part validation on decryption
Philipp Gesang [Thu, 6 Apr 2017 14:54:33 +0000]
parse buffer as header if passed as arg to next()
Philipp Gesang [Thu, 6 Apr 2017 13:42:15 +0000]
adapt concat_compress unit tests to gzip block sequence
The unit tests assume that compression of three files requires
three distinct Gzip blocks. The first one of these is empty and
serves no purpose, differing from the others by containing the
more or less redundant archive name. This is no longer the case
after the revision of the header code: the first block will still
have the archive name in the metadata but also contain the first
file.
Thus, adapt the unit tests to no longer check for and then ignore
the empty initial gzip block.
Philipp Gesang [Thu, 6 Apr 2017 13:09:10 +0000]
prefer symbolic constants over literals referring to gzip header
The mixed use of hex and octal is pretty confusing to say the
least, use named constants instead that are defined only in
tarfile.
Philipp Gesang [Thu, 6 Apr 2017 12:36:26 +0000]
fix tarfile crypto parameter passing
Remove obsolete parameters like “password” that are no longer
meaningful after moving the creation of the crypto context
outside of tarfile.py.
Also, check test the presence of encryption attributes before
accessing them to avoid conflicts with zlib streams. (Kludgy, but
not avoidable without a larger changes due to the possibility of
“fileobj” being anything, including things that don’t satisfy all
the interfaces that “_Stream” provides.
Philipp Gesang [Thu, 6 Apr 2017 12:34:20 +0000]
accept external counter in crypto.py
Required when encrypting an auxiliary file of type info.
Philipp Gesang [Wed, 5 Apr 2017 06:49:50 +0000]
unify constant naming I2N_→PDTCRYPT_
Philipp Gesang [Tue, 4 Apr 2017 15:35:04 +0000]
unify file extension handling
The required extension no longer depends on the “[index]mode”
parameter since the encryption context is handled independently.
Add a function pick_extension() that reflects this circumstance
and appends the required suffixed depending on three inputs.
Philipp Gesang [Tue, 4 Apr 2017 08:00:17 +0000]
catch ESPIPE from ftell () on stream
The result is only used for status output so defaulting to -1 for
stdin doesn’t hurt. All functional uses of the current position
rely on the value of total_read.
Philipp Gesang [Tue, 4 Apr 2017 07:43:23 +0000]
improve parameter handling of crypto.py
Philipp Gesang [Tue, 4 Apr 2017 06:54:31 +0000]
allow decryption from std{in,out} via crypto.py
Make it possible to invoke the script as
$ ./crypto.py test1234 - - <bfull-2017-04-04-0856-001.tar.pdtcrypt >out.tar.gz
for extra convenience.
Philipp Gesang [Mon, 3 Apr 2017 15:32:44 +0000]
handle zlib correctly
Encryption was split from the “compression mode” and now depends
entirely on the crypto context. Also only finalize a tar stream
being “closed” if the underlying file is being closed.
Philipp Gesang [Mon, 3 Apr 2017 14:53:46 +0000]
delay index file write until backup is complete
Due to restrictions of the file counting in the encryption
module, files must be handled in a strictly sequential manner.
Thus, postpone the creation of and all writes to the index file
until after all other files have been processed.
Philipp Gesang [Mon, 3 Apr 2017 13:56:25 +0000]
fix file offset calculation
Move the seek-back code down into the “low level file” wrapper so
header writes aren’t counted. This way byte counters match again.
Philipp Gesang [Mon, 3 Apr 2017 11:54:54 +0000]
add simple decryption routine to crypto.py
Currently this allows decrypting (and only decrypting) a backup
volume without requiring the deltatar layer in between.
Philipp Gesang [Fri, 31 Mar 2017 14:37:43 +0000]
rename open_index to open_auxiliary_file
We need a more neutral name since the functionality is accessed
elsewhere to write the info file too.
Philipp Gesang [Fri, 31 Mar 2017 13:56:11 +0000]
prevent the empty string as password
backup.py would default to using the empty string as password
which would cause a crypto context to be created even without
encryption being required.
Use ``None`` instead to indicate absence of a user-supplied
password.
Philipp Gesang [Fri, 31 Mar 2017 13:35:53 +0000]
actually output something in toc mode
Until now backup.py wouldn’t print anything when passed the -l
flag. Supply a trivial printer to the “list_func” argument to
output the contents similar to what GNU tar does.
Philipp Gesang [Fri, 31 Mar 2017 13:01:20 +0000]
pass encryption context to tarfile
This supersedes the individual parameters.
Philipp Gesang [Fri, 31 Mar 2017 12:31:19 +0000]
do not require parameter version with each encrypted object
When encrypting, stick to the parameter version specified in the
ctor. Despite the format allowing for on-the-fly adjustment of
encryption parameters, there is no real world scenario yet in
which this might be desired. Thus, remove this prerequisite as it
only encumbers the signature of ``.next()`` with cruft.
Philipp Gesang [Tue, 28 Mar 2017 15:44:31 +0000]
first draft for making the encryption layer independent
WIP.
In Deltatar, we cannot use the ctor itself to set up the
encryption because it is neutral wrt. reading / writing.
Only once one of the entry points:
- ``.list_backup()``,
- ``.create_full_backup()``, ``.create_diff_backup()``, or
- ``.restore_backup()``.
are invoked do we know what the object’s intended use is.
Thus we hook the encryption handler somewhere in there.
Philipp Gesang [Tue, 28 Mar 2017 13:12:04 +0000]
change extension for encrypted files
Philipp Gesang [Tue, 28 Mar 2017 12:28:06 +0000]
return collected fixed iv parts from .close() when encrypting
Provisional implementation for dumping the IVs in the info file
that will be superseded once the crypto context can be provided
to tarfile by the user.
Philipp Gesang [Mon, 27 Mar 2017 15:43:27 +0000]
automate iv fixed-part generation
The crypto context keeps track of the used IV fixed parts
so they can eventually be included in the info file. A new
fixed part is created in the ctor, then for every time the
counter wraps.
“Wrapping” resets the counter to 2 since 1 is globally reserved
for the info file.
Philipp Gesang [Mon, 27 Mar 2017 15:22:13 +0000]
properly enter/leave encryption sections
Philipp Gesang [Mon, 27 Mar 2017 14:06:18 +0000]
explicitly constuct zlib headers
Get rid of the unnecessary literal byte strings. Commit
5fdff89f4d9fa28e6b210d40d389680072651eb7
introduced headers for additional blocks, omitting the “original
file name” field that gzip set by default.
Philipp Gesang [Mon, 27 Mar 2017 12:10:51 +0000]
unify zlib initialization
Philipp Gesang [Mon, 27 Mar 2017 09:11:00 +0000]
apply compression if compressor is present
Philipp Gesang [Fri, 24 Mar 2017 13:46:59 +0000]
overhaul pre- and post-crypto sync
In order to handle ``_Stream.close()`` well later, we need to
write the header of the last object before the tar info is
injected. To allow the padding and zlib finalization in .close(),
this cannot be performed when the actual file contents are
written but has to be suspended until we are certain no data will
be written to the current crypto object.
Philipp Gesang [Fri, 24 Mar 2017 11:20:45 +0000]
implement encryption passthrough mode
Philipp Gesang [Fri, 24 Mar 2017 10:27:45 +0000]
unify common operations between encryption and decryption
Philipp Gesang [Fri, 24 Mar 2017 10:12:29 +0000]
adjust versioned encryption parameters
Prepare for revised versioning: Both the encryption mechanism and
the KDF may be specified in the versioning.
Philipp Gesang [Thu, 23 Mar 2017 10:48:59 +0000]
extend open_index() API for info file handling
In fact, backup_python’s “info file” is just another “index file”
to deltatar. Conceptually they’re quite different though
especially regarding encryption. To allow requesting an info
flavored index file, add a parameter to communicate with the
crypto layer.
Philipp Gesang [Thu, 23 Mar 2017 09:43:25 +0000]
start payload encryption counter at 2
As per the spec, a file counter of 1 is reserved for the info
file.
Philipp Gesang [Tue, 21 Mar 2017 14:32:32 +0000]
track encryption state