Joffrey F [Tue, 27 Feb 2018 00:02:21 +0000]
bpo-32713: Fix tarfile.itn for large/negative float values. (GH-5434)
Import the tarfile changes from commit:
commit
72d9b2be36f091793ae7ffc5ad751f040c6e6ad3
Author: Joffrey F <f.joffrey@gmail.com>
Date: Mon Feb 26 16:02:21 2018 -0800
bpo-32713: Fix tarfile.itn for large/negative float values. (GH-5434)
which add a safeguard against the type level volatility of
os.stat_result which will return a float instead of an int if e.
g. st_mtime becomes negative due to the timezone.
Thomas Jarosch [Tue, 28 Apr 2020 10:02:54 +0000]
Increase version to 2.2
Thomas Jarosch [Tue, 5 May 2020 09:06:34 +0000]
Merge branch 'crypto-review'
Crypto auditor is happy with our changes.
Philipp Gesang [Thu, 23 Apr 2020 11:53:55 +0000]
add support for PDT encryption to run_benchmark.py
The script was still expecting the encryption to be handled in
the pre-crypto.py fashion.
Paths need some preparing as well so we can inject a deltatar
from arbitrary locations; the $DELTATAR environment variable
is used for this.
Philipp Gesang [Thu, 23 Apr 2020 07:48:31 +0000]
fix usage info in run_benchmark
Philipp Gesang [Wed, 22 Apr 2020 10:01:38 +0000]
add utility for profiling memory usage of encryption
Philipp Gesang [Tue, 21 Apr 2020 14:23:39 +0000]
document future transition to AES-GCM-SIV as possibility
Address Deltatar audit item 2.6/1: miscellaneous suggestions
It would be great if we could add support for SIV but there are
no immediate plans for doing so.
Philipp Gesang [Mon, 20 Apr 2020 13:10:54 +0000]
state the consequences of disabling integrity checks in rescue and recovery mode
Address Deltatar audit item 2.4: exploitability of disaster
recovery
Clarify that recovery mode can enable an attacker to recover
plaintext of encrypted backups under certain conditions and
should only be used as a last resort.
Philipp Gesang [Mon, 20 Apr 2020 10:21:16 +0000]
allow checking PDTCRYPT archives for IV integrity with crypto.py
This allows checking the IVs of objects in the given file for
uniqueness and consecutiveness without decrypting them.
Example:
$ crypto.py ivcheck -i - <bfull-2020-04-20-1133-001.tar.gz.pdtcrypt
PDT: Successfully traversed 411 encrypted objects in input.
PDT:
PDT: All IVs consecutive and unique.
Philipp Gesang [Fri, 17 Apr 2020 09:27:48 +0000]
enable strict IV checking by default during decryption
Address Deltatar audit item 2.3: Ciphertexts can be re-ordered or
dropped when decryption is non-linear
Set *decryption* contexts to validate IVs unless the user opts out.
This requires sprinkling options for non-strict behavior wherever
a decryption context might be created, and disabling the strict
checking for APIs intended for use with possibly corrupted inputs.
During a normal *encryption* when only a single Encrypt handle is
used, the IVs are guaranteed unique. Strict tracking of IVs is
only necessary in testing and when working with multiple Encrypt
handles so used IVs are to be compared across handlers
afterwards. Thus we need not enable the ``strict_ivs'' option for
Encrypt handles.
Philipp Gesang [Thu, 16 Apr 2020 15:40:08 +0000]
clarify possible IV reuse with multiple Encrypt handles
Address Deltatar audit item 2.2: Unsafe to create more than 2^32
instances of Encrypt using the same key and salt
Extend the documentation section on handling of IVs with a
paragraph concerning multiple Encrypt handles. Also add a method
to obtain the list of IVs for checks performed by the caller.
Philipp Gesang [Thu, 16 Apr 2020 13:43:36 +0000]
deny insecure parameters by default
Address Deltatar audit item 2.1: Violating data integrity using
passthrough mode.
The ciphertext is assumed attacker controlled. This allows a
downgrade to plaintext by setting the parameter version to
plaintext passthrough on an encrypted object, bypassing integrity
protection. Reduce the susceptibility to this attack by refusing
to continue if such a header is encountered in the input.
Passthrough mode is still available if the *insecure* flag is
passed to the constructors of the encryption / decryption
handles.
Philipp Gesang [Thu, 16 Apr 2020 08:10:01 +0000]
skip exclusive branch
Of the two conditions only one can ever apply, so skip evaluating
the second one if the first branch was taken.
Thomas Jarosch [Tue, 4 Feb 2020 13:19:16 +0000]
Increase version to 2.1
Philipp Gesang [Wed, 29 Jan 2020 09:02:00 +0000]
python 3.7 support: Track cpython removal of re._pattern_type
Another sneaky backward incompatibility introduced by Python 3.7.
https://github.com/python/cpython/pull/1646
Thomas Jarosch [Tue, 4 Feb 2020 13:15:33 +0000]
Merge branch 'tarfile-unlink'
Philipp Gesang [Mon, 3 Feb 2020 10:03:48 +0000]
unit test unlink protection for tarfile
Add a separate test series that uses Tarfile directly, bypassing
DeltaTar because the latter does not allow fine-grained control
over what mitigations are active.
Philipp Gesang [Mon, 3 Feb 2020 09:09:48 +0000]
fix check for symlink path in unittest
os.path.exists() will return false in the event that it is passed
a broken symlink which defeats the point of testing whether the
link itself exists. use path.lexist() instead which properly
checks with O_NOFOLLOW.
Philipp Gesang [Fri, 28 Oct 2016 15:02:31 +0000]
add unlink-before-extract behavior for tarfile
Implement optional removal of existing files analogous to the -U
option of GNU tar and bsdtar. This is an effective measure
against symlink attacks which tarfile.py is not capable of
mitigating.
Signed-off-by: Philipp Gesang <philipp.gesang@intra2net.com>
Thomas Jarosch [Sat, 1 Feb 2020 14:38:05 +0000]
Increase version to 2.0
Major feature: New AES-GCM based crypto layer.
Thomas Jarosch [Sat, 1 Feb 2020 14:26:48 +0000]
Merge branch 'bug-6440_open-race'
Philipp Gesang [Thu, 5 Jul 2018 09:57:27 +0000]
fix incorrect error handling in deltatar
Account for the fact that the os library converts return values
to exceptions ... It is not documented whether os.open() can ever
return negative values at all.
Philipp Gesang [Thu, 5 Jul 2018 09:16:28 +0000]
extend fs access test to compress and crypto variant
Philipp Gesang [Thu, 5 Jul 2018 08:54:31 +0000]
guard call to stat() against ENOENT
Move the implicit call to stat(2) into the ENOENT-safe block we
just introduced.
Found by unit testing.
Philipp Gesang [Thu, 5 Jul 2018 08:50:23 +0000]
add unit test for mishandling access(3)
Replace os.access() with a dummy to check deltatar behavior
against racey file system checks.
Philipp Gesang [Thu, 5 Jul 2018 06:53:06 +0000]
drop os.exists() before makedirs()
os.makedirs() has proper EEXIST handling builtin. Checking with
``os.exists()`` is both redundant and inherently racy.
Philipp Gesang [Wed, 4 Jul 2018 08:29:09 +0000]
fix access race when traversing the filesystem
Related to issue #6440.
Yet another race caused by mishandling of access(3); fix by
acquiring an fd to the directory and use that for iteration.
Handle ENOENT around open(2) and throw out the call to
os.path.exists().
Philipp Gesang [Wed, 4 Jul 2018 08:05:57 +0000]
fix crash on unaccessible input files
Fix i2n bug #6440: race condition between access(3) and open(2).
This flaw exists in both the full and diff backup code. Deltatar
must not assume that the verdict returned by access() will hold
true later. Emit a warning if we receive ENOENT on later calls to
open(), but continue regardless.
Thomas Jarosch [Sat, 1 Feb 2020 14:10:48 +0000]
Merge branch 'crypto'. Yes!
Philipp Gesang [Thu, 30 Jan 2020 15:36:14 +0000]
treat binary data as binary data
The script ``file_crypt.py'' was opening both the source and the
sink in ``text mode'' which already messes with the input, then
encoding the result as UTF-8 encoded strings before writing
it back. Needless to say this was corrupting data all over the
place as soon as the payload contained non-ASCII bytes.
Philipp Gesang [Thu, 30 Jan 2020 15:00:33 +0000]
remove unused argument
Different encryption modes vanished with the revised crypto that
only implements AES-GCM.
Philipp Gesang [Thu, 30 Jan 2020 13:28:06 +0000]
unit test detection of non-consecutive ivs
The detection works as expected except for a bug in the exception
formatting.
Philipp Gesang [Thu, 30 Jan 2020 09:41:31 +0000]
guard invocations of tar from interactive mode
On failure, tests may cause GNU tar to prompt for user input like
for example volume names etc. which may interrupt the test run.
Prevent that by passing /dev/null wherever we shell out to tar.
Philipp Gesang [Wed, 29 Jan 2020 15:57:57 +0000]
fix resource leaks building recovery index
Python 3.7 now emits warnings about possible resource leaks
of which deltatar provokes plenty. The main culprit here is
manual resource management of file handles in face of early
returns by exception.
Philipp Gesang [Wed, 29 Jan 2020 14:27:17 +0000]
validate data lengths against value in header
When decrypting we need to ensure that the decryptor will not
blindly accept data past the size that was specified by the
current object's header. Note that processing trailing data would
fail eventually when the GCM tag is checked; this check just
catches that case earlier.
Philipp Gesang [Wed, 29 Jan 2020 10:53:15 +0000]
fix incorrect unit test description
The process() method signals the caller that the maximum data length
has been reached by returning a ciphertext that is shorter than
the plaintext. The caller must then continue by finalizing the
current object and starting a new one.
The behavior in this case was changed in
cb7a3911f8 to not
propagate the exception from Cryptography, but the unit test
retained the erroneous description.
Philipp Gesang [Wed, 29 Jan 2020 10:35:56 +0000]
improve unit test SNR
Philipp Gesang [Wed, 29 Jan 2020 10:03:54 +0000]
turn API-mandated no-op into assertion
Make it explicit that there cannot actually be a rest data when
finalizing an encrypted object. The Cryptography API mandates
that the caller handle the remainder of the data on finalization.
By virtue of being a stream cipher, the AES-GCM encoder always
returns the exact number of bytes that it was given so
technically the rest is meaningless.
Philipp Gesang [Wed, 29 Jan 2020 09:28:34 +0000]
account for one-tuple return
For some reason, feeding ``(0,)'' into os.read() as the length
argument fails with newer versions of python. Fix this by
unpacking the single element ``tuple'' before using it.
Thomas Jarosch [Mon, 27 Jan 2020 17:21:50 +0000]
Document source of the AES GCM size limit
Also verified our bit left shift operations match the numbers.
Thomas Jarosch [Mon, 27 Jan 2020 16:51:43 +0000]
Mention sibling UT for the test_crypto_aes_gcm_enc_length_cap() test
Thomas Jarosch [Mon, 27 Jan 2020 16:27:35 +0000]
Fix up docstring of test_create_backup_index_max_file_length()
Thomas Jarosch [Mon, 27 Jan 2020 16:05:18 +0000]
Extend unit test for larger than GCM encrytion size
Add a few corner cases in addition to the existing test:
* MAX_SIZE-1
* MAX_SIZE
* 2*MAX_SIZE + 1
Thomas Jarosch [Mon, 27 Jan 2020 15:56:53 +0000]
Fix indentation of 'else' block
Right now this affects the debug output only.
Thomas Jarosch [Mon, 27 Jan 2020 15:21:27 +0000]
Fix up check_consecutive_iv()
It was using the wrong variable to store the previously used IV.
Thomas Jarosch [Mon, 27 Jan 2020 15:20:24 +0000]
Clarify two functions are meant to be used by desaster recovery
Thomas Jarosch [Mon, 27 Jan 2020 15:09:03 +0000]
Remove upstream GCM issue that was resolved
Thomas Jarosch [Mon, 27 Jan 2020 15:07:25 +0000]
Remove outdated author information
Thomas Jarosch [Mon, 27 Jan 2020 15:04:16 +0000]
Change unit tests to check for expected exception
It would be an error if the expected exception is not thrown
f.e. for re-use of IVs.
Thomas Jarosch [Sat, 25 Jan 2020 18:38:50 +0000]
Fix position of [-c] handler in usage
It won't work the other way around.
Thomas Jarosch [Sat, 25 Jan 2020 18:25:06 +0000]
Fix wrong script name in usage help
Thomas Jarosch [Sat, 25 Jan 2020 18:08:49 +0000]
Increase version to 1.6
Thomas Jarosch [Sat, 25 Jan 2020 17:48:29 +0000]
Fix use of wrong 'i' variable name
pylint complained:
tarfile.py:3797:38: E0602: Undefined variable 'i' (undefined-variable)
Code was introduced like this in
commit
27ee4dd4df48340541317123f5df348056c235ca
Author: Philipp Gesang <philipp.gesang@intra2net.com>
AuthorDate: Tue Aug 29 12:00:54 2017 +0200
implement volume handling for rescue mode
When reconstructing the index, traverse backup volumes and set
the “volume” member on the objects appropriately.
-> I guess 'i' was renamed to nvol for better readability.
Thomas Jarosch [Sat, 25 Jan 2020 17:40:53 +0000]
Fix errno.ESPIPE error handler
pylint complained:
crypto.py:1632:28: E1101: Module 'os' has no 'errno' member (no-member)
Thomas Jarosch [Sat, 25 Jan 2020 17:37:59 +0000]
Add missing iv fixed part in error output
pylint complained:
crypto.py:1357:36: E1306: Not enough arguments for format string (too-few-format-args)
Thomas Jarosch [Sat, 25 Jan 2020 17:30:58 +0000]
Fix 'pw' variable name in error handling
pylint complained:
crypto.py:840:40: E0602: Undefined variable 'password' (undefined-variable)
Thomas Jarosch [Tue, 3 Apr 2018 05:50:48 +0000]
Documentation improvements
Philipp Gesang [Thu, 31 Aug 2017 12:30:03 +0000]
remove development tweak for test runner
This essentially reverts
commit
406e0fa86d97f912b50689d6b080c2aee69eef86
Author: Philipp Gesang <philipp.gesang@intra2net.com>
Date: Tue Apr 18 11:59:02 2017 +0200
allow selecting individual tests with runtests.py
Tests and files for all imported test classes may still take some
moments but we don’t need to skip them any longer.
Philipp Gesang [Thu, 31 Aug 2017 11:37:10 +0000]
handle problems with incomplete gzip headers
Throw the appropriate exn to signal EOF or malformed data
conditions when tentatively parsing GZip headers.
Philipp Gesang [Thu, 31 Aug 2017 09:41:50 +0000]
add tests for truncated files
Philipp Gesang [Thu, 31 Aug 2017 08:57:17 +0000]
describe corruption mechanisms and their function in testing
Philipp Gesang [Wed, 30 Aug 2017 07:47:16 +0000]
skip some unittests on older python versions
Philipp Gesang [Tue, 29 Aug 2017 15:19:04 +0000]
test multivol index reconstruct with hole and header corruption
Philipp Gesang [Tue, 29 Aug 2017 14:59:14 +0000]
extend index reconstruct tests for multivol backups
Philipp Gesang [Tue, 29 Aug 2017 12:45:31 +0000]
lift block alignment requirement for tar archive rescue
It is unlikely that damaged archives have correctly aligned tar
headers. Thus we need to check each header-like section whether
it contains the right magic and the checksum matches. Objects
without a correct checksum (which spans the better part of the
header) are discarded similar to what file(1) does.
Philipp Gesang [Tue, 29 Aug 2017 10:00:54 +0000]
implement volume handling for rescue mode
When reconstructing the index, traverse backup volumes and set
the “volume” member on the objects appropriately.
Philipp Gesang [Tue, 29 Aug 2017 08:39:22 +0000]
implement leading garbage test
Implemented a rescue test since offsets won’t match in this
scenario.
Philipp Gesang [Tue, 29 Aug 2017 07:53:46 +0000]
include description of object validation with crypto.py scan mode
Example output for a second object with a corrupt byte in the
size field:
PDT: obj 1: read payload @64
PDT: · version = 1 : 0100
PDT: · paramversion = 1 : 0100
PDT: · nacl : 1dc1 154a 5405 ef5e df81 173f 2821 7a0c
PDT: · iv : 7cae 452a a05b 5182 0300 0000
PDT: · ctsize = 230 : e600 0000 0000 0000
PDT: · tag : 42c0 8774 3309 88eb 0e1a 71dc 8fd9 80c1
PDT: 0 → ✓ valid object 64–294
PDT: 294 → EOF inside object (358≤5312627≤
1095216701872); adjusting size to 5312269
PDT: obj 2: read payload @358
PDT: · version = 1 : 0100
PDT: · paramversion = 1 : 0100
PDT: · nacl : 1dc1 154a 5405 ef5e df81 173f 2821 7a0c
PDT: · iv : 7cae 452a a05b 5182 0400 0000
PDT: · ctsize = 5312269 : 0d0f 5100 0000 0000
PDT: · tag : 5946 dbcf 41b9 ac7e 4729 9e09 46c7 3388
PDT: GCM tag mismatch for object 358–5312627
PDT: 294 → × fishy object 358–5312627, corrupt header
Philipp Gesang [Mon, 28 Aug 2017 15:59:42 +0000]
add unit test for borked ciphertext size
Philipp Gesang [Mon, 28 Aug 2017 15:56:33 +0000]
allow for detecting overlapping objects with tarfile
Philipp Gesang [Mon, 28 Aug 2017 13:29:41 +0000]
detect overlapping objects
The CLI will run one additional pass to determine whether objects
overlap one another. Overlap might indicate bad headers or gaps
in the file (object offsets shifted).
Philipp Gesang [Mon, 28 Aug 2017 07:45:44 +0000]
adjust post-conditions for GZ[,AES]/rescue unit test
Philipp Gesang [Mon, 28 Aug 2017 07:17:41 +0000]
make crypto.py CLI accept hex-encoded keys again
Also handle decoding of those keys at the same level as base64
encoded ones.
Philipp Gesang [Fri, 25 Aug 2017 15:41:21 +0000]
use real new volume handler during rescue
With the dummy we end up with a nil object instead of a tarinfo
at the end of the volume. Reinstating the actual handler is
harmless and produces a valid info object again.
Philipp Gesang [Fri, 25 Aug 2017 15:03:37 +0000]
prevent tarobject iteration in disaster mode
The tarfile iterator relies on the header data to determine the
next object offset which may be wrong for corrupt files. Instead,
skip that iteration step and completely rely on the object
offsets determined during index rebuild.
Philipp Gesang [Fri, 25 Aug 2017 12:23:17 +0000]
implement tolerant GNU tar header parser
When skimming a file for tar objects, only consider the GNU
header magic and whether the blocks are aligned.
Philipp Gesang [Fri, 25 Aug 2017 09:12:39 +0000]
add restore helper handling for reconstructed indices
Philipp Gesang [Fri, 25 Aug 2017 08:27:39 +0000]
add iterator mode for reconstructed index
Philipp Gesang [Fri, 25 Aug 2017 07:33:49 +0000]
unify construction of secret values
Philipp Gesang [Thu, 24 Aug 2017 15:24:36 +0000]
implement tolerant gz header parser
Since they assume a stream object, we cannot rely on the original
tarfile GZ handling. Add a “tolerant” one according to the format
spec that notices malformed or unexpected (in Deltatar context)
values, but glosses over them if they do not necessarily impact
the readability of the object.
Also use the new symbolic constants in the existing GZ reader
instead of magic numbers.
Philipp Gesang [Thu, 24 Aug 2017 14:48:32 +0000]
ignore GCM tag mismatch in scan mode
Header info is assumed unreliable during rescue so a tag mismatch
must not result in a bad object.
Philipp Gesang [Thu, 24 Aug 2017 11:21:30 +0000]
convert TarInfo to index format
Philipp Gesang [Thu, 24 Aug 2017 09:57:53 +0000]
read tar objects at predetermined offsets for rescue index
Leverage the tarobj to read the object headers at the determined
offsets. Currently only implemented for encrypted archives whose
offsets are located with *crypto.py*.
Philipp Gesang [Thu, 24 Aug 2017 09:56:14 +0000]
add test skeleton for corrupt index reconstruction
Starting with an intact backup set.
Philipp Gesang [Wed, 23 Aug 2017 08:49:36 +0000]
draft rescue mode through all layers
The strategy is for rescue mode to reconstruct the relevant [*]
information from the index by inspecting the passed tar object,
then continue from there. On the crypto side, this boils down to
a streamlined (and silent) version of the “scan” mode. The
tarfile side is still WIP.
[*] Omitting the useless parts like inode number.
Philipp Gesang [Tue, 22 Aug 2017 15:17:04 +0000]
derive test skeleton for disaster rescue mode
Philipp Gesang [Tue, 22 Aug 2017 15:06:41 +0000]
implement dump mode for tolerant decryption
Utilize the safe dirfd based implementation from split mode to
write extracted objects to a target directory.
Philipp Gesang [Tue, 22 Aug 2017 13:30:15 +0000]
extend tarfile API for rescue mode
Philipp Gesang [Tue, 22 Aug 2017 11:29:45 +0000]
implement decryption for tolerant mode
Not possible to reuse the existing CLI decryption since we’re
operating with fds in scan mode.
Philipp Gesang [Tue, 22 Aug 2017 09:59:35 +0000]
attempt to process candidate objects in scan mode
Philipp Gesang [Tue, 22 Aug 2017 08:25:21 +0000]
print list of header candidates
Philipp Gesang [Tue, 15 Aug 2017 15:37:12 +0000]
implement PDTCRYPT header scanning
First phase: collect all possible header start locations.
Adds a CLI subcommand “scan” to crypto.py for analyzing files.
Philipp Gesang [Tue, 15 Aug 2017 14:54:30 +0000]
test corruption by tearing a hole in a volume
Philipp Gesang [Tue, 15 Aug 2017 13:38:17 +0000]
add test corrupting an entire volume
Zero out the first volume: None of the content can be restored.
This includes the file extending from the first into the second
volume.
Philipp Gesang [Tue, 15 Aug 2017 12:42:35 +0000]
use symbolic constant for errno
Philipp Gesang [Tue, 15 Aug 2017 09:14:01 +0000]
clarify index read failure
Instead of erroring out with an exception, make --restore emit an
error message indicating that something is wrong with the index.
Philipp Gesang [Tue, 15 Aug 2017 08:31:15 +0000]
do not discard valid data in buffers when in tolerant mode
Both decryption and decompression will fail on the first error
and ignore any results of earlier passes. In normal operation,
the hard failures are desirable to indicate a bad backup set.
However, in tolerant / recovery mode the error handling is closer
to the opposite extreme: we want to retrieve every last byte that
made it through the various layers and only skip over the parts
that cannot be interpreted at all.
Philipp Gesang [Mon, 14 Aug 2017 15:24:56 +0000]
catch bad parameter version in header
Philipp Gesang [Mon, 14 Aug 2017 14:04:53 +0000]
reject bad index files with a meaningful error
Philipp Gesang [Mon, 14 Aug 2017 13:10:44 +0000]
add brief description of disaster recovery