6 ===============================================================================
7 crypto -- Encryption Layer for the Deltatar Backup
8 ===============================================================================
12 - AES-GCM for the symmetric encryption;
17 - NIST Recommendation for Block Cipher Modes of Operation: Galois/Counter
19 http://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-38d.pdf
22 https://cryptome.org/2014/01/aes-gcm-v1.pdf
24 - Authentication weaknesses in GCM
25 http://csrc.nist.gov/groups/ST/toolkit/BCM/documents/comments/CWC-GCM/Ferguson2.pdf
27 Trouble with python-cryptography packages: authentication tags can only be
28 passed in advance: https://github.com/pyca/cryptography/pull/3421
31 -------------------------------------------------------------------------------
33 Errors fall into roughly three categories:
35 - Cryptographical errors or invalid data.
37 - ``InvalidGCMTag`` (decryption failed on account of an invalid GCM
39 - ``InvalidIVFixedPart`` (IV fixed part of object not found in list),
40 - ``DuplicateIV`` (the IV of an encrypted object already occurred),
41 - ``DecryptionError`` (used in CLI decryption for presenting error
42 conditions to the user).
44 - Incorrect usage of the library.
46 - ``InvalidParameter`` (non-conforming user supplied parameter),
47 - ``InvalidHeader`` (data passed for reading not parsable into header),
48 - ``FormatError`` (cannot handle header or parameter version),
51 - Bad internal state. If one of these is encountered it means that a state
52 was reached that shouldn’t occur during normal processing.
57 Also, ``EndOfFile`` is used as a sentinel to communicate that a stream supplied
58 for reading is exhausted.
60 Initialization Vectors
61 -------------------------------------------------------------------------------
63 Initialization vectors are checked reuse during the lifetime of a decryptor.
64 The fixed counters for metadata files cannot be reused and attempts to do so
65 will cause a DuplicateIV error. This means the length of objects encrypted with
66 a metadata counter is capped at 63 GB.
68 For ordinary, non-metadata payload, there is an optional mode with strict IV
69 checking that causes a crypto context to fail if an IV encountered or created
70 was already used for decrypting or encrypting, respectively, an earlier object.
71 Note that this mode can trigger false positives when decrypting non-linearly,
72 e. g. when traversing the same object multiple times. Since the crypto context
73 has no notion of a position in a PDT encrypted archive, this condition must be
74 sorted out downstream.
77 -------------------------------------------------------------------------------
79 ``crypto.py`` may be invoked as a script for decrypting, validating, and
80 splitting PDT encrypted files. Consult the usage message for details.
84 Decrypt from stdin using the password ‘foo’: ::
86 $ crypto.py process foo -i - -o - <some-file.tar.gz.pdtcrypt >some-file.tar.gz
88 Output verbose information about the encrypted objects in the archive: ::
90 $ crypto.py process foo -v -i some-file.tar.gz.pdtcrypt -o /dev/null
91 PDT: decrypt from some-file.tar.gz.pdtcrypt
92 PDT: decrypt to /dev/null
93 PDT: source: file some-file.tar.gz.pdtcrypt
94 PDT: sink: file /dev/null
96 PDT: · version = 1 : 0100
97 PDT: · paramversion = 1 : 0100
98 PDT: · nacl : d270 b031 00d1 87e2 c946 610d 7b7f 7e5f
99 PDT: · iv : 02ee 3dd7 a963 1eb1 0100 0000
100 PDT: · ctsize = 591 : 4f02 0000 0000 0000
101 PDT: · tag : 5b2d 6d8b 8f82 4842 12fd 0b10 b6e3 369b
102 PDT: 64 decrypt obj no. 1, 591 B
103 PDT: · [64] 0% done, read block (591 B of 591 B remaining)
104 PDT: · decrypt ciphertext 591 B
105 PDT: · decrypt plaintext 591 B
109 Also, the mode *scrypt* allows deriving encryption keys. To calculate the
110 encryption key from the password ‘foo’ and the salt of the first object in a
111 PDT encrypted file: ::
113 $ crypto.py scrypt foo -i some-file.pdtcrypt
114 {"paramversion": 1, "salt": "Cqzbk48e3peEjzWto8D0yA==", "key": "JH9EkMwaM4x9F5aim5gK/Q=="}
116 The computed 16 byte key is given in hexadecimal notation in the value to
117 ``hash`` and can be fed into Python’s ``binascii.unhexlify()`` to obtain the
118 corresponding binary representation.
120 Note that in Scrypt hashing mode, no data integrity checks are being performed.
121 If the wrong password is given, a wrong key will be derived. Whether the password
122 was indeed correct can only be determined by decrypting. Note that since PDT
123 archives essentially consist of a stream of independent objects, the salt and
124 other parameters may change. Thus a key derived using above method from the
125 first object doesn’t necessarily apply to any of the subsequent objects.
134 from functools import reduce, partial
144 except ImportError as exn:
147 if __name__ == "__main__": ## Work around the import mechanism lest Python’s
148 pwd = os.getcwd() ## preference for local imports causes a cyclical
149 ## import (crypto → pylibscrypt → […] → ./tarfile → crypto).
150 sys.path = [ p for p in sys.path if p.find ("deltatar") < 0 ]
153 from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes
154 from cryptography.hazmat.backends import default_backend
158 __all__ = [ "hdr_make", "hdr_read", "hdr_fmt", "hdr_fmt_pretty"
160 , "PDTCRYPT_HDR_SIZE", "AES_GCM_IV_CNT_DATA"
161 , "AES_GCM_IV_CNT_INFOFILE", "AES_GCM_IV_CNT_INDEX"
165 ###############################################################################
167 ###############################################################################
169 class EndOfFile (Exception):
173 def __init__ (self, n=None, msg=None):
179 class InvalidParameter (Exception):
180 """Inputs not valid for PDT encryption."""
184 class InvalidHeader (Exception):
185 """Header not valid."""
189 class InvalidGCMTag (Exception):
191 The GCM tag calculated during decryption differs from that in the object
197 class InvalidIVFixedPart (Exception):
199 IV fixed part not in supplied list: either the backup is corrupt or the
200 current object does not belong to it.
205 class IVFixedPartError (Exception):
207 Error creating a unique IV fixed part: repeated calls to system RNG yielded
208 the same sequence of bytes as the last IV used.
213 class InvalidFileCounter (Exception):
215 When encrypting, an attempted reuse of a dedicated counter (info file,
216 index file) was caught.
221 class DuplicateIV (Exception):
223 During encryption, the current IV fixed part is identical to an already
224 existing IV (same prefix and file counter). This indicates tampering or
225 programmer error and cannot be recovered from.
230 class NonConsecutiveIV (Exception):
232 IVs not numbered consecutively. This is a hard error with strict IV
233 checking. Precludes random access to the encrypted objects.
238 class FormatError (Exception):
239 """Unusable parameters in header."""
243 class DecryptionError (Exception):
244 """Error during decryption with ``crypto.py`` on the command line."""
248 class Unreachable (Exception):
250 Makeshift __builtin_unreachable(); always a programmer error if
256 class InternalError (Exception):
257 """Errors not ascribable to bad user inputs or cryptography."""
261 ###############################################################################
262 ## crypto layer version
263 ###############################################################################
265 ENCRYPTION_PARAMETERS = \
267 { "kdf": ("dummy", 16)
268 , "enc": "passthrough" }
276 , "enc": "aes-gcm" } }
278 ###############################################################################
280 ###############################################################################
282 PDTCRYPT_HDR_MAGIC = b"PDTCRYPT"
284 PDTCRYPT_HDR_SIZE_MAGIC = 8 # 8
285 PDTCRYPT_HDR_SIZE_VERSION = 2 # 10
286 PDTCRYPT_HDR_SIZE_PARAMVERSION = 2 # 12
287 PDTCRYPT_HDR_SIZE_NACL = 16 # 28
288 PDTCRYPT_HDR_SIZE_IV = 12 # 40
289 PDTCRYPT_HDR_SIZE_CTSIZE = 8 # 48
290 PDTCRYPT_HDR_SIZE_TAG = 16 # 64 GCM auth tag
292 PDTCRYPT_HDR_SIZE = PDTCRYPT_HDR_SIZE_MAGIC + PDTCRYPT_HDR_SIZE_VERSION \
293 + PDTCRYPT_HDR_SIZE_PARAMVERSION + PDTCRYPT_HDR_SIZE_NACL \
294 + PDTCRYPT_HDR_SIZE_IV + PDTCRYPT_HDR_SIZE_CTSIZE \
295 + PDTCRYPT_HDR_SIZE_TAG # = 64
297 # precalculate offsets since Python can’t do constant folding over names
298 HDR_OFF_VERSION = PDTCRYPT_HDR_SIZE_MAGIC
299 HDR_OFF_PARAMVERSION = HDR_OFF_VERSION + PDTCRYPT_HDR_SIZE_VERSION
300 HDR_OFF_NACL = HDR_OFF_PARAMVERSION + PDTCRYPT_HDR_SIZE_PARAMVERSION
301 HDR_OFF_IV = HDR_OFF_NACL + PDTCRYPT_HDR_SIZE_NACL
302 HDR_OFF_CTSIZE = HDR_OFF_IV + PDTCRYPT_HDR_SIZE_IV
303 HDR_OFF_TAG = HDR_OFF_CTSIZE + PDTCRYPT_HDR_SIZE_CTSIZE
307 FMT_I2N_IV = "<8sL" # 8 random bytes ‖ 32 bit counter
308 FMT_I2N_HDR = ("<" # host byte order
312 "16s" # sodium chloride
318 AES_KEY_SIZE = 16 # b"0123456789abcdef"
319 AES_KEY_SIZE_B64 = 24 # b'MDEyMzQ1Njc4OWFiY2RlZg=='
320 AES_GCM_MAX_SIZE = (1 << 36) - (1 << 5) # 2^39 - 2^8 b ≅ 64 GB
321 PDTCRYPT_MAX_OBJ_SIZE_DEFAULT = 63 * (1 << 30) # 63 GB
322 PDTCRYPT_MAX_OBJ_SIZE = PDTCRYPT_MAX_OBJ_SIZE_DEFAULT
324 # index and info files are written on-the fly while encrypting so their
325 # counters must be available inadvance
326 AES_GCM_IV_CNT_INFOFILE = 1 # constant
327 AES_GCM_IV_CNT_INDEX = AES_GCM_IV_CNT_INFOFILE + 1
328 AES_GCM_IV_CNT_DATA = AES_GCM_IV_CNT_INDEX + 1 # also for multivolume
329 AES_GCM_IV_CNT_MAX_DEFAULT = 0xffFFffFF
330 AES_GCM_IV_CNT_MAX = AES_GCM_IV_CNT_MAX_DEFAULT
332 # IV structure and generation
333 PDTCRYPT_IV_GEN_MAX_RETRIES = 10 # ×
334 PDTCRYPT_IV_FIXEDPART_SIZE = 8 # B
335 PDTCRYPT_IV_COUNTER_SIZE = 4 # B
337 # secret type: PW of string | KEY of char [16]
338 PDTCRYPT_SECRET_PW = 0
339 PDTCRYPT_SECRET_KEY = 1
341 ###############################################################################
343 ###############################################################################
349 # , paramversion : u16
355 # fn hdr_read (f : handle) -> hdrinfo;
356 # fn hdr_make (f : handle, h : hdrinfo) -> IOResult<usize>;
357 # fn hdr_fmt (h : hdrinfo) -> String;
362 Read bytes as header structure.
364 If the input could not be interpreted as a header, fail with
369 mag, version, paramversion, nacl, iv, ctsize, tag = \
370 struct.unpack (FMT_I2N_HDR, data)
371 except Exception as exn:
372 raise InvalidHeader ("error unpacking header from [%r]: %s"
373 % (binascii.hexlify (data), str (exn)))
375 if mag != PDTCRYPT_HDR_MAGIC:
376 raise InvalidHeader ("bad magic in header: expected [%s], got [%s]"
377 % (PDTCRYPT_HDR_MAGIC, mag))
380 { "version" : version
381 , "paramversion" : paramversion
389 def hdr_read_stream (instr):
391 Read header from stream at the current position.
393 Fail with ``InvalidHeader`` if insufficient bytes were read from the
394 stream, or if the content could not be interpreted as a header.
396 data = instr.read(PDTCRYPT_HDR_SIZE)
400 elif ldata != PDTCRYPT_HDR_SIZE:
401 raise InvalidHeader ("hdr_read_stream: expected %d B, received %d B"
402 % (PDTCRYPT_HDR_SIZE, ldata))
403 return hdr_read (data)
406 def hdr_from_params (version, paramversion, nacl, iv, ctsize, tag):
408 Assemble the necessary values into a PDTCRYPT header.
410 :type version: int to fit uint16_t
411 :type paramversion: int to fit uint16_t
412 :type nacl: bytes to fit uint8_t[16]
413 :type iv: bytes to fit uint8_t[12]
414 :type size: int to fit uint64_t
415 :type tag: bytes to fit uint8_t[16]
417 buf = bytearray (PDTCRYPT_HDR_SIZE)
418 bufv = memoryview (buf)
421 struct.pack_into (FMT_I2N_HDR, bufv, 0,
423 version, paramversion, nacl, iv, ctsize, tag)
424 except Exception as exn:
425 return False, "error assembling header: %s" % str (exn)
427 return True, bytes (buf)
430 def hdr_make_dummy (s):
432 Create a header sized block of bytes initialized to a value derived from a
433 string. Used to verify we’ve jumped back correctly to the actual position
434 of the object header.
436 c = reduce (lambda a, c: a + ord(c), s, 0) % 0xFF
437 return bytes (bytearray (struct.pack ("B", c)) * PDTCRYPT_HDR_SIZE)
442 Assemble a header from the given header structure.
444 return hdr_from_params (version=hdr.get("version"),
445 paramversion=hdr.get("paramversion"),
446 nacl=hdr.get("nacl"), iv=hdr.get("iv"),
447 ctsize=hdr.get("ctsize"), tag=hdr.get("tag"))
450 HDR_FMT = "I2n_header { version: %d, paramversion: %d, nacl: %s[%d]," \
451 " iv: %s[%d], ctsize: %d, tag: %s[%d] }"
454 """Format a header structure into readable output."""
455 return HDR_FMT % (h["version"], h["paramversion"],
456 binascii.hexlify (h["nacl"]), len(h["nacl"]),
457 binascii.hexlify (h["iv"]), len(h["iv"]),
459 binascii.hexlify (h["tag"]), len(h["tag"]))
462 def hex_spaced_of_bytes (b):
463 """Format bytes object, hexdump style."""
464 return " ".join ([ "%.2x%.2x" % (c1, c2)
465 for c1, c2 in zip (b[0::2], b[1::2]) ]) \
466 + (len (b) | 1 == len (b) and " %.2x" % b[-1] or "") # odd lengths
469 def hdr_iv_counter (h):
470 """Extract the variable part of the IV of the given header."""
471 _fixed, cnt = struct.unpack (FMT_I2N_IV, h ["iv"])
475 def hdr_iv_fixed (h):
476 """Extract the fixed part of the IV of the given header."""
477 fixed, _cnt = struct.unpack (FMT_I2N_IV, h ["iv"])
481 hdr_dump = hex_spaced_of_bytes
485 """version = %-4d : %s
486 paramversion = %-4d : %s
493 def hdr_fmt_pretty (h):
495 Format header structure into multi-line representation of its contents and
496 their raw representation. (Omit the implicit “PDTCRYPT” magic bytes that
497 precede every header.)
499 return HDR_FMT_PRETTY \
501 hex_spaced_of_bytes (struct.pack (FMT_UINT16_LE, h["version"])),
503 hex_spaced_of_bytes (struct.pack (FMT_UINT16_LE, h["paramversion"])),
504 hex_spaced_of_bytes (h["nacl"]),
505 hex_spaced_of_bytes (h["iv"]),
507 hex_spaced_of_bytes (struct.pack (FMT_UINT64_LE, h["ctsize"])),
508 hex_spaced_of_bytes (h["tag"]))
510 IV_FMT = "((f %s) (c %d))"
513 """Format the two components of an IV in a readable fashion."""
514 fixed, cnt = struct.unpack (FMT_I2N_IV, iv)
515 return IV_FMT % (binascii.hexlify (fixed), cnt)
518 ###############################################################################
520 ###############################################################################
522 class Location (object):
526 def restore_loc_fmt (loc):
528 % (loc.n, loc.offset)
530 def locate_hdr_candidates (fd):
532 Walk over instances of the magic string in the payload, collecting their
533 positions. If the offset of the first found instance is not zero, the file
534 begins with leading garbage.
536 :return: The list of offsets in the file.
540 mm = mmap.mmap(fd, 0, mmap.MAP_SHARED, mmap.PROT_READ)
543 pos = mm.find (PDTCRYPT_HDR_MAGIC, pos)
552 HDR_CAND_GOOD = 0 # header marks begin of valid object
553 HDR_CAND_FISHY = 1 # inconclusive (tag mismatch, obj overlap etc.)
554 HDR_CAND_JUNK = 2 # not a header / object unreadable
557 { HDR_CAND_GOOD : "valid"
558 , HDR_CAND_FISHY : "fishy"
559 , HDR_CAND_JUNK : "junk"
563 def verdict_fmt (vdt):
564 return HDR_VERDICT_NAME [vdt]
567 def inspect_hdr (fd, off):
569 Attempt to parse a header in *fd* at position *off*.
571 Returns a verdict about the quality of that header plus the parsed header
575 _ = os.lseek (fd, off, os.SEEK_SET)
577 if os.lseek (fd, 0, os.SEEK_CUR) != off:
578 if PDTCRYPT_VERBOSE is True:
579 noise ("PDT: %d → dismissed (lseek() past EOF)" % off)
580 return HDR_CAND_JUNK, None
582 raw = os.read (fd, PDTCRYPT_HDR_SIZE)
583 if len (raw) != PDTCRYPT_HDR_SIZE:
584 if PDTCRYPT_VERBOSE is True:
585 noise ("PDT: %d → dismissed (EOF inside header)" % off)
586 return HDR_CAND_JUNK, None
590 except InvalidHeader as exn:
591 if PDTCRYPT_VERBOSE is True:
592 noise ("PDT: %d → dismissed (invalid: [%s])" % (off, str (exn)))
593 return HDR_CAND_JUNK, None
595 obj0 = off + PDTCRYPT_HDR_SIZE
596 objX = obj0 + hdr ["ctsize"]
598 eof = os.lseek (fd, 0, os.SEEK_END)
600 if PDTCRYPT_VERBOSE is True:
601 noise ("PDT: %d → EOF inside object (%d≤%d≤%d); adjusting size to "
602 "%d" % (off, obj0, eof, objX, (eof - obj0)))
603 # try reading up to the end
604 hdr ["ctsize"] = eof - obj0
605 return HDR_CAND_FISHY, hdr
607 return HDR_CAND_GOOD, hdr
610 def try_decrypt (ifd, off, hdr, secret, ofd=-1):
612 Attempt to decrypt the object in the (seekable) descriptor *ifd* starting
613 at *off* using the metadata in *hdr* and *secret*. An output fd can be
614 specified with *ofd*; if it is *-1* – the default –, the decrypted payload
617 Always creates a fresh decryptor, so validation steps across objects don’t
620 Errors during GCM tag validation are ignored.
622 ctleft = hdr ["ctsize"]
626 if ks == PDTCRYPT_SECRET_PW:
627 decr = Decrypt (password=secret [1])
628 elif ks == PDTCRYPT_SECRET_KEY:
630 decr = Decrypt (key=key)
637 os.lseek (ifd, pos, os.SEEK_SET)
640 cnksiz = min (ctleft, PDTCRYPT_BLOCKSIZE)
641 cnk = os.read (ifd, cnksiz)
644 pt = decr.process (cnk)
649 except InvalidGCMTag:
650 noise ("PDT: GCM tag mismatch for object %d–%d"
651 % (off, off + hdr ["ctsize"]))
652 if len (pt) > 0 and ofd != -1:
655 except Exception as exn:
656 noise ("PDT: error decrypting object %d–%d@%d, %d B remaining [%s]"
657 % (off, off + hdr ["ctsize"], pos, ctleft, exn))
663 def readable_objects_offsets (ifd, secret, cands):
665 From a list of candidates, locate the ones that mark the start of actual
666 readable PDTCRYPT objects.
670 for i, cand in enumerate (cands):
671 vdt, hdr = inspect_hdr (ifd, cand)
672 if vdt == HDR_CAND_JUNK:
673 pass # ignore unreadable ones
674 elif vdt in [HDR_CAND_GOOD, HDR_CAND_FISHY]:
675 ctsize = hdr ["ctsize"]
676 off0 = cand + PDTCRYPT_HDR_SIZE
677 ok = try_decrypt (ifd, off0, hdr, secret) == ctsize
679 good.append ((cand, off0 + ctsize))
681 overlap = find_overlaps (good)
683 return [ g [0] for g in good ]
686 def reconstruct_offsets (fname, secret):
687 ifd = os.open (fname, os.O_RDONLY)
690 cands = locate_hdr_candidates (ifd)
691 return readable_objects_offsets (ifd, secret, cands)
696 ###############################################################################
698 ###############################################################################
700 def make_secret (password=None, key=None):
702 Safely create a “secret” value that consists either of a key or a password.
703 Inputs are validated: the password is accepted as (UTF-8 encoded) bytes or
704 string; for the key only a bytes object of the proper size or a base64
705 encoded string thereof is accepted.
707 If both are provided, the key is preferred over the password; no checks are
708 performed whether the key is derived from the password.
710 :returns: secret value if inputs were acceptable | None otherwise.
713 if isinstance (key, str) is True:
714 key = key.encode ("utf-8")
715 if isinstance (key, bytes) is True:
716 if len (key) == AES_KEY_SIZE:
717 return (PDTCRYPT_SECRET_KEY, key)
718 if len (key) == AES_KEY_SIZE * 2:
720 key = binascii.unhexlify (key)
721 return (PDTCRYPT_SECRET_KEY, key)
722 except binascii.Error: # garbage in string
724 if len (key) == AES_KEY_SIZE_B64:
726 key = base64.b64decode (key)
727 # the base64 processor is very tolerant and allows for
728 # arbitrary trailing and leading data thus the data obtained
729 # must be checked for the proper length
730 if len (key) == AES_KEY_SIZE:
731 return (PDTCRYPT_SECRET_KEY, key)
732 except binascii.Error: # “incorrect padding”
734 elif password is not None:
735 if isinstance (password, str) is True:
736 return (PDTCRYPT_SECRET_PW, password)
737 elif isinstance (password, bytes) is True:
739 password = password.decode ("utf-8")
740 return (PDTCRYPT_SECRET_PW, password)
741 except UnicodeDecodeError:
747 ###############################################################################
748 ## passthrough / null encryption
749 ###############################################################################
751 class PassthroughCipher (object):
753 tag = struct.pack ("<QQ", 0, 0)
755 def __init__ (self) : pass
757 def update (self, b) : return b
759 def finalize (self) : return b""
761 def finalize_with_tag (self, _) : return b""
763 ###############################################################################
764 ## convenience wrapper
765 ###############################################################################
768 def kdf_dummy (klen, password, _nacl):
770 Fake KDF for testing purposes that is called when parameter version zero is
773 q, r = divmod (klen, len (password))
774 if isinstance (password, bytes) is False:
775 password = password.encode ()
776 return password * q + password [:r], b""
779 SCRYPT_KEY_MEMO = { } # static because needed for both the info file and the archive
782 def kdf_scrypt (params, password, nacl):
784 Wrapper for the Scrypt KDF, corresponds to parameter version one. The
785 computation result is memoized based on the inputs to facilitate spawning
786 multiple encryption contexts.
791 dkLen = params["dkLen"]
794 nacl = os.urandom (params["NaCl_LEN"])
796 key_parms = (password, nacl, N, r, p, dkLen)
797 global SCRYPT_KEY_MEMO
798 if key_parms not in SCRYPT_KEY_MEMO:
799 SCRYPT_KEY_MEMO [key_parms] = \
800 pylibscrypt.scrypt (password, nacl, N, r, p, dkLen)
801 return SCRYPT_KEY_MEMO [key_parms], nacl
804 def kdf_by_version (paramversion=None, defs=None):
806 Pick the KDF handler corresponding to the parameter version or the
809 :rtype: function (password : str, nacl : str) -> str
811 if paramversion is not None:
812 defs = ENCRYPTION_PARAMETERS.get(paramversion, None)
814 raise InvalidParameter ("no encryption parameters for version %r"
816 (kdf, params) = defs["kdf"]
818 if kdf == "scrypt" : fn = kdf_scrypt
819 if kdf == "dummy" : fn = kdf_dummy
821 raise ValueError ("key derivation method %r unknown" % kdf)
822 return partial (fn, params)
825 ###############################################################################
827 ###############################################################################
829 def scrypt_hashsource (pw, ins):
831 Calculate the SCRYPT hash from the password and the information contained
832 in the first header found in ``ins``.
834 This does not validate whether the first object is encrypted correctly.
836 if isinstance (pw, str) is True:
838 elif isinstance (pw, bytes) is False:
839 raise InvalidParameter ("password must be a string, not %s"
841 if isinstance (ins, io.BufferedReader) is False and \
842 isinstance (ins, io.FileIO) is False:
843 raise InvalidParameter ("file to hash must be opened in “binary” mode")
846 hdr = hdr_read_stream (ins)
847 except EndOfFile as exn:
848 noise ("PDT: malformed input: end of file reading first object header")
853 pver = hdr ["paramversion"]
854 if PDTCRYPT_VERBOSE is True:
855 noise ("PDT: salt of first object : %s" % binascii.hexlify (nacl))
856 noise ("PDT: parameter version of archive : %d" % pver)
859 defs = ENCRYPTION_PARAMETERS.get(pver, None)
860 kdfname, params = defs ["kdf"]
861 if kdfname != "scrypt":
862 noise ("PDT: input is not an SCRYPT archive")
865 kdf = kdf_by_version (None, defs)
866 except ValueError as exn:
867 noise ("PDT: object has unknown parameter version %d" % pver)
869 hsh, _void = kdf (pw, nacl)
871 return hsh, nacl, hdr ["version"], pver
874 def scrypt_hashfile (pw, fname):
876 Calculate the SCRYPT hash from the password and the information contained
877 in the first header found in the given file. The header is read only at
880 with deptdcrypt_mk_stream (PDTCRYPT_SOURCE, fname or "-") as ins:
881 hsh, _void, _void, _void = scrypt_hashsource (pw, ins)
885 ###############################################################################
887 ###############################################################################
889 class Crypto (object):
891 Encryption context to remain alive throughout an entire tarfile pass.
896 cnt = None # file counter (uint32_t != 0)
897 iv = None # current IV
898 fixed = None # accu for 64 bit fixed parts of IV
899 used_ivs = None # tracks IVs
900 strict_ivs = False # if True, panic on duplicate object IV
909 info_counter_used = False
910 index_counter_used = False
912 def __init__ (self, *al, **akv):
913 self.used_ivs = set ()
914 self.set_parameters (*al, **akv)
917 def next_fixed (self):
922 def set_object_counter (self, cnt=None):
924 Safely set the internal counter of encrypted objects. Numerous
927 The same counter may not be reused in combination with one IV fixed
928 part. This is validated elsewhere in the IV handling.
930 Counter zero is invalid. The first two counters are reserved for
931 metadata. The implementation does not allow for splitting metadata
932 files over multiple encrypted objects. (This would be possible by
933 assigning new fixed parts.) Thus in a Deltatar backup there is at most
934 one object with a counter value of one and two. On creation of a
935 context, the initial counter may be chosen. The globals
936 ``AES_GCM_IV_CNT_INFOFILE`` and ``AES_GCM_IV_CNT_INDEX`` can be used to
937 request one of the reserved values. If one of these values has been
938 used, any further attempt of setting the counter to that value will
939 be rejected with an ``InvalidFileCounter`` exception.
941 Out of bounds values (i. e. below one and more than the maximum of 2³²)
942 cause an ``InvalidParameter`` exception to be thrown.
945 self.cnt = AES_GCM_IV_CNT_DATA
947 if cnt == 0 or cnt > AES_GCM_IV_CNT_MAX + 1:
948 raise InvalidParameter ("invalid counter value %d requested: "
949 "acceptable values are from 1 to %d"
950 % (cnt, AES_GCM_IV_CNT_MAX))
951 if cnt == AES_GCM_IV_CNT_INFOFILE:
952 if self.info_counter_used is True:
953 raise InvalidFileCounter ("attempted to reuse info file "
954 "counter %d: must be unique" % cnt)
955 self.info_counter_used = True
956 elif cnt == AES_GCM_IV_CNT_INDEX:
957 if self.index_counter_used is True:
958 raise InvalidFileCounter ("attempted to reuse index file "
959 " counter %d: must be unique" % cnt)
960 self.index_counter_used = True
961 if cnt <= AES_GCM_IV_CNT_MAX:
964 # cnt == AES_GCM_IV_CNT_MAX + 1 → wrap
965 self.cnt = AES_GCM_IV_CNT_DATA
969 def set_parameters (self, password=None, key=None, paramversion=None,
970 nacl=None, counter=None, strict_ivs=False):
972 Configure the internal state of a crypto context. Not intended for
976 self.set_object_counter (counter)
977 self.strict_ivs = strict_ivs
979 if paramversion is not None:
980 self.paramversion = paramversion
983 self.key, self.nacl = key, nacl
986 if password is not None:
987 if isinstance (password, bytes) is False:
988 password = str.encode (password)
989 self.password = password
990 if paramversion is None and nacl is None:
991 # postpone key setup until first header is available
993 kdf = kdf_by_version (paramversion)
995 self.key, self.nacl = kdf (password, nacl)
998 def process (self, buf):
1000 Encrypt / decrypt a buffer. Invokes the ``.update()`` method on the
1001 wrapped encryptor or decryptor, respectively.
1003 The Cryptography exception ``AlreadyFinalized`` is translated to an
1004 ``InternalError`` at this point. It may occur in sound code when the GC
1005 closes an encrypting stream after an error. Everywhere else it must be
1008 if self.enc is None:
1009 raise RuntimeError ("process: context not initialized")
1010 self.stats ["in"] += len (buf)
1012 out = self.enc.update (buf)
1013 except cryptography.exceptions.AlreadyFinalized as exn:
1014 raise InternalError (exn)
1015 self.stats ["out"] += len (out)
1019 def next (self, password, paramversion, nacl, iv):
1021 Prepare for encrypting another object: Reset the data counters and
1022 change the configuration in case one of the variable parameters differs
1023 from the last object. Also check the IV for duplicates and error out
1024 if strict checking was requested.
1028 self.stats ["obj"] += 1
1030 self.check_duplicate_iv (iv)
1032 if ( self.paramversion != paramversion
1033 or self.password != password
1034 or self.nacl != nacl):
1035 self.set_parameters (password=password, paramversion=paramversion,
1036 nacl=nacl, strict_ivs=self.strict_ivs)
1039 def check_duplicate_iv (self, iv):
1041 Add an IV (the 12 byte representation as in the header) to the list. With
1042 strict checking enabled, this will throw a ``DuplicateIV``. Depending on
1043 the context, this may indicate a serious error (IV reuse).
1045 if self.strict_ivs is True and iv in self.used_ivs:
1046 raise DuplicateIV ("iv %s was reused" % iv_fmt (iv))
1047 # vi has not been used before; add to collection
1048 self.used_ivs.add (iv)
1051 def counters (self):
1053 Access the data counters.
1055 return self.stats ["obj"], self.stats ["in"], self.stats ["out"]
1060 Clear the current context regardless of its finalization state. The
1061 next operation must be ``.next()``.
1066 class Encrypt (Crypto):
1072 def __init__ (self, version, paramversion, password=None, key=None, nacl=None,
1073 counter=AES_GCM_IV_CNT_DATA, strict_ivs=True):
1075 The ctor will throw immediately if one of the parameters does not conform
1076 to our expectations.
1078 counter=AES_GCM_IV_CNT_DATA, strict_ivs=True):
1079 :type version: int to fit uint16_t
1080 :type paramversion: int to fit uint16_t
1081 :param password: mutually exclusive with ``key``
1082 :type password: bytes
1083 :param key: mutually exclusive with ``password``
1086 :type counter: initial object counter the values
1087 ``AES_GCM_IV_CNT_INFOFILE`` and
1088 ``AES_GCM_IV_CNT_INDEX`` are unique in each backup set
1089 and cannot be reused even with different fixed parts.
1090 :type strict_ivs: bool
1092 if password is None and key is None \
1093 or password is not None and key is not None :
1094 raise InvalidParameter ("__init__: need either key or password")
1097 if isinstance (key, bytes) is False:
1098 raise InvalidParameter ("__init__: key must be provided as "
1099 "bytes, not %s" % type (key))
1101 raise InvalidParameter ("__init__: salt must be provided along "
1102 "with encryption key")
1103 else: # password, no key
1104 if isinstance (password, str) is False:
1105 raise InvalidParameter ("__init__: password must be a string, not %s"
1107 if len (password) == 0:
1108 raise InvalidParameter ("__init__: supplied empty password but not "
1109 "permitted for PDT encrypted files")
1111 if isinstance (version, int) is False:
1112 raise InvalidParameter ("__init__: version number must be an "
1113 "integer, not %s" % type (version))
1115 raise InvalidParameter ("__init__: version number must be a "
1116 "nonnegative integer, not %d" % version)
1118 if isinstance (paramversion, int) is False:
1119 raise InvalidParameter ("__init__: crypto parameter version number "
1120 "must be an integer, not %s"
1121 % type (paramversion))
1122 if paramversion < 0:
1123 raise InvalidParameter ("__init__: crypto parameter version number "
1124 "must be a nonnegative integer, not %d"
1127 if nacl is not None:
1128 if isinstance (nacl, bytes) is False:
1129 raise InvalidParameter ("__init__: salt given, but of type %s "
1130 "instead of bytes" % type (nacl))
1131 # salt length would depend on the actual encryption so it can’t be
1132 # validated at this point
1134 self.version = version
1135 self.paramenc = ENCRYPTION_PARAMETERS.get (paramversion) ["enc"]
1137 super().__init__ (password, key, paramversion, nacl, counter=counter,
1138 strict_ivs=strict_ivs)
1141 def next_fixed (self, retries=PDTCRYPT_IV_GEN_MAX_RETRIES):
1143 Generate the next IV fixed part by reading eight bytes from
1144 ``/dev/urandom``. The buffer so obtained is tested against the fixed
1145 parts used so far to prevent accidental reuse of IVs. After a
1146 configurable number of attempts to create a unique fixed part, it will
1147 refuse to continue with an ``IVFixedPartError``. This is unlikely to
1148 ever happen on a normal system but may detect an issue with the random
1151 The list of fixed parts that were used by the context at hand can be
1152 accessed through the ``.fixed`` list. Its last element is the fixed
1153 part currently in use.
1157 fp = os.urandom (PDTCRYPT_IV_FIXEDPART_SIZE)
1158 if fp not in self.fixed:
1159 self.fixed.append (fp)
1162 raise IVFixedPartError ("error obtaining a unique IV fixed part from "
1163 "/dev/urandom; giving up after %d tries" % i)
1168 Construct a 12-bytes IV from the current fixed part and the object
1171 return struct.pack(FMT_I2N_IV, self.fixed [-1], self.cnt)
1174 def next (self, filename=None, counter=None):
1176 Prepare for encrypting the next incoming object. Update the counter
1177 and put together the IV, possibly changing prefixes. Then create the
1180 The argument ``counter`` can be used to specify a file counter for this
1181 object. Unless it is one of the reserved values, the counter of
1182 subsequent objects will be computed from this one.
1184 If this is the first object in a series, ``filename`` is required,
1185 otherwise it is reused if not present. The value is used to derive a
1186 header sized placeholder to use until after encryption when all the
1187 inputs to construct the final header are available. This is then
1188 matched in ``.done()`` against the value found at the position of the
1189 header. The motivation for this extra check is primarily to assist
1190 format debugging: It makes stray headers easy to spot in malformed
1193 if filename is None:
1194 if self.lastinfo is None:
1195 raise InvalidParameter ("next: filename is mandatory for "
1197 filename, _dummy = self.lastinfo
1199 if isinstance (filename, str) is False:
1200 raise InvalidParameter ("next: filename must be a string, no %s"
1202 if counter is not None:
1203 if isinstance (counter, int) is False:
1204 raise InvalidParameter ("next: the supplied counter is of "
1205 "invalid type %s; please pass an "
1206 "integer instead" % type (counter))
1207 self.set_object_counter (counter)
1209 self.iv = self.iv_make ()
1210 if self.paramenc == "aes-gcm":
1212 ( algorithms.AES (self.key)
1213 , modes.GCM (self.iv)
1214 , backend = default_backend ()) \
1216 elif self.paramenc == "passthrough":
1217 self.enc = PassthroughCipher ()
1219 raise InvalidParameter ("next: parameter version %d not known"
1220 % self.paramversion)
1221 hdrdum = hdr_make_dummy (filename)
1222 self.lastinfo = (filename, hdrdum)
1223 super().next (self.password, self.paramversion, self.nacl, self.iv)
1225 self.set_object_counter (self.cnt + 1)
1229 def done (self, cmpdata):
1231 Complete encryption of an object. After this has been called, attempts
1232 of encrypting further data will cause an error until ``.next()`` is
1235 Returns a 64 bytes buffer containing the object header including all
1236 values including the “late” ones e. g. the ciphertext size and the
1239 if isinstance (cmpdata, bytes) is False:
1240 raise InvalidParameter ("done: comparison input expected as bytes, "
1241 "not %s" % type (cmpdata))
1242 if self.lastinfo is None:
1243 raise RuntimeError ("done: encryption context not initialized")
1244 filename, hdrdum = self.lastinfo
1245 if cmpdata != hdrdum:
1246 raise RuntimeError ("done: bad sync of header for object %d: "
1247 "preliminary data does not match; this likely "
1248 "indicates a wrongly repositioned stream"
1250 data = self.enc.finalize ()
1251 self.stats ["out"] += len (data)
1252 self.ctsize += len (data)
1253 ok, hdr = hdr_from_params (self.version, self.paramversion, self.nacl,
1254 self.iv, self.ctsize, self.enc.tag)
1256 raise InternalError ("error constructing header: %r" % hdr)
1257 return data, hdr, self.fixed
1260 def process (self, buf):
1262 Encrypt a chunk of plaintext with the active encryptor. Returns the
1263 size of the input consumed. This **must** be checked downstream. If the
1264 maximum possible object size has been reached, the current context must
1265 be finalized and a new one established before any further data can be
1266 encrypted. The second argument is the remainder of the plaintext that
1267 was not encrypted for the caller to use immediately after the new
1270 if isinstance (buf, bytes) is False:
1271 raise InvalidParameter ("process: expected byte buffer, not %s"
1274 newptsize = self.ptsize + bsize
1275 diff = newptsize - PDTCRYPT_MAX_OBJ_SIZE
1278 newptsize = PDTCRYPT_MAX_OBJ_SIZE
1279 self.ptsize = newptsize
1280 data = super().process (buf [:bsize])
1281 self.ctsize += len (data)
1285 class Decrypt (Crypto):
1287 tag = None # GCM tag, part of header
1288 last_iv = None # check consecutive ivs in strict mode
1290 def __init__ (self, password=None, key=None, counter=None, fixedparts=None,
1293 Sanitizing ctor for the decryption context. ``fixedparts`` specifies a
1294 list of IV fixed parts accepted during decryption. If a fixed part is
1295 encountered that is not in the list, decryption will fail.
1297 :param password: mutually exclusive with ``key``
1298 :type password: bytes
1299 :param key: mutually exclusive with ``password``
1301 :type counter: initial object counter the values
1302 ``AES_GCM_IV_CNT_INFOFILE`` and
1303 ``AES_GCM_IV_CNT_INDEX`` are unique in each backup set
1304 and cannot be reused even with different fixed parts.
1305 :type fixedparts: bytes list
1307 if password is None and key is None \
1308 or password is not None and key is not None :
1309 raise InvalidParameter ("__init__: need either key or password")
1312 if isinstance (key, bytes) is False:
1313 raise InvalidParameter ("__init__: key must be provided as "
1314 "bytes, not %s" % type (key))
1315 else: # password, no key
1316 if isinstance (password, str) is False:
1317 raise InvalidParameter ("__init__: password must be a string, not %s"
1319 if len (password) == 0:
1320 raise InvalidParameter ("__init__: supplied empty password but not "
1321 "permitted for PDT encrypted files")
1323 if fixedparts is not None:
1324 if isinstance (fixedparts, list) is False:
1325 raise InvalidParameter ("__init__: IV fixed parts must be "
1326 "supplied as list, not %s"
1327 % type (fixedparts))
1328 self.fixed = fixedparts
1331 super().__init__ (password=password, key=key, counter=counter,
1332 strict_ivs=strict_ivs)
1335 def valid_fixed_part (self, iv):
1337 Check if a fixed part was already seen.
1339 # check if fixed part is known
1340 fixed, _cnt = struct.unpack (FMT_I2N_IV, iv)
1341 i = bisect.bisect_left (self.fixed, fixed)
1342 return i != len (self.fixed) and self.fixed [i] == fixed
1345 def check_consecutive_iv (self, iv):
1347 Check whether the counter part of the given IV is indeed the successor
1348 of the currently present counter. This should always be the case for
1349 the objects in a well formed PDT archive but should not be enforced
1350 when decrypting out-of-order.
1352 fixed, cnt = struct.unpack (FMT_I2N_IV, iv)
1353 if self.strict_ivs is True \
1354 and self.last_iv is not None \
1355 and self.last_iv [0] == fixed \
1356 and self.last_iv [1] != cnt - 1:
1357 raise NonConsecutiveIV ("iv %s counter not successor of "
1358 "last object (expected %d, found %d)"
1359 % (iv_fmt (self.last_iv [1]), cnt))
1360 self.last_iv = (iv, cnt)
1363 def next (self, hdr):
1365 Start decrypting the next object. The PDTCRYPT header for the object
1366 can be given either as already parsed object or as bytes.
1368 if isinstance (hdr, bytes) is True:
1369 hdr = hdr_read (hdr)
1370 elif isinstance (hdr, dict) is False:
1371 # this won’t catch malformed specs though
1372 raise InvalidParameter ("next: wrong type of parameter hdr: "
1373 "expected bytes or spec, got %s"
1376 paramversion = hdr ["paramversion"]
1381 raise InvalidHeader ("next: not a header %r" % hdr)
1383 super().next (self.password, paramversion, nacl, iv)
1384 if self.fixed is not None and self.valid_fixed_part (iv) is False:
1385 raise InvalidIVFixedPart ("iv %s has invalid fixed part"
1387 self.check_consecutive_iv (iv)
1390 defs = ENCRYPTION_PARAMETERS.get (paramversion, None)
1392 raise FormatError ("header contains unknown parameter version %d; "
1393 "maybe the file was created by a more recent "
1394 "version of Deltatar" % paramversion)
1396 if enc == "aes-gcm":
1398 ( algorithms.AES (self.key)
1399 , modes.GCM (iv, tag=self.tag)
1400 , backend = default_backend ()) \
1402 elif enc == "passthrough":
1403 self.enc = PassthroughCipher ()
1405 raise InternalError ("encryption parameter set %d refers to unknown "
1406 "mode %r" % (paramversion, enc))
1407 self.set_object_counter (self.cnt + 1)
1410 def done (self, tag=None):
1412 Stop decryption of the current object and finalize it with the active
1413 context. This will throw an *InvalidGCMTag* exception to indicate that
1414 the authentication tag does not match the data. If the tag is correct,
1415 the rest of the plaintext is returned.
1420 data = self.enc.finalize ()
1422 if isinstance (tag, bytes) is False:
1423 raise InvalidParameter ("done: wrong type of parameter "
1424 "tag: expected bytes, got %s"
1426 data = self.enc.finalize_with_tag (self.tag)
1427 except cryptography.exceptions.InvalidTag:
1428 raise InvalidGCMTag ("done: tag mismatch of object %d: %s "
1429 "rejected by finalize ()"
1430 % (self.cnt, binascii.hexlify (self.tag)))
1431 self.ctsize += len (data)
1432 self.stats ["out"] += len (data)
1436 def process (self, buf):
1438 Decrypt the bytes object *buf* with the active decryptor.
1440 if isinstance (buf, bytes) is False:
1441 raise InvalidParameter ("process: expected byte buffer, not %s"
1443 self.ctsize += len (buf)
1444 data = super().process (buf)
1445 self.ptsize += len (data)
1449 ###############################################################################
1451 ###############################################################################
1453 def _patch_global (glob, vow, n=None):
1455 Adapt upper file counter bound for testing IV logic. Completely unsafe.
1457 assert vow == "I am fully aware that this will void my warranty."
1458 r = globals () [glob]
1460 n = globals () [glob + "_DEFAULT"]
1461 globals () [glob] = n
1464 _testing_set_AES_GCM_IV_CNT_MAX = \
1465 partial (_patch_global, "AES_GCM_IV_CNT_MAX")
1467 _testing_set_PDTCRYPT_MAX_OBJ_SIZE = \
1468 partial (_patch_global, "PDTCRYPT_MAX_OBJ_SIZE")
1470 def open2_dump_file (fname, dir_fd, force=False):
1473 oflags = os.O_CREAT | os.O_WRONLY
1475 oflags |= os.O_TRUNC
1480 outfd = os.open (fname, oflags,
1481 stat.S_IRUSR | stat.S_IWUSR, dir_fd=dir_fd)
1482 except FileExistsError as exn:
1483 noise ("PDT: refusing to overwrite existing file %s" % fname)
1485 raise RuntimeError ("destination file %s already exists" % fname)
1486 if PDTCRYPT_VERBOSE is True:
1487 noise ("PDT: new output file %s (fd=%d)" % (fname, outfd))
1491 ###############################################################################
1492 ## freestanding invocation
1493 ###############################################################################
1495 PDTCRYPT_SUB_PROCESS = 0
1496 PDTCRYPT_SUB_SCRYPT = 1
1497 PDTCRYPT_SUB_SCAN = 2
1500 { "process" : PDTCRYPT_SUB_PROCESS
1501 , "scrypt" : PDTCRYPT_SUB_SCRYPT
1502 , "scan" : PDTCRYPT_SUB_SCAN }
1504 PDTCRYPT_DECRYPT = 1 << 0 # decrypt archive with password
1505 PDTCRYPT_SPLIT = 1 << 1 # split archive into individual objects
1506 PDTCRYPT_HASH = 1 << 2 # output scrypt hash for file and given password
1508 PDTCRYPT_SPLITNAME = "pdtcrypt-object-%d.bin"
1509 PDTCRYPT_RESCUENAME = "pdtcrypt-rescue-object-%0.5d.bin"
1511 PDTCRYPT_VERBOSE = False
1512 PDTCRYPT_STRICTIVS = False
1513 PDTCRYPT_OVERWRITE = False
1514 PDTCRYPT_BLOCKSIZE = 1 << 12
1519 PDTCRYPT_DEFAULT_VER = 1
1520 PDTCRYPT_DEFAULT_PVER = 1
1522 # scrypt hashing output control
1523 PDTCRYPT_SCRYPT_INTRANATOR = 0
1524 PDTCRYPT_SCRYPT_PARAMETERS = 1
1525 PDTCRYPT_SCRYPT_DEFAULT = PDTCRYPT_SCRYPT_INTRANATOR
1527 PDTCRYPT_SCRYPT_FORMAT = \
1528 { "i2n" : PDTCRYPT_SCRYPT_INTRANATOR
1529 , "params" : PDTCRYPT_SCRYPT_PARAMETERS }
1531 PDTCRYPT_TT_COLUMNS = 80 # assume standard terminal
1533 class PDTDecryptionError (Exception):
1534 """Decryption failed."""
1536 class PDTSplitError (Exception):
1537 """Decryption failed."""
1540 def noise (*a, **b):
1541 print (file=sys.stderr, *a, **b)
1544 class PassthroughDecryptor (object):
1546 curhdr = None # write current header on first data write
1548 def __init__ (self):
1549 if PDTCRYPT_VERBOSE is True:
1550 noise ("PDT: no encryption; data passthrough")
1552 def next (self, hdr):
1553 ok, curhdr = hdr_make (hdr)
1555 raise PDTDecryptionError ("bad header %r" % hdr)
1556 self.curhdr = curhdr
1559 if self.curhdr is not None:
1563 def process (self, d):
1564 if self.curhdr is not None:
1570 def depdtcrypt (mode, secret, ins, outs):
1572 Remove PDTCRYPT layer from all objects encrypted with the secret. Used on a
1573 Deltatar backup this will yield a (possibly Gzip compressed) tarball.
1575 ctleft = -1 # length of ciphertext to consume
1576 ctcurrent = 0 # total ciphertext of current object
1577 total_obj = 0 # total number of objects read
1578 total_pt = 0 # total plaintext bytes
1579 total_ct = 0 # total ciphertext bytes
1580 total_read = 0 # total bytes read
1581 outfile = None # Python file object for output
1583 if mode & PDTCRYPT_DECRYPT: # decryptor
1585 if ks == PDTCRYPT_SECRET_PW:
1586 decr = Decrypt (password=secret [1], strict_ivs=PDTCRYPT_STRICTIVS)
1587 elif ks == PDTCRYPT_SECRET_KEY:
1589 decr = Decrypt (key=key, strict_ivs=PDTCRYPT_STRICTIVS)
1591 raise InternalError ("‘%d’ does not specify a valid kind of secret"
1594 decr = PassthroughDecryptor ()
1597 """Dummy for non-split mode: output file does not vary."""
1600 if mode & PDTCRYPT_SPLIT:
1601 def nextout (outfile):
1603 We were passed an fd as outs for accessing the destination
1604 directory where extracted archive components are supposed
1609 if PDTCRYPT_VERBOSE is True:
1610 noise ("PDT: no output file to close at this point")
1612 if PDTCRYPT_VERBOSE is True:
1613 noise ("PDT: release output file %r" % outfile)
1614 # cleanup happens automatically by the GC; the next
1615 # line will error out on account of an invalid fd
1618 assert total_obj > 0
1619 fname = PDTCRYPT_SPLITNAME % total_obj
1621 outfd = open2_dump_file (fname, outs, force=PDTCRYPT_OVERWRITE)
1622 except RuntimeError as exn:
1623 raise PDTSplitError (exn)
1624 return os.fdopen (outfd, "wb", closefd=True)
1628 """ESPIPE is normal on non-seekable stdio stream."""
1631 except OSError as exn:
1632 if exn.errno == os.errno.ESPIPE:
1635 def out (pt, outfile):
1639 if PDTCRYPT_VERBOSE is True:
1640 noise ("PDT:\t· decrypt plaintext %d B" % (npt))
1642 nn = outfile.write (pt)
1643 except OSError as exn: # probably ENOSPC
1644 raise DecryptionError ("error (%s)" % exn)
1646 raise DecryptionError ("write aborted after %d of %d B" % (nn, npt))
1650 # current object completed; in a valid archive this marks either
1651 # the start of a new header or the end of the input
1652 if ctleft == 0: # current object requires finalization
1653 if PDTCRYPT_VERBOSE is True:
1654 noise ("PDT: %d finalize" % tell (ins))
1657 except InvalidGCMTag as exn:
1658 raise DecryptionError ("error finalizing object %d (%d B): "
1659 "%r" % (total_obj, len (pt), exn)) \
1662 if PDTCRYPT_VERBOSE is True:
1663 noise ("PDT:\t· object validated")
1665 if PDTCRYPT_VERBOSE is True:
1666 noise ("PDT: %d hdr" % tell (ins))
1668 hdr = hdr_read_stream (ins)
1669 total_read += PDTCRYPT_HDR_SIZE
1670 except EndOfFile as exn:
1671 total_read += exn.remainder
1672 if total_ct + total_obj * PDTCRYPT_HDR_SIZE != total_read:
1673 raise PDTDecryptionError ("ciphertext processed (%d B) plus "
1674 "overhead (%d × %d B) does not match "
1675 "the number of bytes read (%d )"
1676 % (total_ct, total_obj, PDTCRYPT_HDR_SIZE,
1678 # the single good exit
1679 return total_read, total_obj, total_ct, total_pt
1680 except InvalidHeader as exn:
1681 raise PDTDecryptionError ("invalid header at position %d in %r "
1682 "(%s)" % (tell (ins), exn, ins))
1683 if PDTCRYPT_VERBOSE is True:
1684 pretty = hdr_fmt_pretty (hdr)
1685 noise (reduce (lambda a, e: (a + "\n" if a else "") + "PDT:\t· " + e,
1686 pretty.splitlines (), ""))
1687 ctcurrent = ctleft = hdr ["ctsize"]
1691 total_obj += 1 # used in file counter with split mode
1693 # finalization complete or skipped in case of first object in
1694 # stream; create a new output file if necessary
1695 outfile = nextout (outfile)
1697 if PDTCRYPT_VERBOSE is True:
1698 noise ("PDT: %d decrypt obj no. %d, %d B"
1699 % (tell (ins), total_obj, ctleft))
1701 # always allocate a new buffer since python-cryptography doesn’t allow
1702 # passing a bytearray :/
1703 nexpect = min (ctleft, PDTCRYPT_BLOCKSIZE)
1704 if PDTCRYPT_VERBOSE is True:
1705 noise ("PDT:\t· [%d] %d%% done, read block (%d B of %d B remaining)"
1707 100 - ctleft * 100 / (ctcurrent > 0 and ctcurrent or 1),
1709 ct = ins.read (nexpect)
1713 raise EndOfFile (nct,
1714 "hit EOF after %d of %d B in block [%d:%d); "
1715 "%d B ciphertext remaining for object no %d"
1716 % (nct, nexpect, off, off + nexpect, ctleft,
1722 if PDTCRYPT_VERBOSE is True:
1723 noise ("PDT:\t· decrypt ciphertext %d B" % (nct))
1724 pt = decr.process (ct)
1728 def deptdcrypt_mk_stream (kind, path):
1729 """Create stream from file or stdio descriptor."""
1730 if kind == PDTCRYPT_SINK:
1732 if PDTCRYPT_VERBOSE is True: noise ("PDT: sink: stdout")
1733 return sys.stdout.buffer
1735 if PDTCRYPT_VERBOSE is True: noise ("PDT: sink: file %s" % path)
1736 return io.FileIO (path, "w")
1737 if kind == PDTCRYPT_SOURCE:
1739 if PDTCRYPT_VERBOSE is True: noise ("PDT: source: stdin")
1740 return sys.stdin.buffer
1742 if PDTCRYPT_VERBOSE is True: noise ("PDT: source: file %s" % path)
1743 return io.FileIO (path, "r")
1745 raise ValueError ("bogus stream “%s” / %s" % (kind, path))
1748 def mode_depdtcrypt (mode, secret, ins, outs):
1750 total_read, total_obj, total_ct, total_pt = \
1751 depdtcrypt (mode, secret, ins, outs)
1752 except DecryptionError as exn:
1753 noise ("PDT: Decryption failed:")
1755 noise ("PDT: “%s”" % exn)
1757 noise ("PDT: Did you specify the correct key / password?")
1760 except PDTSplitError as exn:
1761 noise ("PDT: Split operation failed:")
1763 noise ("PDT: “%s”" % exn)
1765 noise ("PDT: Hint: target directory should be empty.")
1769 if PDTCRYPT_VERBOSE is True:
1770 noise ("PDT: decryption successful" )
1771 noise ("PDT: %.10d bytes read" % total_read)
1772 noise ("PDT: %.10d objects decrypted" % total_obj )
1773 noise ("PDT: %.10d bytes ciphertext" % total_ct )
1774 noise ("PDT: %.10d bytes plaintext" % total_pt )
1780 def mode_scrypt (pw, ins=None, nacl=None, fmt=PDTCRYPT_SCRYPT_INTRANATOR):
1782 paramversion = PDTCRYPT_DEFAULT_PVER
1784 hsh, nacl, version, paramversion = scrypt_hashsource (pw, ins)
1785 defs = ENCRYPTION_PARAMETERS.get(paramversion, None)
1787 nacl = binascii.unhexlify (nacl)
1788 defs = ENCRYPTION_PARAMETERS.get(paramversion, None)
1789 version = PDTCRYPT_DEFAULT_VER
1791 kdfname, params = defs ["kdf"]
1793 kdf = kdf_by_version (None, defs)
1794 hsh, _void = kdf (pw, nacl)
1798 if fmt == PDTCRYPT_SCRYPT_INTRANATOR:
1799 out = json.dumps ({ "salt" : base64.b64encode (nacl).decode ()
1800 , "key" : base64.b64encode (hsh) .decode ()
1801 , "paramversion" : paramversion })
1802 elif fmt == PDTCRYPT_SCRYPT_PARAMETERS:
1803 out = json.dumps ({ "salt" : binascii.hexlify (nacl).decode ()
1804 , "key" : binascii.hexlify (hsh) .decode ()
1805 , "version" : version
1806 , "scrypt_params" : { "N" : params ["N"]
1807 , "r" : params ["r"]
1808 , "p" : params ["p"]
1809 , "dkLen" : params ["dkLen"] } })
1811 raise RuntimeError ("bad scrypt output scheme %r" % fmt)
1816 def noise_output_candidates (cands, indent=8, cols=PDTCRYPT_TT_COLUMNS):
1818 Print a list of offsets without garbling the terminal too much.
1820 The indent is counted from column zero; if it is wide enough, the “PDT: ”
1821 marker will be prepended, considered part of the indentation.
1825 idt = " " * indent if indent < 5 else "PDT: " + " " * (indent - 5)
1830 init = True # prevent leading separator
1833 raise ValueError ("the requested indentation exceeds the line "
1834 "width by %d" % (indent - wd))
1844 if lpos > wd: # line break
1860 SLICE_START = 1 # ordering is important to have starts of intervals
1861 SLICE_END = 0 # sorted before equal ends
1863 def find_overlaps (slices):
1865 Find overlapping slices: iterate open/close points of intervals, tracking
1866 the ones open at any time.
1869 inside = set () # of indices into bounds
1870 ovrlp = set () # of indices into bounds
1872 for i, s in enumerate (slices):
1873 bounds.append ((s [0], SLICE_START, i))
1874 bounds.append ((s [1], SLICE_END , i))
1875 bounds = sorted (bounds)
1879 if val [1] == SLICE_START:
1882 if len (inside) > 1: # closing one that overlapped
1886 return [ slices [i] for i in ovrlp ]
1889 def mode_scan (secret, fname, outs=None, nacl=None):
1891 Dissect a binary file, looking for PDTCRYPT headers and objects.
1893 If *outs* is supplied, recoverable data will be dumped into the specified
1897 ifd = os.open (fname, os.O_RDONLY)
1898 except FileNotFoundError:
1899 noise ("PDT: failed to open %s readonly" % fname)
1904 if PDTCRYPT_VERBOSE is True:
1905 noise ("PDT: scan for potential sync points")
1906 cands = locate_hdr_candidates (ifd)
1907 if len (cands) == 0:
1908 noise ("PDT: scan complete: input does not contain potential PDT "
1909 "headers; giving up.")
1911 if PDTCRYPT_VERBOSE is True:
1912 noise ("PDT: scan complete: found %d candidates:" % len (cands))
1913 noise_output_candidates (cands)
1918 junk, todo, slices = [], [], []
1923 vdt, hdr = inspect_hdr (ifd, cand)
1925 vdts = verdict_fmt (vdt)
1927 if vdt == HDR_CAND_JUNK:
1928 noise ("PDT: obj %d: %s object: bad header, skipping" % vdts)
1931 off0 = cand + PDTCRYPT_HDR_SIZE
1932 if PDTCRYPT_VERBOSE is True:
1933 noise ("PDT: obj %d: read payload @%d" % (nobj, off0))
1934 pretty = hdr_fmt_pretty (hdr)
1935 noise (reduce (lambda a, e: (a + "\n" if a else "") + "PDT:\t· " + e,
1936 pretty.splitlines (), ""))
1939 if outs is not None:
1940 ofname = PDTCRYPT_RESCUENAME % nobj
1941 ofd = open2_dump_file (ofname, outs, force=PDTCRYPT_OVERWRITE)
1943 ctsize = hdr ["ctsize"]
1945 l = try_decrypt (ifd, off0, hdr, secret, ofd=ofd)
1947 slices.append ((off0, off0 + l))
1951 if vdt == HDR_CAND_GOOD and ok is True:
1952 noise ("PDT: %d → ✓ %s object %d–%d"
1953 % (cand, vdts, off0, off0 + ctsize))
1954 elif vdt == HDR_CAND_FISHY and ok is True:
1955 noise ("PDT: %d → × %s object %d–%d, corrupt header"
1956 % (cand, vdts, off0, off0 + ctsize))
1957 elif vdt == HDR_CAND_GOOD and ok is False:
1958 noise ("PDT: %d → × %s object %d–%d, problematic payload"
1959 % (cand, vdts, off0, off0 + ctsize))
1960 elif vdt == HDR_CAND_FISHY and ok is False:
1961 noise ("PDT: %d → × %s object %d–%d, corrupt header, problematic "
1962 "ciphertext" % (cand, vdts, off0, off0 + ctsize))
1969 noise ("PDT: all headers ok")
1971 noise ("PDT: %d candidates not parseable as headers:" % len (junk))
1972 noise_output_candidates (junk)
1974 overlap = find_overlaps (slices)
1975 if len (overlap) > 0:
1976 noise ("PDT: %d objects overlapping others" % len (overlap))
1977 for slice in overlap:
1978 noise ("PDT: × %d→%d" % (slice [0], slice [1]))
1980 def usage (err=False):
1984 indent = ' ' * len (SELF)
1985 out ("usage: %s SUBCOMMAND { --help" % SELF)
1986 out (" %s | [ -v ] { -p PASSWORD | -k KEY }" % indent)
1987 out (" %s [ { -i | --in } { - | SOURCE } ]" % indent)
1988 out (" %s [ { -n | --nacl } { SALT } ]" % indent)
1989 out (" %s [ { -o | --out } { - | DESTINATION } ]" % indent)
1990 out (" %s [ -D | --no-decrypt ] [ -S | --split ]" % indent)
1991 out (" %s [ -f | --format ]" % indent)
1994 out ("\t\tSUBCOMMAND main mode: { process | scrypt }")
1996 out ("\t\t process: extract objects from PDT archive")
1997 out ("\t\t scrypt: calculate hash from password and first object")
1998 out ("\t\t-p PASSWORD password to derive the encryption key from")
1999 out ("\t\t-k KEY encryption key as 16 bytes in hexadecimal notation")
2000 out ("\t\t-s enforce strict handling of initialization vectors")
2001 out ("\t\t-i SOURCE file name to read from")
2002 out ("\t\t-o DESTINATION file to write output to")
2003 out ("\t\t-n SALT provide salt for scrypt mode in hex encoding")
2004 out ("\t\t-v print extra info")
2005 out ("\t\t-S split into files at object boundaries; this")
2006 out ("\t\t requires DESTINATION to refer to directory")
2007 out ("\t\t-D PDT header and ciphertext passthrough")
2008 out ("\t\t-f format of SCRYPT hash output (“default” or “parameters”)")
2010 out ("\tinstead of filenames, “-” may used to specify stdin / stdout")
2012 sys.exit ((err is True) and 42 or 0)
2022 def parse_argv (argv):
2023 global PDTCRYPT_OVERWRITE
2025 mode = PDTCRYPT_DECRYPT
2031 scrypt_format = PDTCRYPT_SCRYPT_DEFAULT
2034 SELF = os.path.basename (next (argvi))
2037 rawsubcmd = next (argvi)
2038 subcommand = PDTCRYPT_SUB [rawsubcmd]
2039 except StopIteration:
2040 bail ("ERROR: subcommand required")
2042 bail ("ERROR: invalid subcommand “%s” specified" % rawsubcmd)
2048 except StopIteration:
2049 bail ("ERROR: argument list incomplete")
2051 def checked_secret (s):
2056 bail ("ERROR: encountered “%s” but secret already given" % arg)
2059 if arg in [ "-h", "--help" ]:
2062 elif arg in [ "-v", "--verbose", "--wtf" ]:
2063 global PDTCRYPT_VERBOSE
2064 PDTCRYPT_VERBOSE = True
2065 elif arg in [ "-i", "--in", "--source" ]:
2066 insspec = checked_arg ()
2067 if PDTCRYPT_VERBOSE is True: noise ("PDT: decrypt from %s" % insspec)
2068 elif arg in [ "-p", "--password" ]:
2069 arg = checked_arg ()
2070 checked_secret (make_secret (password=arg))
2071 if PDTCRYPT_VERBOSE is True: noise ("PDT: decrypting with password")
2073 if subcommand == PDTCRYPT_SUB_PROCESS:
2074 if arg in [ "-s", "--strict-ivs" ]:
2075 global PDTCRYPT_STRICTIVS
2076 PDTCRYPT_STRICTIVS = True
2077 elif arg in [ "-o", "--out", "--dest", "--sink" ]:
2078 outsspec = checked_arg ()
2079 if PDTCRYPT_VERBOSE is True: noise ("PDT: decrypt to %s" % outsspec)
2080 elif arg in [ "-f", "--force" ]:
2081 PDTCRYPT_OVERWRITE = True
2082 if PDTCRYPT_VERBOSE is True: noise ("PDT: overwrite existing files")
2083 elif arg in [ "-S", "--split" ]:
2084 mode |= PDTCRYPT_SPLIT
2085 if PDTCRYPT_VERBOSE is True: noise ("PDT: split files")
2086 elif arg in [ "-D", "--no-decrypt" ]:
2087 mode &= ~PDTCRYPT_DECRYPT
2088 if PDTCRYPT_VERBOSE is True: noise ("PDT: not decrypting")
2089 elif arg in [ "-k", "--key" ]:
2090 arg = checked_arg ()
2091 checked_secret (make_secret (key=arg))
2092 if PDTCRYPT_VERBOSE is True: noise ("PDT: decrypting with key")
2094 bail ("ERROR: unexpected positional argument “%s”" % arg)
2095 elif subcommand == PDTCRYPT_SUB_SCRYPT:
2096 if arg in [ "-n", "--nacl", "--salt" ]:
2097 nacl = checked_arg ()
2098 if PDTCRYPT_VERBOSE is True: noise ("PDT: salt key with %s" % nacl)
2099 elif arg in [ "-f", "--format" ]:
2100 arg = checked_arg ()
2102 scrypt_format = PDTCRYPT_SCRYPT_FORMAT [arg]
2104 bail ("ERROR: invalid scrypt output format %s" % arg)
2105 if PDTCRYPT_VERBOSE is True:
2106 noise ("PDT: scrypt output format “%s”" % scrypt_format)
2108 bail ("ERROR: unexpected positional argument “%s”" % arg)
2109 elif subcommand == PDTCRYPT_SUB_SCAN:
2110 if arg in [ "-o", "--out", "--dest", "--sink" ]:
2111 outsspec = checked_arg ()
2112 if PDTCRYPT_VERBOSE is True: noise ("PDT: decrypt to %s" % outsspec)
2113 elif arg in [ "-f", "--force" ]:
2114 PDTCRYPT_OVERWRITE = True
2115 if PDTCRYPT_VERBOSE is True: noise ("PDT: overwrite existing files")
2117 bail ("ERROR: unexpected positional argument “%s”" % arg)
2120 if PDTCRYPT_VERBOSE is True:
2121 noise ("ERROR: no password or key specified, trying $PDTCRYPT_PASSWORD")
2122 epw = os.getenv ("PDTCRYPT_PASSWORD")
2124 checked_secret (make_secret (password=epw.strip ()))
2127 if PDTCRYPT_VERBOSE is True:
2128 noise ("ERROR: no password or key specified, trying $PDTCRYPT_KEY")
2129 ek = os.getenv ("PDTCRYPT_KEY")
2131 checked_secret (make_secret (key=ek.strip ()))
2134 if subcommand == PDTCRYPT_SUB_SCRYPT:
2135 bail ("ERROR: scrypt hash mode requested but no password given")
2136 elif mode & PDTCRYPT_DECRYPT:
2137 bail ("ERROR: decryption requested but no password given")
2139 if mode & PDTCRYPT_SPLIT and outsspec is None:
2140 bail ("ERROR: split mode is incompatible with stdout sink "
2143 if subcommand == PDTCRYPT_SUB_SCAN and outsspec is None:
2144 pass # no output by default in scan mode
2145 elif mode & PDTCRYPT_SPLIT or subcommand == PDTCRYPT_SUB_SCAN:
2146 # destination must be directory
2148 bail ("ERROR: mode is incompatible with stdout sink")
2151 os.makedirs (outsspec, 0o700)
2152 except FileExistsError:
2153 # if it’s a directory with appropriate perms, everything is
2154 # good; otherwise, below invocation of open(2) will fail
2156 outs = os.open (outsspec, os.O_DIRECTORY, 0o600)
2157 except FileNotFoundError as exn:
2158 bail ("ERROR: cannot create target directory “%s”" % outsspec)
2159 except NotADirectoryError as exn:
2160 bail ("ERROR: target path “%s” is not a directory" % outsspec)
2162 outs = deptdcrypt_mk_stream (PDTCRYPT_SINK, outsspec or "-")
2164 if subcommand == PDTCRYPT_SUB_SCAN:
2166 bail ("ERROR: please supply an input file for scanning")
2168 bail ("ERROR: input must be seekable; please specify a file")
2169 return True, partial (mode_scan, secret, insspec, outs, nacl=nacl)
2171 if subcommand == PDTCRYPT_SUB_SCRYPT:
2172 if secret [0] == PDTCRYPT_SECRET_KEY:
2173 bail ("ERROR: scrypt mode requires a password")
2174 if insspec is not None and nacl is not None \
2175 or insspec is None and nacl is None :
2176 bail ("ERROR: please supply either an input file or "
2181 if insspec is not None or subcommand != PDTCRYPT_SUB_SCRYPT:
2182 ins = deptdcrypt_mk_stream (PDTCRYPT_SOURCE, insspec or "-")
2184 if subcommand == PDTCRYPT_SUB_SCRYPT:
2185 return True, partial (mode_scrypt, secret [1].encode (), ins, nacl,
2188 return True, partial (mode_depdtcrypt, mode, secret, ins, outs)
2192 ok, runner = parse_argv (argv)
2194 if ok is True: return runner ()
2199 if __name__ == "__main__":
2200 sys.exit (main (sys.argv))