6 ===============================================================================
7 crypto -- Encryption Layer for the Deltatar Backup
8 ===============================================================================
12 - AES-GCM for the symmetric encryption;
17 - NIST Recommendation for Block Cipher Modes of Operation: Galois/Counter
19 http://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-38d.pdf
22 https://cryptome.org/2014/01/aes-gcm-v1.pdf
24 - Authentication weaknesses in GCM
25 http://csrc.nist.gov/groups/ST/toolkit/BCM/documents/comments/CWC-GCM/Ferguson2.pdf
27 Trouble with python-cryptography packages: authentication tags can only be
28 passed in advance: https://github.com/pyca/cryptography/pull/3421
31 -------------------------------------------------------------------------------
33 Errors fall into roughly three categories:
35 - Cryptographical errors or invalid data.
37 - ``InvalidGCMTag`` (decryption failed on account of an invalid GCM
39 - ``InvalidIVFixedPart`` (IV fixed part of object not found in list),
40 - ``DuplicateIV`` (the IV of an encrypted object already occurred),
41 - ``DecryptionError`` (used in CLI decryption for presenting error
42 conditions to the user).
44 - Incorrect usage of the library.
46 - ``InvalidParameter`` (non-conforming user supplied parameter),
47 - ``InvalidHeader`` (data passed for reading not parsable into header),
48 - ``FormatError`` (cannot handle header or parameter version),
51 - Bad internal state. If one of these is encountered it means that a state
52 was reached that shouldn’t occur during normal processing.
57 Also, ``EndOfFile`` is used as a sentinel to communicate that a stream supplied
58 for reading is exhausted.
60 Initialization Vectors
61 -------------------------------------------------------------------------------
63 Initialization vectors are checked reuse during the lifetime of a decryptor.
64 The fixed counters for metadata files cannot be reused and attempts to do so
65 will cause a DuplicateIV error. This means the length of objects encrypted with
66 a metadata counter is capped at 63 GB.
68 For ordinary, non-metadata payload, there is an optional mode with strict IV
69 checking that causes a crypto context to fail if an IV encountered or created
70 was already used for decrypting or encrypting, respectively, an earlier object.
71 Note that this mode can trigger false positives when decrypting non-linearly,
72 e. g. when traversing the same object multiple times. Since the crypto context
73 has no notion of a position in a PDT encrypted archive, this condition must be
74 sorted out downstream.
77 -------------------------------------------------------------------------------
79 ``crypto.py`` may be invoked as a script for decrypting, validating, and
80 splitting PDT encrypted files. Consult the usage message for details.
84 Decrypt from stdin using the password ‘foo’: ::
86 $ crypto.py process foo -i - -o - <some-file.tar.gz.pdtcrypt >some-file.tar.gz
88 Output verbose information about the encrypted objects in the archive: ::
90 $ crypto.py process foo -v -i some-file.tar.gz.pdtcrypt -o /dev/null
91 PDT: decrypt from some-file.tar.gz.pdtcrypt
92 PDT: decrypt to /dev/null
93 PDT: source: file some-file.tar.gz.pdtcrypt
94 PDT: sink: file /dev/null
96 PDT: · version = 1 : 0100
97 PDT: · paramversion = 1 : 0100
98 PDT: · nacl : d270 b031 00d1 87e2 c946 610d 7b7f 7e5f
99 PDT: · iv : 02ee 3dd7 a963 1eb1 0100 0000
100 PDT: · ctsize = 591 : 4f02 0000 0000 0000
101 PDT: · tag : 5b2d 6d8b 8f82 4842 12fd 0b10 b6e3 369b
102 PDT: 64 decrypt obj no. 1, 591 B
103 PDT: · [64] 0% done, read block (591 B of 591 B remaining)
104 PDT: · decrypt ciphertext 591 B
105 PDT: · decrypt plaintext 591 B
109 Also, the mode *scrypt* allows deriving encryption keys. To calculate the
110 encryption key from the password ‘foo’ and the salt of the first object in a
111 PDT encrypted file: ::
113 $ crypto.py scrypt foo -i some-file.pdtcrypt
114 {"paramversion": 1, "salt": "Cqzbk48e3peEjzWto8D0yA==", "key": "JH9EkMwaM4x9F5aim5gK/Q=="}
116 The computed 16 byte key is given in hexadecimal notation in the value to
117 ``hash`` and can be fed into Python’s ``binascii.unhexlify()`` to obtain the
118 corresponding binary representation.
120 Note that in Scrypt hashing mode, no data integrity checks are being performed.
121 If the wrong password is given, a wrong key will be derived. Whether the password
122 was indeed correct can only be determined by decrypting. Note that since PDT
123 archives essentially consist of a stream of independent objects, the salt and
124 other parameters may change. Thus a key derived using above method from the
125 first object doesn’t necessarily apply to any of the subsequent objects.
134 from functools import reduce, partial
144 except ImportError as exn:
147 if __name__ == "__main__": ## Work around the import mechanism lest Python’s
148 pwd = os.getcwd() ## preference for local imports causes a cyclical
149 ## import (crypto → pylibscrypt → […] → ./tarfile → crypto).
150 sys.path = [ p for p in sys.path if p.find ("deltatar") < 0 ]
153 from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes
154 from cryptography.hazmat.backends import default_backend
158 __all__ = [ "hdr_make", "hdr_read", "hdr_fmt", "hdr_fmt_pretty"
160 , "PDTCRYPT_HDR_SIZE", "AES_GCM_IV_CNT_DATA"
161 , "AES_GCM_IV_CNT_INFOFILE", "AES_GCM_IV_CNT_INDEX"
165 ###############################################################################
167 ###############################################################################
169 class EndOfFile (Exception):
173 def __init__ (self, n=None, msg=None):
179 class InvalidParameter (Exception):
180 """Inputs not valid for PDT encryption."""
184 class InvalidHeader (Exception):
185 """Header not valid."""
189 class InvalidGCMTag (Exception):
191 The GCM tag calculated during decryption differs from that in the object
197 class InvalidIVFixedPart (Exception):
199 IV fixed part not in supplied list: either the backup is corrupt or the
200 current object does not belong to it.
205 class IVFixedPartError (Exception):
207 Error creating a unique IV fixed part: repeated calls to system RNG yielded
208 the same sequence of bytes as the last IV used.
213 class InvalidFileCounter (Exception):
215 When encrypting, an attempted reuse of a dedicated counter (info file,
216 index file) was caught.
221 class DuplicateIV (Exception):
223 During encryption, the current IV fixed part is identical to an already
224 existing IV (same prefix and file counter). This indicates tampering or
225 programmer error and cannot be recovered from.
230 class NonConsecutiveIV (Exception):
232 IVs not numbered consecutively. This is a hard error with strict IV
233 checking. Precludes random access to the encrypted objects.
238 class FormatError (Exception):
239 """Unusable parameters in header."""
243 class DecryptionError (Exception):
244 """Error during decryption with ``crypto.py`` on the command line."""
248 class Unreachable (Exception):
250 Makeshift __builtin_unreachable(); always a programmer error if
256 class InternalError (Exception):
257 """Errors not ascribable to bad user inputs or cryptography."""
261 ###############################################################################
262 ## crypto layer version
263 ###############################################################################
265 ENCRYPTION_PARAMETERS = \
267 { "kdf": ("dummy", 16)
268 , "enc": "passthrough" }
276 , "enc": "aes-gcm" } }
278 ###############################################################################
280 ###############################################################################
282 PDTCRYPT_HDR_MAGIC = b"PDTCRYPT"
284 PDTCRYPT_HDR_SIZE_MAGIC = 8 # 8
285 PDTCRYPT_HDR_SIZE_VERSION = 2 # 10
286 PDTCRYPT_HDR_SIZE_PARAMVERSION = 2 # 12
287 PDTCRYPT_HDR_SIZE_NACL = 16 # 28
288 PDTCRYPT_HDR_SIZE_IV = 12 # 40
289 PDTCRYPT_HDR_SIZE_CTSIZE = 8 # 48
290 PDTCRYPT_HDR_SIZE_TAG = 16 # 64 GCM auth tag
292 PDTCRYPT_HDR_SIZE = PDTCRYPT_HDR_SIZE_MAGIC + PDTCRYPT_HDR_SIZE_VERSION \
293 + PDTCRYPT_HDR_SIZE_PARAMVERSION + PDTCRYPT_HDR_SIZE_NACL \
294 + PDTCRYPT_HDR_SIZE_IV + PDTCRYPT_HDR_SIZE_CTSIZE \
295 + PDTCRYPT_HDR_SIZE_TAG # = 64
297 # precalculate offsets since Python can’t do constant folding over names
298 HDR_OFF_VERSION = PDTCRYPT_HDR_SIZE_MAGIC
299 HDR_OFF_PARAMVERSION = HDR_OFF_VERSION + PDTCRYPT_HDR_SIZE_VERSION
300 HDR_OFF_NACL = HDR_OFF_PARAMVERSION + PDTCRYPT_HDR_SIZE_PARAMVERSION
301 HDR_OFF_IV = HDR_OFF_NACL + PDTCRYPT_HDR_SIZE_NACL
302 HDR_OFF_CTSIZE = HDR_OFF_IV + PDTCRYPT_HDR_SIZE_IV
303 HDR_OFF_TAG = HDR_OFF_CTSIZE + PDTCRYPT_HDR_SIZE_CTSIZE
307 FMT_I2N_IV = "<8sL" # 8 random bytes ‖ 32 bit counter
308 FMT_I2N_HDR = ("<" # host byte order
312 "16s" # sodium chloride
318 AES_KEY_SIZE = 16 # b"0123456789abcdef"
319 AES_KEY_SIZE_B64 = 24 # b'MDEyMzQ1Njc4OWFiY2RlZg=='
320 AES_GCM_MAX_SIZE = (1 << 36) - (1 << 5) # 2^39 - 2^8 b ≅ 64 GB
321 PDTCRYPT_MAX_OBJ_SIZE_DEFAULT = 63 * (1 << 30) # 63 GB
322 PDTCRYPT_MAX_OBJ_SIZE = PDTCRYPT_MAX_OBJ_SIZE_DEFAULT
324 # index and info files are written on-the fly while encrypting so their
325 # counters must be available inadvance
326 AES_GCM_IV_CNT_INFOFILE = 1 # constant
327 AES_GCM_IV_CNT_INDEX = AES_GCM_IV_CNT_INFOFILE + 1
328 AES_GCM_IV_CNT_DATA = AES_GCM_IV_CNT_INDEX + 1 # also for multivolume
329 AES_GCM_IV_CNT_MAX_DEFAULT = 0xffFFffFF
330 AES_GCM_IV_CNT_MAX = AES_GCM_IV_CNT_MAX_DEFAULT
332 # IV structure and generation
333 PDTCRYPT_IV_GEN_MAX_RETRIES = 10 # ×
334 PDTCRYPT_IV_FIXEDPART_SIZE = 8 # B
335 PDTCRYPT_IV_COUNTER_SIZE = 4 # B
337 # secret type: PW of string | KEY of char [16]
338 PDTCRYPT_SECRET_PW = 0
339 PDTCRYPT_SECRET_KEY = 1
341 ###############################################################################
343 ###############################################################################
349 # , paramversion : u16
355 # fn hdr_read (f : handle) -> hdrinfo;
356 # fn hdr_make (f : handle, h : hdrinfo) -> IOResult<usize>;
357 # fn hdr_fmt (h : hdrinfo) -> String;
362 Read bytes as header structure.
364 If the input could not be interpreted as a header, fail with
369 mag, version, paramversion, nacl, iv, ctsize, tag = \
370 struct.unpack (FMT_I2N_HDR, data)
371 except Exception as exn:
372 raise InvalidHeader ("error unpacking header from [%r]: %s"
373 % (binascii.hexlify (data), str (exn)))
375 if mag != PDTCRYPT_HDR_MAGIC:
376 raise InvalidHeader ("bad magic in header: expected [%s], got [%s]"
377 % (PDTCRYPT_HDR_MAGIC, mag))
380 { "version" : version
381 , "paramversion" : paramversion
389 def hdr_read_stream (instr):
391 Read header from stream at the current position.
393 Fail with ``InvalidHeader`` if insufficient bytes were read from the
394 stream, or if the content could not be interpreted as a header.
396 data = instr.read(PDTCRYPT_HDR_SIZE)
400 elif ldata != PDTCRYPT_HDR_SIZE:
401 raise InvalidHeader ("hdr_read_stream: expected %d B, received %d B"
402 % (PDTCRYPT_HDR_SIZE, ldata))
403 return hdr_read (data)
406 def hdr_from_params (version, paramversion, nacl, iv, ctsize, tag):
408 Assemble the necessary values into a PDTCRYPT header.
410 :type version: int to fit uint16_t
411 :type paramversion: int to fit uint16_t
412 :type nacl: bytes to fit uint8_t[16]
413 :type iv: bytes to fit uint8_t[12]
414 :type size: int to fit uint64_t
415 :type tag: bytes to fit uint8_t[16]
417 buf = bytearray (PDTCRYPT_HDR_SIZE)
418 bufv = memoryview (buf)
421 struct.pack_into (FMT_I2N_HDR, bufv, 0,
423 version, paramversion, nacl, iv, ctsize, tag)
424 except Exception as exn:
425 return False, "error assembling header: %s" % str (exn)
427 return True, bytes (buf)
430 def hdr_make_dummy (s):
432 Create a header sized block of bytes initialized to a value derived from a
433 string. Used to verify we’ve jumped back correctly to the actual position
434 of the object header.
436 c = reduce (lambda a, c: a + ord(c), s, 0) % 0xFF
437 return bytes (bytearray (struct.pack ("B", c)) * PDTCRYPT_HDR_SIZE)
442 Assemble a header from the given header structure.
444 return hdr_from_params (version=hdr.get("version"),
445 paramversion=hdr.get("paramversion"),
446 nacl=hdr.get("nacl"), iv=hdr.get("iv"),
447 ctsize=hdr.get("ctsize"), tag=hdr.get("tag"))
450 HDR_FMT = "I2n_header { version: %d, paramversion: %d, nacl: %s[%d]," \
451 " iv: %s[%d], ctsize: %d, tag: %s[%d] }"
454 """Format a header structure into readable output."""
455 return HDR_FMT % (h["version"], h["paramversion"],
456 binascii.hexlify (h["nacl"]), len(h["nacl"]),
457 binascii.hexlify (h["iv"]), len(h["iv"]),
459 binascii.hexlify (h["tag"]), len(h["tag"]))
462 def hex_spaced_of_bytes (b):
463 """Format bytes object, hexdump style."""
464 return " ".join ([ "%.2x%.2x" % (c1, c2)
465 for c1, c2 in zip (b[0::2], b[1::2]) ]) \
466 + (len (b) | 1 == len (b) and " %.2x" % b[-1] or "") # odd lengths
469 def hdr_iv_counter (h):
470 """Extract the variable part of the IV of the given header."""
471 _fixed, cnt = struct.unpack (FMT_I2N_IV, h ["iv"])
475 def hdr_iv_fixed (h):
476 """Extract the fixed part of the IV of the given header."""
477 fixed, _cnt = struct.unpack (FMT_I2N_IV, h ["iv"])
481 hdr_dump = hex_spaced_of_bytes
485 """version = %-4d : %s
486 paramversion = %-4d : %s
493 def hdr_fmt_pretty (h):
495 Format header structure into multi-line representation of its contents and
496 their raw representation. (Omit the implicit “PDTCRYPT” magic bytes that
497 precede every header.)
499 return HDR_FMT_PRETTY \
501 hex_spaced_of_bytes (struct.pack (FMT_UINT16_LE, h["version"])),
503 hex_spaced_of_bytes (struct.pack (FMT_UINT16_LE, h["paramversion"])),
504 hex_spaced_of_bytes (h["nacl"]),
505 hex_spaced_of_bytes (h["iv"]),
507 hex_spaced_of_bytes (struct.pack (FMT_UINT64_LE, h["ctsize"])),
508 hex_spaced_of_bytes (h["tag"]))
510 IV_FMT = "((f %s) (c %d))"
513 """Format the two components of an IV in a readable fashion."""
514 fixed, cnt = struct.unpack (FMT_I2N_IV, iv)
515 return IV_FMT % (binascii.hexlify (fixed), cnt)
518 ###############################################################################
520 ###############################################################################
522 class Location (object):
526 def restore_loc_fmt (loc):
528 % (loc.n, loc.offset)
530 def locate_hdr_candidates (fd):
532 Walk over instances of the magic string in the payload, collecting their
533 positions. If the offset of the first found instance is not zero, the file
534 begins with leading garbage.
536 :return: The list of offsets in the file.
540 mm = mmap.mmap(fd, 0, mmap.MAP_SHARED, mmap.PROT_READ)
543 pos = mm.find (PDTCRYPT_HDR_MAGIC, pos)
552 HDR_CAND_GOOD = 0 # header marks begin of valid object
553 HDR_CAND_FISHY = 1 # inconclusive (tag mismatch, obj overlap etc.)
554 HDR_CAND_JUNK = 2 # not a header / object unreadable
557 { HDR_CAND_GOOD : "valid"
558 , HDR_CAND_FISHY : "fishy"
559 , HDR_CAND_JUNK : "junk"
563 def verdict_fmt (vdt):
564 return HDR_VERDICT_NAME [vdt]
567 def inspect_hdr (fd, off):
569 Attempt to parse a header in *fd* at position *off*.
571 Returns a verdict about the quality of that header plus the parsed header
575 _ = os.lseek (fd, off, os.SEEK_SET)
577 if os.lseek (fd, 0, os.SEEK_CUR) != off:
578 if PDTCRYPT_VERBOSE is True:
579 noise ("PDT: %d → dismissed (lseek() past EOF)" % off)
580 return HDR_CAND_JUNK, None
582 raw = os.read (fd, PDTCRYPT_HDR_SIZE)
583 if len (raw) != PDTCRYPT_HDR_SIZE:
584 if PDTCRYPT_VERBOSE is True:
585 noise ("PDT: %d → dismissed (EOF inside header)" % off)
586 return HDR_CAND_JUNK, None
590 except InvalidHeader as exn:
591 if PDTCRYPT_VERBOSE is True:
592 noise ("PDT: %d → dismissed (invalid: [%s])" % (off, str (exn)))
593 return HDR_CAND_JUNK, None
595 obj0 = off + PDTCRYPT_HDR_SIZE
596 objX = obj0 + hdr ["ctsize"]
598 eof = os.lseek (fd, 0, os.SEEK_END)
600 if PDTCRYPT_VERBOSE is True:
601 noise ("PDT: %d → EOF inside object (%d≤%d≤%d); adjusting size to "
602 "%d" % (off, obj0, eof, objX, (eof - obj0)))
603 # try reading up to the end
604 hdr ["ctsize"] = eof - obj0
605 return HDR_CAND_FISHY, hdr
607 return HDR_CAND_GOOD, hdr
610 def try_decrypt (ifd, off, hdr, secret, ofd=-1):
612 Attempt to decrypt the object in the (seekable) descriptor *ifd* starting
613 at *off* using the metadata in *hdr* and *secret*. An output fd can be
614 specified with *ofd*; if it is *-1* – the default –, the decrypted payload
617 Always creates a fresh decryptor, so validation steps across objects don’t
620 Errors during GCM tag validation are ignored.
622 ctleft = hdr ["ctsize"]
626 if ks == PDTCRYPT_SECRET_PW:
627 decr = Decrypt (password=secret [1])
628 elif ks == PDTCRYPT_SECRET_KEY:
630 decr = Decrypt (key=key)
637 os.lseek (ifd, pos, os.SEEK_SET)
639 cnksiz = min (ctleft, PDTCRYPT_BLOCKSIZE)
640 cnk = os.read (ifd, cnksiz)
643 pt = decr.process (cnk)
648 except InvalidGCMTag:
649 noise ("PDT: GCM tag mismatch for object %d–%d"
650 % (off, off + hdr ["ctsize"]))
651 if len (pt) > 0 and ofd != -1:
654 except Exception as exn:
655 noise ("PDT: error decrypting object %d–%d@%d, %d B remaining [%s]"
656 % (off, off + hdr ["ctsize"], pos, ctleft, exn))
662 def readable_objects_offsets (ifd, secret, cands):
664 From a list of candidates, locate the ones that mark the start of actual
665 readable PDTCRYPT objects.
669 for i, cand in enumerate (cands):
670 vdt, hdr = inspect_hdr (ifd, cand)
671 if vdt == HDR_CAND_JUNK:
672 pass # ignore unreadable ones
673 elif vdt in [HDR_CAND_GOOD, HDR_CAND_FISHY]:
674 ctsize = hdr ["ctsize"]
675 off0 = cand + PDTCRYPT_HDR_SIZE
676 ok = try_decrypt (ifd, off0, hdr, secret) == ctsize
678 good.append ((cand, off0 + ctsize))
680 overlap = find_overlaps (good)
682 return [ g [0] for g in good ]
685 def reconstruct_offsets (fname, secret):
686 ifd = os.open (fname, os.O_RDONLY)
689 cands = locate_hdr_candidates (ifd)
690 return readable_objects_offsets (ifd, secret, cands)
695 ###############################################################################
697 ###############################################################################
699 def make_secret (password=None, key=None):
701 Safely create a “secret” value that consists either of a key or a password.
702 Inputs are validated: the password is accepted as (UTF-8 encoded) bytes or
703 string; for the key only a bytes object of the proper size or a base64
704 encoded string thereof is accepted.
706 If both are provided, the key is preferred over the password; no checks are
707 performed whether the key is derived from the password.
709 :returns: secret value if inputs were acceptable | None otherwise.
712 if isinstance (key, str) is True:
713 key = key.encode ("utf-8")
714 if isinstance (key, bytes) is True:
715 if len (key) == AES_KEY_SIZE:
716 return (PDTCRYPT_SECRET_KEY, key)
717 if len (key) == AES_KEY_SIZE * 2:
719 key = binascii.unhexlify (key)
720 return (PDTCRYPT_SECRET_KEY, key)
721 except binascii.Error: # garbage in string
723 if len (key) == AES_KEY_SIZE_B64:
725 key = base64.b64decode (key)
726 # the base64 processor is very tolerant and allows for
727 # arbitrary trailing and leading data thus the data obtained
728 # must be checked for the proper length
729 if len (key) == AES_KEY_SIZE:
730 return (PDTCRYPT_SECRET_KEY, key)
731 except binascii.Error: # “incorrect padding”
733 elif password is not None:
734 if isinstance (password, str) is True:
735 return (PDTCRYPT_SECRET_PW, password)
736 elif isinstance (password, bytes) is True:
738 password = password.decode ("utf-8")
739 return (PDTCRYPT_SECRET_PW, password)
740 except UnicodeDecodeError:
746 ###############################################################################
747 ## passthrough / null encryption
748 ###############################################################################
750 class PassthroughCipher (object):
752 tag = struct.pack ("<QQ", 0, 0)
754 def __init__ (self) : pass
756 def update (self, b) : return b
758 def finalize (self) : return b""
760 def finalize_with_tag (self, _) : return b""
762 ###############################################################################
763 ## convenience wrapper
764 ###############################################################################
767 def kdf_dummy (klen, password, _nacl):
769 Fake KDF for testing purposes that is called when parameter version zero is
772 q, r = divmod (klen, len (password))
773 if isinstance (password, bytes) is False:
774 password = password.encode ()
775 return password * q + password [:r], b""
778 SCRYPT_KEY_MEMO = { } # static because needed for both the info file and the archive
781 def kdf_scrypt (params, password, nacl):
783 Wrapper for the Scrypt KDF, corresponds to parameter version one. The
784 computation result is memoized based on the inputs to facilitate spawning
785 multiple encryption contexts.
790 dkLen = params["dkLen"]
793 nacl = os.urandom (params["NaCl_LEN"])
795 key_parms = (password, nacl, N, r, p, dkLen)
796 global SCRYPT_KEY_MEMO
797 if key_parms not in SCRYPT_KEY_MEMO:
798 SCRYPT_KEY_MEMO [key_parms] = \
799 pylibscrypt.scrypt (password, nacl, N, r, p, dkLen)
800 return SCRYPT_KEY_MEMO [key_parms], nacl
803 def kdf_by_version (paramversion=None, defs=None):
805 Pick the KDF handler corresponding to the parameter version or the
808 :rtype: function (password : str, nacl : str) -> str
810 if paramversion is not None:
811 defs = ENCRYPTION_PARAMETERS.get(paramversion, None)
813 raise InvalidParameter ("no encryption parameters for version %r"
815 (kdf, params) = defs["kdf"]
817 if kdf == "scrypt" : fn = kdf_scrypt
818 if kdf == "dummy" : fn = kdf_dummy
820 raise ValueError ("key derivation method %r unknown" % kdf)
821 return partial (fn, params)
824 ###############################################################################
826 ###############################################################################
828 def scrypt_hashsource (pw, ins):
830 Calculate the SCRYPT hash from the password and the information contained
831 in the first header found in ``ins``.
833 This does not validate whether the first object is encrypted correctly.
835 if isinstance (pw, str) is True:
837 elif isinstance (pw, bytes) is False:
838 raise InvalidParameter ("password must be a string, not %s"
840 if isinstance (ins, io.BufferedReader) is False and \
841 isinstance (ins, io.FileIO) is False:
842 raise InvalidParameter ("file to hash must be opened in “binary” mode")
845 hdr = hdr_read_stream (ins)
846 except EndOfFile as exn:
847 noise ("PDT: malformed input: end of file reading first object header")
852 pver = hdr ["paramversion"]
853 if PDTCRYPT_VERBOSE is True:
854 noise ("PDT: salt of first object : %s" % binascii.hexlify (nacl))
855 noise ("PDT: parameter version of archive : %d" % pver)
858 defs = ENCRYPTION_PARAMETERS.get(pver, None)
859 kdfname, params = defs ["kdf"]
860 if kdfname != "scrypt":
861 noise ("PDT: input is not an SCRYPT archive")
864 kdf = kdf_by_version (None, defs)
865 except ValueError as exn:
866 noise ("PDT: object has unknown parameter version %d" % pver)
868 hsh, _void = kdf (pw, nacl)
870 return hsh, nacl, hdr ["version"], pver
873 def scrypt_hashfile (pw, fname):
875 Calculate the SCRYPT hash from the password and the information contained
876 in the first header found in the given file. The header is read only at
879 with deptdcrypt_mk_stream (PDTCRYPT_SOURCE, fname or "-") as ins:
880 hsh, _void, _void, _void = scrypt_hashsource (pw, ins)
884 ###############################################################################
886 ###############################################################################
888 class Crypto (object):
890 Encryption context to remain alive throughout an entire tarfile pass.
895 cnt = None # file counter (uint32_t != 0)
896 iv = None # current IV
897 fixed = None # accu for 64 bit fixed parts of IV
898 used_ivs = None # tracks IVs
899 strict_ivs = False # if True, panic on duplicate object IV
908 info_counter_used = False
909 index_counter_used = False
911 def __init__ (self, *al, **akv):
912 self.used_ivs = set ()
913 self.set_parameters (*al, **akv)
916 def next_fixed (self):
921 def set_object_counter (self, cnt=None):
923 Safely set the internal counter of encrypted objects. Numerous
926 The same counter may not be reused in combination with one IV fixed
927 part. This is validated elsewhere in the IV handling.
929 Counter zero is invalid. The first two counters are reserved for
930 metadata. The implementation does not allow for splitting metadata
931 files over multiple encrypted objects. (This would be possible by
932 assigning new fixed parts.) Thus in a Deltatar backup there is at most
933 one object with a counter value of one and two. On creation of a
934 context, the initial counter may be chosen. The globals
935 ``AES_GCM_IV_CNT_INFOFILE`` and ``AES_GCM_IV_CNT_INDEX`` can be used to
936 request one of the reserved values. If one of these values has been
937 used, any further attempt of setting the counter to that value will
938 be rejected with an ``InvalidFileCounter`` exception.
940 Out of bounds values (i. e. below one and more than the maximum of 2³²)
941 cause an ``InvalidParameter`` exception to be thrown.
944 self.cnt = AES_GCM_IV_CNT_DATA
946 if cnt == 0 or cnt > AES_GCM_IV_CNT_MAX + 1:
947 raise InvalidParameter ("invalid counter value %d requested: "
948 "acceptable values are from 1 to %d"
949 % (cnt, AES_GCM_IV_CNT_MAX))
950 if cnt == AES_GCM_IV_CNT_INFOFILE:
951 if self.info_counter_used is True:
952 raise InvalidFileCounter ("attempted to reuse info file "
953 "counter %d: must be unique" % cnt)
954 self.info_counter_used = True
955 elif cnt == AES_GCM_IV_CNT_INDEX:
956 if self.index_counter_used is True:
957 raise InvalidFileCounter ("attempted to reuse index file "
958 " counter %d: must be unique" % cnt)
959 self.index_counter_used = True
960 if cnt <= AES_GCM_IV_CNT_MAX:
963 # cnt == AES_GCM_IV_CNT_MAX + 1 → wrap
964 self.cnt = AES_GCM_IV_CNT_DATA
968 def set_parameters (self, password=None, key=None, paramversion=None,
969 nacl=None, counter=None, strict_ivs=False):
971 Configure the internal state of a crypto context. Not intended for
975 self.set_object_counter (counter)
976 self.strict_ivs = strict_ivs
978 if paramversion is not None:
979 self.paramversion = paramversion
982 self.key, self.nacl = key, nacl
985 if password is not None:
986 if isinstance (password, bytes) is False:
987 password = str.encode (password)
988 self.password = password
989 if paramversion is None and nacl is None:
990 # postpone key setup until first header is available
992 kdf = kdf_by_version (paramversion)
994 self.key, self.nacl = kdf (password, nacl)
997 def process (self, buf):
999 Encrypt / decrypt a buffer. Invokes the ``.update()`` method on the
1000 wrapped encryptor or decryptor, respectively.
1002 The Cryptography exception ``AlreadyFinalized`` is translated to an
1003 ``InternalError`` at this point. It may occur in sound code when the GC
1004 closes an encrypting stream after an error. Everywhere else it must be
1007 if self.enc is None:
1008 raise RuntimeError ("process: context not initialized")
1009 self.stats ["in"] += len (buf)
1011 out = self.enc.update (buf)
1012 except cryptography.exceptions.AlreadyFinalized as exn:
1013 raise InternalError (exn)
1014 self.stats ["out"] += len (out)
1018 def next (self, password, paramversion, nacl, iv):
1020 Prepare for encrypting another object: Reset the data counters and
1021 change the configuration in case one of the variable parameters differs
1022 from the last object. Also check the IV for duplicates and error out
1023 if strict checking was requested.
1027 self.stats ["obj"] += 1
1029 self.check_duplicate_iv (iv)
1031 if ( self.paramversion != paramversion
1032 or self.password != password
1033 or self.nacl != nacl):
1034 self.set_parameters (password=password, paramversion=paramversion,
1035 nacl=nacl, strict_ivs=self.strict_ivs)
1038 def check_duplicate_iv (self, iv):
1040 Add an IV (the 12 byte representation as in the header) to the list. With
1041 strict checking enabled, this will throw a ``DuplicateIV``. Depending on
1042 the context, this may indicate a serious error (IV reuse).
1044 if self.strict_ivs is True and iv in self.used_ivs:
1045 raise DuplicateIV ("iv %s was reused" % iv_fmt (iv))
1046 # vi has not been used before; add to collection
1047 self.used_ivs.add (iv)
1050 def counters (self):
1052 Access the data counters.
1054 return self.stats ["obj"], self.stats ["in"], self.stats ["out"]
1059 Clear the current context regardless of its finalization state. The
1060 next operation must be ``.next()``.
1065 class Encrypt (Crypto):
1071 def __init__ (self, version, paramversion, password=None, key=None, nacl=None,
1072 counter=AES_GCM_IV_CNT_DATA, strict_ivs=True):
1074 The ctor will throw immediately if one of the parameters does not conform
1075 to our expectations.
1077 counter=AES_GCM_IV_CNT_DATA, strict_ivs=True):
1078 :type version: int to fit uint16_t
1079 :type paramversion: int to fit uint16_t
1080 :param password: mutually exclusive with ``key``
1081 :type password: bytes
1082 :param key: mutually exclusive with ``password``
1085 :type counter: initial object counter the values
1086 ``AES_GCM_IV_CNT_INFOFILE`` and
1087 ``AES_GCM_IV_CNT_INDEX`` are unique in each backup set
1088 and cannot be reused even with different fixed parts.
1089 :type strict_ivs: bool
1091 if password is None and key is None \
1092 or password is not None and key is not None :
1093 raise InvalidParameter ("__init__: need either key or password")
1096 if isinstance (key, bytes) is False:
1097 raise InvalidParameter ("__init__: key must be provided as "
1098 "bytes, not %s" % type (key))
1100 raise InvalidParameter ("__init__: salt must be provided along "
1101 "with encryption key")
1102 else: # password, no key
1103 if isinstance (password, str) is False:
1104 raise InvalidParameter ("__init__: password must be a string, not %s"
1106 if len (password) == 0:
1107 raise InvalidParameter ("__init__: supplied empty password but not "
1108 "permitted for PDT encrypted files")
1110 if isinstance (version, int) is False:
1111 raise InvalidParameter ("__init__: version number must be an "
1112 "integer, not %s" % type (version))
1114 raise InvalidParameter ("__init__: version number must be a "
1115 "nonnegative integer, not %d" % version)
1117 if isinstance (paramversion, int) is False:
1118 raise InvalidParameter ("__init__: crypto parameter version number "
1119 "must be an integer, not %s"
1120 % type (paramversion))
1121 if paramversion < 0:
1122 raise InvalidParameter ("__init__: crypto parameter version number "
1123 "must be a nonnegative integer, not %d"
1126 if nacl is not None:
1127 if isinstance (nacl, bytes) is False:
1128 raise InvalidParameter ("__init__: salt given, but of type %s "
1129 "instead of bytes" % type (nacl))
1130 # salt length would depend on the actual encryption so it can’t be
1131 # validated at this point
1133 self.version = version
1134 self.paramenc = ENCRYPTION_PARAMETERS.get (paramversion) ["enc"]
1136 super().__init__ (password, key, paramversion, nacl, counter=counter,
1137 strict_ivs=strict_ivs)
1140 def next_fixed (self, retries=PDTCRYPT_IV_GEN_MAX_RETRIES):
1142 Generate the next IV fixed part by reading eight bytes from
1143 ``/dev/urandom``. The buffer so obtained is tested against the fixed
1144 parts used so far to prevent accidental reuse of IVs. After a
1145 configurable number of attempts to create a unique fixed part, it will
1146 refuse to continue with an ``IVFixedPartError``. This is unlikely to
1147 ever happen on a normal system but may detect an issue with the random
1150 The list of fixed parts that were used by the context at hand can be
1151 accessed through the ``.fixed`` list. Its last element is the fixed
1152 part currently in use.
1156 fp = os.urandom (PDTCRYPT_IV_FIXEDPART_SIZE)
1157 if fp not in self.fixed:
1158 self.fixed.append (fp)
1161 raise IVFixedPartError ("error obtaining a unique IV fixed part from "
1162 "/dev/urandom; giving up after %d tries" % i)
1167 Construct a 12-bytes IV from the current fixed part and the object
1170 return struct.pack(FMT_I2N_IV, self.fixed [-1], self.cnt)
1173 def next (self, filename=None, counter=None):
1175 Prepare for encrypting the next incoming object. Update the counter
1176 and put together the IV, possibly changing prefixes. Then create the
1179 The argument ``counter`` can be used to specify a file counter for this
1180 object. Unless it is one of the reserved values, the counter of
1181 subsequent objects will be computed from this one.
1183 If this is the first object in a series, ``filename`` is required,
1184 otherwise it is reused if not present. The value is used to derive a
1185 header sized placeholder to use until after encryption when all the
1186 inputs to construct the final header are available. This is then
1187 matched in ``.done()`` against the value found at the position of the
1188 header. The motivation for this extra check is primarily to assist
1189 format debugging: It makes stray headers easy to spot in malformed
1192 if filename is None:
1193 if self.lastinfo is None:
1194 raise InvalidParameter ("next: filename is mandatory for "
1196 filename, _dummy = self.lastinfo
1198 if isinstance (filename, str) is False:
1199 raise InvalidParameter ("next: filename must be a string, no %s"
1201 if counter is not None:
1202 if isinstance (counter, int) is False:
1203 raise InvalidParameter ("next: the supplied counter is of "
1204 "invalid type %s; please pass an "
1205 "integer instead" % type (counter))
1206 self.set_object_counter (counter)
1208 self.iv = self.iv_make ()
1209 if self.paramenc == "aes-gcm":
1211 ( algorithms.AES (self.key)
1212 , modes.GCM (self.iv)
1213 , backend = default_backend ()) \
1215 elif self.paramenc == "passthrough":
1216 self.enc = PassthroughCipher ()
1218 raise InvalidParameter ("next: parameter version %d not known"
1219 % self.paramversion)
1220 hdrdum = hdr_make_dummy (filename)
1221 self.lastinfo = (filename, hdrdum)
1222 super().next (self.password, self.paramversion, self.nacl, self.iv)
1224 self.set_object_counter (self.cnt + 1)
1228 def done (self, cmpdata):
1230 Complete encryption of an object. After this has been called, attempts
1231 of encrypting further data will cause an error until ``.next()`` is
1234 Returns a 64 bytes buffer containing the object header including all
1235 values including the “late” ones e. g. the ciphertext size and the
1238 if isinstance (cmpdata, bytes) is False:
1239 raise InvalidParameter ("done: comparison input expected as bytes, "
1240 "not %s" % type (cmpdata))
1241 if self.lastinfo is None:
1242 raise RuntimeError ("done: encryption context not initialized")
1243 filename, hdrdum = self.lastinfo
1244 if cmpdata != hdrdum:
1245 raise RuntimeError ("done: bad sync of header for object %d: "
1246 "preliminary data does not match; this likely "
1247 "indicates a wrongly repositioned stream"
1249 data = self.enc.finalize ()
1250 self.stats ["out"] += len (data)
1251 self.ctsize += len (data)
1252 ok, hdr = hdr_from_params (self.version, self.paramversion, self.nacl,
1253 self.iv, self.ctsize, self.enc.tag)
1255 raise InternalError ("error constructing header: %r" % hdr)
1256 return data, hdr, self.fixed
1259 def process (self, buf):
1261 Encrypt a chunk of plaintext with the active encryptor. Returns the
1262 size of the input consumed. This **must** be checked downstream. If the
1263 maximum possible object size has been reached, the current context must
1264 be finalized and a new one established before any further data can be
1265 encrypted. The second argument is the remainder of the plaintext that
1266 was not encrypted for the caller to use immediately after the new
1269 if isinstance (buf, bytes) is False:
1270 raise InvalidParameter ("process: expected byte buffer, not %s"
1273 newptsize = self.ptsize + bsize
1274 diff = newptsize - PDTCRYPT_MAX_OBJ_SIZE
1277 newptsize = PDTCRYPT_MAX_OBJ_SIZE
1278 self.ptsize = newptsize
1279 data = super().process (buf [:bsize])
1280 self.ctsize += len (data)
1284 class Decrypt (Crypto):
1286 tag = None # GCM tag, part of header
1287 last_iv = None # check consecutive ivs in strict mode
1289 def __init__ (self, password=None, key=None, counter=None, fixedparts=None,
1292 Sanitizing ctor for the decryption context. ``fixedparts`` specifies a
1293 list of IV fixed parts accepted during decryption. If a fixed part is
1294 encountered that is not in the list, decryption will fail.
1296 :param password: mutually exclusive with ``key``
1297 :type password: bytes
1298 :param key: mutually exclusive with ``password``
1300 :type counter: initial object counter the values
1301 ``AES_GCM_IV_CNT_INFOFILE`` and
1302 ``AES_GCM_IV_CNT_INDEX`` are unique in each backup set
1303 and cannot be reused even with different fixed parts.
1304 :type fixedparts: bytes list
1306 if password is None and key is None \
1307 or password is not None and key is not None :
1308 raise InvalidParameter ("__init__: need either key or password")
1311 if isinstance (key, bytes) is False:
1312 raise InvalidParameter ("__init__: key must be provided as "
1313 "bytes, not %s" % type (key))
1314 else: # password, no key
1315 if isinstance (password, str) is False:
1316 raise InvalidParameter ("__init__: password must be a string, not %s"
1318 if len (password) == 0:
1319 raise InvalidParameter ("__init__: supplied empty password but not "
1320 "permitted for PDT encrypted files")
1322 if fixedparts is not None:
1323 if isinstance (fixedparts, list) is False:
1324 raise InvalidParameter ("__init__: IV fixed parts must be "
1325 "supplied as list, not %s"
1326 % type (fixedparts))
1327 self.fixed = fixedparts
1330 super().__init__ (password=password, key=key, counter=counter,
1331 strict_ivs=strict_ivs)
1334 def valid_fixed_part (self, iv):
1336 Check if a fixed part was already seen.
1338 # check if fixed part is known
1339 fixed, _cnt = struct.unpack (FMT_I2N_IV, iv)
1340 i = bisect.bisect_left (self.fixed, fixed)
1341 return i != len (self.fixed) and self.fixed [i] == fixed
1344 def check_consecutive_iv (self, iv):
1346 Check whether the counter part of the given IV is indeed the successor
1347 of the currently present counter. This should always be the case for
1348 the objects in a well formed PDT archive but should not be enforced
1349 when decrypting out-of-order.
1351 fixed, cnt = struct.unpack (FMT_I2N_IV, iv)
1352 if self.strict_ivs is True \
1353 and self.last_iv is not None \
1354 and self.last_iv [0] == fixed \
1355 and self.last_iv [1] != cnt - 1:
1356 raise NonConsecutiveIV ("iv %s counter not successor of "
1357 "last object (expected %d, found %d)"
1358 % (iv_fmt (self.last_iv [1]), cnt))
1359 self.last_iv = (iv, cnt)
1362 def next (self, hdr):
1364 Start decrypting the next object. The PDTCRYPT header for the object
1365 can be given either as already parsed object or as bytes.
1367 if isinstance (hdr, bytes) is True:
1368 hdr = hdr_read (hdr)
1369 elif isinstance (hdr, dict) is False:
1370 # this won’t catch malformed specs though
1371 raise InvalidParameter ("next: wrong type of parameter hdr: "
1372 "expected bytes or spec, got %s"
1375 paramversion = hdr ["paramversion"]
1380 raise InvalidHeader ("next: not a header %r" % hdr)
1382 super().next (self.password, paramversion, nacl, iv)
1383 if self.fixed is not None and self.valid_fixed_part (iv) is False:
1384 raise InvalidIVFixedPart ("iv %s has invalid fixed part"
1386 self.check_consecutive_iv (iv)
1389 defs = ENCRYPTION_PARAMETERS.get (paramversion, None)
1391 raise FormatError ("header contains unknown parameter version %d; "
1392 "maybe the file was created by a more recent "
1393 "version of Deltatar" % paramversion)
1395 if enc == "aes-gcm":
1397 ( algorithms.AES (self.key)
1398 , modes.GCM (iv, tag=self.tag)
1399 , backend = default_backend ()) \
1401 elif enc == "passthrough":
1402 self.enc = PassthroughCipher ()
1404 raise InternalError ("encryption parameter set %d refers to unknown "
1405 "mode %r" % (paramversion, enc))
1406 self.set_object_counter (self.cnt + 1)
1409 def done (self, tag=None):
1411 Stop decryption of the current object and finalize it with the active
1412 context. This will throw an *InvalidGCMTag* exception to indicate that
1413 the authentication tag does not match the data. If the tag is correct,
1414 the rest of the plaintext is returned.
1419 data = self.enc.finalize ()
1421 if isinstance (tag, bytes) is False:
1422 raise InvalidParameter ("done: wrong type of parameter "
1423 "tag: expected bytes, got %s"
1425 data = self.enc.finalize_with_tag (self.tag)
1426 except cryptography.exceptions.InvalidTag:
1427 raise InvalidGCMTag ("done: tag mismatch of object %d: %s "
1428 "rejected by finalize ()"
1429 % (self.cnt, binascii.hexlify (self.tag)))
1430 self.ctsize += len (data)
1431 self.stats ["out"] += len (data)
1435 def process (self, buf):
1437 Decrypt the bytes object *buf* with the active decryptor.
1439 if isinstance (buf, bytes) is False:
1440 raise InvalidParameter ("process: expected byte buffer, not %s"
1442 self.ctsize += len (buf)
1443 data = super().process (buf)
1444 self.ptsize += len (data)
1448 ###############################################################################
1450 ###############################################################################
1452 def _patch_global (glob, vow, n=None):
1454 Adapt upper file counter bound for testing IV logic. Completely unsafe.
1456 assert vow == "I am fully aware that this will void my warranty."
1457 r = globals () [glob]
1459 n = globals () [glob + "_DEFAULT"]
1460 globals () [glob] = n
1463 _testing_set_AES_GCM_IV_CNT_MAX = \
1464 partial (_patch_global, "AES_GCM_IV_CNT_MAX")
1466 _testing_set_PDTCRYPT_MAX_OBJ_SIZE = \
1467 partial (_patch_global, "PDTCRYPT_MAX_OBJ_SIZE")
1469 def open2_dump_file (fname, dir_fd, force=False):
1472 oflags = os.O_CREAT | os.O_WRONLY
1474 oflags |= os.O_TRUNC
1479 outfd = os.open (fname, oflags,
1480 stat.S_IRUSR | stat.S_IWUSR, dir_fd=dir_fd)
1481 except FileExistsError as exn:
1482 noise ("PDT: refusing to overwrite existing file %s" % fname)
1484 raise RuntimeError ("destination file %s already exists" % fname)
1485 if PDTCRYPT_VERBOSE is True:
1486 noise ("PDT: new output file %s (fd=%d)" % (fname, outfd))
1490 ###############################################################################
1491 ## freestanding invocation
1492 ###############################################################################
1494 PDTCRYPT_SUB_PROCESS = 0
1495 PDTCRYPT_SUB_SCRYPT = 1
1496 PDTCRYPT_SUB_SCAN = 2
1499 { "process" : PDTCRYPT_SUB_PROCESS
1500 , "scrypt" : PDTCRYPT_SUB_SCRYPT
1501 , "scan" : PDTCRYPT_SUB_SCAN }
1503 PDTCRYPT_DECRYPT = 1 << 0 # decrypt archive with password
1504 PDTCRYPT_SPLIT = 1 << 1 # split archive into individual objects
1505 PDTCRYPT_HASH = 1 << 2 # output scrypt hash for file and given password
1507 PDTCRYPT_SPLITNAME = "pdtcrypt-object-%d.bin"
1508 PDTCRYPT_RESCUENAME = "pdtcrypt-rescue-object-%0.5d.bin"
1510 PDTCRYPT_VERBOSE = False
1511 PDTCRYPT_STRICTIVS = False
1512 PDTCRYPT_OVERWRITE = False
1513 PDTCRYPT_BLOCKSIZE = 1 << 12
1518 PDTCRYPT_DEFAULT_VER = 1
1519 PDTCRYPT_DEFAULT_PVER = 1
1521 # scrypt hashing output control
1522 PDTCRYPT_SCRYPT_INTRANATOR = 0
1523 PDTCRYPT_SCRYPT_PARAMETERS = 1
1524 PDTCRYPT_SCRYPT_DEFAULT = PDTCRYPT_SCRYPT_INTRANATOR
1526 PDTCRYPT_SCRYPT_FORMAT = \
1527 { "i2n" : PDTCRYPT_SCRYPT_INTRANATOR
1528 , "params" : PDTCRYPT_SCRYPT_PARAMETERS }
1530 PDTCRYPT_TT_COLUMNS = 80 # assume standard terminal
1532 class PDTDecryptionError (Exception):
1533 """Decryption failed."""
1535 class PDTSplitError (Exception):
1536 """Decryption failed."""
1539 def noise (*a, **b):
1540 print (file=sys.stderr, *a, **b)
1543 class PassthroughDecryptor (object):
1545 curhdr = None # write current header on first data write
1547 def __init__ (self):
1548 if PDTCRYPT_VERBOSE is True:
1549 noise ("PDT: no encryption; data passthrough")
1551 def next (self, hdr):
1552 ok, curhdr = hdr_make (hdr)
1554 raise PDTDecryptionError ("bad header %r" % hdr)
1555 self.curhdr = curhdr
1558 if self.curhdr is not None:
1562 def process (self, d):
1563 if self.curhdr is not None:
1569 def depdtcrypt (mode, secret, ins, outs):
1571 Remove PDTCRYPT layer from all objects encrypted with the secret. Used on a
1572 Deltatar backup this will yield a (possibly Gzip compressed) tarball.
1574 ctleft = -1 # length of ciphertext to consume
1575 ctcurrent = 0 # total ciphertext of current object
1576 total_obj = 0 # total number of objects read
1577 total_pt = 0 # total plaintext bytes
1578 total_ct = 0 # total ciphertext bytes
1579 total_read = 0 # total bytes read
1580 outfile = None # Python file object for output
1582 if mode & PDTCRYPT_DECRYPT: # decryptor
1584 if ks == PDTCRYPT_SECRET_PW:
1585 decr = Decrypt (password=secret [1], strict_ivs=PDTCRYPT_STRICTIVS)
1586 elif ks == PDTCRYPT_SECRET_KEY:
1588 decr = Decrypt (key=key, strict_ivs=PDTCRYPT_STRICTIVS)
1590 raise InternalError ("‘%d’ does not specify a valid kind of secret"
1593 decr = PassthroughDecryptor ()
1596 """Dummy for non-split mode: output file does not vary."""
1599 if mode & PDTCRYPT_SPLIT:
1600 def nextout (outfile):
1602 We were passed an fd as outs for accessing the destination
1603 directory where extracted archive components are supposed
1608 if PDTCRYPT_VERBOSE is True:
1609 noise ("PDT: no output file to close at this point")
1611 if PDTCRYPT_VERBOSE is True:
1612 noise ("PDT: release output file %r" % outfile)
1613 # cleanup happens automatically by the GC; the next
1614 # line will error out on account of an invalid fd
1617 assert total_obj > 0
1618 fname = PDTCRYPT_SPLITNAME % total_obj
1620 outfd = open2_dump_file (fname, outs, force=PDTCRYPT_OVERWRITE)
1621 except RuntimeError as exn:
1622 raise PDTSplitError (exn)
1623 return os.fdopen (outfd, "wb", closefd=True)
1627 """ESPIPE is normal on non-seekable stdio stream."""
1630 except OSError as exn:
1631 if exn.errno == os.errno.ESPIPE:
1634 def out (pt, outfile):
1638 if PDTCRYPT_VERBOSE is True:
1639 noise ("PDT:\t· decrypt plaintext %d B" % (npt))
1641 nn = outfile.write (pt)
1642 except OSError as exn: # probably ENOSPC
1643 raise DecryptionError ("error (%s)" % exn)
1645 raise DecryptionError ("write aborted after %d of %d B" % (nn, npt))
1649 # current object completed; in a valid archive this marks either
1650 # the start of a new header or the end of the input
1651 if ctleft == 0: # current object requires finalization
1652 if PDTCRYPT_VERBOSE is True:
1653 noise ("PDT: %d finalize" % tell (ins))
1656 except InvalidGCMTag as exn:
1657 raise DecryptionError ("error finalizing object %d (%d B): "
1658 "%r" % (total_obj, len (pt), exn)) \
1661 if PDTCRYPT_VERBOSE is True:
1662 noise ("PDT:\t· object validated")
1664 if PDTCRYPT_VERBOSE is True:
1665 noise ("PDT: %d hdr" % tell (ins))
1667 hdr = hdr_read_stream (ins)
1668 total_read += PDTCRYPT_HDR_SIZE
1669 except EndOfFile as exn:
1670 total_read += exn.remainder
1671 if total_ct + total_obj * PDTCRYPT_HDR_SIZE != total_read:
1672 raise PDTDecryptionError ("ciphertext processed (%d B) plus "
1673 "overhead (%d × %d B) does not match "
1674 "the number of bytes read (%d )"
1675 % (total_ct, total_obj, PDTCRYPT_HDR_SIZE,
1677 # the single good exit
1678 return total_read, total_obj, total_ct, total_pt
1679 except InvalidHeader as exn:
1680 raise PDTDecryptionError ("invalid header at position %d in %r "
1681 "(%s)" % (tell (ins), exn, ins))
1682 if PDTCRYPT_VERBOSE is True:
1683 pretty = hdr_fmt_pretty (hdr)
1684 noise (reduce (lambda a, e: (a + "\n" if a else "") + "PDT:\t· " + e,
1685 pretty.splitlines (), ""))
1686 ctcurrent = ctleft = hdr ["ctsize"]
1690 total_obj += 1 # used in file counter with split mode
1692 # finalization complete or skipped in case of first object in
1693 # stream; create a new output file if necessary
1694 outfile = nextout (outfile)
1696 if PDTCRYPT_VERBOSE is True:
1697 noise ("PDT: %d decrypt obj no. %d, %d B"
1698 % (tell (ins), total_obj, ctleft))
1700 # always allocate a new buffer since python-cryptography doesn’t allow
1701 # passing a bytearray :/
1702 nexpect = min (ctleft, PDTCRYPT_BLOCKSIZE)
1703 if PDTCRYPT_VERBOSE is True:
1704 noise ("PDT:\t· [%d] %d%% done, read block (%d B of %d B remaining)"
1706 100 - ctleft * 100 / (ctcurrent > 0 and ctcurrent or 1),
1708 ct = ins.read (nexpect)
1712 raise EndOfFile (nct,
1713 "hit EOF after %d of %d B in block [%d:%d); "
1714 "%d B ciphertext remaining for object no %d"
1715 % (nct, nexpect, off, off + nexpect, ctleft,
1721 if PDTCRYPT_VERBOSE is True:
1722 noise ("PDT:\t· decrypt ciphertext %d B" % (nct))
1723 pt = decr.process (ct)
1727 def deptdcrypt_mk_stream (kind, path):
1728 """Create stream from file or stdio descriptor."""
1729 if kind == PDTCRYPT_SINK:
1731 if PDTCRYPT_VERBOSE is True: noise ("PDT: sink: stdout")
1732 return sys.stdout.buffer
1734 if PDTCRYPT_VERBOSE is True: noise ("PDT: sink: file %s" % path)
1735 return io.FileIO (path, "w")
1736 if kind == PDTCRYPT_SOURCE:
1738 if PDTCRYPT_VERBOSE is True: noise ("PDT: source: stdin")
1739 return sys.stdin.buffer
1741 if PDTCRYPT_VERBOSE is True: noise ("PDT: source: file %s" % path)
1742 return io.FileIO (path, "r")
1744 raise ValueError ("bogus stream “%s” / %s" % (kind, path))
1747 def mode_depdtcrypt (mode, secret, ins, outs):
1749 total_read, total_obj, total_ct, total_pt = \
1750 depdtcrypt (mode, secret, ins, outs)
1751 except DecryptionError as exn:
1752 noise ("PDT: Decryption failed:")
1754 noise ("PDT: “%s”" % exn)
1756 noise ("PDT: Did you specify the correct key / password?")
1759 except PDTSplitError as exn:
1760 noise ("PDT: Split operation failed:")
1762 noise ("PDT: “%s”" % exn)
1764 noise ("PDT: Hint: target directory should be empty.")
1768 if PDTCRYPT_VERBOSE is True:
1769 noise ("PDT: decryption successful" )
1770 noise ("PDT: %.10d bytes read" % total_read)
1771 noise ("PDT: %.10d objects decrypted" % total_obj )
1772 noise ("PDT: %.10d bytes ciphertext" % total_ct )
1773 noise ("PDT: %.10d bytes plaintext" % total_pt )
1779 def mode_scrypt (pw, ins=None, nacl=None, fmt=PDTCRYPT_SCRYPT_INTRANATOR):
1781 paramversion = PDTCRYPT_DEFAULT_PVER
1783 hsh, nacl, version, paramversion = scrypt_hashsource (pw, ins)
1784 defs = ENCRYPTION_PARAMETERS.get(paramversion, None)
1786 nacl = binascii.unhexlify (nacl)
1787 defs = ENCRYPTION_PARAMETERS.get(paramversion, None)
1788 version = PDTCRYPT_DEFAULT_VER
1790 kdfname, params = defs ["kdf"]
1792 kdf = kdf_by_version (None, defs)
1793 hsh, _void = kdf (pw, nacl)
1797 if fmt == PDTCRYPT_SCRYPT_INTRANATOR:
1798 out = json.dumps ({ "salt" : base64.b64encode (nacl).decode ()
1799 , "key" : base64.b64encode (hsh) .decode ()
1800 , "paramversion" : paramversion })
1801 elif fmt == PDTCRYPT_SCRYPT_PARAMETERS:
1802 out = json.dumps ({ "salt" : binascii.hexlify (nacl).decode ()
1803 , "key" : binascii.hexlify (hsh) .decode ()
1804 , "version" : version
1805 , "scrypt_params" : { "N" : params ["N"]
1806 , "r" : params ["r"]
1807 , "p" : params ["p"]
1808 , "dkLen" : params ["dkLen"] } })
1810 raise RuntimeError ("bad scrypt output scheme %r" % fmt)
1815 def noise_output_candidates (cands, indent=8, cols=PDTCRYPT_TT_COLUMNS):
1817 Print a list of offsets without garbling the terminal too much.
1819 The indent is counted from column zero; if it is wide enough, the “PDT: ”
1820 marker will be prepended, considered part of the indentation.
1824 idt = " " * indent if indent < 5 else "PDT: " + " " * (indent - 5)
1829 init = True # prevent leading separator
1832 raise ValueError ("the requested indentation exceeds the line "
1833 "width by %d" % (indent - wd))
1843 if lpos > wd: # line break
1859 SLICE_START = 1 # ordering is important to have starts of intervals
1860 SLICE_END = 0 # sorted before equal ends
1862 def find_overlaps (slices):
1864 Find overlapping slices: iterate open/close points of intervals, tracking
1865 the ones open at any time.
1868 inside = set () # of indices into bounds
1869 ovrlp = set () # of indices into bounds
1871 for i, s in enumerate (slices):
1872 bounds.append ((s [0], SLICE_START, i))
1873 bounds.append ((s [1], SLICE_END , i))
1874 bounds = sorted (bounds)
1878 if val [1] == SLICE_START:
1881 if len (inside) > 1: # closing one that overlapped
1885 return [ slices [i] for i in ovrlp ]
1888 def mode_scan (secret, fname, outs=None, nacl=None):
1890 Dissect a binary file, looking for PDTCRYPT headers and objects.
1892 If *outs* is supplied, recoverable data will be dumped into the specified
1896 ifd = os.open (fname, os.O_RDONLY)
1897 except FileNotFoundError:
1898 noise ("PDT: failed to open %s readonly" % fname)
1903 if PDTCRYPT_VERBOSE is True:
1904 noise ("PDT: scan for potential sync points")
1905 cands = locate_hdr_candidates (ifd)
1906 if len (cands) == 0:
1907 noise ("PDT: scan complete: input does not contain potential PDT "
1908 "headers; giving up.")
1910 if PDTCRYPT_VERBOSE is True:
1911 noise ("PDT: scan complete: found %d candidates:" % len (cands))
1912 noise_output_candidates (cands)
1917 junk, todo, slices = [], [], []
1922 vdt, hdr = inspect_hdr (ifd, cand)
1924 vdts = verdict_fmt (vdt)
1926 if vdt == HDR_CAND_JUNK:
1927 noise ("PDT: obj %d: %s object: bad header, skipping" % vdts)
1930 off0 = cand + PDTCRYPT_HDR_SIZE
1931 if PDTCRYPT_VERBOSE is True:
1932 noise ("PDT: obj %d: read payload @%d" % (nobj, off0))
1933 pretty = hdr_fmt_pretty (hdr)
1934 noise (reduce (lambda a, e: (a + "\n" if a else "") + "PDT:\t· " + e,
1935 pretty.splitlines (), ""))
1938 if outs is not None:
1939 ofname = PDTCRYPT_RESCUENAME % nobj
1940 ofd = open2_dump_file (ofname, outs, force=PDTCRYPT_OVERWRITE)
1942 ctsize = hdr ["ctsize"]
1944 l = try_decrypt (ifd, off0, hdr, secret, ofd=ofd)
1946 slices.append ((off0, off0 + l))
1950 if vdt == HDR_CAND_GOOD and ok is True:
1951 noise ("PDT: %d → ✓ %s object %d–%d"
1952 % (cand, vdts, off0, off0 + ctsize))
1953 elif vdt == HDR_CAND_FISHY and ok is True:
1954 noise ("PDT: %d → × %s object %d–%d, corrupt header"
1955 % (cand, vdts, off0, off0 + ctsize))
1956 elif vdt == HDR_CAND_GOOD and ok is False:
1957 noise ("PDT: %d → × %s object %d–%d, problematic payload"
1958 % (cand, vdts, off0, off0 + ctsize))
1959 elif vdt == HDR_CAND_FISHY and ok is False:
1960 noise ("PDT: %d → × %s object %d–%d, corrupt header, problematic "
1961 "ciphertext" % (cand, vdts, off0, off0 + ctsize))
1968 noise ("PDT: all headers ok")
1970 noise ("PDT: %d candidates not parseable as headers:" % len (junk))
1971 noise_output_candidates (junk)
1973 overlap = find_overlaps (slices)
1974 if len (overlap) > 0:
1975 noise ("PDT: %d objects overlapping others" % len (overlap))
1976 for slice in overlap:
1977 noise ("PDT: × %d→%d" % (slice [0], slice [1]))
1979 def usage (err=False):
1983 indent = ' ' * len (SELF)
1984 out ("usage: %s SUBCOMMAND { --help" % SELF)
1985 out (" %s | [ -v ] { -p PASSWORD | -k KEY }" % indent)
1986 out (" %s [ { -i | --in } { - | SOURCE } ]" % indent)
1987 out (" %s [ { -n | --nacl } { SALT } ]" % indent)
1988 out (" %s [ { -o | --out } { - | DESTINATION } ]" % indent)
1989 out (" %s [ -D | --no-decrypt ] [ -S | --split ]" % indent)
1990 out (" %s [ -f | --format ]" % indent)
1993 out ("\t\tSUBCOMMAND main mode: { process | scrypt }")
1995 out ("\t\t process: extract objects from PDT archive")
1996 out ("\t\t scrypt: calculate hash from password and first object")
1997 out ("\t\t-p PASSWORD password to derive the encryption key from")
1998 out ("\t\t-k KEY encryption key as 16 bytes in hexadecimal notation")
1999 out ("\t\t-s enforce strict handling of initialization vectors")
2000 out ("\t\t-i SOURCE file name to read from")
2001 out ("\t\t-o DESTINATION file to write output to")
2002 out ("\t\t-n SALT provide salt for scrypt mode in hex encoding")
2003 out ("\t\t-v print extra info")
2004 out ("\t\t-S split into files at object boundaries; this")
2005 out ("\t\t requires DESTINATION to refer to directory")
2006 out ("\t\t-D PDT header and ciphertext passthrough")
2007 out ("\t\t-f format of SCRYPT hash output (“default” or “parameters”)")
2009 out ("\tinstead of filenames, “-” may used to specify stdin / stdout")
2011 sys.exit ((err is True) and 42 or 0)
2021 def parse_argv (argv):
2022 global PDTCRYPT_OVERWRITE
2024 mode = PDTCRYPT_DECRYPT
2030 scrypt_format = PDTCRYPT_SCRYPT_DEFAULT
2033 SELF = os.path.basename (next (argvi))
2036 rawsubcmd = next (argvi)
2037 subcommand = PDTCRYPT_SUB [rawsubcmd]
2038 except StopIteration:
2039 bail ("ERROR: subcommand required")
2041 bail ("ERROR: invalid subcommand “%s” specified" % rawsubcmd)
2047 except StopIteration:
2048 bail ("ERROR: argument list incomplete")
2050 def checked_secret (s):
2055 bail ("ERROR: encountered “%s” but secret already given" % arg)
2058 if arg in [ "-h", "--help" ]:
2061 elif arg in [ "-v", "--verbose", "--wtf" ]:
2062 global PDTCRYPT_VERBOSE
2063 PDTCRYPT_VERBOSE = True
2064 elif arg in [ "-i", "--in", "--source" ]:
2065 insspec = checked_arg ()
2066 if PDTCRYPT_VERBOSE is True: noise ("PDT: decrypt from %s" % insspec)
2067 elif arg in [ "-p", "--password" ]:
2068 arg = checked_arg ()
2069 checked_secret (make_secret (password=arg))
2070 if PDTCRYPT_VERBOSE is True: noise ("PDT: decrypting with password")
2072 if subcommand == PDTCRYPT_SUB_PROCESS:
2073 if arg in [ "-s", "--strict-ivs" ]:
2074 global PDTCRYPT_STRICTIVS
2075 PDTCRYPT_STRICTIVS = True
2076 elif arg in [ "-o", "--out", "--dest", "--sink" ]:
2077 outsspec = checked_arg ()
2078 if PDTCRYPT_VERBOSE is True: noise ("PDT: decrypt to %s" % outsspec)
2079 elif arg in [ "-f", "--force" ]:
2080 PDTCRYPT_OVERWRITE = True
2081 if PDTCRYPT_VERBOSE is True: noise ("PDT: overwrite existing files")
2082 elif arg in [ "-S", "--split" ]:
2083 mode |= PDTCRYPT_SPLIT
2084 if PDTCRYPT_VERBOSE is True: noise ("PDT: split files")
2085 elif arg in [ "-D", "--no-decrypt" ]:
2086 mode &= ~PDTCRYPT_DECRYPT
2087 if PDTCRYPT_VERBOSE is True: noise ("PDT: not decrypting")
2088 elif arg in [ "-k", "--key" ]:
2089 arg = checked_arg ()
2090 checked_secret (make_secret (key=arg))
2091 if PDTCRYPT_VERBOSE is True: noise ("PDT: decrypting with key")
2093 bail ("ERROR: unexpected positional argument “%s”" % arg)
2094 elif subcommand == PDTCRYPT_SUB_SCRYPT:
2095 if arg in [ "-n", "--nacl", "--salt" ]:
2096 nacl = checked_arg ()
2097 if PDTCRYPT_VERBOSE is True: noise ("PDT: salt key with %s" % nacl)
2098 elif arg in [ "-f", "--format" ]:
2099 arg = checked_arg ()
2101 scrypt_format = PDTCRYPT_SCRYPT_FORMAT [arg]
2103 bail ("ERROR: invalid scrypt output format %s" % arg)
2104 if PDTCRYPT_VERBOSE is True:
2105 noise ("PDT: scrypt output format “%s”" % scrypt_format)
2107 bail ("ERROR: unexpected positional argument “%s”" % arg)
2108 elif subcommand == PDTCRYPT_SUB_SCAN:
2109 if arg in [ "-o", "--out", "--dest", "--sink" ]:
2110 outsspec = checked_arg ()
2111 if PDTCRYPT_VERBOSE is True: noise ("PDT: decrypt to %s" % outsspec)
2112 elif arg in [ "-f", "--force" ]:
2113 PDTCRYPT_OVERWRITE = True
2114 if PDTCRYPT_VERBOSE is True: noise ("PDT: overwrite existing files")
2116 bail ("ERROR: unexpected positional argument “%s”" % arg)
2119 if PDTCRYPT_VERBOSE is True:
2120 noise ("ERROR: no password or key specified, trying $PDTCRYPT_PASSWORD")
2121 epw = os.getenv ("PDTCRYPT_PASSWORD")
2123 checked_secret (make_secret (password=epw.strip ()))
2126 if PDTCRYPT_VERBOSE is True:
2127 noise ("ERROR: no password or key specified, trying $PDTCRYPT_KEY")
2128 ek = os.getenv ("PDTCRYPT_KEY")
2130 checked_secret (make_secret (key=ek.strip ()))
2133 if subcommand == PDTCRYPT_SUB_SCRYPT:
2134 bail ("ERROR: scrypt hash mode requested but no password given")
2135 elif mode & PDTCRYPT_DECRYPT:
2136 bail ("ERROR: decryption requested but no password given")
2138 if mode & PDTCRYPT_SPLIT and outsspec is None:
2139 bail ("ERROR: split mode is incompatible with stdout sink "
2142 if subcommand == PDTCRYPT_SUB_SCAN and outsspec is None:
2143 pass # no output by default in scan mode
2144 elif mode & PDTCRYPT_SPLIT or subcommand == PDTCRYPT_SUB_SCAN:
2145 # destination must be directory
2147 bail ("ERROR: mode is incompatible with stdout sink")
2150 os.makedirs (outsspec, 0o700)
2151 except FileExistsError:
2152 # if it’s a directory with appropriate perms, everything is
2153 # good; otherwise, below invocation of open(2) will fail
2155 outs = os.open (outsspec, os.O_DIRECTORY, 0o600)
2156 except FileNotFoundError as exn:
2157 bail ("ERROR: cannot create target directory “%s”" % outsspec)
2158 except NotADirectoryError as exn:
2159 bail ("ERROR: target path “%s” is not a directory" % outsspec)
2161 outs = deptdcrypt_mk_stream (PDTCRYPT_SINK, outsspec or "-")
2163 if subcommand == PDTCRYPT_SUB_SCAN:
2165 bail ("ERROR: please supply an input file for scanning")
2167 bail ("ERROR: input must be seekable; please specify a file")
2168 return True, partial (mode_scan, secret, insspec, outs, nacl=nacl)
2170 if subcommand == PDTCRYPT_SUB_SCRYPT:
2171 if secret [0] == PDTCRYPT_SECRET_KEY:
2172 bail ("ERROR: scrypt mode requires a password")
2173 if insspec is not None and nacl is not None \
2174 or insspec is None and nacl is None :
2175 bail ("ERROR: please supply either an input file or "
2180 if insspec is not None or subcommand != PDTCRYPT_SUB_SCRYPT:
2181 ins = deptdcrypt_mk_stream (PDTCRYPT_SOURCE, insspec or "-")
2183 if subcommand == PDTCRYPT_SUB_SCRYPT:
2184 return True, partial (mode_scrypt, secret [1].encode (), ins, nacl,
2187 return True, partial (mode_depdtcrypt, mode, secret, ins, outs)
2191 ok, runner = parse_argv (argv)
2193 if ok is True: return runner ()
2198 if __name__ == "__main__":
2199 sys.exit (main (sys.argv))