6 ===============================================================================
7 crypto -- Encryption Layer for the Deltatar Backup
8 ===============================================================================
12 - AES-GCM for the symmetric encryption;
17 - NIST Recommendation for Block Cipher Modes of Operation: Galois/Counter
19 http://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-38d.pdf
22 https://cryptome.org/2014/01/aes-gcm-v1.pdf
24 - Authentication weaknesses in GCM
25 http://csrc.nist.gov/groups/ST/toolkit/BCM/documents/comments/CWC-GCM/Ferguson2.pdf
27 Trouble with python-cryptography packages: authentication tags can only be
28 passed in advance: https://github.com/pyca/cryptography/pull/3421
31 -------------------------------------------------------------------------------
33 Errors fall into roughly three categories:
35 - Cryptographical errors or invalid data.
37 - ``InvalidGCMTag`` (decryption failed on account of an invalid GCM
39 - ``InvalidIVFixedPart`` (IV fixed part of object not found in list),
40 - ``DuplicateIV`` (the IV of an encrypted object already occurred),
41 - ``DecryptionError`` (used in CLI decryption for presenting error
42 conditions to the user).
44 - Incorrect usage of the library.
46 - ``InvalidParameter`` (non-conforming user supplied parameter),
47 - ``InvalidHeader`` (data passed for reading not parsable into header),
48 - ``FormatError`` (cannot handle header or parameter version),
51 - Bad internal state. If one of these is encountered it means that a state
52 was reached that shouldn’t occur during normal processing.
57 Also, ``EndOfFile`` is used as a sentinel to communicate that a stream supplied
58 for reading is exhausted.
60 Initialization Vectors
61 -------------------------------------------------------------------------------
63 Initialization vectors are checked reuse during the lifetime of a decryptor.
64 The fixed counters for metadata files cannot be reused and attempts to do so
65 will cause a DuplicateIV error. This means the length of objects encrypted with
66 a metadata counter is capped at 63 GB.
68 For ordinary, non-metadata payload, there is an optional mode with strict IV
69 checking that causes a crypto context to fail if an IV encountered or created
70 was already used for decrypting or encrypting, respectively, an earlier object.
71 Note that this mode can trigger false positives when decrypting non-linearly,
72 e. g. when traversing the same object multiple times. Since the crypto context
73 has no notion of a position in a PDT encrypted archive, this condition must be
74 sorted out downstream.
77 -------------------------------------------------------------------------------
79 ``crypto.py`` may be invoked as a script for decrypting, validating, and
80 splitting PDT encrypted files. Consult the usage message for details.
84 Decrypt from stdin using the password ‘foo’: ::
86 $ crypto.py process foo -i - -o - <some-file.tar.gz.pdtcrypt >some-file.tar.gz
88 Output verbose information about the encrypted objects in the archive: ::
90 $ crypto.py process foo -v -i some-file.tar.gz.pdtcrypt -o /dev/null
91 PDT: decrypt from some-file.tar.gz.pdtcrypt
92 PDT: decrypt to /dev/null
93 PDT: source: file some-file.tar.gz.pdtcrypt
94 PDT: sink: file /dev/null
96 PDT: · version = 1 : 0100
97 PDT: · paramversion = 1 : 0100
98 PDT: · nacl : d270 b031 00d1 87e2 c946 610d 7b7f 7e5f
99 PDT: · iv : 02ee 3dd7 a963 1eb1 0100 0000
100 PDT: · ctsize = 591 : 4f02 0000 0000 0000
101 PDT: · tag : 5b2d 6d8b 8f82 4842 12fd 0b10 b6e3 369b
102 PDT: 64 decrypt obj no. 1, 591 B
103 PDT: · [64] 0% done, read block (591 B of 591 B remaining)
104 PDT: · decrypt ciphertext 591 B
105 PDT: · decrypt plaintext 591 B
109 Also, the mode *scrypt* allows deriving encryption keys. To calculate the
110 encryption key from the password ‘foo’ and the salt of the first object in a
111 PDT encrypted file: ::
113 $ crypto.py scrypt foo -i some-file.pdtcrypt
114 {"paramversion": 1, "salt": "Cqzbk48e3peEjzWto8D0yA==", "key": "JH9EkMwaM4x9F5aim5gK/Q=="}
116 The computed 16 byte key is given in hexadecimal notation in the value to
117 ``hash`` and can be fed into Python’s ``binascii.unhexlify()`` to obtain the
118 corresponding binary representation.
120 Note that in Scrypt hashing mode, no data integrity checks are being performed.
121 If the wrong password is given, a wrong key will be derived. Whether the password
122 was indeed correct can only be determined by decrypting. Note that since PDT
123 archives essentially consist of a stream of independent objects, the salt and
124 other parameters may change. Thus a key derived using above method from the
125 first object doesn’t necessarily apply to any of the subsequent objects.
134 from functools import reduce, partial
144 except ImportError as exn:
147 if __name__ == "__main__": ## Work around the import mechanism lest Python’s
148 pwd = os.getcwd() ## preference for local imports causes a cyclical
149 ## import (crypto → pylibscrypt → […] → ./tarfile → crypto).
150 sys.path = [ p for p in sys.path if p.find ("deltatar") < 0 ]
153 from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes
154 from cryptography.hazmat.backends import default_backend
158 __all__ = [ "hdr_make", "hdr_read", "hdr_fmt", "hdr_fmt_pretty"
160 , "PDTCRYPT_HDR_SIZE", "AES_GCM_IV_CNT_DATA"
161 , "AES_GCM_IV_CNT_INFOFILE", "AES_GCM_IV_CNT_INDEX"
165 ###############################################################################
167 ###############################################################################
169 class EndOfFile (Exception):
173 def __init__ (self, n=None, msg=None):
179 class InvalidParameter (Exception):
180 """Inputs not valid for PDT encryption."""
184 class InvalidHeader (Exception):
185 """Header not valid."""
189 class InvalidGCMTag (Exception):
191 The GCM tag calculated during decryption differs from that in the object
197 class InvalidIVFixedPart (Exception):
199 IV fixed part not in supplied list: either the backup is corrupt or the
200 current object does not belong to it.
205 class IVFixedPartError (Exception):
207 Error creating a unique IV fixed part: repeated calls to system RNG yielded
208 the same sequence of bytes as the last IV used.
213 class InvalidFileCounter (Exception):
215 When encrypting, an attempted reuse of a dedicated counter (info file,
216 index file) was caught.
221 class DuplicateIV (Exception):
223 During encryption, the current IV fixed part is identical to an already
224 existing IV (same prefix and file counter). This indicates tampering or
225 programmer error and cannot be recovered from.
230 class NonConsecutiveIV (Exception):
232 IVs not numbered consecutively. This is a hard error with strict IV
233 checking. Precludes random access to the encrypted objects.
238 class FormatError (Exception):
239 """Unusable parameters in header."""
243 class DecryptionError (Exception):
244 """Error during decryption with ``crypto.py`` on the command line."""
248 class Unreachable (Exception):
250 Makeshift __builtin_unreachable(); always a programmer error if
256 class InternalError (Exception):
257 """Errors not ascribable to bad user inputs or cryptography."""
261 ###############################################################################
262 ## crypto layer version
263 ###############################################################################
265 ENCRYPTION_PARAMETERS = \
267 { "kdf": ("dummy", 16)
268 , "enc": "passthrough" }
276 , "enc": "aes-gcm" } }
278 ###############################################################################
280 ###############################################################################
282 PDTCRYPT_HDR_MAGIC = b"PDTCRYPT"
284 PDTCRYPT_HDR_SIZE_MAGIC = 8 # 8
285 PDTCRYPT_HDR_SIZE_VERSION = 2 # 10
286 PDTCRYPT_HDR_SIZE_PARAMVERSION = 2 # 12
287 PDTCRYPT_HDR_SIZE_NACL = 16 # 28
288 PDTCRYPT_HDR_SIZE_IV = 12 # 40
289 PDTCRYPT_HDR_SIZE_CTSIZE = 8 # 48
290 PDTCRYPT_HDR_SIZE_TAG = 16 # 64 GCM auth tag
292 PDTCRYPT_HDR_SIZE = PDTCRYPT_HDR_SIZE_MAGIC + PDTCRYPT_HDR_SIZE_VERSION \
293 + PDTCRYPT_HDR_SIZE_PARAMVERSION + PDTCRYPT_HDR_SIZE_NACL \
294 + PDTCRYPT_HDR_SIZE_IV + PDTCRYPT_HDR_SIZE_CTSIZE \
295 + PDTCRYPT_HDR_SIZE_TAG # = 64
297 # precalculate offsets since Python can’t do constant folding over names
298 HDR_OFF_VERSION = PDTCRYPT_HDR_SIZE_MAGIC
299 HDR_OFF_PARAMVERSION = HDR_OFF_VERSION + PDTCRYPT_HDR_SIZE_VERSION
300 HDR_OFF_NACL = HDR_OFF_PARAMVERSION + PDTCRYPT_HDR_SIZE_PARAMVERSION
301 HDR_OFF_IV = HDR_OFF_NACL + PDTCRYPT_HDR_SIZE_NACL
302 HDR_OFF_CTSIZE = HDR_OFF_IV + PDTCRYPT_HDR_SIZE_IV
303 HDR_OFF_TAG = HDR_OFF_CTSIZE + PDTCRYPT_HDR_SIZE_CTSIZE
307 FMT_I2N_IV = "<8sL" # 8 random bytes ‖ 32 bit counter
308 FMT_I2N_HDR = ("<" # host byte order
312 "16s" # sodium chloride
318 AES_KEY_SIZE = 16 # b"0123456789abcdef"
319 AES_KEY_SIZE_B64 = 24 # b'MDEyMzQ1Njc4OWFiY2RlZg=='
320 AES_GCM_MAX_SIZE = (1 << 36) - (1 << 5) # 2^39 - 2^8 b ≅ 64 GB
321 PDTCRYPT_MAX_OBJ_SIZE_DEFAULT = 63 * (1 << 30) # 63 GB
322 PDTCRYPT_MAX_OBJ_SIZE = PDTCRYPT_MAX_OBJ_SIZE_DEFAULT
324 # index and info files are written on-the fly while encrypting so their
325 # counters must be available inadvance
326 AES_GCM_IV_CNT_INFOFILE = 1 # constant
327 AES_GCM_IV_CNT_INDEX = AES_GCM_IV_CNT_INFOFILE + 1
328 AES_GCM_IV_CNT_DATA = AES_GCM_IV_CNT_INDEX + 1 # also for multivolume
329 AES_GCM_IV_CNT_MAX_DEFAULT = 0xffFFffFF
330 AES_GCM_IV_CNT_MAX = AES_GCM_IV_CNT_MAX_DEFAULT
332 # IV structure and generation
333 PDTCRYPT_IV_GEN_MAX_RETRIES = 10 # ×
334 PDTCRYPT_IV_FIXEDPART_SIZE = 8 # B
335 PDTCRYPT_IV_COUNTER_SIZE = 4 # B
337 # secret type: PW of string | KEY of char [16]
338 PDTCRYPT_SECRET_PW = 0
339 PDTCRYPT_SECRET_KEY = 1
341 ###############################################################################
343 ###############################################################################
349 # , paramversion : u16
355 # fn hdr_read (f : handle) -> hdrinfo;
356 # fn hdr_make (f : handle, h : hdrinfo) -> IOResult<usize>;
357 # fn hdr_fmt (h : hdrinfo) -> String;
362 Read bytes as header structure.
364 If the input could not be interpreted as a header, fail with
369 mag, version, paramversion, nacl, iv, ctsize, tag = \
370 struct.unpack (FMT_I2N_HDR, data)
371 except Exception as exn:
372 raise InvalidHeader ("error unpacking header from [%r]: %s"
373 % (binascii.hexlify (data), str (exn)))
375 if mag != PDTCRYPT_HDR_MAGIC:
376 raise InvalidHeader ("bad magic in header: expected [%s], got [%s]"
377 % (PDTCRYPT_HDR_MAGIC, mag))
380 { "version" : version
381 , "paramversion" : paramversion
389 def hdr_read_stream (instr):
391 Read header from stream at the current position.
393 Fail with ``InvalidHeader`` if insufficient bytes were read from the
394 stream, or if the content could not be interpreted as a header.
396 data = instr.read(PDTCRYPT_HDR_SIZE)
400 elif ldata != PDTCRYPT_HDR_SIZE:
401 raise InvalidHeader ("hdr_read_stream: expected %d B, received %d B"
402 % (PDTCRYPT_HDR_SIZE, ldata))
403 return hdr_read (data)
406 def hdr_from_params (version, paramversion, nacl, iv, ctsize, tag):
408 Assemble the necessary values into a PDTCRYPT header.
410 :type version: int to fit uint16_t
411 :type paramversion: int to fit uint16_t
412 :type nacl: bytes to fit uint8_t[16]
413 :type iv: bytes to fit uint8_t[12]
414 :type size: int to fit uint64_t
415 :type tag: bytes to fit uint8_t[16]
417 buf = bytearray (PDTCRYPT_HDR_SIZE)
418 bufv = memoryview (buf)
421 struct.pack_into (FMT_I2N_HDR, bufv, 0,
423 version, paramversion, nacl, iv, ctsize, tag)
424 except Exception as exn:
425 return False, "error assembling header: %s" % str (exn)
427 return True, bytes (buf)
430 def hdr_make_dummy (s):
432 Create a header sized block of bytes initialized to a value derived from a
433 string. Used to verify we’ve jumped back correctly to the actual position
434 of the object header.
436 c = reduce (lambda a, c: a + ord(c), s, 0) % 0xFF
437 return bytes (bytearray (struct.pack ("B", c)) * PDTCRYPT_HDR_SIZE)
442 Assemble a header from the given header structure.
444 return hdr_from_params (version=hdr.get("version"),
445 paramversion=hdr.get("paramversion"),
446 nacl=hdr.get("nacl"), iv=hdr.get("iv"),
447 ctsize=hdr.get("ctsize"), tag=hdr.get("tag"))
450 HDR_FMT = "I2n_header { version: %d, paramversion: %d, nacl: %s[%d]," \
451 " iv: %s[%d], ctsize: %d, tag: %s[%d] }"
454 """Format a header structure into readable output."""
455 return HDR_FMT % (h["version"], h["paramversion"],
456 binascii.hexlify (h["nacl"]), len(h["nacl"]),
457 binascii.hexlify (h["iv"]), len(h["iv"]),
459 binascii.hexlify (h["tag"]), len(h["tag"]))
462 def hex_spaced_of_bytes (b):
463 """Format bytes object, hexdump style."""
464 return " ".join ([ "%.2x%.2x" % (c1, c2)
465 for c1, c2 in zip (b[0::2], b[1::2]) ]) \
466 + (len (b) | 1 == len (b) and " %.2x" % b[-1] or "") # odd lengths
469 def hdr_iv_counter (h):
470 """Extract the variable part of the IV of the given header."""
471 _fixed, cnt = struct.unpack (FMT_I2N_IV, h ["iv"])
475 def hdr_iv_fixed (h):
476 """Extract the fixed part of the IV of the given header."""
477 fixed, _cnt = struct.unpack (FMT_I2N_IV, h ["iv"])
481 hdr_dump = hex_spaced_of_bytes
485 """version = %-4d : %s
486 paramversion = %-4d : %s
493 def hdr_fmt_pretty (h):
495 Format header structure into multi-line representation of its contents and
496 their raw representation. (Omit the implicit “PDTCRYPT” magic bytes that
497 precede every header.)
499 return HDR_FMT_PRETTY \
501 hex_spaced_of_bytes (struct.pack (FMT_UINT16_LE, h["version"])),
503 hex_spaced_of_bytes (struct.pack (FMT_UINT16_LE, h["paramversion"])),
504 hex_spaced_of_bytes (h["nacl"]),
505 hex_spaced_of_bytes (h["iv"]),
507 hex_spaced_of_bytes (struct.pack (FMT_UINT64_LE, h["ctsize"])),
508 hex_spaced_of_bytes (h["tag"]))
510 IV_FMT = "((f %s) (c %d))"
513 """Format the two components of an IV in a readable fashion."""
514 fixed, cnt = struct.unpack (FMT_I2N_IV, iv)
515 return IV_FMT % (binascii.hexlify (fixed), cnt)
518 ###############################################################################
520 ###############################################################################
522 class Location (object):
526 def restore_loc_fmt (loc):
528 % (loc.n, loc.offset)
530 def locate_hdr_candidates (fd):
532 Walk over instances of the magic string in the payload, collecting their
533 positions. If the offset of the first found instance is not zero, the file
534 begins with leading garbage.
536 :return: The list of offsets in the file.
540 mm = mmap.mmap(fd, 0, mmap.MAP_SHARED, mmap.PROT_READ)
543 pos = mm.find (PDTCRYPT_HDR_MAGIC, pos)
552 HDR_CAND_GOOD = 0 # header marks begin of valid object
553 HDR_CAND_FISHY = 1 # inconclusive (tag mismatch, obj overlap etc.)
554 HDR_CAND_JUNK = 2 # not a header / object unreadable
557 def inspect_hdr (fd, off):
559 Attempt to parse a header in *fd* at position *off*.
561 Returns a verdict about the quality of that header plus the parsed header
565 _ = os.lseek (fd, off, os.SEEK_SET)
567 if os.lseek (fd, 0, os.SEEK_CUR) != off:
568 if PDTCRYPT_VERBOSE is True:
569 noise ("PDT: %d → dismissed (lseek() past EOF)" % off)
570 return HDR_CAND_JUNK, None
572 raw = os.read (fd, PDTCRYPT_HDR_SIZE)
573 if len (raw) != PDTCRYPT_HDR_SIZE:
574 if PDTCRYPT_VERBOSE is True:
575 noise ("PDT: %d → dismissed (EOF inside header)" % off)
576 return HDR_CAND_JUNK, None
580 except InvalidHeader as exn:
581 if PDTCRYPT_VERBOSE is True:
582 noise ("PDT: %d → dismissed (invalid: [%s])" % (off, str (exn)))
583 return HDR_CAND_JUNK, None
585 obj0 = off + PDTCRYPT_HDR_SIZE
586 objX = obj0 + hdr ["ctsize"]
588 eof = os.lseek (fd, 0, os.SEEK_END)
590 if PDTCRYPT_VERBOSE is True:
591 noise ("PDT: %d → EOF inside object (%d≤%d≤%d); adjusting size to "
592 "%d" % (off, obj0, eof, objX, (eof - obj0)))
593 # try reading up to the end
594 hdr ["ctsize"] = eof - obj0
595 return HDR_CAND_FISHY, hdr
597 return HDR_CAND_GOOD, hdr
600 def try_decrypt (ifd, off, hdr, secret, ofd=-1):
602 Attempt to decrypt the object in the (seekable) descriptor *ifd* starting
603 at *off* using the metadata in *hdr* and *secret*. An output fd can be
604 specified with *ofd*; if it is *-1* – the default –, the decrypted payload
607 Always creates a fresh decryptor, so validation steps across objects don’t
610 Errors during GCM tag validation are ignored.
612 ctleft = hdr ["ctsize"]
616 if ks == PDTCRYPT_SECRET_PW:
617 decr = Decrypt (password=secret [1])
618 elif ks == PDTCRYPT_SECRET_KEY:
620 decr = Decrypt (key=key)
627 os.lseek (ifd, pos, os.SEEK_SET)
629 cnksiz = min (ctleft, PDTCRYPT_BLOCKSIZE)
630 cnk = os.read (ifd, cnksiz)
633 pt = decr.process (cnk)
638 except InvalidGCMTag:
639 noise ("PDT: GCM tag mismatch for object %d–%d"
640 % (off, off + hdr ["ctsize"]))
641 if len (pt) > 0 and ofd != -1:
644 except Exception as exn:
645 noise ("PDT: error decrypting object %d–%d@%d, %d B remaining [%s]"
646 % (off, off + hdr ["ctsize"], pos, ctleft, exn))
652 def readable_objects_offsets (ifd, secret, cands):
654 From a list of candidates, locate the ones that mark the start of actual
655 readable PDTCRYPT objects.
659 for i, cand in enumerate (cands):
660 vdt, hdr = inspect_hdr (ifd, cand)
661 if vdt == HDR_CAND_JUNK:
662 pass # ignore unreadable ones
663 elif vdt in [HDR_CAND_GOOD, HDR_CAND_FISHY]:
664 ctsize = hdr ["ctsize"]
665 off0 = cand + PDTCRYPT_HDR_SIZE
666 ok = try_decrypt (ifd, off0, hdr, secret) == ctsize
668 good.append ((cand, off0 + ctsize))
670 overlap = find_overlaps (good)
672 return [ g [0] for g in good ]
675 def reconstruct_offsets (fname, secret):
676 ifd = os.open (fname, os.O_RDONLY)
679 cands = locate_hdr_candidates (ifd)
680 return readable_objects_offsets (ifd, secret, cands)
685 ###############################################################################
687 ###############################################################################
689 def make_secret (password=None, key=None):
691 Safely create a “secret” value that consists either of a key or a password.
692 Inputs are validated: the password is accepted as (UTF-8 encoded) bytes or
693 string; for the key only a bytes object of the proper size or a base64
694 encoded string thereof is accepted.
696 If both are provided, the key is preferred over the password; no checks are
697 performed whether the key is derived from the password.
699 :returns: secret value if inputs were acceptable | None otherwise.
702 if isinstance (key, str) is True:
703 key = key.encode ("utf-8")
704 if isinstance (key, bytes) is True:
705 if len (key) == AES_KEY_SIZE:
706 return (PDTCRYPT_SECRET_KEY, key)
707 if len (key) == AES_KEY_SIZE * 2:
709 key = binascii.unhexlify (key)
710 return (PDTCRYPT_SECRET_KEY, key)
711 except binascii.Error: # garbage in string
713 if len (key) == AES_KEY_SIZE_B64:
715 key = base64.b64decode (key)
716 # the base64 processor is very tolerant and allows for
717 # arbitrary trailing and leading data thus the data obtained
718 # must be checked for the proper length
719 if len (key) == AES_KEY_SIZE:
720 return (PDTCRYPT_SECRET_KEY, key)
721 except binascii.Error: # “incorrect padding”
723 elif password is not None:
724 if isinstance (password, str) is True:
725 return (PDTCRYPT_SECRET_PW, password)
726 elif isinstance (password, bytes) is True:
728 password = password.decode ("utf-8")
729 return (PDTCRYPT_SECRET_PW, password)
730 except UnicodeDecodeError:
736 ###############################################################################
737 ## passthrough / null encryption
738 ###############################################################################
740 class PassthroughCipher (object):
742 tag = struct.pack ("<QQ", 0, 0)
744 def __init__ (self) : pass
746 def update (self, b) : return b
748 def finalize (self) : return b""
750 def finalize_with_tag (self, _) : return b""
752 ###############################################################################
753 ## convenience wrapper
754 ###############################################################################
757 def kdf_dummy (klen, password, _nacl):
759 Fake KDF for testing purposes that is called when parameter version zero is
762 q, r = divmod (klen, len (password))
763 if isinstance (password, bytes) is False:
764 password = password.encode ()
765 return password * q + password [:r], b""
768 SCRYPT_KEY_MEMO = { } # static because needed for both the info file and the archive
771 def kdf_scrypt (params, password, nacl):
773 Wrapper for the Scrypt KDF, corresponds to parameter version one. The
774 computation result is memoized based on the inputs to facilitate spawning
775 multiple encryption contexts.
780 dkLen = params["dkLen"]
783 nacl = os.urandom (params["NaCl_LEN"])
785 key_parms = (password, nacl, N, r, p, dkLen)
786 global SCRYPT_KEY_MEMO
787 if key_parms not in SCRYPT_KEY_MEMO:
788 SCRYPT_KEY_MEMO [key_parms] = \
789 pylibscrypt.scrypt (password, nacl, N, r, p, dkLen)
790 return SCRYPT_KEY_MEMO [key_parms], nacl
793 def kdf_by_version (paramversion=None, defs=None):
795 Pick the KDF handler corresponding to the parameter version or the
798 :rtype: function (password : str, nacl : str) -> str
800 if paramversion is not None:
801 defs = ENCRYPTION_PARAMETERS.get(paramversion, None)
803 raise InvalidParameter ("no encryption parameters for version %r"
805 (kdf, params) = defs["kdf"]
807 if kdf == "scrypt" : fn = kdf_scrypt
808 if kdf == "dummy" : fn = kdf_dummy
810 raise ValueError ("key derivation method %r unknown" % kdf)
811 return partial (fn, params)
814 ###############################################################################
816 ###############################################################################
818 def scrypt_hashsource (pw, ins):
820 Calculate the SCRYPT hash from the password and the information contained
821 in the first header found in ``ins``.
823 This does not validate whether the first object is encrypted correctly.
825 if isinstance (pw, str) is True:
827 elif isinstance (pw, bytes) is False:
828 raise InvalidParameter ("password must be a string, not %s"
830 if isinstance (ins, io.BufferedReader) is False and \
831 isinstance (ins, io.FileIO) is False:
832 raise InvalidParameter ("file to hash must be opened in “binary” mode")
835 hdr = hdr_read_stream (ins)
836 except EndOfFile as exn:
837 noise ("PDT: malformed input: end of file reading first object header")
842 pver = hdr ["paramversion"]
843 if PDTCRYPT_VERBOSE is True:
844 noise ("PDT: salt of first object : %s" % binascii.hexlify (nacl))
845 noise ("PDT: parameter version of archive : %d" % pver)
848 defs = ENCRYPTION_PARAMETERS.get(pver, None)
849 kdfname, params = defs ["kdf"]
850 if kdfname != "scrypt":
851 noise ("PDT: input is not an SCRYPT archive")
854 kdf = kdf_by_version (None, defs)
855 except ValueError as exn:
856 noise ("PDT: object has unknown parameter version %d" % pver)
858 hsh, _void = kdf (pw, nacl)
860 return hsh, nacl, hdr ["version"], pver
863 def scrypt_hashfile (pw, fname):
865 Calculate the SCRYPT hash from the password and the information contained
866 in the first header found in the given file. The header is read only at
869 with deptdcrypt_mk_stream (PDTCRYPT_SOURCE, fname or "-") as ins:
870 hsh, _void, _void, _void = scrypt_hashsource (pw, ins)
874 ###############################################################################
876 ###############################################################################
878 class Crypto (object):
880 Encryption context to remain alive throughout an entire tarfile pass.
885 cnt = None # file counter (uint32_t != 0)
886 iv = None # current IV
887 fixed = None # accu for 64 bit fixed parts of IV
888 used_ivs = None # tracks IVs
889 strict_ivs = False # if True, panic on duplicate object IV
898 info_counter_used = False
899 index_counter_used = False
901 def __init__ (self, *al, **akv):
902 self.used_ivs = set ()
903 self.set_parameters (*al, **akv)
906 def next_fixed (self):
911 def set_object_counter (self, cnt=None):
913 Safely set the internal counter of encrypted objects. Numerous
916 The same counter may not be reused in combination with one IV fixed
917 part. This is validated elsewhere in the IV handling.
919 Counter zero is invalid. The first two counters are reserved for
920 metadata. The implementation does not allow for splitting metadata
921 files over multiple encrypted objects. (This would be possible by
922 assigning new fixed parts.) Thus in a Deltatar backup there is at most
923 one object with a counter value of one and two. On creation of a
924 context, the initial counter may be chosen. The globals
925 ``AES_GCM_IV_CNT_INFOFILE`` and ``AES_GCM_IV_CNT_INDEX`` can be used to
926 request one of the reserved values. If one of these values has been
927 used, any further attempt of setting the counter to that value will
928 be rejected with an ``InvalidFileCounter`` exception.
930 Out of bounds values (i. e. below one and more than the maximum of 2³²)
931 cause an ``InvalidParameter`` exception to be thrown.
934 self.cnt = AES_GCM_IV_CNT_DATA
936 if cnt == 0 or cnt > AES_GCM_IV_CNT_MAX + 1:
937 raise InvalidParameter ("invalid counter value %d requested: "
938 "acceptable values are from 1 to %d"
939 % (cnt, AES_GCM_IV_CNT_MAX))
940 if cnt == AES_GCM_IV_CNT_INFOFILE:
941 if self.info_counter_used is True:
942 raise InvalidFileCounter ("attempted to reuse info file "
943 "counter %d: must be unique" % cnt)
944 self.info_counter_used = True
945 elif cnt == AES_GCM_IV_CNT_INDEX:
946 if self.index_counter_used is True:
947 raise InvalidFileCounter ("attempted to reuse index file "
948 " counter %d: must be unique" % cnt)
949 self.index_counter_used = True
950 if cnt <= AES_GCM_IV_CNT_MAX:
953 # cnt == AES_GCM_IV_CNT_MAX + 1 → wrap
954 self.cnt = AES_GCM_IV_CNT_DATA
958 def set_parameters (self, password=None, key=None, paramversion=None,
959 nacl=None, counter=None, strict_ivs=False):
961 Configure the internal state of a crypto context. Not intended for
965 self.set_object_counter (counter)
966 self.strict_ivs = strict_ivs
968 if paramversion is not None:
969 self.paramversion = paramversion
972 self.key, self.nacl = key, nacl
975 if password is not None:
976 if isinstance (password, bytes) is False:
977 password = str.encode (password)
978 self.password = password
979 if paramversion is None and nacl is None:
980 # postpone key setup until first header is available
982 kdf = kdf_by_version (paramversion)
984 self.key, self.nacl = kdf (password, nacl)
987 def process (self, buf):
989 Encrypt / decrypt a buffer. Invokes the ``.update()`` method on the
990 wrapped encryptor or decryptor, respectively.
992 The Cryptography exception ``AlreadyFinalized`` is translated to an
993 ``InternalError`` at this point. It may occur in sound code when the GC
994 closes an encrypting stream after an error. Everywhere else it must be
998 raise RuntimeError ("process: context not initialized")
999 self.stats ["in"] += len (buf)
1001 out = self.enc.update (buf)
1002 except cryptography.exceptions.AlreadyFinalized as exn:
1003 raise InternalError (exn)
1004 self.stats ["out"] += len (out)
1008 def next (self, password, paramversion, nacl, iv):
1010 Prepare for encrypting another object: Reset the data counters and
1011 change the configuration in case one of the variable parameters differs
1012 from the last object. Also check the IV for duplicates and error out
1013 if strict checking was requested.
1017 self.stats ["obj"] += 1
1019 self.check_duplicate_iv (iv)
1021 if ( self.paramversion != paramversion
1022 or self.password != password
1023 or self.nacl != nacl):
1024 self.set_parameters (password=password, paramversion=paramversion,
1025 nacl=nacl, strict_ivs=self.strict_ivs)
1028 def check_duplicate_iv (self, iv):
1030 Add an IV (the 12 byte representation as in the header) to the list. With
1031 strict checking enabled, this will throw a ``DuplicateIV``. Depending on
1032 the context, this may indicate a serious error (IV reuse).
1034 if self.strict_ivs is True and iv in self.used_ivs:
1035 raise DuplicateIV ("iv %s was reused" % iv_fmt (iv))
1036 # vi has not been used before; add to collection
1037 self.used_ivs.add (iv)
1040 def counters (self):
1042 Access the data counters.
1044 return self.stats ["obj"], self.stats ["in"], self.stats ["out"]
1049 Clear the current context regardless of its finalization state. The
1050 next operation must be ``.next()``.
1055 class Encrypt (Crypto):
1061 def __init__ (self, version, paramversion, password=None, key=None, nacl=None,
1062 counter=AES_GCM_IV_CNT_DATA, strict_ivs=True):
1064 The ctor will throw immediately if one of the parameters does not conform
1065 to our expectations.
1067 counter=AES_GCM_IV_CNT_DATA, strict_ivs=True):
1068 :type version: int to fit uint16_t
1069 :type paramversion: int to fit uint16_t
1070 :param password: mutually exclusive with ``key``
1071 :type password: bytes
1072 :param key: mutually exclusive with ``password``
1075 :type counter: initial object counter the values
1076 ``AES_GCM_IV_CNT_INFOFILE`` and
1077 ``AES_GCM_IV_CNT_INDEX`` are unique in each backup set
1078 and cannot be reused even with different fixed parts.
1079 :type strict_ivs: bool
1081 if password is None and key is None \
1082 or password is not None and key is not None :
1083 raise InvalidParameter ("__init__: need either key or password")
1086 if isinstance (key, bytes) is False:
1087 raise InvalidParameter ("__init__: key must be provided as "
1088 "bytes, not %s" % type (key))
1090 raise InvalidParameter ("__init__: salt must be provided along "
1091 "with encryption key")
1092 else: # password, no key
1093 if isinstance (password, str) is False:
1094 raise InvalidParameter ("__init__: password must be a string, not %s"
1096 if len (password) == 0:
1097 raise InvalidParameter ("__init__: supplied empty password but not "
1098 "permitted for PDT encrypted files")
1100 if isinstance (version, int) is False:
1101 raise InvalidParameter ("__init__: version number must be an "
1102 "integer, not %s" % type (version))
1104 raise InvalidParameter ("__init__: version number must be a "
1105 "nonnegative integer, not %d" % version)
1107 if isinstance (paramversion, int) is False:
1108 raise InvalidParameter ("__init__: crypto parameter version number "
1109 "must be an integer, not %s"
1110 % type (paramversion))
1111 if paramversion < 0:
1112 raise InvalidParameter ("__init__: crypto parameter version number "
1113 "must be a nonnegative integer, not %d"
1116 if nacl is not None:
1117 if isinstance (nacl, bytes) is False:
1118 raise InvalidParameter ("__init__: salt given, but of type %s "
1119 "instead of bytes" % type (nacl))
1120 # salt length would depend on the actual encryption so it can’t be
1121 # validated at this point
1123 self.version = version
1124 self.paramenc = ENCRYPTION_PARAMETERS.get (paramversion) ["enc"]
1126 super().__init__ (password, key, paramversion, nacl, counter=counter,
1127 strict_ivs=strict_ivs)
1130 def next_fixed (self, retries=PDTCRYPT_IV_GEN_MAX_RETRIES):
1132 Generate the next IV fixed part by reading eight bytes from
1133 ``/dev/urandom``. The buffer so obtained is tested against the fixed
1134 parts used so far to prevent accidental reuse of IVs. After a
1135 configurable number of attempts to create a unique fixed part, it will
1136 refuse to continue with an ``IVFixedPartError``. This is unlikely to
1137 ever happen on a normal system but may detect an issue with the random
1140 The list of fixed parts that were used by the context at hand can be
1141 accessed through the ``.fixed`` list. Its last element is the fixed
1142 part currently in use.
1146 fp = os.urandom (PDTCRYPT_IV_FIXEDPART_SIZE)
1147 if fp not in self.fixed:
1148 self.fixed.append (fp)
1151 raise IVFixedPartError ("error obtaining a unique IV fixed part from "
1152 "/dev/urandom; giving up after %d tries" % i)
1157 Construct a 12-bytes IV from the current fixed part and the object
1160 return struct.pack(FMT_I2N_IV, self.fixed [-1], self.cnt)
1163 def next (self, filename=None, counter=None):
1165 Prepare for encrypting the next incoming object. Update the counter
1166 and put together the IV, possibly changing prefixes. Then create the
1169 The argument ``counter`` can be used to specify a file counter for this
1170 object. Unless it is one of the reserved values, the counter of
1171 subsequent objects will be computed from this one.
1173 If this is the first object in a series, ``filename`` is required,
1174 otherwise it is reused if not present. The value is used to derive a
1175 header sized placeholder to use until after encryption when all the
1176 inputs to construct the final header are available. This is then
1177 matched in ``.done()`` against the value found at the position of the
1178 header. The motivation for this extra check is primarily to assist
1179 format debugging: It makes stray headers easy to spot in malformed
1182 if filename is None:
1183 if self.lastinfo is None:
1184 raise InvalidParameter ("next: filename is mandatory for "
1186 filename, _dummy = self.lastinfo
1188 if isinstance (filename, str) is False:
1189 raise InvalidParameter ("next: filename must be a string, no %s"
1191 if counter is not None:
1192 if isinstance (counter, int) is False:
1193 raise InvalidParameter ("next: the supplied counter is of "
1194 "invalid type %s; please pass an "
1195 "integer instead" % type (counter))
1196 self.set_object_counter (counter)
1198 self.iv = self.iv_make ()
1199 if self.paramenc == "aes-gcm":
1201 ( algorithms.AES (self.key)
1202 , modes.GCM (self.iv)
1203 , backend = default_backend ()) \
1205 elif self.paramenc == "passthrough":
1206 self.enc = PassthroughCipher ()
1208 raise InvalidParameter ("next: parameter version %d not known"
1209 % self.paramversion)
1210 hdrdum = hdr_make_dummy (filename)
1211 self.lastinfo = (filename, hdrdum)
1212 super().next (self.password, self.paramversion, self.nacl, self.iv)
1214 self.set_object_counter (self.cnt + 1)
1218 def done (self, cmpdata):
1220 Complete encryption of an object. After this has been called, attempts
1221 of encrypting further data will cause an error until ``.next()`` is
1224 Returns a 64 bytes buffer containing the object header including all
1225 values including the “late” ones e. g. the ciphertext size and the
1228 if isinstance (cmpdata, bytes) is False:
1229 raise InvalidParameter ("done: comparison input expected as bytes, "
1230 "not %s" % type (cmpdata))
1231 if self.lastinfo is None:
1232 raise RuntimeError ("done: encryption context not initialized")
1233 filename, hdrdum = self.lastinfo
1234 if cmpdata != hdrdum:
1235 raise RuntimeError ("done: bad sync of header for object %d: "
1236 "preliminary data does not match; this likely "
1237 "indicates a wrongly repositioned stream"
1239 data = self.enc.finalize ()
1240 self.stats ["out"] += len (data)
1241 self.ctsize += len (data)
1242 ok, hdr = hdr_from_params (self.version, self.paramversion, self.nacl,
1243 self.iv, self.ctsize, self.enc.tag)
1245 raise InternalError ("error constructing header: %r" % hdr)
1246 return data, hdr, self.fixed
1249 def process (self, buf):
1251 Encrypt a chunk of plaintext with the active encryptor. Returns the
1252 size of the input consumed. This **must** be checked downstream. If the
1253 maximum possible object size has been reached, the current context must
1254 be finalized and a new one established before any further data can be
1255 encrypted. The second argument is the remainder of the plaintext that
1256 was not encrypted for the caller to use immediately after the new
1259 if isinstance (buf, bytes) is False:
1260 raise InvalidParameter ("process: expected byte buffer, not %s"
1263 newptsize = self.ptsize + bsize
1264 diff = newptsize - PDTCRYPT_MAX_OBJ_SIZE
1267 newptsize = PDTCRYPT_MAX_OBJ_SIZE
1268 self.ptsize = newptsize
1269 data = super().process (buf [:bsize])
1270 self.ctsize += len (data)
1274 class Decrypt (Crypto):
1276 tag = None # GCM tag, part of header
1277 last_iv = None # check consecutive ivs in strict mode
1279 def __init__ (self, password=None, key=None, counter=None, fixedparts=None,
1282 Sanitizing ctor for the decryption context. ``fixedparts`` specifies a
1283 list of IV fixed parts accepted during decryption. If a fixed part is
1284 encountered that is not in the list, decryption will fail.
1286 :param password: mutually exclusive with ``key``
1287 :type password: bytes
1288 :param key: mutually exclusive with ``password``
1290 :type counter: initial object counter the values
1291 ``AES_GCM_IV_CNT_INFOFILE`` and
1292 ``AES_GCM_IV_CNT_INDEX`` are unique in each backup set
1293 and cannot be reused even with different fixed parts.
1294 :type fixedparts: bytes list
1296 if password is None and key is None \
1297 or password is not None and key is not None :
1298 raise InvalidParameter ("__init__: need either key or password")
1301 if isinstance (key, bytes) is False:
1302 raise InvalidParameter ("__init__: key must be provided as "
1303 "bytes, not %s" % type (key))
1304 else: # password, no key
1305 if isinstance (password, str) is False:
1306 raise InvalidParameter ("__init__: password must be a string, not %s"
1308 if len (password) == 0:
1309 raise InvalidParameter ("__init__: supplied empty password but not "
1310 "permitted for PDT encrypted files")
1312 if fixedparts is not None:
1313 if isinstance (fixedparts, list) is False:
1314 raise InvalidParameter ("__init__: IV fixed parts must be "
1315 "supplied as list, not %s"
1316 % type (fixedparts))
1317 self.fixed = fixedparts
1320 super().__init__ (password=password, key=key, counter=counter,
1321 strict_ivs=strict_ivs)
1324 def valid_fixed_part (self, iv):
1326 Check if a fixed part was already seen.
1328 # check if fixed part is known
1329 fixed, _cnt = struct.unpack (FMT_I2N_IV, iv)
1330 i = bisect.bisect_left (self.fixed, fixed)
1331 return i != len (self.fixed) and self.fixed [i] == fixed
1334 def check_consecutive_iv (self, iv):
1336 Check whether the counter part of the given IV is indeed the successor
1337 of the currently present counter. This should always be the case for
1338 the objects in a well formed PDT archive but should not be enforced
1339 when decrypting out-of-order.
1341 fixed, cnt = struct.unpack (FMT_I2N_IV, iv)
1342 if self.strict_ivs is True \
1343 and self.last_iv is not None \
1344 and self.last_iv [0] == fixed \
1345 and self.last_iv [1] != cnt - 1:
1346 raise NonConsecutiveIV ("iv %s counter not successor of "
1347 "last object (expected %d, found %d)"
1348 % (iv_fmt (self.last_iv [1]), cnt))
1349 self.last_iv = (iv, cnt)
1352 def next (self, hdr):
1354 Start decrypting the next object. The PDTCRYPT header for the object
1355 can be given either as already parsed object or as bytes.
1357 if isinstance (hdr, bytes) is True:
1358 hdr = hdr_read (hdr)
1359 elif isinstance (hdr, dict) is False:
1360 # this won’t catch malformed specs though
1361 raise InvalidParameter ("next: wrong type of parameter hdr: "
1362 "expected bytes or spec, got %s"
1365 paramversion = hdr ["paramversion"]
1370 raise InvalidHeader ("next: not a header %r" % hdr)
1372 super().next (self.password, paramversion, nacl, iv)
1373 if self.fixed is not None and self.valid_fixed_part (iv) is False:
1374 raise InvalidIVFixedPart ("iv %s has invalid fixed part"
1376 self.check_consecutive_iv (iv)
1379 defs = ENCRYPTION_PARAMETERS.get (paramversion, None)
1381 raise FormatError ("header contains unknown parameter version %d; "
1382 "maybe the file was created by a more recent "
1383 "version of Deltatar" % paramversion)
1385 if enc == "aes-gcm":
1387 ( algorithms.AES (self.key)
1388 , modes.GCM (iv, tag=self.tag)
1389 , backend = default_backend ()) \
1391 elif enc == "passthrough":
1392 self.enc = PassthroughCipher ()
1394 raise InternalError ("encryption parameter set %d refers to unknown "
1395 "mode %r" % (paramversion, enc))
1396 self.set_object_counter (self.cnt + 1)
1399 def done (self, tag=None):
1401 Stop decryption of the current object and finalize it with the active
1402 context. This will throw an *InvalidGCMTag* exception to indicate that
1403 the authentication tag does not match the data. If the tag is correct,
1404 the rest of the plaintext is returned.
1409 data = self.enc.finalize ()
1411 if isinstance (tag, bytes) is False:
1412 raise InvalidParameter ("done: wrong type of parameter "
1413 "tag: expected bytes, got %s"
1415 data = self.enc.finalize_with_tag (self.tag)
1416 except cryptography.exceptions.InvalidTag:
1417 raise InvalidGCMTag ("done: tag mismatch of object %d: %s "
1418 "rejected by finalize ()"
1419 % (self.cnt, binascii.hexlify (self.tag)))
1420 self.ctsize += len (data)
1421 self.stats ["out"] += len (data)
1425 def process (self, buf):
1427 Decrypt the bytes object *buf* with the active decryptor.
1429 if isinstance (buf, bytes) is False:
1430 raise InvalidParameter ("process: expected byte buffer, not %s"
1432 self.ctsize += len (buf)
1433 data = super().process (buf)
1434 self.ptsize += len (data)
1438 ###############################################################################
1440 ###############################################################################
1442 def _patch_global (glob, vow, n=None):
1444 Adapt upper file counter bound for testing IV logic. Completely unsafe.
1446 assert vow == "I am fully aware that this will void my warranty."
1447 r = globals () [glob]
1449 n = globals () [glob + "_DEFAULT"]
1450 globals () [glob] = n
1453 _testing_set_AES_GCM_IV_CNT_MAX = \
1454 partial (_patch_global, "AES_GCM_IV_CNT_MAX")
1456 _testing_set_PDTCRYPT_MAX_OBJ_SIZE = \
1457 partial (_patch_global, "PDTCRYPT_MAX_OBJ_SIZE")
1459 def open2_dump_file (fname, dir_fd, force=False):
1462 oflags = os.O_CREAT | os.O_WRONLY
1464 oflags |= os.O_TRUNC
1469 outfd = os.open (fname, oflags,
1470 stat.S_IRUSR | stat.S_IWUSR, dir_fd=dir_fd)
1471 except FileExistsError as exn:
1472 noise ("PDT: refusing to overwrite existing file %s" % fname)
1474 raise RuntimeError ("destination file %s already exists" % fname)
1475 if PDTCRYPT_VERBOSE is True:
1476 noise ("PDT: new output file %s (fd=%d)" % (fname, outfd))
1480 ###############################################################################
1481 ## freestanding invocation
1482 ###############################################################################
1484 PDTCRYPT_SUB_PROCESS = 0
1485 PDTCRYPT_SUB_SCRYPT = 1
1486 PDTCRYPT_SUB_SCAN = 2
1489 { "process" : PDTCRYPT_SUB_PROCESS
1490 , "scrypt" : PDTCRYPT_SUB_SCRYPT
1491 , "scan" : PDTCRYPT_SUB_SCAN }
1493 PDTCRYPT_DECRYPT = 1 << 0 # decrypt archive with password
1494 PDTCRYPT_SPLIT = 1 << 1 # split archive into individual objects
1495 PDTCRYPT_HASH = 1 << 2 # output scrypt hash for file and given password
1497 PDTCRYPT_SPLITNAME = "pdtcrypt-object-%d.bin"
1498 PDTCRYPT_RESCUENAME = "pdtcrypt-rescue-object-%0.5d.bin"
1500 PDTCRYPT_VERBOSE = False
1501 PDTCRYPT_STRICTIVS = False
1502 PDTCRYPT_OVERWRITE = False
1503 PDTCRYPT_BLOCKSIZE = 1 << 12
1508 PDTCRYPT_DEFAULT_VER = 1
1509 PDTCRYPT_DEFAULT_PVER = 1
1511 # scrypt hashing output control
1512 PDTCRYPT_SCRYPT_INTRANATOR = 0
1513 PDTCRYPT_SCRYPT_PARAMETERS = 1
1514 PDTCRYPT_SCRYPT_DEFAULT = PDTCRYPT_SCRYPT_INTRANATOR
1516 PDTCRYPT_SCRYPT_FORMAT = \
1517 { "i2n" : PDTCRYPT_SCRYPT_INTRANATOR
1518 , "params" : PDTCRYPT_SCRYPT_PARAMETERS }
1520 PDTCRYPT_TT_COLUMNS = 80 # assume standard terminal
1522 class PDTDecryptionError (Exception):
1523 """Decryption failed."""
1525 class PDTSplitError (Exception):
1526 """Decryption failed."""
1529 def noise (*a, **b):
1530 print (file=sys.stderr, *a, **b)
1533 class PassthroughDecryptor (object):
1535 curhdr = None # write current header on first data write
1537 def __init__ (self):
1538 if PDTCRYPT_VERBOSE is True:
1539 noise ("PDT: no encryption; data passthrough")
1541 def next (self, hdr):
1542 ok, curhdr = hdr_make (hdr)
1544 raise PDTDecryptionError ("bad header %r" % hdr)
1545 self.curhdr = curhdr
1548 if self.curhdr is not None:
1552 def process (self, d):
1553 if self.curhdr is not None:
1559 def depdtcrypt (mode, secret, ins, outs):
1561 Remove PDTCRYPT layer from all objects encrypted with the secret. Used on a
1562 Deltatar backup this will yield a (possibly Gzip compressed) tarball.
1564 ctleft = -1 # length of ciphertext to consume
1565 ctcurrent = 0 # total ciphertext of current object
1566 total_obj = 0 # total number of objects read
1567 total_pt = 0 # total plaintext bytes
1568 total_ct = 0 # total ciphertext bytes
1569 total_read = 0 # total bytes read
1570 outfile = None # Python file object for output
1572 if mode & PDTCRYPT_DECRYPT: # decryptor
1574 if ks == PDTCRYPT_SECRET_PW:
1575 decr = Decrypt (password=secret [1], strict_ivs=PDTCRYPT_STRICTIVS)
1576 elif ks == PDTCRYPT_SECRET_KEY:
1578 decr = Decrypt (key=key, strict_ivs=PDTCRYPT_STRICTIVS)
1580 raise InternalError ("‘%d’ does not specify a valid kind of secret"
1583 decr = PassthroughDecryptor ()
1586 """Dummy for non-split mode: output file does not vary."""
1589 if mode & PDTCRYPT_SPLIT:
1590 def nextout (outfile):
1592 We were passed an fd as outs for accessing the destination
1593 directory where extracted archive components are supposed
1598 if PDTCRYPT_VERBOSE is True:
1599 noise ("PDT: no output file to close at this point")
1601 if PDTCRYPT_VERBOSE is True:
1602 noise ("PDT: release output file %r" % outfile)
1603 # cleanup happens automatically by the GC; the next
1604 # line will error out on account of an invalid fd
1607 assert total_obj > 0
1608 fname = PDTCRYPT_SPLITNAME % total_obj
1610 outfd = open2_dump_file (fname, outs, force=PDTCRYPT_OVERWRITE)
1611 except RuntimeError as exn:
1612 raise PDTSplitError (exn)
1613 return os.fdopen (outfd, "wb", closefd=True)
1617 """ESPIPE is normal on non-seekable stdio stream."""
1620 except OSError as exn:
1621 if exn.errno == os.errno.ESPIPE:
1624 def out (pt, outfile):
1628 if PDTCRYPT_VERBOSE is True:
1629 noise ("PDT:\t· decrypt plaintext %d B" % (npt))
1631 nn = outfile.write (pt)
1632 except OSError as exn: # probably ENOSPC
1633 raise DecryptionError ("error (%s)" % exn)
1635 raise DecryptionError ("write aborted after %d of %d B" % (nn, npt))
1639 # current object completed; in a valid archive this marks either
1640 # the start of a new header or the end of the input
1641 if ctleft == 0: # current object requires finalization
1642 if PDTCRYPT_VERBOSE is True:
1643 noise ("PDT: %d finalize" % tell (ins))
1646 except InvalidGCMTag as exn:
1647 raise DecryptionError ("error finalizing object %d (%d B): "
1648 "%r" % (total_obj, len (pt), exn)) \
1651 if PDTCRYPT_VERBOSE is True:
1652 noise ("PDT:\t· object validated")
1654 if PDTCRYPT_VERBOSE is True:
1655 noise ("PDT: %d hdr" % tell (ins))
1657 hdr = hdr_read_stream (ins)
1658 total_read += PDTCRYPT_HDR_SIZE
1659 except EndOfFile as exn:
1660 total_read += exn.remainder
1661 if total_ct + total_obj * PDTCRYPT_HDR_SIZE != total_read:
1662 raise PDTDecryptionError ("ciphertext processed (%d B) plus "
1663 "overhead (%d × %d B) does not match "
1664 "the number of bytes read (%d )"
1665 % (total_ct, total_obj, PDTCRYPT_HDR_SIZE,
1667 # the single good exit
1668 return total_read, total_obj, total_ct, total_pt
1669 except InvalidHeader as exn:
1670 raise PDTDecryptionError ("invalid header at position %d in %r "
1671 "(%s)" % (tell (ins), exn, ins))
1672 if PDTCRYPT_VERBOSE is True:
1673 pretty = hdr_fmt_pretty (hdr)
1674 noise (reduce (lambda a, e: (a + "\n" if a else "") + "PDT:\t· " + e,
1675 pretty.splitlines (), ""))
1676 ctcurrent = ctleft = hdr ["ctsize"]
1680 total_obj += 1 # used in file counter with split mode
1682 # finalization complete or skipped in case of first object in
1683 # stream; create a new output file if necessary
1684 outfile = nextout (outfile)
1686 if PDTCRYPT_VERBOSE is True:
1687 noise ("PDT: %d decrypt obj no. %d, %d B"
1688 % (tell (ins), total_obj, ctleft))
1690 # always allocate a new buffer since python-cryptography doesn’t allow
1691 # passing a bytearray :/
1692 nexpect = min (ctleft, PDTCRYPT_BLOCKSIZE)
1693 if PDTCRYPT_VERBOSE is True:
1694 noise ("PDT:\t· [%d] %d%% done, read block (%d B of %d B remaining)"
1696 100 - ctleft * 100 / (ctcurrent > 0 and ctcurrent or 1),
1698 ct = ins.read (nexpect)
1702 raise EndOfFile (nct,
1703 "hit EOF after %d of %d B in block [%d:%d); "
1704 "%d B ciphertext remaining for object no %d"
1705 % (nct, nexpect, off, off + nexpect, ctleft,
1711 if PDTCRYPT_VERBOSE is True:
1712 noise ("PDT:\t· decrypt ciphertext %d B" % (nct))
1713 pt = decr.process (ct)
1717 def deptdcrypt_mk_stream (kind, path):
1718 """Create stream from file or stdio descriptor."""
1719 if kind == PDTCRYPT_SINK:
1721 if PDTCRYPT_VERBOSE is True: noise ("PDT: sink: stdout")
1722 return sys.stdout.buffer
1724 if PDTCRYPT_VERBOSE is True: noise ("PDT: sink: file %s" % path)
1725 return io.FileIO (path, "w")
1726 if kind == PDTCRYPT_SOURCE:
1728 if PDTCRYPT_VERBOSE is True: noise ("PDT: source: stdin")
1729 return sys.stdin.buffer
1731 if PDTCRYPT_VERBOSE is True: noise ("PDT: source: file %s" % path)
1732 return io.FileIO (path, "r")
1734 raise ValueError ("bogus stream “%s” / %s" % (kind, path))
1737 def mode_depdtcrypt (mode, secret, ins, outs):
1739 total_read, total_obj, total_ct, total_pt = \
1740 depdtcrypt (mode, secret, ins, outs)
1741 except DecryptionError as exn:
1742 noise ("PDT: Decryption failed:")
1744 noise ("PDT: “%s”" % exn)
1746 noise ("PDT: Did you specify the correct key / password?")
1749 except PDTSplitError as exn:
1750 noise ("PDT: Split operation failed:")
1752 noise ("PDT: “%s”" % exn)
1754 noise ("PDT: Hint: target directory should be empty.")
1758 if PDTCRYPT_VERBOSE is True:
1759 noise ("PDT: decryption successful" )
1760 noise ("PDT: %.10d bytes read" % total_read)
1761 noise ("PDT: %.10d objects decrypted" % total_obj )
1762 noise ("PDT: %.10d bytes ciphertext" % total_ct )
1763 noise ("PDT: %.10d bytes plaintext" % total_pt )
1769 def mode_scrypt (pw, ins=None, nacl=None, fmt=PDTCRYPT_SCRYPT_INTRANATOR):
1771 paramversion = PDTCRYPT_DEFAULT_PVER
1773 hsh, nacl, version, paramversion = scrypt_hashsource (pw, ins)
1774 defs = ENCRYPTION_PARAMETERS.get(paramversion, None)
1776 nacl = binascii.unhexlify (nacl)
1777 defs = ENCRYPTION_PARAMETERS.get(paramversion, None)
1778 version = PDTCRYPT_DEFAULT_VER
1780 kdfname, params = defs ["kdf"]
1782 kdf = kdf_by_version (None, defs)
1783 hsh, _void = kdf (pw, nacl)
1787 if fmt == PDTCRYPT_SCRYPT_INTRANATOR:
1788 out = json.dumps ({ "salt" : base64.b64encode (nacl).decode ()
1789 , "key" : base64.b64encode (hsh) .decode ()
1790 , "paramversion" : paramversion })
1791 elif fmt == PDTCRYPT_SCRYPT_PARAMETERS:
1792 out = json.dumps ({ "salt" : binascii.hexlify (nacl).decode ()
1793 , "key" : binascii.hexlify (hsh) .decode ()
1794 , "version" : version
1795 , "scrypt_params" : { "N" : params ["N"]
1796 , "r" : params ["r"]
1797 , "p" : params ["p"]
1798 , "dkLen" : params ["dkLen"] } })
1800 raise RuntimeError ("bad scrypt output scheme %r" % fmt)
1805 def noise_output_candidates (cands, indent=8, cols=PDTCRYPT_TT_COLUMNS):
1807 Print a list of offsets without garbling the terminal too much.
1809 The indent is counted from column zero; if it is wide enough, the “PDT: ”
1810 marker will be prepended, considered part of the indentation.
1814 idt = " " * indent if indent < 5 else "PDT: " + " " * (indent - 5)
1819 init = True # prevent leading separator
1822 raise ValueError ("the requested indentation exceeds the line "
1823 "width by %d" % (indent - wd))
1833 if lpos > wd: # line break
1849 SLICE_START = 1 # ordering is important to have starts of intervals
1850 SLICE_END = 0 # sorted before equal ends
1852 def find_overlaps (slices):
1854 Find overlapping slices: iterate open/close points of intervals, tracking
1855 the ones open at any time.
1858 inside = set () # of indices into bounds
1859 ovrlp = set () # of indices into bounds
1861 for i, s in enumerate (slices):
1862 bounds.append ((s [0], SLICE_START, i))
1863 bounds.append ((s [1], SLICE_END , i))
1864 bounds = sorted (bounds)
1868 if val [1] == SLICE_START:
1871 if len (inside) > 1: # closing one that overlapped
1875 return [ slices [i] for i in ovrlp ]
1878 def mode_scan (secret, fname, outs=None, nacl=None):
1880 Dissect a binary file, looking for PDTCRYPT headers and objects.
1882 If *outs* is supplied, recoverable data will be dumped into the specified
1886 ifd = os.open (fname, os.O_RDONLY)
1887 except FileNotFoundError:
1888 noise ("PDT: failed to open %s readonly" % fname)
1893 if PDTCRYPT_VERBOSE is True:
1894 noise ("PDT: scan for potential sync points")
1895 cands = locate_hdr_candidates (ifd)
1896 if len (cands) == 0:
1897 noise ("PDT: scan complete: input does not contain potential PDT "
1898 "headers; giving up.")
1900 if PDTCRYPT_VERBOSE is True:
1901 noise ("PDT: scan complete: found %d candidates:" % len (cands))
1902 noise_output_candidates (cands)
1907 junk, todo, slices = [], [], []
1912 vdt, hdr = inspect_hdr (ifd, cand)
1914 if vdt == HDR_CAND_JUNK:
1917 off0 = cand + PDTCRYPT_HDR_SIZE
1918 if PDTCRYPT_VERBOSE is True:
1919 noise ("PDT: obj %d: read payload @%d" % (nobj, off0))
1920 pretty = hdr_fmt_pretty (hdr)
1921 noise (reduce (lambda a, e: (a + "\n" if a else "") + "PDT:\t· " + e,
1922 pretty.splitlines (), ""))
1925 if outs is not None:
1926 ofname = PDTCRYPT_RESCUENAME % nobj
1927 ofd = open2_dump_file (ofname, outs, force=PDTCRYPT_OVERWRITE)
1929 ctsize = hdr ["ctsize"]
1931 l = try_decrypt (ifd, off0, hdr, secret, ofd=ofd)
1933 slices.append ((off0, off0 + l))
1937 if vdt == HDR_CAND_GOOD and ok is True:
1938 noise ("PDT: %d → ✓ valid object %d–%d"
1939 % (cand, off0, off0 + ctsize))
1940 elif vdt == HDR_CAND_FISHY and ok is True:
1941 noise ("PDT: %d → × object %d–%d, corrupt header"
1942 % (cand, off0, off0 + ctsize))
1943 elif vdt == HDR_CAND_GOOD and ok is False:
1944 noise ("PDT: %d → × object %d–%d, problematic payload"
1945 % (cand, off0, off0 + ctsize))
1946 elif vdt == HDR_CAND_FISHY and ok is False:
1947 noise ("PDT: %d → × object %d–%d, corrupt header, problematic "
1948 "ciphertext" % (cand, off0, off0 + ctsize))
1955 noise ("PDT: all headers ok")
1957 noise ("PDT: %d candidates not parseable as headers:" % len (junk))
1958 noise_output_candidates (junk)
1960 overlap = find_overlaps (slices)
1961 if len (overlap) > 0:
1962 noise ("PDT: %d objects overlapping others" % len (overlap))
1963 for slice in overlap:
1964 noise ("PDT: × %d→%d" % (slice [0], slice [1]))
1966 def usage (err=False):
1970 indent = ' ' * len (SELF)
1971 out ("usage: %s SUBCOMMAND { --help" % SELF)
1972 out (" %s | [ -v ] { -p PASSWORD | -k KEY }" % indent)
1973 out (" %s [ { -i | --in } { - | SOURCE } ]" % indent)
1974 out (" %s [ { -n | --nacl } { SALT } ]" % indent)
1975 out (" %s [ { -o | --out } { - | DESTINATION } ]" % indent)
1976 out (" %s [ -D | --no-decrypt ] [ -S | --split ]" % indent)
1977 out (" %s [ -f | --format ]" % indent)
1980 out ("\t\tSUBCOMMAND main mode: { process | scrypt }")
1982 out ("\t\t process: extract objects from PDT archive")
1983 out ("\t\t scrypt: calculate hash from password and first object")
1984 out ("\t\t-p PASSWORD password to derive the encryption key from")
1985 out ("\t\t-k KEY encryption key as 16 bytes in hexadecimal notation")
1986 out ("\t\t-s enforce strict handling of initialization vectors")
1987 out ("\t\t-i SOURCE file name to read from")
1988 out ("\t\t-o DESTINATION file to write output to")
1989 out ("\t\t-n SALT provide salt for scrypt mode in hex encoding")
1990 out ("\t\t-v print extra info")
1991 out ("\t\t-S split into files at object boundaries; this")
1992 out ("\t\t requires DESTINATION to refer to directory")
1993 out ("\t\t-D PDT header and ciphertext passthrough")
1994 out ("\t\t-f format of SCRYPT hash output (“default” or “parameters”)")
1996 out ("\tinstead of filenames, “-” may used to specify stdin / stdout")
1998 sys.exit ((err is True) and 42 or 0)
2008 def parse_argv (argv):
2009 global PDTCRYPT_OVERWRITE
2011 mode = PDTCRYPT_DECRYPT
2017 scrypt_format = PDTCRYPT_SCRYPT_DEFAULT
2020 SELF = os.path.basename (next (argvi))
2023 rawsubcmd = next (argvi)
2024 subcommand = PDTCRYPT_SUB [rawsubcmd]
2025 except StopIteration:
2026 bail ("ERROR: subcommand required")
2028 bail ("ERROR: invalid subcommand “%s” specified" % rawsubcmd)
2034 except StopIteration:
2035 bail ("ERROR: argument list incomplete")
2037 def checked_secret (s):
2042 bail ("ERROR: encountered “%s” but secret already given" % arg)
2045 if arg in [ "-h", "--help" ]:
2048 elif arg in [ "-v", "--verbose", "--wtf" ]:
2049 global PDTCRYPT_VERBOSE
2050 PDTCRYPT_VERBOSE = True
2051 elif arg in [ "-i", "--in", "--source" ]:
2052 insspec = checked_arg ()
2053 if PDTCRYPT_VERBOSE is True: noise ("PDT: decrypt from %s" % insspec)
2054 elif arg in [ "-p", "--password" ]:
2055 arg = checked_arg ()
2056 checked_secret (make_secret (password=arg))
2057 if PDTCRYPT_VERBOSE is True: noise ("PDT: decrypting with password")
2059 if subcommand == PDTCRYPT_SUB_PROCESS:
2060 if arg in [ "-s", "--strict-ivs" ]:
2061 global PDTCRYPT_STRICTIVS
2062 PDTCRYPT_STRICTIVS = True
2063 elif arg in [ "-o", "--out", "--dest", "--sink" ]:
2064 outsspec = checked_arg ()
2065 if PDTCRYPT_VERBOSE is True: noise ("PDT: decrypt to %s" % outsspec)
2066 elif arg in [ "-f", "--force" ]:
2067 PDTCRYPT_OVERWRITE = True
2068 if PDTCRYPT_VERBOSE is True: noise ("PDT: overwrite existing files")
2069 elif arg in [ "-S", "--split" ]:
2070 mode |= PDTCRYPT_SPLIT
2071 if PDTCRYPT_VERBOSE is True: noise ("PDT: split files")
2072 elif arg in [ "-D", "--no-decrypt" ]:
2073 mode &= ~PDTCRYPT_DECRYPT
2074 if PDTCRYPT_VERBOSE is True: noise ("PDT: not decrypting")
2075 elif arg in [ "-k", "--key" ]:
2076 arg = checked_arg ()
2077 checked_secret (make_secret (key=arg))
2078 if PDTCRYPT_VERBOSE is True: noise ("PDT: decrypting with key")
2080 bail ("ERROR: unexpected positional argument “%s”" % arg)
2081 elif subcommand == PDTCRYPT_SUB_SCRYPT:
2082 if arg in [ "-n", "--nacl", "--salt" ]:
2083 nacl = checked_arg ()
2084 if PDTCRYPT_VERBOSE is True: noise ("PDT: salt key with %s" % nacl)
2085 elif arg in [ "-f", "--format" ]:
2086 arg = checked_arg ()
2088 scrypt_format = PDTCRYPT_SCRYPT_FORMAT [arg]
2090 bail ("ERROR: invalid scrypt output format %s" % arg)
2091 if PDTCRYPT_VERBOSE is True:
2092 noise ("PDT: scrypt output format “%s”" % scrypt_format)
2094 bail ("ERROR: unexpected positional argument “%s”" % arg)
2095 elif subcommand == PDTCRYPT_SUB_SCAN:
2096 if arg in [ "-o", "--out", "--dest", "--sink" ]:
2097 outsspec = checked_arg ()
2098 if PDTCRYPT_VERBOSE is True: noise ("PDT: decrypt to %s" % outsspec)
2099 elif arg in [ "-f", "--force" ]:
2100 PDTCRYPT_OVERWRITE = True
2101 if PDTCRYPT_VERBOSE is True: noise ("PDT: overwrite existing files")
2103 bail ("ERROR: unexpected positional argument “%s”" % arg)
2106 if PDTCRYPT_VERBOSE is True:
2107 noise ("ERROR: no password or key specified, trying $PDTCRYPT_PASSWORD")
2108 epw = os.getenv ("PDTCRYPT_PASSWORD")
2110 checked_secret (make_secret (password=epw.strip ()))
2113 if PDTCRYPT_VERBOSE is True:
2114 noise ("ERROR: no password or key specified, trying $PDTCRYPT_KEY")
2115 ek = os.getenv ("PDTCRYPT_KEY")
2117 checked_secret (make_secret (key=ek.strip ()))
2120 if subcommand == PDTCRYPT_SUB_SCRYPT:
2121 bail ("ERROR: scrypt hash mode requested but no password given")
2122 elif mode & PDTCRYPT_DECRYPT:
2123 bail ("ERROR: decryption requested but no password given")
2125 if mode & PDTCRYPT_SPLIT and outsspec is None:
2126 bail ("ERROR: split mode is incompatible with stdout sink "
2129 if subcommand == PDTCRYPT_SUB_SCAN and outsspec is None:
2130 pass # no output by default in scan mode
2131 elif mode & PDTCRYPT_SPLIT or subcommand == PDTCRYPT_SUB_SCAN:
2132 # destination must be directory
2134 bail ("ERROR: mode is incompatible with stdout sink")
2137 os.makedirs (outsspec, 0o700)
2138 except FileExistsError:
2139 # if it’s a directory with appropriate perms, everything is
2140 # good; otherwise, below invocation of open(2) will fail
2142 outs = os.open (outsspec, os.O_DIRECTORY, 0o600)
2143 except FileNotFoundError as exn:
2144 bail ("ERROR: cannot create target directory “%s”" % outsspec)
2145 except NotADirectoryError as exn:
2146 bail ("ERROR: target path “%s” is not a directory" % outsspec)
2148 outs = deptdcrypt_mk_stream (PDTCRYPT_SINK, outsspec or "-")
2150 if subcommand == PDTCRYPT_SUB_SCAN:
2152 bail ("ERROR: please supply an input file for scanning")
2154 bail ("ERROR: input must be seekable; please specify a file")
2155 return True, partial (mode_scan, secret, insspec, outs, nacl=nacl)
2157 if subcommand == PDTCRYPT_SUB_SCRYPT:
2158 if secret [0] == PDTCRYPT_SECRET_KEY:
2159 bail ("ERROR: scrypt mode requires a password")
2160 if insspec is not None and nacl is not None \
2161 or insspec is None and nacl is None :
2162 bail ("ERROR: please supply either an input file or "
2167 if insspec is not None or subcommand != PDTCRYPT_SUB_SCRYPT:
2168 ins = deptdcrypt_mk_stream (PDTCRYPT_SOURCE, insspec or "-")
2170 if subcommand == PDTCRYPT_SUB_SCRYPT:
2171 return True, partial (mode_scrypt, secret [1].encode (), ins, nacl,
2174 return True, partial (mode_depdtcrypt, mode, secret, ins, outs)
2178 ok, runner = parse_argv (argv)
2180 if ok is True: return runner ()
2185 if __name__ == "__main__":
2186 sys.exit (main (sys.argv))