6 ===============================================================================
7 crypto -- Encryption Layer for the Deltatar Backup
8 ===============================================================================
12 - AES-GCM for the symmetric encryption;
17 - NIST Recommendation for Block Cipher Modes of Operation: Galois/Counter
19 http://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-38d.pdf
22 https://cryptome.org/2014/01/aes-gcm-v1.pdf
24 - Authentication weaknesses in GCM
25 http://csrc.nist.gov/groups/ST/toolkit/BCM/documents/comments/CWC-GCM/Ferguson2.pdf
27 Trouble with python-cryptography packages: authentication tags can only be
28 passed in advance: https://github.com/pyca/cryptography/pull/3421
31 -------------------------------------------------------------------------------
33 Errors fall into roughly three categories:
35 - Cryptographical errors or invalid data.
37 - ``InvalidGCMTag`` (decryption failed on account of an invalid GCM
39 - ``InvalidIVFixedPart`` (IV fixed part of object not found in list),
40 - ``DuplicateIV`` (the IV of an encrypted object already occurred),
41 - ``DecryptionError`` (used in CLI decryption for presenting error
42 conditions to the user).
44 - Incorrect usage of the library.
46 - ``InvalidParameter`` (non-conforming user supplied parameter),
47 - ``InvalidHeader`` (data passed for reading not parsable into header),
48 - ``FormatError`` (cannot handle header or parameter version),
51 - Bad internal state. If one of these is encountered it means that a state
52 was reached that shouldn’t occur during normal processing.
57 Also, ``EndOfFile`` is used as a sentinel to communicate that a stream supplied
58 for reading is exhausted.
60 Initialization Vectors
61 -------------------------------------------------------------------------------
63 Initialization vectors are checked for reuse during the lifetime of a decryptor.
64 The fixed counters for metadata files cannot be reused and attempts to do so
65 will cause a DuplicateIV error. This means the length of objects encrypted with
66 a metadata counter is capped at 63 GB.
68 For ordinary, non-metadata payload, there is an optional mode with strict IV
69 checking that causes a crypto context to fail if an IV encountered or created
70 was already used for decrypting or encrypting, respectively, an earlier object.
71 Note that this mode can trigger false positives when decrypting non-linearly,
72 e. g. when traversing the same object multiple times. Since the crypto context
73 has no notion of a position in a PDT encrypted archive, this condition must be
74 sorted out downstream.
77 -------------------------------------------------------------------------------
79 ``crypto.py`` may be invoked as a script for decrypting, validating, and
80 splitting PDT encrypted files. Consult the usage message for details.
84 Decrypt from stdin using the password ‘foo’: ::
86 $ crypto.py process foo -i - -o - <some-file.tar.gz.pdtcrypt >some-file.tar.gz
88 Output verbose information about the encrypted objects in the archive: ::
90 $ crypto.py process foo -v -i some-file.tar.gz.pdtcrypt -o /dev/null
91 PDT: decrypt from some-file.tar.gz.pdtcrypt
92 PDT: decrypt to /dev/null
93 PDT: source: file some-file.tar.gz.pdtcrypt
94 PDT: sink: file /dev/null
96 PDT: · version = 1 : 0100
97 PDT: · paramversion = 1 : 0100
98 PDT: · nacl : d270 b031 00d1 87e2 c946 610d 7b7f 7e5f
99 PDT: · iv : 02ee 3dd7 a963 1eb1 0100 0000
100 PDT: · ctsize = 591 : 4f02 0000 0000 0000
101 PDT: · tag : 5b2d 6d8b 8f82 4842 12fd 0b10 b6e3 369b
102 PDT: 64 decrypt obj no. 1, 591 B
103 PDT: · [64] 0% done, read block (591 B of 591 B remaining)
104 PDT: · decrypt ciphertext 591 B
105 PDT: · decrypt plaintext 591 B
109 Also, the mode *scrypt* allows deriving encryption keys. To calculate the
110 encryption key from the password ‘foo’ and the salt of the first object in a
111 PDT encrypted file: ::
113 $ crypto.py scrypt foo -i some-file.pdtcrypt
114 {"paramversion": 1, "salt": "Cqzbk48e3peEjzWto8D0yA==", "key": "JH9EkMwaM4x9F5aim5gK/Q=="}
116 The computed 16 byte key is given in hexadecimal notation in the value to
117 ``hash`` and can be fed into Python’s ``binascii.unhexlify()`` to obtain the
118 corresponding binary representation.
120 Note that in Scrypt hashing mode, no data integrity checks are being performed.
121 If the wrong password is given, a wrong key will be derived. Whether the password
122 was indeed correct can only be determined by decrypting. Note that since PDT
123 archives essentially consist of a stream of independent objects, the salt and
124 other parameters may change. Thus a key derived using above method from the
125 first object doesn’t necessarily apply to any of the subsequent objects.
134 from functools import reduce, partial
145 except ImportError as exn:
148 if __name__ == "__main__": ## Work around the import mechanism lest Python’s
149 pwd = os.getcwd() ## preference for local imports causes a cyclical
150 ## import (crypto → pylibscrypt → […] → ./tarfile → crypto).
151 sys.path = [ p for p in sys.path if p.find ("deltatar") < 0 ]
154 from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes
155 from cryptography.hazmat.backends import default_backend
159 __all__ = [ "hdr_make", "hdr_read", "hdr_fmt", "hdr_fmt_pretty"
161 , "PDTCRYPT_HDR_SIZE", "AES_GCM_IV_CNT_DATA"
162 , "AES_GCM_IV_CNT_INFOFILE", "AES_GCM_IV_CNT_INDEX"
166 ###############################################################################
168 ###############################################################################
170 class EndOfFile (Exception):
174 def __init__ (self, n=None, msg=None):
180 class InvalidParameter (Exception):
181 """Inputs not valid for PDT encryption."""
185 class InvalidHeader (Exception):
186 """Header not valid."""
190 class InvalidGCMTag (Exception):
192 The GCM tag calculated during decryption differs from that in the object
198 class InvalidIVFixedPart (Exception):
200 IV fixed part not in supplied list: either the backup is corrupt or the
201 current object does not belong to it.
206 class IVFixedPartError (Exception):
208 Error creating a unique IV fixed part: repeated calls to system RNG yielded
209 the same sequence of bytes as the last IV used.
214 class InvalidFileCounter (Exception):
216 When encrypting, an attempted reuse of a dedicated counter (info file,
217 index file) was caught.
222 class DuplicateIV (Exception):
224 During encryption, the current IV fixed part is identical to an already
225 existing IV (same prefix and file counter). This indicates tampering or
226 programmer error and cannot be recovered from.
231 class NonConsecutiveIV (Exception):
233 IVs not numbered consecutively. This is a hard error with strict IV
234 checking. Precludes random access to the encrypted objects.
239 class FormatError (Exception):
240 """Unusable parameters in header."""
244 class DecryptionError (Exception):
245 """Error during decryption with ``crypto.py`` on the command line."""
249 class Unreachable (Exception):
251 Makeshift __builtin_unreachable(); always a programmer error if
257 class InternalError (Exception):
258 """Errors not ascribable to bad user inputs or cryptography."""
262 ###############################################################################
263 ## crypto layer version
264 ###############################################################################
266 ENCRYPTION_PARAMETERS = \
268 { "kdf": ("dummy", 16)
269 , "enc": "passthrough" }
277 , "enc": "aes-gcm" } }
279 ###############################################################################
281 ###############################################################################
283 PDTCRYPT_HDR_MAGIC = b"PDTCRYPT"
285 PDTCRYPT_HDR_SIZE_MAGIC = 8 # 8
286 PDTCRYPT_HDR_SIZE_VERSION = 2 # 10
287 PDTCRYPT_HDR_SIZE_PARAMVERSION = 2 # 12
288 PDTCRYPT_HDR_SIZE_NACL = 16 # 28
289 PDTCRYPT_HDR_SIZE_IV = 12 # 40
290 PDTCRYPT_HDR_SIZE_CTSIZE = 8 # 48
291 PDTCRYPT_HDR_SIZE_TAG = 16 # 64 GCM auth tag
293 PDTCRYPT_HDR_SIZE = PDTCRYPT_HDR_SIZE_MAGIC + PDTCRYPT_HDR_SIZE_VERSION \
294 + PDTCRYPT_HDR_SIZE_PARAMVERSION + PDTCRYPT_HDR_SIZE_NACL \
295 + PDTCRYPT_HDR_SIZE_IV + PDTCRYPT_HDR_SIZE_CTSIZE \
296 + PDTCRYPT_HDR_SIZE_TAG # = 64
298 # precalculate offsets since Python can’t do constant folding over names
299 HDR_OFF_VERSION = PDTCRYPT_HDR_SIZE_MAGIC
300 HDR_OFF_PARAMVERSION = HDR_OFF_VERSION + PDTCRYPT_HDR_SIZE_VERSION
301 HDR_OFF_NACL = HDR_OFF_PARAMVERSION + PDTCRYPT_HDR_SIZE_PARAMVERSION
302 HDR_OFF_IV = HDR_OFF_NACL + PDTCRYPT_HDR_SIZE_NACL
303 HDR_OFF_CTSIZE = HDR_OFF_IV + PDTCRYPT_HDR_SIZE_IV
304 HDR_OFF_TAG = HDR_OFF_CTSIZE + PDTCRYPT_HDR_SIZE_CTSIZE
308 FMT_I2N_IV = "<8sL" # 8 random bytes ‖ 32 bit counter
309 FMT_I2N_HDR = ("<" # host byte order
313 "16s" # sodium chloride
319 AES_KEY_SIZE = 16 # b"0123456789abcdef"
320 AES_KEY_SIZE_B64 = 24 # b'MDEyMzQ1Njc4OWFiY2RlZg=='
321 AES_GCM_MAX_SIZE = (1 << 36) - (1 << 5) # 2^39 - 2^8 b ≅ 64 GB
322 PDTCRYPT_MAX_OBJ_SIZE_DEFAULT = 63 * (1 << 30) # 63 GB
323 PDTCRYPT_MAX_OBJ_SIZE = PDTCRYPT_MAX_OBJ_SIZE_DEFAULT
325 # index and info files are written on-the fly while encrypting so their
326 # counters must be available in advance
327 AES_GCM_IV_CNT_INFOFILE = 1 # constant
328 AES_GCM_IV_CNT_INDEX = AES_GCM_IV_CNT_INFOFILE + 1
329 AES_GCM_IV_CNT_DATA = AES_GCM_IV_CNT_INDEX + 1 # also for multivolume
330 AES_GCM_IV_CNT_MAX_DEFAULT = 0xffFFffFF
331 AES_GCM_IV_CNT_MAX = AES_GCM_IV_CNT_MAX_DEFAULT
333 # IV structure and generation
334 PDTCRYPT_IV_GEN_MAX_RETRIES = 10 # ×
335 PDTCRYPT_IV_FIXEDPART_SIZE = 8 # B
336 PDTCRYPT_IV_COUNTER_SIZE = 4 # B
338 # secret type: PW of string | KEY of char [16]
339 PDTCRYPT_SECRET_PW = 0
340 PDTCRYPT_SECRET_KEY = 1
342 ###############################################################################
344 ###############################################################################
350 # , paramversion : u16
356 # fn hdr_read (f : handle) -> hdrinfo;
357 # fn hdr_make (f : handle, h : hdrinfo) -> IOResult<usize>;
358 # fn hdr_fmt (h : hdrinfo) -> String;
363 Read bytes as header structure.
365 If the input could not be interpreted as a header, fail with
370 mag, version, paramversion, nacl, iv, ctsize, tag = \
371 struct.unpack (FMT_I2N_HDR, data)
372 except Exception as exn:
373 raise InvalidHeader ("error unpacking header from [%r]: %s"
374 % (binascii.hexlify (data), str (exn)))
376 if mag != PDTCRYPT_HDR_MAGIC:
377 raise InvalidHeader ("bad magic in header: expected [%s], got [%s]"
378 % (PDTCRYPT_HDR_MAGIC, mag))
381 { "version" : version
382 , "paramversion" : paramversion
390 def hdr_read_stream (instr):
392 Read header from stream at the current position.
394 Fail with ``InvalidHeader`` if insufficient bytes were read from the
395 stream, or if the content could not be interpreted as a header.
397 data = instr.read(PDTCRYPT_HDR_SIZE)
401 elif ldata != PDTCRYPT_HDR_SIZE:
402 raise InvalidHeader ("hdr_read_stream: expected %d B, received %d B"
403 % (PDTCRYPT_HDR_SIZE, ldata))
404 return hdr_read (data)
407 def hdr_from_params (version, paramversion, nacl, iv, ctsize, tag):
409 Assemble the necessary values into a PDTCRYPT header.
411 :type version: int to fit uint16_t
412 :type paramversion: int to fit uint16_t
413 :type nacl: bytes to fit uint8_t[16]
414 :type iv: bytes to fit uint8_t[12]
415 :type size: int to fit uint64_t
416 :type tag: bytes to fit uint8_t[16]
418 buf = bytearray (PDTCRYPT_HDR_SIZE)
419 bufv = memoryview (buf)
422 struct.pack_into (FMT_I2N_HDR, bufv, 0,
424 version, paramversion, nacl, iv, ctsize, tag)
425 except Exception as exn:
426 return False, "error assembling header: %s" % str (exn)
428 return True, bytes (buf)
431 def hdr_make_dummy (s):
433 Create a header sized block of bytes initialized to a value derived from a
434 string. Used to verify we’ve jumped back correctly to the actual position
435 of the object header.
437 c = reduce (lambda a, c: a + ord(c), s, 0) % 0xFF
438 return bytes (bytearray (struct.pack ("B", c)) * PDTCRYPT_HDR_SIZE)
443 Assemble a header from the given header structure.
445 return hdr_from_params (version=hdr.get("version"),
446 paramversion=hdr.get("paramversion"),
447 nacl=hdr.get("nacl"), iv=hdr.get("iv"),
448 ctsize=hdr.get("ctsize"), tag=hdr.get("tag"))
451 HDR_FMT = "I2n_header { version: %d, paramversion: %d, nacl: %s[%d]," \
452 " iv: %s[%d], ctsize: %d, tag: %s[%d] }"
455 """Format a header structure into readable output."""
456 return HDR_FMT % (h["version"], h["paramversion"],
457 binascii.hexlify (h["nacl"]), len(h["nacl"]),
458 binascii.hexlify (h["iv"]), len(h["iv"]),
460 binascii.hexlify (h["tag"]), len(h["tag"]))
463 def hex_spaced_of_bytes (b):
464 """Format bytes object, hexdump style."""
465 return " ".join ([ "%.2x%.2x" % (c1, c2)
466 for c1, c2 in zip (b[0::2], b[1::2]) ]) \
467 + (len (b) | 1 == len (b) and " %.2x" % b[-1] or "") # odd lengths
470 def hdr_iv_counter (h):
471 """Extract the variable part of the IV of the given header."""
472 _fixed, cnt = struct.unpack (FMT_I2N_IV, h ["iv"])
476 def hdr_iv_fixed (h):
477 """Extract the fixed part of the IV of the given header."""
478 fixed, _cnt = struct.unpack (FMT_I2N_IV, h ["iv"])
482 hdr_dump = hex_spaced_of_bytes
486 """version = %-4d : %s
487 paramversion = %-4d : %s
494 def hdr_fmt_pretty (h):
496 Format header structure into multi-line representation of its contents and
497 their raw representation. (Omit the implicit “PDTCRYPT” magic bytes that
498 precede every header.)
500 return HDR_FMT_PRETTY \
502 hex_spaced_of_bytes (struct.pack (FMT_UINT16_LE, h["version"])),
504 hex_spaced_of_bytes (struct.pack (FMT_UINT16_LE, h["paramversion"])),
505 hex_spaced_of_bytes (h["nacl"]),
506 hex_spaced_of_bytes (h["iv"]),
508 hex_spaced_of_bytes (struct.pack (FMT_UINT64_LE, h["ctsize"])),
509 hex_spaced_of_bytes (h["tag"]))
511 IV_FMT = "((f %s) (c %d))"
514 """Format the two components of an IV in a readable fashion."""
515 fixed, cnt = struct.unpack (FMT_I2N_IV, iv)
516 return IV_FMT % (binascii.hexlify (fixed), cnt)
519 ###############################################################################
521 ###############################################################################
523 class Location (object):
527 def restore_loc_fmt (loc):
529 % (loc.n, loc.offset)
531 def locate_hdr_candidates (fd):
533 Walk over instances of the magic string in the payload, collecting their
534 positions. If the offset of the first found instance is not zero, the file
535 begins with leading garbage.
537 :return: The list of offsets in the file.
541 mm = mmap.mmap(fd, 0, mmap.MAP_SHARED, mmap.PROT_READ)
544 pos = mm.find (PDTCRYPT_HDR_MAGIC, pos)
553 HDR_CAND_GOOD = 0 # header marks begin of valid object
554 HDR_CAND_FISHY = 1 # inconclusive (tag mismatch, obj overlap etc.)
555 HDR_CAND_JUNK = 2 # not a header / object unreadable
558 { HDR_CAND_GOOD : "valid"
559 , HDR_CAND_FISHY : "fishy"
560 , HDR_CAND_JUNK : "junk"
564 def verdict_fmt (vdt):
565 return HDR_VERDICT_NAME [vdt]
568 def inspect_hdr (fd, off):
570 Attempt to parse a header in *fd* at position *off*.
572 Returns a verdict about the quality of that header plus the parsed header
576 _ = os.lseek (fd, off, os.SEEK_SET)
578 if os.lseek (fd, 0, os.SEEK_CUR) != off:
579 if PDTCRYPT_VERBOSE is True:
580 noise ("PDT: %d → dismissed (lseek() past EOF)" % off)
581 return HDR_CAND_JUNK, None
583 raw = os.read (fd, PDTCRYPT_HDR_SIZE)
584 if len (raw) != PDTCRYPT_HDR_SIZE:
585 if PDTCRYPT_VERBOSE is True:
586 noise ("PDT: %d → dismissed (EOF inside header)" % off)
587 return HDR_CAND_JUNK, None
591 except InvalidHeader as exn:
592 if PDTCRYPT_VERBOSE is True:
593 noise ("PDT: %d → dismissed (invalid: [%s])" % (off, str (exn)))
594 return HDR_CAND_JUNK, None
596 obj0 = off + PDTCRYPT_HDR_SIZE
597 objX = obj0 + hdr ["ctsize"]
599 eof = os.lseek (fd, 0, os.SEEK_END)
601 if PDTCRYPT_VERBOSE is True:
602 noise ("PDT: %d → EOF inside object (%d≤%d≤%d); adjusting size to "
603 "%d" % (off, obj0, eof, objX, (eof - obj0)))
604 # try reading up to the end
605 hdr ["ctsize"] = eof - obj0
606 return HDR_CAND_FISHY, hdr
608 return HDR_CAND_GOOD, hdr
611 def try_decrypt (ifd, off, hdr, secret, ofd=-1):
613 Attempt to decrypt the object in the (seekable) descriptor *ifd* starting
614 at *off* using the metadata in *hdr* and *secret*. An output fd can be
615 specified with *ofd*; if it is *-1* – the default –, the decrypted payload
618 Always creates a fresh decryptor, so validation steps across objects don’t
621 Errors during GCM tag validation are ignored.
623 ctleft = hdr ["ctsize"]
627 if ks == PDTCRYPT_SECRET_PW:
628 decr = Decrypt (password=secret [1])
629 elif ks == PDTCRYPT_SECRET_KEY:
631 decr = Decrypt (key=key)
638 os.lseek (ifd, pos, os.SEEK_SET)
641 cnksiz = min (ctleft, PDTCRYPT_BLOCKSIZE)
642 cnk = os.read (ifd, cnksiz)
645 pt = decr.process (cnk)
650 except InvalidGCMTag:
651 noise ("PDT: GCM tag mismatch for object %d–%d"
652 % (off, off + hdr ["ctsize"]))
653 if len (pt) > 0 and ofd != -1:
656 except Exception as exn:
657 noise ("PDT: error decrypting object %d–%d@%d, %d B remaining [%s]"
658 % (off, off + hdr ["ctsize"], pos, ctleft, exn))
664 def readable_objects_offsets (ifd, secret, cands):
666 From a list of candidates, locate the ones that mark the start of actual
667 readable PDTCRYPT objects.
671 for i, cand in enumerate (cands):
672 vdt, hdr = inspect_hdr (ifd, cand)
673 if vdt == HDR_CAND_JUNK:
674 pass # ignore unreadable ones
675 elif vdt in [HDR_CAND_GOOD, HDR_CAND_FISHY]:
676 ctsize = hdr ["ctsize"]
677 off0 = cand + PDTCRYPT_HDR_SIZE
678 ok = try_decrypt (ifd, off0, hdr, secret) == ctsize
680 good.append ((cand, off0 + ctsize))
682 overlap = find_overlaps (good)
684 return [ g [0] for g in good ]
687 def reconstruct_offsets (fname, secret):
688 ifd = os.open (fname, os.O_RDONLY)
691 cands = locate_hdr_candidates (ifd)
692 return readable_objects_offsets (ifd, secret, cands)
697 ###############################################################################
699 ###############################################################################
701 def make_secret (password=None, key=None):
703 Safely create a “secret” value that consists either of a key or a password.
704 Inputs are validated: the password is accepted as (UTF-8 encoded) bytes or
705 string; for the key only a bytes object of the proper size or a base64
706 encoded string thereof is accepted.
708 If both are provided, the key is preferred over the password; no checks are
709 performed whether the key is derived from the password.
711 :returns: secret value if inputs were acceptable | None otherwise.
714 if isinstance (key, str) is True:
715 key = key.encode ("utf-8")
716 if isinstance (key, bytes) is True:
717 if len (key) == AES_KEY_SIZE:
718 return (PDTCRYPT_SECRET_KEY, key)
719 if len (key) == AES_KEY_SIZE * 2:
721 key = binascii.unhexlify (key)
722 return (PDTCRYPT_SECRET_KEY, key)
723 except binascii.Error: # garbage in string
725 if len (key) == AES_KEY_SIZE_B64:
727 key = base64.b64decode (key)
728 # the base64 processor is very tolerant and allows for
729 # arbitrary trailing and leading data thus the data obtained
730 # must be checked for the proper length
731 if len (key) == AES_KEY_SIZE:
732 return (PDTCRYPT_SECRET_KEY, key)
733 except binascii.Error: # “incorrect padding”
735 elif password is not None:
736 if isinstance (password, str) is True:
737 return (PDTCRYPT_SECRET_PW, password)
738 elif isinstance (password, bytes) is True:
740 password = password.decode ("utf-8")
741 return (PDTCRYPT_SECRET_PW, password)
742 except UnicodeDecodeError:
748 ###############################################################################
749 ## passthrough / null encryption
750 ###############################################################################
752 class PassthroughCipher (object):
754 tag = struct.pack ("<QQ", 0, 0)
756 def __init__ (self) : pass
758 def update (self, b) : return b
760 def finalize (self) : return b""
762 def finalize_with_tag (self, _) : return b""
764 ###############################################################################
765 ## convenience wrapper
766 ###############################################################################
769 def kdf_dummy (klen, password, _nacl):
771 Fake KDF for testing purposes that is called when parameter version zero is
774 q, r = divmod (klen, len (password))
775 if isinstance (password, bytes) is False:
776 password = password.encode ()
777 return password * q + password [:r], b""
780 SCRYPT_KEY_MEMO = { } # static because needed for both the info file and the archive
783 def kdf_scrypt (params, password, nacl):
785 Wrapper for the Scrypt KDF, corresponds to parameter version one. The
786 computation result is memoized based on the inputs to facilitate spawning
787 multiple encryption contexts.
792 dkLen = params["dkLen"]
795 nacl = os.urandom (params["NaCl_LEN"])
797 key_parms = (password, nacl, N, r, p, dkLen)
798 global SCRYPT_KEY_MEMO
799 if key_parms not in SCRYPT_KEY_MEMO:
800 SCRYPT_KEY_MEMO [key_parms] = \
801 pylibscrypt.scrypt (password, nacl, N, r, p, dkLen)
802 return SCRYPT_KEY_MEMO [key_parms], nacl
805 def kdf_by_version (paramversion=None, defs=None):
807 Pick the KDF handler corresponding to the parameter version or the
810 :rtype: function (password : str, nacl : str) -> str
812 if paramversion is not None:
813 defs = ENCRYPTION_PARAMETERS.get(paramversion, None)
815 raise InvalidParameter ("no encryption parameters for version %r"
817 (kdf, params) = defs["kdf"]
819 if kdf == "scrypt" : fn = kdf_scrypt
820 if kdf == "dummy" : fn = kdf_dummy
822 raise ValueError ("key derivation method %r unknown" % kdf)
823 return partial (fn, params)
826 ###############################################################################
828 ###############################################################################
830 def scrypt_hashsource (pw, ins):
832 Calculate the SCRYPT hash from the password and the information contained
833 in the first header found in ``ins``.
835 This does not validate whether the first object is encrypted correctly.
837 if isinstance (pw, str) is True:
839 elif isinstance (pw, bytes) is False:
840 raise InvalidParameter ("password must be a string, not %s"
842 if isinstance (ins, io.BufferedReader) is False and \
843 isinstance (ins, io.FileIO) is False:
844 raise InvalidParameter ("file to hash must be opened in “binary” mode")
847 hdr = hdr_read_stream (ins)
848 except EndOfFile as exn:
849 noise ("PDT: malformed input: end of file reading first object header")
854 pver = hdr ["paramversion"]
855 if PDTCRYPT_VERBOSE is True:
856 noise ("PDT: salt of first object : %s" % binascii.hexlify (nacl))
857 noise ("PDT: parameter version of archive : %d" % pver)
860 defs = ENCRYPTION_PARAMETERS.get(pver, None)
861 kdfname, params = defs ["kdf"]
862 if kdfname != "scrypt":
863 noise ("PDT: input is not an SCRYPT archive")
866 kdf = kdf_by_version (None, defs)
867 except ValueError as exn:
868 noise ("PDT: object has unknown parameter version %d" % pver)
870 hsh, _void = kdf (pw, nacl)
872 return hsh, nacl, hdr ["version"], pver
875 def scrypt_hashfile (pw, fname):
877 Calculate the SCRYPT hash from the password and the information contained
878 in the first header found in the given file. The header is read only at
881 with deptdcrypt_mk_stream (PDTCRYPT_SOURCE, fname or "-") as ins:
882 hsh, _void, _void, _void = scrypt_hashsource (pw, ins)
886 ###############################################################################
888 ###############################################################################
890 class Crypto (object):
892 Encryption context to remain alive throughout an entire tarfile pass.
897 cnt = None # file counter (uint32_t != 0)
898 iv = None # current IV
899 fixed = None # accu for 64 bit fixed parts of IV
900 used_ivs = None # tracks IVs
901 strict_ivs = False # if True, panic on duplicate object IV
910 info_counter_used = False
911 index_counter_used = False
913 def __init__ (self, *al, **akv):
914 self.used_ivs = set ()
915 self.set_parameters (*al, **akv)
918 def next_fixed (self):
923 def set_object_counter (self, cnt=None):
925 Safely set the internal counter of encrypted objects. Numerous
928 The same counter may not be reused in combination with one IV fixed
929 part. This is validated elsewhere in the IV handling.
931 Counter zero is invalid. The first two counters are reserved for
932 metadata. The implementation does not allow for splitting metadata
933 files over multiple encrypted objects. (This would be possible by
934 assigning new fixed parts.) Thus in a Deltatar backup there is at most
935 one object with a counter value of one and two. On creation of a
936 context, the initial counter may be chosen. The globals
937 ``AES_GCM_IV_CNT_INFOFILE`` and ``AES_GCM_IV_CNT_INDEX`` can be used to
938 request one of the reserved values. If one of these values has been
939 used, any further attempt of setting the counter to that value will
940 be rejected with an ``InvalidFileCounter`` exception.
942 Out of bounds values (i. e. below one and more than the maximum of 2³²)
943 cause an ``InvalidParameter`` exception to be thrown.
946 self.cnt = AES_GCM_IV_CNT_DATA
948 if cnt == 0 or cnt > AES_GCM_IV_CNT_MAX + 1:
949 raise InvalidParameter ("invalid counter value %d requested: "
950 "acceptable values are from 1 to %d"
951 % (cnt, AES_GCM_IV_CNT_MAX))
952 if cnt == AES_GCM_IV_CNT_INFOFILE:
953 if self.info_counter_used is True:
954 raise InvalidFileCounter ("attempted to reuse info file "
955 "counter %d: must be unique" % cnt)
956 self.info_counter_used = True
957 elif cnt == AES_GCM_IV_CNT_INDEX:
958 if self.index_counter_used is True:
959 raise InvalidFileCounter ("attempted to reuse index file "
960 " counter %d: must be unique" % cnt)
961 self.index_counter_used = True
962 if cnt <= AES_GCM_IV_CNT_MAX:
965 # cnt == AES_GCM_IV_CNT_MAX + 1 → wrap
966 self.cnt = AES_GCM_IV_CNT_DATA
970 def set_parameters (self, password=None, key=None, paramversion=None,
971 nacl=None, counter=None, strict_ivs=False):
973 Configure the internal state of a crypto context. Not intended for
977 self.set_object_counter (counter)
978 self.strict_ivs = strict_ivs
980 if paramversion is not None:
981 self.paramversion = paramversion
984 self.key, self.nacl = key, nacl
987 if password is not None:
988 if isinstance (password, bytes) is False:
989 password = str.encode (password)
990 self.password = password
991 if paramversion is None and nacl is None:
992 # postpone key setup until first header is available
994 kdf = kdf_by_version (paramversion)
996 self.key, self.nacl = kdf (password, nacl)
999 def process (self, buf):
1001 Encrypt / decrypt a buffer. Invokes the ``.update()`` method on the
1002 wrapped encryptor or decryptor, respectively.
1004 The Cryptography exception ``AlreadyFinalized`` is translated to an
1005 ``InternalError`` at this point. It may occur in sound code when the GC
1006 closes an encrypting stream after an error. Everywhere else it must be
1009 if self.enc is None:
1010 raise RuntimeError ("process: context not initialized")
1011 self.stats ["in"] += len (buf)
1013 out = self.enc.update (buf)
1014 except cryptography.exceptions.AlreadyFinalized as exn:
1015 raise InternalError (exn)
1016 self.stats ["out"] += len (out)
1020 def next (self, password, paramversion, nacl, iv):
1022 Prepare for encrypting another object: Reset the data counters and
1023 change the configuration in case one of the variable parameters differs
1024 from the last object. Also check the IV for duplicates and error out
1025 if strict checking was requested.
1029 self.stats ["obj"] += 1
1031 self.check_duplicate_iv (iv)
1033 if ( self.paramversion != paramversion
1034 or self.password != password
1035 or self.nacl != nacl):
1036 self.set_parameters (password=password, paramversion=paramversion,
1037 nacl=nacl, strict_ivs=self.strict_ivs)
1040 def check_duplicate_iv (self, iv):
1042 Add an IV (the 12 byte representation as in the header) to the list. With
1043 strict checking enabled, this will throw a ``DuplicateIV``. Depending on
1044 the context, this may indicate a serious error (IV reuse).
1046 if self.strict_ivs is True and iv in self.used_ivs:
1047 raise DuplicateIV ("iv %s was reused" % iv_fmt (iv))
1048 # vi has not been used before; add to collection
1049 self.used_ivs.add (iv)
1052 def counters (self):
1054 Access the data counters.
1056 return self.stats ["obj"], self.stats ["in"], self.stats ["out"]
1061 Clear the current context regardless of its finalization state. The
1062 next operation must be ``.next()``.
1067 class Encrypt (Crypto):
1073 def __init__ (self, version, paramversion, password=None, key=None, nacl=None,
1074 counter=AES_GCM_IV_CNT_DATA, strict_ivs=True):
1076 The ctor will throw immediately if one of the parameters does not conform
1077 to our expectations.
1079 counter=AES_GCM_IV_CNT_DATA, strict_ivs=True):
1080 :type version: int to fit uint16_t
1081 :type paramversion: int to fit uint16_t
1082 :param password: mutually exclusive with ``key``
1083 :type password: bytes
1084 :param key: mutually exclusive with ``password``
1087 :type counter: initial object counter the values
1088 ``AES_GCM_IV_CNT_INFOFILE`` and
1089 ``AES_GCM_IV_CNT_INDEX`` are unique in each backup set
1090 and cannot be reused even with different fixed parts.
1091 :type strict_ivs: bool
1093 if password is None and key is None \
1094 or password is not None and key is not None :
1095 raise InvalidParameter ("__init__: need either key or password")
1098 if isinstance (key, bytes) is False:
1099 raise InvalidParameter ("__init__: key must be provided as "
1100 "bytes, not %s" % type (key))
1102 raise InvalidParameter ("__init__: salt must be provided along "
1103 "with encryption key")
1104 else: # password, no key
1105 if isinstance (password, str) is False:
1106 raise InvalidParameter ("__init__: password must be a string, not %s"
1108 if len (password) == 0:
1109 raise InvalidParameter ("__init__: supplied empty password but not "
1110 "permitted for PDT encrypted files")
1112 if isinstance (version, int) is False:
1113 raise InvalidParameter ("__init__: version number must be an "
1114 "integer, not %s" % type (version))
1116 raise InvalidParameter ("__init__: version number must be a "
1117 "nonnegative integer, not %d" % version)
1119 if isinstance (paramversion, int) is False:
1120 raise InvalidParameter ("__init__: crypto parameter version number "
1121 "must be an integer, not %s"
1122 % type (paramversion))
1123 if paramversion < 0:
1124 raise InvalidParameter ("__init__: crypto parameter version number "
1125 "must be a nonnegative integer, not %d"
1128 if nacl is not None:
1129 if isinstance (nacl, bytes) is False:
1130 raise InvalidParameter ("__init__: salt given, but of type %s "
1131 "instead of bytes" % type (nacl))
1132 # salt length would depend on the actual encryption so it can’t be
1133 # validated at this point
1135 self.version = version
1136 self.paramenc = ENCRYPTION_PARAMETERS.get (paramversion) ["enc"]
1138 super().__init__ (password, key, paramversion, nacl, counter=counter,
1139 strict_ivs=strict_ivs)
1142 def next_fixed (self, retries=PDTCRYPT_IV_GEN_MAX_RETRIES):
1144 Generate the next IV fixed part by reading eight bytes from
1145 ``/dev/urandom``. The buffer so obtained is tested against the fixed
1146 parts used so far to prevent accidental reuse of IVs. After a
1147 configurable number of attempts to create a unique fixed part, it will
1148 refuse to continue with an ``IVFixedPartError``. This is unlikely to
1149 ever happen on a normal system but may detect an issue with the random
1152 The list of fixed parts that were used by the context at hand can be
1153 accessed through the ``.fixed`` list. Its last element is the fixed
1154 part currently in use.
1158 fp = os.urandom (PDTCRYPT_IV_FIXEDPART_SIZE)
1159 if fp not in self.fixed:
1160 self.fixed.append (fp)
1163 raise IVFixedPartError ("error obtaining a unique IV fixed part from "
1164 "/dev/urandom; giving up after %d tries" % i)
1169 Construct a 12-bytes IV from the current fixed part and the object
1172 return struct.pack(FMT_I2N_IV, self.fixed [-1], self.cnt)
1175 def next (self, filename=None, counter=None):
1177 Prepare for encrypting the next incoming object. Update the counter
1178 and put together the IV, possibly changing prefixes. Then create the
1181 The argument ``counter`` can be used to specify a file counter for this
1182 object. Unless it is one of the reserved values, the counter of
1183 subsequent objects will be computed from this one.
1185 If this is the first object in a series, ``filename`` is required,
1186 otherwise it is reused if not present. The value is used to derive a
1187 header sized placeholder to use until after encryption when all the
1188 inputs to construct the final header are available. This is then
1189 matched in ``.done()`` against the value found at the position of the
1190 header. The motivation for this extra check is primarily to assist
1191 format debugging: It makes stray headers easy to spot in malformed
1194 if filename is None:
1195 if self.lastinfo is None:
1196 raise InvalidParameter ("next: filename is mandatory for "
1198 filename, _dummy = self.lastinfo
1200 if isinstance (filename, str) is False:
1201 raise InvalidParameter ("next: filename must be a string, no %s"
1203 if counter is not None:
1204 if isinstance (counter, int) is False:
1205 raise InvalidParameter ("next: the supplied counter is of "
1206 "invalid type %s; please pass an "
1207 "integer instead" % type (counter))
1208 self.set_object_counter (counter)
1210 self.iv = self.iv_make ()
1211 if self.paramenc == "aes-gcm":
1213 ( algorithms.AES (self.key)
1214 , modes.GCM (self.iv)
1215 , backend = default_backend ()) \
1217 elif self.paramenc == "passthrough":
1218 self.enc = PassthroughCipher ()
1220 raise InvalidParameter ("next: parameter version %d not known"
1221 % self.paramversion)
1222 hdrdum = hdr_make_dummy (filename)
1223 self.lastinfo = (filename, hdrdum)
1224 super().next (self.password, self.paramversion, self.nacl, self.iv)
1226 self.set_object_counter (self.cnt + 1)
1230 def done (self, cmpdata):
1232 Complete encryption of an object. After this has been called, attempts
1233 of encrypting further data will cause an error until ``.next()`` is
1236 Returns a 64 bytes buffer containing the object header including all
1237 values including the “late” ones e. g. the ciphertext size and the
1240 if isinstance (cmpdata, bytes) is False:
1241 raise InvalidParameter ("done: comparison input expected as bytes, "
1242 "not %s" % type (cmpdata))
1243 if self.lastinfo is None:
1244 raise RuntimeError ("done: encryption context not initialized")
1245 filename, hdrdum = self.lastinfo
1246 if cmpdata != hdrdum:
1247 raise RuntimeError ("done: bad sync of header for object %d: "
1248 "preliminary data does not match; this likely "
1249 "indicates a wrongly repositioned stream"
1251 data = self.enc.finalize ()
1252 self.stats ["out"] += len (data)
1253 self.ctsize += len (data)
1254 ok, hdr = hdr_from_params (self.version, self.paramversion, self.nacl,
1255 self.iv, self.ctsize, self.enc.tag)
1257 raise InternalError ("error constructing header: %r" % hdr)
1258 return data, hdr, self.fixed
1261 def process (self, buf):
1263 Encrypt a chunk of plaintext with the active encryptor. Returns the
1264 size of the input consumed. This **must** be checked downstream. If the
1265 maximum possible object size has been reached, the current context must
1266 be finalized and a new one established before any further data can be
1267 encrypted. The second argument is the remainder of the plaintext that
1268 was not encrypted for the caller to use immediately after the new
1271 if isinstance (buf, bytes) is False:
1272 raise InvalidParameter ("process: expected byte buffer, not %s"
1275 newptsize = self.ptsize + bsize
1276 diff = newptsize - PDTCRYPT_MAX_OBJ_SIZE
1279 newptsize = PDTCRYPT_MAX_OBJ_SIZE
1280 self.ptsize = newptsize
1281 data = super().process (buf [:bsize])
1282 self.ctsize += len (data)
1286 class Decrypt (Crypto):
1288 tag = None # GCM tag, part of header
1289 last_iv = None # check consecutive ivs in strict mode
1291 def __init__ (self, password=None, key=None, counter=None, fixedparts=None,
1294 Sanitizing ctor for the decryption context. ``fixedparts`` specifies a
1295 list of IV fixed parts accepted during decryption. If a fixed part is
1296 encountered that is not in the list, decryption will fail.
1298 :param password: mutually exclusive with ``key``
1299 :type password: bytes
1300 :param key: mutually exclusive with ``password``
1302 :type counter: initial object counter the values
1303 ``AES_GCM_IV_CNT_INFOFILE`` and
1304 ``AES_GCM_IV_CNT_INDEX`` are unique in each backup set
1305 and cannot be reused even with different fixed parts.
1306 :type fixedparts: bytes list
1308 if password is None and key is None \
1309 or password is not None and key is not None :
1310 raise InvalidParameter ("__init__: need either key or password")
1313 if isinstance (key, bytes) is False:
1314 raise InvalidParameter ("__init__: key must be provided as "
1315 "bytes, not %s" % type (key))
1316 else: # password, no key
1317 if isinstance (password, str) is False:
1318 raise InvalidParameter ("__init__: password must be a string, not %s"
1320 if len (password) == 0:
1321 raise InvalidParameter ("__init__: supplied empty password but not "
1322 "permitted for PDT encrypted files")
1324 if fixedparts is not None:
1325 if isinstance (fixedparts, list) is False:
1326 raise InvalidParameter ("__init__: IV fixed parts must be "
1327 "supplied as list, not %s"
1328 % type (fixedparts))
1329 self.fixed = fixedparts
1332 super().__init__ (password=password, key=key, counter=counter,
1333 strict_ivs=strict_ivs)
1336 def valid_fixed_part (self, iv):
1338 Check if a fixed part was already seen.
1340 # check if fixed part is known
1341 fixed, _cnt = struct.unpack (FMT_I2N_IV, iv)
1342 i = bisect.bisect_left (self.fixed, fixed)
1343 return i != len (self.fixed) and self.fixed [i] == fixed
1346 def check_consecutive_iv (self, iv):
1348 Check whether the counter part of the given IV is indeed the successor
1349 of the currently present counter. This should always be the case for
1350 the objects in a well formed PDT archive but should not be enforced
1351 when decrypting out-of-order.
1353 fixed, cnt = struct.unpack (FMT_I2N_IV, iv)
1354 if self.strict_ivs is True \
1355 and self.last_iv is not None \
1356 and self.last_iv [0] == fixed \
1357 and self.last_iv [1] != cnt - 1:
1358 raise NonConsecutiveIV ("iv %s counter not successor of "
1359 "last object (expected %d, found %d)"
1360 % (fixed, iv_fmt (self.last_iv [1]), cnt))
1361 self.last_iv = (iv, cnt)
1364 def next (self, hdr):
1366 Start decrypting the next object. The PDTCRYPT header for the object
1367 can be given either as already parsed object or as bytes.
1369 if isinstance (hdr, bytes) is True:
1370 hdr = hdr_read (hdr)
1371 elif isinstance (hdr, dict) is False:
1372 # this won’t catch malformed specs though
1373 raise InvalidParameter ("next: wrong type of parameter hdr: "
1374 "expected bytes or spec, got %s"
1377 paramversion = hdr ["paramversion"]
1382 raise InvalidHeader ("next: not a header %r" % hdr)
1384 super().next (self.password, paramversion, nacl, iv)
1385 if self.fixed is not None and self.valid_fixed_part (iv) is False:
1386 raise InvalidIVFixedPart ("iv %s has invalid fixed part"
1388 self.check_consecutive_iv (iv)
1391 defs = ENCRYPTION_PARAMETERS.get (paramversion, None)
1393 raise FormatError ("header contains unknown parameter version %d; "
1394 "maybe the file was created by a more recent "
1395 "version of Deltatar" % paramversion)
1397 if enc == "aes-gcm":
1399 ( algorithms.AES (self.key)
1400 , modes.GCM (iv, tag=self.tag)
1401 , backend = default_backend ()) \
1403 elif enc == "passthrough":
1404 self.enc = PassthroughCipher ()
1406 raise InternalError ("encryption parameter set %d refers to unknown "
1407 "mode %r" % (paramversion, enc))
1408 self.set_object_counter (self.cnt + 1)
1411 def done (self, tag=None):
1413 Stop decryption of the current object and finalize it with the active
1414 context. This will throw an *InvalidGCMTag* exception to indicate that
1415 the authentication tag does not match the data. If the tag is correct,
1416 the rest of the plaintext is returned.
1421 data = self.enc.finalize ()
1423 if isinstance (tag, bytes) is False:
1424 raise InvalidParameter ("done: wrong type of parameter "
1425 "tag: expected bytes, got %s"
1427 data = self.enc.finalize_with_tag (self.tag)
1428 except cryptography.exceptions.InvalidTag:
1429 raise InvalidGCMTag ("done: tag mismatch of object %d: %s "
1430 "rejected by finalize ()"
1431 % (self.cnt, binascii.hexlify (self.tag)))
1432 self.ctsize += len (data)
1433 self.stats ["out"] += len (data)
1437 def process (self, buf):
1439 Decrypt the bytes object *buf* with the active decryptor.
1441 if isinstance (buf, bytes) is False:
1442 raise InvalidParameter ("process: expected byte buffer, not %s"
1444 self.ctsize += len (buf)
1445 data = super().process (buf)
1446 self.ptsize += len (data)
1450 ###############################################################################
1452 ###############################################################################
1454 def _patch_global (glob, vow, n=None):
1456 Adapt upper file counter bound for testing IV logic. Completely unsafe.
1458 assert vow == "I am fully aware that this will void my warranty."
1459 r = globals () [glob]
1461 n = globals () [glob + "_DEFAULT"]
1462 globals () [glob] = n
1465 _testing_set_AES_GCM_IV_CNT_MAX = \
1466 partial (_patch_global, "AES_GCM_IV_CNT_MAX")
1468 _testing_set_PDTCRYPT_MAX_OBJ_SIZE = \
1469 partial (_patch_global, "PDTCRYPT_MAX_OBJ_SIZE")
1471 def open2_dump_file (fname, dir_fd, force=False):
1474 oflags = os.O_CREAT | os.O_WRONLY
1476 oflags |= os.O_TRUNC
1481 outfd = os.open (fname, oflags,
1482 stat.S_IRUSR | stat.S_IWUSR, dir_fd=dir_fd)
1483 except FileExistsError as exn:
1484 noise ("PDT: refusing to overwrite existing file %s" % fname)
1486 raise RuntimeError ("destination file %s already exists" % fname)
1487 if PDTCRYPT_VERBOSE is True:
1488 noise ("PDT: new output file %s (fd=%d)" % (fname, outfd))
1492 ###############################################################################
1493 ## freestanding invocation
1494 ###############################################################################
1496 PDTCRYPT_SUB_PROCESS = 0
1497 PDTCRYPT_SUB_SCRYPT = 1
1498 PDTCRYPT_SUB_SCAN = 2
1501 { "process" : PDTCRYPT_SUB_PROCESS
1502 , "scrypt" : PDTCRYPT_SUB_SCRYPT
1503 , "scan" : PDTCRYPT_SUB_SCAN }
1505 PDTCRYPT_DECRYPT = 1 << 0 # decrypt archive with password
1506 PDTCRYPT_SPLIT = 1 << 1 # split archive into individual objects
1507 PDTCRYPT_HASH = 1 << 2 # output scrypt hash for file and given password
1509 PDTCRYPT_SPLITNAME = "pdtcrypt-object-%d.bin"
1510 PDTCRYPT_RESCUENAME = "pdtcrypt-rescue-object-%0.5d.bin"
1512 PDTCRYPT_VERBOSE = False
1513 PDTCRYPT_STRICTIVS = False
1514 PDTCRYPT_OVERWRITE = False
1515 PDTCRYPT_BLOCKSIZE = 1 << 12
1520 PDTCRYPT_DEFAULT_VER = 1
1521 PDTCRYPT_DEFAULT_PVER = 1
1523 # scrypt hashing output control
1524 PDTCRYPT_SCRYPT_INTRANATOR = 0
1525 PDTCRYPT_SCRYPT_PARAMETERS = 1
1526 PDTCRYPT_SCRYPT_DEFAULT = PDTCRYPT_SCRYPT_INTRANATOR
1528 PDTCRYPT_SCRYPT_FORMAT = \
1529 { "i2n" : PDTCRYPT_SCRYPT_INTRANATOR
1530 , "params" : PDTCRYPT_SCRYPT_PARAMETERS }
1532 PDTCRYPT_TT_COLUMNS = 80 # assume standard terminal
1534 class PDTDecryptionError (Exception):
1535 """Decryption failed."""
1537 class PDTSplitError (Exception):
1538 """Decryption failed."""
1541 def noise (*a, **b):
1542 print (file=sys.stderr, *a, **b)
1545 class PassthroughDecryptor (object):
1547 curhdr = None # write current header on first data write
1549 def __init__ (self):
1550 if PDTCRYPT_VERBOSE is True:
1551 noise ("PDT: no encryption; data passthrough")
1553 def next (self, hdr):
1554 ok, curhdr = hdr_make (hdr)
1556 raise PDTDecryptionError ("bad header %r" % hdr)
1557 self.curhdr = curhdr
1560 if self.curhdr is not None:
1564 def process (self, d):
1565 if self.curhdr is not None:
1571 def depdtcrypt (mode, secret, ins, outs):
1573 Remove PDTCRYPT layer from all objects encrypted with the secret. Used on a
1574 Deltatar backup this will yield a (possibly Gzip compressed) tarball.
1576 ctleft = -1 # length of ciphertext to consume
1577 ctcurrent = 0 # total ciphertext of current object
1578 total_obj = 0 # total number of objects read
1579 total_pt = 0 # total plaintext bytes
1580 total_ct = 0 # total ciphertext bytes
1581 total_read = 0 # total bytes read
1582 outfile = None # Python file object for output
1584 if mode & PDTCRYPT_DECRYPT: # decryptor
1586 if ks == PDTCRYPT_SECRET_PW:
1587 decr = Decrypt (password=secret [1], strict_ivs=PDTCRYPT_STRICTIVS)
1588 elif ks == PDTCRYPT_SECRET_KEY:
1590 decr = Decrypt (key=key, strict_ivs=PDTCRYPT_STRICTIVS)
1592 raise InternalError ("‘%d’ does not specify a valid kind of secret"
1595 decr = PassthroughDecryptor ()
1598 """Dummy for non-split mode: output file does not vary."""
1601 if mode & PDTCRYPT_SPLIT:
1602 def nextout (outfile):
1604 We were passed an fd as outs for accessing the destination
1605 directory where extracted archive components are supposed
1610 if PDTCRYPT_VERBOSE is True:
1611 noise ("PDT: no output file to close at this point")
1613 if PDTCRYPT_VERBOSE is True:
1614 noise ("PDT: release output file %r" % outfile)
1615 # cleanup happens automatically by the GC; the next
1616 # line will error out on account of an invalid fd
1619 assert total_obj > 0
1620 fname = PDTCRYPT_SPLITNAME % total_obj
1622 outfd = open2_dump_file (fname, outs, force=PDTCRYPT_OVERWRITE)
1623 except RuntimeError as exn:
1624 raise PDTSplitError (exn)
1625 return os.fdopen (outfd, "wb", closefd=True)
1629 """ESPIPE is normal on non-seekable stdio stream."""
1632 except OSError as exn:
1633 if exn.errno == errno.ESPIPE:
1636 def out (pt, outfile):
1640 if PDTCRYPT_VERBOSE is True:
1641 noise ("PDT:\t· decrypt plaintext %d B" % (npt))
1643 nn = outfile.write (pt)
1644 except OSError as exn: # probably ENOSPC
1645 raise DecryptionError ("error (%s)" % exn)
1647 raise DecryptionError ("write aborted after %d of %d B" % (nn, npt))
1651 # current object completed; in a valid archive this marks either
1652 # the start of a new header or the end of the input
1653 if ctleft == 0: # current object requires finalization
1654 if PDTCRYPT_VERBOSE is True:
1655 noise ("PDT: %d finalize" % tell (ins))
1658 except InvalidGCMTag as exn:
1659 raise DecryptionError ("error finalizing object %d (%d B): "
1660 "%r" % (total_obj, len (pt), exn)) \
1663 if PDTCRYPT_VERBOSE is True:
1664 noise ("PDT:\t· object validated")
1666 if PDTCRYPT_VERBOSE is True:
1667 noise ("PDT: %d hdr" % tell (ins))
1669 hdr = hdr_read_stream (ins)
1670 total_read += PDTCRYPT_HDR_SIZE
1671 except EndOfFile as exn:
1672 total_read += exn.remainder
1673 if total_ct + total_obj * PDTCRYPT_HDR_SIZE != total_read:
1674 raise PDTDecryptionError ("ciphertext processed (%d B) plus "
1675 "overhead (%d × %d B) does not match "
1676 "the number of bytes read (%d )"
1677 % (total_ct, total_obj, PDTCRYPT_HDR_SIZE,
1679 # the single good exit
1680 return total_read, total_obj, total_ct, total_pt
1681 except InvalidHeader as exn:
1682 raise PDTDecryptionError ("invalid header at position %d in %r "
1683 "(%s)" % (tell (ins), exn, ins))
1684 if PDTCRYPT_VERBOSE is True:
1685 pretty = hdr_fmt_pretty (hdr)
1686 noise (reduce (lambda a, e: (a + "\n" if a else "") + "PDT:\t· " + e,
1687 pretty.splitlines (), ""))
1688 ctcurrent = ctleft = hdr ["ctsize"]
1692 total_obj += 1 # used in file counter with split mode
1694 # finalization complete or skipped in case of first object in
1695 # stream; create a new output file if necessary
1696 outfile = nextout (outfile)
1698 if PDTCRYPT_VERBOSE is True:
1699 noise ("PDT: %d decrypt obj no. %d, %d B"
1700 % (tell (ins), total_obj, ctleft))
1702 # always allocate a new buffer since python-cryptography doesn’t allow
1703 # passing a bytearray :/
1704 nexpect = min (ctleft, PDTCRYPT_BLOCKSIZE)
1705 if PDTCRYPT_VERBOSE is True:
1706 noise ("PDT:\t· [%d] %d%% done, read block (%d B of %d B remaining)"
1708 100 - ctleft * 100 / (ctcurrent > 0 and ctcurrent or 1),
1710 ct = ins.read (nexpect)
1714 raise EndOfFile (nct,
1715 "hit EOF after %d of %d B in block [%d:%d); "
1716 "%d B ciphertext remaining for object no %d"
1717 % (nct, nexpect, off, off + nexpect, ctleft,
1723 if PDTCRYPT_VERBOSE is True:
1724 noise ("PDT:\t· decrypt ciphertext %d B" % (nct))
1725 pt = decr.process (ct)
1729 def deptdcrypt_mk_stream (kind, path):
1730 """Create stream from file or stdio descriptor."""
1731 if kind == PDTCRYPT_SINK:
1733 if PDTCRYPT_VERBOSE is True: noise ("PDT: sink: stdout")
1734 return sys.stdout.buffer
1736 if PDTCRYPT_VERBOSE is True: noise ("PDT: sink: file %s" % path)
1737 return io.FileIO (path, "w")
1738 if kind == PDTCRYPT_SOURCE:
1740 if PDTCRYPT_VERBOSE is True: noise ("PDT: source: stdin")
1741 return sys.stdin.buffer
1743 if PDTCRYPT_VERBOSE is True: noise ("PDT: source: file %s" % path)
1744 return io.FileIO (path, "r")
1746 raise ValueError ("bogus stream “%s” / %s" % (kind, path))
1749 def mode_depdtcrypt (mode, secret, ins, outs):
1751 total_read, total_obj, total_ct, total_pt = \
1752 depdtcrypt (mode, secret, ins, outs)
1753 except DecryptionError as exn:
1754 noise ("PDT: Decryption failed:")
1756 noise ("PDT: “%s”" % exn)
1758 noise ("PDT: Did you specify the correct key / password?")
1761 except PDTSplitError as exn:
1762 noise ("PDT: Split operation failed:")
1764 noise ("PDT: “%s”" % exn)
1766 noise ("PDT: Hint: target directory should be empty.")
1770 if PDTCRYPT_VERBOSE is True:
1771 noise ("PDT: decryption successful" )
1772 noise ("PDT: %.10d bytes read" % total_read)
1773 noise ("PDT: %.10d objects decrypted" % total_obj )
1774 noise ("PDT: %.10d bytes ciphertext" % total_ct )
1775 noise ("PDT: %.10d bytes plaintext" % total_pt )
1781 def mode_scrypt (pw, ins=None, nacl=None, fmt=PDTCRYPT_SCRYPT_INTRANATOR):
1783 paramversion = PDTCRYPT_DEFAULT_PVER
1785 hsh, nacl, version, paramversion = scrypt_hashsource (pw, ins)
1786 defs = ENCRYPTION_PARAMETERS.get(paramversion, None)
1788 nacl = binascii.unhexlify (nacl)
1789 defs = ENCRYPTION_PARAMETERS.get(paramversion, None)
1790 version = PDTCRYPT_DEFAULT_VER
1792 kdfname, params = defs ["kdf"]
1794 kdf = kdf_by_version (None, defs)
1795 hsh, _void = kdf (pw, nacl)
1799 if fmt == PDTCRYPT_SCRYPT_INTRANATOR:
1800 out = json.dumps ({ "salt" : base64.b64encode (nacl).decode ()
1801 , "key" : base64.b64encode (hsh) .decode ()
1802 , "paramversion" : paramversion })
1803 elif fmt == PDTCRYPT_SCRYPT_PARAMETERS:
1804 out = json.dumps ({ "salt" : binascii.hexlify (nacl).decode ()
1805 , "key" : binascii.hexlify (hsh) .decode ()
1806 , "version" : version
1807 , "scrypt_params" : { "N" : params ["N"]
1808 , "r" : params ["r"]
1809 , "p" : params ["p"]
1810 , "dkLen" : params ["dkLen"] } })
1812 raise RuntimeError ("bad scrypt output scheme %r" % fmt)
1817 def noise_output_candidates (cands, indent=8, cols=PDTCRYPT_TT_COLUMNS):
1819 Print a list of offsets without garbling the terminal too much.
1821 The indent is counted from column zero; if it is wide enough, the “PDT: ”
1822 marker will be prepended, considered part of the indentation.
1826 idt = " " * indent if indent < 5 else "PDT: " + " " * (indent - 5)
1831 init = True # prevent leading separator
1834 raise ValueError ("the requested indentation exceeds the line "
1835 "width by %d" % (indent - wd))
1845 if lpos > wd: # line break
1861 SLICE_START = 1 # ordering is important to have starts of intervals
1862 SLICE_END = 0 # sorted before equal ends
1864 def find_overlaps (slices):
1866 Find overlapping slices: iterate open/close points of intervals, tracking
1867 the ones open at any time.
1870 inside = set () # of indices into bounds
1871 ovrlp = set () # of indices into bounds
1873 for i, s in enumerate (slices):
1874 bounds.append ((s [0], SLICE_START, i))
1875 bounds.append ((s [1], SLICE_END , i))
1876 bounds = sorted (bounds)
1880 if val [1] == SLICE_START:
1883 if len (inside) > 1: # closing one that overlapped
1887 return [ slices [i] for i in ovrlp ]
1890 def mode_scan (secret, fname, outs=None, nacl=None):
1892 Dissect a binary file, looking for PDTCRYPT headers and objects.
1894 If *outs* is supplied, recoverable data will be dumped into the specified
1898 ifd = os.open (fname, os.O_RDONLY)
1899 except FileNotFoundError:
1900 noise ("PDT: failed to open %s readonly" % fname)
1905 if PDTCRYPT_VERBOSE is True:
1906 noise ("PDT: scan for potential sync points")
1907 cands = locate_hdr_candidates (ifd)
1908 if len (cands) == 0:
1909 noise ("PDT: scan complete: input does not contain potential PDT "
1910 "headers; giving up.")
1912 if PDTCRYPT_VERBOSE is True:
1913 noise ("PDT: scan complete: found %d candidates:" % len (cands))
1914 noise_output_candidates (cands)
1919 junk, todo, slices = [], [], []
1924 vdt, hdr = inspect_hdr (ifd, cand)
1926 vdts = verdict_fmt (vdt)
1928 if vdt == HDR_CAND_JUNK:
1929 noise ("PDT: obj %d: %s object: bad header, skipping" % vdts)
1932 off0 = cand + PDTCRYPT_HDR_SIZE
1933 if PDTCRYPT_VERBOSE is True:
1934 noise ("PDT: obj %d: read payload @%d" % (nobj, off0))
1935 pretty = hdr_fmt_pretty (hdr)
1936 noise (reduce (lambda a, e: (a + "\n" if a else "") + "PDT:\t· " + e,
1937 pretty.splitlines (), ""))
1940 if outs is not None:
1941 ofname = PDTCRYPT_RESCUENAME % nobj
1942 ofd = open2_dump_file (ofname, outs, force=PDTCRYPT_OVERWRITE)
1944 ctsize = hdr ["ctsize"]
1946 l = try_decrypt (ifd, off0, hdr, secret, ofd=ofd)
1948 slices.append ((off0, off0 + l))
1952 if vdt == HDR_CAND_GOOD and ok is True:
1953 noise ("PDT: %d → ✓ %s object %d–%d"
1954 % (cand, vdts, off0, off0 + ctsize))
1955 elif vdt == HDR_CAND_FISHY and ok is True:
1956 noise ("PDT: %d → × %s object %d–%d, corrupt header"
1957 % (cand, vdts, off0, off0 + ctsize))
1958 elif vdt == HDR_CAND_GOOD and ok is False:
1959 noise ("PDT: %d → × %s object %d–%d, problematic payload"
1960 % (cand, vdts, off0, off0 + ctsize))
1961 elif vdt == HDR_CAND_FISHY and ok is False:
1962 noise ("PDT: %d → × %s object %d–%d, corrupt header, problematic "
1963 "ciphertext" % (cand, vdts, off0, off0 + ctsize))
1970 noise ("PDT: all headers ok")
1972 noise ("PDT: %d candidates not parseable as headers:" % len (junk))
1973 noise_output_candidates (junk)
1975 overlap = find_overlaps (slices)
1976 if len (overlap) > 0:
1977 noise ("PDT: %d objects overlapping others" % len (overlap))
1978 for slice in overlap:
1979 noise ("PDT: × %d→%d" % (slice [0], slice [1]))
1981 def usage (err=False):
1985 indent = ' ' * len (SELF)
1986 out ("usage: %s SUBCOMMAND { --help" % SELF)
1987 out (" %s | [ -v ] { -p PASSWORD | -k KEY }" % indent)
1988 out (" %s [ { -i | --in } { - | SOURCE } ]" % indent)
1989 out (" %s [ { -n | --nacl } { SALT } ]" % indent)
1990 out (" %s [ { -o | --out } { - | DESTINATION } ]" % indent)
1991 out (" %s [ -D | --no-decrypt ] [ -S | --split ]" % indent)
1992 out (" %s [ -f | --format ]" % indent)
1995 out ("\t\tSUBCOMMAND main mode: { process | scrypt }")
1997 out ("\t\t process: extract objects from PDT archive")
1998 out ("\t\t scrypt: calculate hash from password and first object")
1999 out ("\t\t-p PASSWORD password to derive the encryption key from")
2000 out ("\t\t-k KEY encryption key as 16 bytes in hexadecimal notation")
2001 out ("\t\t-s enforce strict handling of initialization vectors")
2002 out ("\t\t-i SOURCE file name to read from")
2003 out ("\t\t-o DESTINATION file to write output to")
2004 out ("\t\t-n SALT provide salt for scrypt mode in hex encoding")
2005 out ("\t\t-v print extra info")
2006 out ("\t\t-S split into files at object boundaries; this")
2007 out ("\t\t requires DESTINATION to refer to directory")
2008 out ("\t\t-D PDT header and ciphertext passthrough")
2009 out ("\t\t-f format of SCRYPT hash output (“default” or “parameters”)")
2011 out ("\tinstead of filenames, “-” may used to specify stdin / stdout")
2013 sys.exit ((err is True) and 42 or 0)
2023 def parse_argv (argv):
2024 global PDTCRYPT_OVERWRITE
2026 mode = PDTCRYPT_DECRYPT
2032 scrypt_format = PDTCRYPT_SCRYPT_DEFAULT
2035 SELF = os.path.basename (next (argvi))
2038 rawsubcmd = next (argvi)
2039 subcommand = PDTCRYPT_SUB [rawsubcmd]
2040 except StopIteration:
2041 bail ("ERROR: subcommand required")
2043 bail ("ERROR: invalid subcommand “%s” specified" % rawsubcmd)
2049 except StopIteration:
2050 bail ("ERROR: argument list incomplete")
2052 def checked_secret (s):
2057 bail ("ERROR: encountered “%s” but secret already given" % arg)
2060 if arg in [ "-h", "--help" ]:
2063 elif arg in [ "-v", "--verbose", "--wtf" ]:
2064 global PDTCRYPT_VERBOSE
2065 PDTCRYPT_VERBOSE = True
2066 elif arg in [ "-i", "--in", "--source" ]:
2067 insspec = checked_arg ()
2068 if PDTCRYPT_VERBOSE is True: noise ("PDT: decrypt from %s" % insspec)
2069 elif arg in [ "-p", "--password" ]:
2070 arg = checked_arg ()
2071 checked_secret (make_secret (password=arg))
2072 if PDTCRYPT_VERBOSE is True: noise ("PDT: decrypting with password")
2074 if subcommand == PDTCRYPT_SUB_PROCESS:
2075 if arg in [ "-s", "--strict-ivs" ]:
2076 global PDTCRYPT_STRICTIVS
2077 PDTCRYPT_STRICTIVS = True
2078 elif arg in [ "-o", "--out", "--dest", "--sink" ]:
2079 outsspec = checked_arg ()
2080 if PDTCRYPT_VERBOSE is True: noise ("PDT: decrypt to %s" % outsspec)
2081 elif arg in [ "-f", "--force" ]:
2082 PDTCRYPT_OVERWRITE = True
2083 if PDTCRYPT_VERBOSE is True: noise ("PDT: overwrite existing files")
2084 elif arg in [ "-S", "--split" ]:
2085 mode |= PDTCRYPT_SPLIT
2086 if PDTCRYPT_VERBOSE is True: noise ("PDT: split files")
2087 elif arg in [ "-D", "--no-decrypt" ]:
2088 mode &= ~PDTCRYPT_DECRYPT
2089 if PDTCRYPT_VERBOSE is True: noise ("PDT: not decrypting")
2090 elif arg in [ "-k", "--key" ]:
2091 arg = checked_arg ()
2092 checked_secret (make_secret (key=arg))
2093 if PDTCRYPT_VERBOSE is True: noise ("PDT: decrypting with key")
2095 bail ("ERROR: unexpected positional argument “%s”" % arg)
2096 elif subcommand == PDTCRYPT_SUB_SCRYPT:
2097 if arg in [ "-n", "--nacl", "--salt" ]:
2098 nacl = checked_arg ()
2099 if PDTCRYPT_VERBOSE is True: noise ("PDT: salt key with %s" % nacl)
2100 elif arg in [ "-f", "--format" ]:
2101 arg = checked_arg ()
2103 scrypt_format = PDTCRYPT_SCRYPT_FORMAT [arg]
2105 bail ("ERROR: invalid scrypt output format %s" % arg)
2106 if PDTCRYPT_VERBOSE is True:
2107 noise ("PDT: scrypt output format “%s”" % scrypt_format)
2109 bail ("ERROR: unexpected positional argument “%s”" % arg)
2110 elif subcommand == PDTCRYPT_SUB_SCAN:
2111 if arg in [ "-o", "--out", "--dest", "--sink" ]:
2112 outsspec = checked_arg ()
2113 if PDTCRYPT_VERBOSE is True: noise ("PDT: decrypt to %s" % outsspec)
2114 elif arg in [ "-f", "--force" ]:
2115 PDTCRYPT_OVERWRITE = True
2116 if PDTCRYPT_VERBOSE is True: noise ("PDT: overwrite existing files")
2118 bail ("ERROR: unexpected positional argument “%s”" % arg)
2121 if PDTCRYPT_VERBOSE is True:
2122 noise ("ERROR: no password or key specified, trying $PDTCRYPT_PASSWORD")
2123 epw = os.getenv ("PDTCRYPT_PASSWORD")
2125 checked_secret (make_secret (password=epw.strip ()))
2128 if PDTCRYPT_VERBOSE is True:
2129 noise ("ERROR: no password or key specified, trying $PDTCRYPT_KEY")
2130 ek = os.getenv ("PDTCRYPT_KEY")
2132 checked_secret (make_secret (key=ek.strip ()))
2135 if subcommand == PDTCRYPT_SUB_SCRYPT:
2136 bail ("ERROR: scrypt hash mode requested but no password given")
2137 elif mode & PDTCRYPT_DECRYPT:
2138 bail ("ERROR: decryption requested but no password given")
2140 if mode & PDTCRYPT_SPLIT and outsspec is None:
2141 bail ("ERROR: split mode is incompatible with stdout sink "
2144 if subcommand == PDTCRYPT_SUB_SCAN and outsspec is None:
2145 pass # no output by default in scan mode
2146 elif mode & PDTCRYPT_SPLIT or subcommand == PDTCRYPT_SUB_SCAN:
2147 # destination must be directory
2149 bail ("ERROR: mode is incompatible with stdout sink")
2152 os.makedirs (outsspec, 0o700)
2153 except FileExistsError:
2154 # if it’s a directory with appropriate perms, everything is
2155 # good; otherwise, below invocation of open(2) will fail
2157 outs = os.open (outsspec, os.O_DIRECTORY, 0o600)
2158 except FileNotFoundError as exn:
2159 bail ("ERROR: cannot create target directory “%s”" % outsspec)
2160 except NotADirectoryError as exn:
2161 bail ("ERROR: target path “%s” is not a directory" % outsspec)
2163 outs = deptdcrypt_mk_stream (PDTCRYPT_SINK, outsspec or "-")
2165 if subcommand == PDTCRYPT_SUB_SCAN:
2167 bail ("ERROR: please supply an input file for scanning")
2169 bail ("ERROR: input must be seekable; please specify a file")
2170 return True, partial (mode_scan, secret, insspec, outs, nacl=nacl)
2172 if subcommand == PDTCRYPT_SUB_SCRYPT:
2173 if secret [0] == PDTCRYPT_SECRET_KEY:
2174 bail ("ERROR: scrypt mode requires a password")
2175 if insspec is not None and nacl is not None \
2176 or insspec is None and nacl is None :
2177 bail ("ERROR: please supply either an input file or "
2182 if insspec is not None or subcommand != PDTCRYPT_SUB_SCRYPT:
2183 ins = deptdcrypt_mk_stream (PDTCRYPT_SOURCE, insspec or "-")
2185 if subcommand == PDTCRYPT_SUB_SCRYPT:
2186 return True, partial (mode_scrypt, secret [1].encode (), ins, nacl,
2189 return True, partial (mode_depdtcrypt, mode, secret, ins, outs)
2193 ok, runner = parse_argv (argv)
2195 if ok is True: return runner ()
2200 if __name__ == "__main__":
2201 sys.exit (main (sys.argv))