6 ===============================================================================
7 crypto -- Encryption Layer for the Deltatar Backup
8 ===============================================================================
12 - AES-GCM for the symmetric encryption;
17 - NIST Recommendation for Block Cipher Modes of Operation: Galois/Counter
19 http://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-38d.pdf
22 https://cryptome.org/2014/01/aes-gcm-v1.pdf
24 - Authentication weaknesses in GCM
25 http://csrc.nist.gov/groups/ST/toolkit/BCM/documents/comments/CWC-GCM/Ferguson2.pdf
28 -------------------------------------------------------------------------------
30 Errors fall into roughly three categories:
32 - Cryptographical errors or invalid data.
34 - ``InvalidGCMTag`` (decryption failed on account of an invalid GCM
36 - ``InvalidIVFixedPart`` (IV fixed part of object not found in list),
37 - ``DuplicateIV`` (the IV of an encrypted object already occurred),
38 - ``DecryptionError`` (used in CLI decryption for presenting error
39 conditions to the user).
41 - Incorrect usage of the library.
43 - ``InvalidParameter`` (non-conforming user supplied parameter),
44 - ``InvalidHeader`` (data passed for reading not parsable into header),
45 - ``FormatError`` (cannot handle header or parameter version),
48 - Bad internal state. If one of these is encountered it means that a state
49 was reached that shouldn’t occur during normal processing.
54 Also, ``EndOfFile`` is used as a sentinel to communicate that a stream supplied
55 for reading is exhausted.
57 Initialization Vectors
58 -------------------------------------------------------------------------------
60 Initialization vectors are checked for reuse during the lifetime of a decryptor.
61 The fixed counters for metadata files cannot be reused and attempts to do so
62 will cause a DuplicateIV error. This means the length of objects encrypted with
63 a metadata counter is capped at 63 GB.
65 For ordinary, non-metadata payload, there is an optional mode with strict IV
66 checking that causes a crypto context to fail if an IV encountered or created
67 was already used for decrypting or encrypting, respectively, an earlier object.
68 Note that this mode can trigger false positives when decrypting non-linearly,
69 e. g. when traversing the same object multiple times. Since the crypto context
70 has no notion of a position in a PDT encrypted archive, this condition must be
71 sorted out downstream.
73 When encrypting with more than one Encrypt context special care must be taken
74 to prevent accidental reuse of IVs. The builtin protection against reuse is
75 only effective for objects encrypted with the same Encrypt handle. If multiple
76 Encrypt handles are used to encrypt with the same combination of password and
77 salt, the encryption becomes susceptible to birthday attacks (bound = 2^32 due
78 to the 64-bit random iv). Thus the use of multiple handles is discouraged.
82 -------------------------------------------------------------------------------
84 ``crypto.py`` may be invoked as a script for decrypting, validating, and
85 splitting PDT encrypted files. Consult the usage message for details.
89 Decrypt from stdin using the password ‘foo’: ::
91 $ crypto.py process foo -i - -o - <some-file.tar.gz.pdtcrypt >some-file.tar.gz
93 Output verbose information about the encrypted objects in the archive: ::
95 $ crypto.py process foo -v -i some-file.tar.gz.pdtcrypt -o /dev/null
96 PDT: decrypt from some-file.tar.gz.pdtcrypt
97 PDT: decrypt to /dev/null
98 PDT: source: file some-file.tar.gz.pdtcrypt
99 PDT: sink: file /dev/null
101 PDT: · version = 1 : 0100
102 PDT: · paramversion = 1 : 0100
103 PDT: · nacl : d270 b031 00d1 87e2 c946 610d 7b7f 7e5f
104 PDT: · iv : 02ee 3dd7 a963 1eb1 0100 0000
105 PDT: · ctsize = 591 : 4f02 0000 0000 0000
106 PDT: · tag : 5b2d 6d8b 8f82 4842 12fd 0b10 b6e3 369b
107 PDT: 64 decrypt obj no. 1, 591 B
108 PDT: · [64] 0% done, read block (591 B of 591 B remaining)
109 PDT: · decrypt ciphertext 591 B
110 PDT: · decrypt plaintext 591 B
114 Also, the mode *scrypt* allows deriving encryption keys. To calculate the
115 encryption key from the password ‘foo’ and the salt of the first object in a
116 PDT encrypted file: ::
118 $ crypto.py scrypt foo -i some-file.pdtcrypt
119 {"paramversion": 1, "salt": "Cqzbk48e3peEjzWto8D0yA==", "key": "JH9EkMwaM4x9F5aim5gK/Q=="}
121 The computed 16 byte key is given in hexadecimal notation in the value to
122 ``hash`` and can be fed into Python’s ``binascii.unhexlify()`` to obtain the
123 corresponding binary representation.
125 Note that in Scrypt hashing mode, no data integrity checks are being performed.
126 If the wrong password is given, a wrong key will be derived. Whether the password
127 was indeed correct can only be determined by decrypting. Note that since PDT
128 archives essentially consist of a stream of independent objects, the salt and
129 other parameters may change. Thus a key derived using above method from the
130 first object doesn’t necessarily apply to any of the subsequent objects.
139 from functools import reduce, partial
150 except ImportError as exn:
153 if __name__ == "__main__": ## Work around the import mechanism lest Python’s
154 pwd = os.getcwd() ## preference for local imports causes a cyclical
155 ## import (crypto → pylibscrypt → […] → ./tarfile → crypto).
156 sys.path = [ p for p in sys.path if p.find ("deltatar") < 0 ]
159 from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes
160 from cryptography.hazmat.backends import default_backend
164 __all__ = [ "hdr_make", "hdr_read", "hdr_fmt", "hdr_fmt_pretty"
166 , "PDTCRYPT_HDR_SIZE", "AES_GCM_IV_CNT_DATA"
167 , "AES_GCM_IV_CNT_INFOFILE", "AES_GCM_IV_CNT_INDEX"
171 ###############################################################################
173 ###############################################################################
175 class EndOfFile (Exception):
179 def __init__ (self, n=None, msg=None):
185 class InvalidParameter (Exception):
186 """Inputs not valid for PDT encryption."""
190 class InvalidHeader (Exception):
191 """Header not valid."""
195 class InvalidGCMTag (Exception):
197 The GCM tag calculated during decryption differs from that in the object
203 class InvalidIVFixedPart (Exception):
205 IV fixed part not in supplied list: either the backup is corrupt or the
206 current object does not belong to it.
211 class IVFixedPartError (Exception):
213 Error creating a unique IV fixed part: repeated calls to system RNG yielded
214 the same sequence of bytes as the last IV used.
219 class InvalidFileCounter (Exception):
221 When encrypting, an attempted reuse of a dedicated counter (info file,
222 index file) was caught.
227 class DuplicateIV (Exception):
229 During encryption, the current IV fixed part is identical to an already
230 existing IV (same prefix and file counter). This indicates tampering or
231 programmer error and cannot be recovered from.
236 class NonConsecutiveIV (Exception):
238 IVs not numbered consecutively. This is a hard error with strict IV
239 checking. Precludes random access to the encrypted objects.
244 class CiphertextTooLong (Exception):
246 An attempt was made to decrypt more data than the ciphertext size declared
247 in the object header.
252 class FormatError (Exception):
253 """Unusable parameters in header."""
257 class DecryptionError (Exception):
258 """Error during decryption with ``crypto.py`` on the command line."""
262 class Unreachable (Exception):
264 Makeshift __builtin_unreachable(); always a programmer error if
270 class InternalError (Exception):
271 """Errors not ascribable to bad user inputs or cryptography."""
275 ###############################################################################
276 ## crypto layer version
277 ###############################################################################
279 ENCRYPTION_PARAMETERS = \
281 { "kdf": ("dummy", 16)
282 , "enc": "passthrough" }
290 , "enc": "aes-gcm" } }
292 # Mode zero is unencrypted and only provided for testing purposes. nless
293 # the encryptor / decryptor are explicitly instructed to do so.
294 MIN_SECURE_PARAMETERS = 1
296 ###############################################################################
298 ###############################################################################
300 PDTCRYPT_HDR_MAGIC = b"PDTCRYPT"
302 PDTCRYPT_HDR_SIZE_MAGIC = 8 # 8
303 PDTCRYPT_HDR_SIZE_VERSION = 2 # 10
304 PDTCRYPT_HDR_SIZE_PARAMVERSION = 2 # 12
305 PDTCRYPT_HDR_SIZE_NACL = 16 # 28
306 PDTCRYPT_HDR_SIZE_IV = 12 # 40
307 PDTCRYPT_HDR_SIZE_CTSIZE = 8 # 48
308 PDTCRYPT_HDR_SIZE_TAG = 16 # 64 GCM auth tag
310 PDTCRYPT_HDR_SIZE = PDTCRYPT_HDR_SIZE_MAGIC + PDTCRYPT_HDR_SIZE_VERSION \
311 + PDTCRYPT_HDR_SIZE_PARAMVERSION + PDTCRYPT_HDR_SIZE_NACL \
312 + PDTCRYPT_HDR_SIZE_IV + PDTCRYPT_HDR_SIZE_CTSIZE \
313 + PDTCRYPT_HDR_SIZE_TAG # = 64
315 # precalculate offsets since Python can’t do constant folding over names
316 HDR_OFF_VERSION = PDTCRYPT_HDR_SIZE_MAGIC
317 HDR_OFF_PARAMVERSION = HDR_OFF_VERSION + PDTCRYPT_HDR_SIZE_VERSION
318 HDR_OFF_NACL = HDR_OFF_PARAMVERSION + PDTCRYPT_HDR_SIZE_PARAMVERSION
319 HDR_OFF_IV = HDR_OFF_NACL + PDTCRYPT_HDR_SIZE_NACL
320 HDR_OFF_CTSIZE = HDR_OFF_IV + PDTCRYPT_HDR_SIZE_IV
321 HDR_OFF_TAG = HDR_OFF_CTSIZE + PDTCRYPT_HDR_SIZE_CTSIZE
325 FMT_I2N_IV = "<8sL" # 8 random bytes ‖ 32 bit counter
326 FMT_I2N_HDR = ("<" # host byte order
330 "16s" # sodium chloride
336 AES_KEY_SIZE = 16 # b"0123456789abcdef"
337 AES_KEY_SIZE_B64 = 24 # b'MDEyMzQ1Njc4OWFiY2RlZg=='
339 AES_GCM_MAX_SIZE = (1 << 36) - (1 << 5) # 2^39 - 2^8 b ≅ 64 GB.
340 # Source: NIST SP 800-38D section 5.2.1.1
341 # https://crypto.stackexchange.com/questions/31793/plain-text-size-limits-for-aes-gcm-mode-just-64gb
343 PDTCRYPT_MAX_OBJ_SIZE_DEFAULT = 63 * (1 << 30) # 63 GB
344 PDTCRYPT_MAX_OBJ_SIZE = PDTCRYPT_MAX_OBJ_SIZE_DEFAULT
346 # index and info files are written on-the fly while encrypting so their
347 # counters must be available in advance
348 AES_GCM_IV_CNT_INFOFILE = 1 # constant
349 AES_GCM_IV_CNT_INDEX = AES_GCM_IV_CNT_INFOFILE + 1
350 AES_GCM_IV_CNT_DATA = AES_GCM_IV_CNT_INDEX + 1 # also for multivolume
351 AES_GCM_IV_CNT_MAX_DEFAULT = 0xffFFffFF
352 AES_GCM_IV_CNT_MAX = AES_GCM_IV_CNT_MAX_DEFAULT
354 # IV structure and generation
355 PDTCRYPT_IV_GEN_MAX_RETRIES = 10 # ×
356 PDTCRYPT_IV_FIXEDPART_SIZE = 8 # B
357 PDTCRYPT_IV_COUNTER_SIZE = 4 # B
359 # secret type: PW of string | KEY of char [16]
360 PDTCRYPT_SECRET_PW = 0
361 PDTCRYPT_SECRET_KEY = 1
363 ###############################################################################
365 ###############################################################################
371 # , paramversion : u16
377 # fn hdr_read (f : handle) -> hdrinfo;
378 # fn hdr_make (f : handle, h : hdrinfo) -> IOResult<usize>;
379 # fn hdr_fmt (h : hdrinfo) -> String;
384 Read bytes as header structure.
386 If the input could not be interpreted as a header, fail with
391 mag, version, paramversion, nacl, iv, ctsize, tag = \
392 struct.unpack (FMT_I2N_HDR, data)
393 except Exception as exn:
394 raise InvalidHeader ("error unpacking header from [%r]: %s"
395 % (binascii.hexlify (data), str (exn)))
397 if mag != PDTCRYPT_HDR_MAGIC:
398 raise InvalidHeader ("bad magic in header: expected [%s], got [%s]"
399 % (PDTCRYPT_HDR_MAGIC, mag))
402 { "version" : version
403 , "paramversion" : paramversion
411 def hdr_read_stream (instr):
413 Read header from stream at the current position.
415 Fail with ``InvalidHeader`` if insufficient bytes were read from the
416 stream, or if the content could not be interpreted as a header.
418 data = instr.read(PDTCRYPT_HDR_SIZE)
422 elif ldata != PDTCRYPT_HDR_SIZE:
423 raise InvalidHeader ("hdr_read_stream: expected %d B, received %d B"
424 % (PDTCRYPT_HDR_SIZE, ldata))
425 return hdr_read (data)
428 def hdr_from_params (version, paramversion, nacl, iv, ctsize, tag):
430 Assemble the necessary values into a PDTCRYPT header.
432 :type version: int to fit uint16_t
433 :type paramversion: int to fit uint16_t
434 :type nacl: bytes to fit uint8_t[16]
435 :type iv: bytes to fit uint8_t[12]
436 :type size: int to fit uint64_t
437 :type tag: bytes to fit uint8_t[16]
439 buf = bytearray (PDTCRYPT_HDR_SIZE)
440 bufv = memoryview (buf)
443 struct.pack_into (FMT_I2N_HDR, bufv, 0,
445 version, paramversion, nacl, iv, ctsize, tag)
446 except Exception as exn:
447 return False, "error assembling header: %s" % str (exn)
449 return True, bytes (buf)
452 def hdr_make_dummy (s):
454 Create a header sized block of bytes initialized to a value derived from a
455 string. Used to verify we’ve jumped back correctly to the actual position
456 of the object header.
458 c = reduce (lambda a, c: a + ord(c), s, 0) % 0xFF
459 return bytes (bytearray (struct.pack ("B", c)) * PDTCRYPT_HDR_SIZE)
464 Assemble a header from the given header structure.
466 return hdr_from_params (version=hdr.get("version"),
467 paramversion=hdr.get("paramversion"),
468 nacl=hdr.get("nacl"), iv=hdr.get("iv"),
469 ctsize=hdr.get("ctsize"), tag=hdr.get("tag"))
472 HDR_FMT = "I2n_header { version: %d, paramversion: %d, nacl: %s[%d]," \
473 " iv: %s[%d], ctsize: %d, tag: %s[%d] }"
476 """Format a header structure into readable output."""
477 return HDR_FMT % (h["version"], h["paramversion"],
478 binascii.hexlify (h["nacl"]), len(h["nacl"]),
479 binascii.hexlify (h["iv"]), len(h["iv"]),
481 binascii.hexlify (h["tag"]), len(h["tag"]))
484 def hex_spaced_of_bytes (b):
485 """Format bytes object, hexdump style."""
486 return " ".join ([ "%.2x%.2x" % (c1, c2)
487 for c1, c2 in zip (b[0::2], b[1::2]) ]) \
488 + (len (b) | 1 == len (b) and " %.2x" % b[-1] or "") # odd lengths
491 def hdr_iv_counter (h):
492 """Extract the variable part of the IV of the given header."""
493 _fixed, cnt = struct.unpack (FMT_I2N_IV, h ["iv"])
497 def hdr_iv_fixed (h):
498 """Extract the fixed part of the IV of the given header."""
499 fixed, _cnt = struct.unpack (FMT_I2N_IV, h ["iv"])
503 hdr_dump = hex_spaced_of_bytes
507 """version = %-4d : %s
508 paramversion = %-4d : %s
515 def hdr_fmt_pretty (h):
517 Format header structure into multi-line representation of its contents and
518 their raw representation. (Omit the implicit “PDTCRYPT” magic bytes that
519 precede every header.)
521 return HDR_FMT_PRETTY \
523 hex_spaced_of_bytes (struct.pack (FMT_UINT16_LE, h["version"])),
525 hex_spaced_of_bytes (struct.pack (FMT_UINT16_LE, h["paramversion"])),
526 hex_spaced_of_bytes (h["nacl"]),
527 hex_spaced_of_bytes (h["iv"]),
529 hex_spaced_of_bytes (struct.pack (FMT_UINT64_LE, h["ctsize"])),
530 hex_spaced_of_bytes (h["tag"]))
532 IV_FMT = "((f %s) (c %d))"
535 """Format the two components of an IV in a readable fashion."""
536 fixed, cnt = struct.unpack (FMT_I2N_IV, iv)
537 return IV_FMT % (binascii.hexlify (fixed), cnt)
540 ###############################################################################
542 ###############################################################################
544 class Location (object):
548 def restore_loc_fmt (loc):
550 % (loc.n, loc.offset)
552 def locate_hdr_candidates (fd):
554 Walk over instances of the magic string in the payload, collecting their
555 positions. If the offset of the first found instance is not zero, the file
556 begins with leading garbage. Used by desaster recovery.
558 :return: The list of offsets in the file.
562 mm = mmap.mmap(fd, 0, mmap.MAP_SHARED, mmap.PROT_READ)
565 pos = mm.find (PDTCRYPT_HDR_MAGIC, pos)
574 HDR_CAND_GOOD = 0 # header marks begin of valid object
575 HDR_CAND_FISHY = 1 # inconclusive (tag mismatch, obj overlap etc.)
576 HDR_CAND_JUNK = 2 # not a header / object unreadable
579 { HDR_CAND_GOOD : "valid"
580 , HDR_CAND_FISHY : "fishy"
581 , HDR_CAND_JUNK : "junk"
585 def verdict_fmt (vdt):
586 return HDR_VERDICT_NAME [vdt]
589 def inspect_hdr (fd, off):
591 Attempt to parse a header in *fd* at position *off*.
593 Returns a verdict about the quality of that header plus the parsed header
597 _ = os.lseek (fd, off, os.SEEK_SET)
599 if os.lseek (fd, 0, os.SEEK_CUR) != off:
600 if PDTCRYPT_VERBOSE is True:
601 noise ("PDT: %d → dismissed (lseek() past EOF)" % off)
602 return HDR_CAND_JUNK, None
604 raw = os.read (fd, PDTCRYPT_HDR_SIZE)
605 if len (raw) != PDTCRYPT_HDR_SIZE:
606 if PDTCRYPT_VERBOSE is True:
607 noise ("PDT: %d → dismissed (EOF inside header)" % off)
608 return HDR_CAND_JUNK, None
612 except InvalidHeader as exn:
613 if PDTCRYPT_VERBOSE is True:
614 noise ("PDT: %d → dismissed (invalid: [%s])" % (off, str (exn)))
615 return HDR_CAND_JUNK, None
617 obj0 = off + PDTCRYPT_HDR_SIZE
618 objX = obj0 + hdr ["ctsize"]
620 eof = os.lseek (fd, 0, os.SEEK_END)
622 if PDTCRYPT_VERBOSE is True:
623 noise ("PDT: %d → EOF inside object (%d≤%d≤%d); adjusting size to "
624 "%d" % (off, obj0, eof, objX, (eof - obj0)))
625 # try reading up to the end
626 hdr ["ctsize"] = eof - obj0
627 return HDR_CAND_FISHY, hdr
629 return HDR_CAND_GOOD, hdr
632 def try_decrypt (ifd, off, hdr, secret, ofd=-1):
634 Attempt to decrypt the object in the (seekable) descriptor *ifd* starting
635 at *off* using the metadata in *hdr* and *secret*. An output fd can be
636 specified with *ofd*; if it is *-1* – the default –, the decrypted payload
639 Always creates a fresh decryptor, so validation steps across objects don’t
642 Errors during GCM tag validation are ignored. Used by desaster recovery.
644 ctleft = hdr ["ctsize"]
648 if ks == PDTCRYPT_SECRET_PW:
649 decr = Decrypt (password=secret [1])
650 elif ks == PDTCRYPT_SECRET_KEY:
652 decr = Decrypt (key=key)
659 os.lseek (ifd, pos, os.SEEK_SET)
662 cnksiz = min (ctleft, PDTCRYPT_BLOCKSIZE)
663 cnk = os.read (ifd, cnksiz)
666 pt = decr.process (cnk)
671 except InvalidGCMTag:
672 noise ("PDT: GCM tag mismatch for object %d–%d"
673 % (off, off + hdr ["ctsize"]))
674 if len (pt) > 0 and ofd != -1:
677 except Exception as exn:
678 noise ("PDT: error decrypting object %d–%d@%d, %d B remaining [%s]"
679 % (off, off + hdr ["ctsize"], pos, ctleft, exn))
685 def readable_objects_offsets (ifd, secret, cands):
687 From a list of candidates, locate the ones that mark the start of actual
688 readable PDTCRYPT objects.
692 for i, cand in enumerate (cands):
693 vdt, hdr = inspect_hdr (ifd, cand)
694 if vdt == HDR_CAND_JUNK:
695 pass # ignore unreadable ones
696 elif vdt in [HDR_CAND_GOOD, HDR_CAND_FISHY]:
697 ctsize = hdr ["ctsize"]
698 off0 = cand + PDTCRYPT_HDR_SIZE
699 ok = try_decrypt (ifd, off0, hdr, secret) == ctsize
701 good.append ((cand, off0 + ctsize))
703 overlap = find_overlaps (good)
705 return [ g [0] for g in good ]
708 def reconstruct_offsets (fname, secret):
709 ifd = os.open (fname, os.O_RDONLY)
712 cands = locate_hdr_candidates (ifd)
713 return readable_objects_offsets (ifd, secret, cands)
718 ###############################################################################
720 ###############################################################################
722 def make_secret (password=None, key=None):
724 Safely create a “secret” value that consists either of a key or a password.
725 Inputs are validated: the password is accepted as (UTF-8 encoded) bytes or
726 string; for the key only a bytes object of the proper size or a base64
727 encoded string thereof is accepted.
729 If both are provided, the key is preferred over the password; no checks are
730 performed whether the key is derived from the password.
732 :returns: secret value if inputs were acceptable | None otherwise.
735 if isinstance (key, str) is True:
736 key = key.encode ("utf-8")
737 if isinstance (key, bytes) is True:
738 if len (key) == AES_KEY_SIZE:
739 return (PDTCRYPT_SECRET_KEY, key)
740 if len (key) == AES_KEY_SIZE * 2:
742 key = binascii.unhexlify (key)
743 return (PDTCRYPT_SECRET_KEY, key)
744 except binascii.Error: # garbage in string
746 if len (key) == AES_KEY_SIZE_B64:
748 key = base64.b64decode (key)
749 # the base64 processor is very tolerant and allows for
750 # arbitrary trailing and leading data thus the data obtained
751 # must be checked for the proper length
752 if len (key) == AES_KEY_SIZE:
753 return (PDTCRYPT_SECRET_KEY, key)
754 except binascii.Error: # “incorrect padding”
756 elif password is not None:
757 if isinstance (password, str) is True:
758 return (PDTCRYPT_SECRET_PW, password)
759 elif isinstance (password, bytes) is True:
761 password = password.decode ("utf-8")
762 return (PDTCRYPT_SECRET_PW, password)
763 except UnicodeDecodeError:
769 ###############################################################################
770 ## passthrough / null encryption
771 ###############################################################################
773 class PassthroughCipher (object):
775 tag = struct.pack ("<QQ", 0, 0)
777 def __init__ (self) : pass
779 def update (self, b) : return b
781 def finalize (self) : return b""
783 def finalize_with_tag (self, _) : return b""
785 ###############################################################################
786 ## convenience wrapper
787 ###############################################################################
790 def kdf_dummy (klen, password, _nacl):
792 Fake KDF for testing purposes that is called when parameter version zero is
795 q, r = divmod (klen, len (password))
796 if isinstance (password, bytes) is False:
797 password = password.encode ()
798 return password * q + password [:r], b""
801 SCRYPT_KEY_MEMO = { } # static because needed for both the info file and the archive
804 def kdf_scrypt (params, password, nacl):
806 Wrapper for the Scrypt KDF, corresponds to parameter version one. The
807 computation result is memoized based on the inputs to facilitate spawning
808 multiple encryption contexts.
813 dkLen = params["dkLen"]
816 nacl = os.urandom (params["NaCl_LEN"])
818 key_parms = (password, nacl, N, r, p, dkLen)
819 global SCRYPT_KEY_MEMO
820 if key_parms not in SCRYPT_KEY_MEMO:
821 SCRYPT_KEY_MEMO [key_parms] = \
822 pylibscrypt.scrypt (password, nacl, N, r, p, dkLen)
823 return SCRYPT_KEY_MEMO [key_parms], nacl
826 def kdf_by_version (paramversion=None, defs=None):
828 Pick the KDF handler corresponding to the parameter version or the
831 :rtype: function (password : str, nacl : str) -> str
833 if paramversion is not None:
834 defs = ENCRYPTION_PARAMETERS.get(paramversion, None)
836 raise InvalidParameter ("no encryption parameters for version %r"
838 (kdf, params) = defs["kdf"]
840 if kdf == "scrypt" : fn = kdf_scrypt
841 elif kdf == "dummy" : fn = kdf_dummy
843 raise ValueError ("key derivation method %r unknown" % kdf)
844 return partial (fn, params)
847 ###############################################################################
849 ###############################################################################
851 def scrypt_hashsource (pw, ins):
853 Calculate the SCRYPT hash from the password and the information contained
854 in the first header found in ``ins``.
856 This does not validate whether the first object is encrypted correctly.
858 if isinstance (pw, str) is True:
860 elif isinstance (pw, bytes) is False:
861 raise InvalidParameter ("password must be a string, not %s"
863 if isinstance (ins, io.BufferedReader) is False and \
864 isinstance (ins, io.FileIO) is False:
865 raise InvalidParameter ("file to hash must be opened in “binary” mode")
868 hdr = hdr_read_stream (ins)
869 except EndOfFile as exn:
870 noise ("PDT: malformed input: end of file reading first object header")
875 pver = hdr ["paramversion"]
876 if PDTCRYPT_VERBOSE is True:
877 noise ("PDT: salt of first object : %s" % binascii.hexlify (nacl))
878 noise ("PDT: parameter version of archive : %d" % pver)
881 defs = ENCRYPTION_PARAMETERS.get(pver, None)
882 kdfname, params = defs ["kdf"]
883 if kdfname != "scrypt":
884 noise ("PDT: input is not an SCRYPT archive")
887 kdf = kdf_by_version (None, defs)
888 except ValueError as exn:
889 noise ("PDT: object has unknown parameter version %d" % pver)
891 hsh, _void = kdf (pw, nacl)
893 return hsh, nacl, hdr ["version"], pver
896 def scrypt_hashfile (pw, fname):
898 Calculate the SCRYPT hash from the password and the information contained
899 in the first header found in the given file. The header is read only at
902 with deptdcrypt_mk_stream (PDTCRYPT_SOURCE, fname or "-") as ins:
903 hsh, _void, _void, _void = scrypt_hashsource (pw, ins)
907 ###############################################################################
909 ###############################################################################
911 class Crypto (object):
913 Encryption context to remain alive throughout an entire tarfile pass.
918 cnt = None # file counter (uint32_t != 0)
919 iv = None # current IV
920 fixed = None # accu for 64 bit fixed parts of IV
921 used_ivs = None # tracks IVs
922 strict_ivs = False # if True, panic on duplicate object IV
925 insecure = False # allow plaintext parameters
932 info_counter_used = False
933 index_counter_used = False
935 def __init__ (self, *al, **akv):
936 self.used_ivs = set ()
937 self.set_parameters (*al, **akv)
940 def next_fixed (self):
945 def set_object_counter (self, cnt=None):
947 Safely set the internal counter of encrypted objects. Numerous
950 The same counter may not be reused in combination with one IV fixed
951 part. This is validated elsewhere in the IV handling.
953 Counter zero is invalid. The first two counters are reserved for
954 metadata. The implementation does not allow for splitting metadata
955 files over multiple encrypted objects. (This would be possible by
956 assigning new fixed parts.) Thus in a Deltatar backup there is at most
957 one object with a counter value of one and two. On creation of a
958 context, the initial counter may be chosen. The globals
959 ``AES_GCM_IV_CNT_INFOFILE`` and ``AES_GCM_IV_CNT_INDEX`` can be used to
960 request one of the reserved values. If one of these values has been
961 used, any further attempt of setting the counter to that value will
962 be rejected with an ``InvalidFileCounter`` exception.
964 Out of bounds values (i. e. below one and more than the maximum of 2³²)
965 cause an ``InvalidParameter`` exception to be thrown.
968 self.cnt = AES_GCM_IV_CNT_DATA
970 if cnt == 0 or cnt > AES_GCM_IV_CNT_MAX + 1:
971 raise InvalidParameter ("invalid counter value %d requested: "
972 "acceptable values are from 1 to %d"
973 % (cnt, AES_GCM_IV_CNT_MAX))
974 if cnt == AES_GCM_IV_CNT_INFOFILE:
975 if self.info_counter_used is True:
976 raise InvalidFileCounter ("attempted to reuse info file "
977 "counter %d: must be unique" % cnt)
978 self.info_counter_used = True
979 elif cnt == AES_GCM_IV_CNT_INDEX:
980 if self.index_counter_used is True:
981 raise InvalidFileCounter ("attempted to reuse index file "
982 "counter %d: must be unique" % cnt)
983 self.index_counter_used = True
984 if cnt <= AES_GCM_IV_CNT_MAX:
987 # cnt == AES_GCM_IV_CNT_MAX + 1 → wrap
988 self.cnt = AES_GCM_IV_CNT_DATA
992 def set_parameters (self, password=None, key=None, paramversion=None,
993 nacl=None, counter=None, strict_ivs=False,
996 Configure the internal state of a crypto context. Not intended for
999 A parameter version indicating passthrough (plaintext) mode is rejected
1000 with an ``InvalidParameter`` unless ``insecure`` is set.
1003 self.set_object_counter (counter)
1004 self.strict_ivs = strict_ivs
1006 self.insecure = insecure
1008 if paramversion is not None:
1009 if self.insecure is False \
1010 and paramversion < MIN_SECURE_PARAMETERS:
1011 raise InvalidParameter \
1012 ("set_parameters: requested parameter version %d but "
1013 "plaintext encryption disallowed in secure context!"
1015 self.paramversion = paramversion
1018 self.key, self.nacl = key, nacl
1021 if password is not None:
1022 if isinstance (password, bytes) is False:
1023 password = str.encode (password)
1024 self.password = password
1025 if paramversion is None and nacl is None:
1026 # postpone key setup until first header is available
1028 kdf = kdf_by_version (paramversion)
1030 self.key, self.nacl = kdf (password, nacl)
1033 def process (self, buf):
1035 Encrypt / decrypt a buffer. Invokes the ``.update()`` method on the
1036 wrapped encryptor or decryptor, respectively.
1038 The Cryptography exception ``AlreadyFinalized`` is translated to an
1039 ``InternalError`` at this point. It may occur in sound code when the GC
1040 closes an encrypting stream after an error. Everywhere else it must be
1043 if self.enc is None:
1044 raise RuntimeError ("process: context not initialized")
1045 self.stats ["in"] += len (buf)
1047 out = self.enc.update (buf)
1048 except cryptography.exceptions.AlreadyFinalized as exn:
1049 raise InternalError (exn)
1050 self.stats ["out"] += len (out)
1054 def next (self, password, paramversion, nacl, iv):
1056 Prepare for encrypting another object: Reset the data counters and
1057 change the configuration in case one of the variable parameters differs
1058 from the last object. Also check the IV for duplicates and error out
1059 if strict checking was requested.
1063 self.stats ["obj"] += 1
1065 self.check_duplicate_iv (iv)
1067 if ( self.paramversion != paramversion
1068 or self.password != password
1069 or self.nacl != nacl):
1070 self.set_parameters (password=password, paramversion=paramversion,
1071 nacl=nacl, strict_ivs=self.strict_ivs,
1072 insecure=self.insecure)
1075 def check_duplicate_iv (self, iv):
1077 Add an IV (the 12 byte representation as in the header) to the list. With
1078 strict checking enabled, this will throw a ``DuplicateIV``. Depending on
1079 the context, this may indicate a serious error (IV reuse).
1081 if self.strict_ivs is True and iv in self.used_ivs:
1082 raise DuplicateIV ("iv %s was reused" % iv_fmt (iv))
1083 # vi has not been used before; add to collection
1084 self.used_ivs.add (iv)
1087 def counters (self):
1089 Access the data counters.
1091 return self.stats ["obj"], self.stats ["in"], self.stats ["out"]
1096 Clear the current context regardless of its finalization state. The
1097 next operation must be ``.next()``.
1102 def get_used_ivs (self):
1104 Get the set of IVs that were used so far during the lifetime of
1105 this context. Useful to check for IV reuse if multiple encryption
1106 contexts were used independently.
1108 return self.used_ivs
1111 class Encrypt (Crypto):
1117 def __init__ (self, version, paramversion, password=None, key=None, nacl=None,
1118 counter=AES_GCM_IV_CNT_DATA, strict_ivs=True, insecure=False):
1120 The ctor will throw immediately if one of the parameters does not conform
1121 to our expectations.
1123 :type version: int to fit uint16_t
1124 :type paramversion: int to fit uint16_t
1125 :param password: mutually exclusive with ``key``
1126 :type password: bytes
1127 :param key: mutually exclusive with ``password``
1130 :type counter: initial object counter the values
1131 ``AES_GCM_IV_CNT_INFOFILE`` and
1132 ``AES_GCM_IV_CNT_INDEX`` are unique in each backup set
1133 and cannot be reused even with different fixed parts.
1134 :type strict_ivs: bool
1135 :type insecure: bool, whether to permit passthrough mode
1137 *Security considerations*: The ``class Encrypt`` handle guarantees that
1138 all random parts (first eight bytes) of the IVs used for encrypting
1139 objects are unique. This guarantee does *not* apply across handles if
1140 multiple handles are used with the same combination of password and
1141 salt. Thus, use of multiple handles with the same combination of password
1142 and salt is subject to birthday attacks with a bound of 2^32. To avoid
1143 collisions, the application should keep the number of handles as low
1144 as possible and check for reuse by comparing the set of IVs used of all
1145 handles that were created (accessible using the ``get_used_ivs`` method).
1147 if password is None and key is None \
1148 or password is not None and key is not None :
1149 raise InvalidParameter ("__init__: need either key or password")
1152 if isinstance (key, bytes) is False:
1153 raise InvalidParameter ("__init__: key must be provided as "
1154 "bytes, not %s" % type (key))
1156 raise InvalidParameter ("__init__: salt must be provided along "
1157 "with encryption key")
1158 else: # password, no key
1159 if isinstance (password, str) is False:
1160 raise InvalidParameter ("__init__: password must be a string, not %s"
1162 if len (password) == 0:
1163 raise InvalidParameter ("__init__: supplied empty password but not "
1164 "permitted for PDT encrypted files")
1166 if isinstance (version, int) is False:
1167 raise InvalidParameter ("__init__: version number must be an "
1168 "integer, not %s" % type (version))
1170 raise InvalidParameter ("__init__: version number must be a "
1171 "nonnegative integer, not %d" % version)
1173 if isinstance (paramversion, int) is False:
1174 raise InvalidParameter ("__init__: crypto parameter version number "
1175 "must be an integer, not %s"
1176 % type (paramversion))
1177 if paramversion < 0:
1178 raise InvalidParameter ("__init__: crypto parameter version number "
1179 "must be a nonnegative integer, not %d"
1182 if nacl is not None:
1183 if isinstance (nacl, bytes) is False:
1184 raise InvalidParameter ("__init__: salt given, but of type %s "
1185 "instead of bytes" % type (nacl))
1186 # salt length would depend on the actual encryption so it can’t be
1187 # validated at this point
1189 self.version = version
1190 self.paramenc = ENCRYPTION_PARAMETERS.get (paramversion) ["enc"]
1192 super().__init__ (password, key, paramversion, nacl, counter=counter,
1193 strict_ivs=strict_ivs, insecure=insecure)
1196 def next_fixed (self, retries=PDTCRYPT_IV_GEN_MAX_RETRIES):
1198 Generate the next IV fixed part by reading eight bytes from
1199 ``/dev/urandom``. The buffer so obtained is tested against the fixed
1200 parts used so far to prevent accidental reuse of IVs. After a
1201 configurable number of attempts to create a unique fixed part, it will
1202 refuse to continue with an ``IVFixedPartError``. This is unlikely to
1203 ever happen on a normal system but may detect an issue with the random
1206 The list of fixed parts that were used by the context at hand can be
1207 accessed through the ``.fixed`` list. Its last element is the fixed
1208 part currently in use.
1212 fp = os.urandom (PDTCRYPT_IV_FIXEDPART_SIZE)
1213 if fp not in self.fixed:
1214 self.fixed.append (fp)
1217 raise IVFixedPartError ("error obtaining a unique IV fixed part from "
1218 "/dev/urandom; giving up after %d tries" % i)
1223 Construct a 12-bytes IV from the current fixed part and the object
1226 return struct.pack(FMT_I2N_IV, self.fixed [-1], self.cnt)
1229 def next (self, filename=None, counter=None):
1231 Prepare for encrypting the next incoming object. Update the counter
1232 and put together the IV, possibly changing prefixes. Then create the
1235 The argument ``counter`` can be used to specify a file counter for this
1236 object. Unless it is one of the reserved values, the counter of
1237 subsequent objects will be computed from this one.
1239 If this is the first object in a series, ``filename`` is required,
1240 otherwise it is reused if not present. The value is used to derive a
1241 header sized placeholder to use until after encryption when all the
1242 inputs to construct the final header are available. This is then
1243 matched in ``.done()`` against the value found at the position of the
1244 header. The motivation for this extra check is primarily to assist
1245 format debugging: It makes stray headers easy to spot in malformed
1248 if filename is None:
1249 if self.lastinfo is None:
1250 raise InvalidParameter ("next: filename is mandatory for "
1252 filename, _dummy = self.lastinfo
1254 if isinstance (filename, str) is False:
1255 raise InvalidParameter ("next: filename must be a string, no %s"
1257 if counter is not None:
1258 if isinstance (counter, int) is False:
1259 raise InvalidParameter ("next: the supplied counter is of "
1260 "invalid type %s; please pass an "
1261 "integer instead" % type (counter))
1262 self.set_object_counter (counter)
1264 self.iv = self.iv_make ()
1265 if self.paramenc == "aes-gcm":
1267 ( algorithms.AES (self.key)
1268 , modes.GCM (self.iv)
1269 , backend = default_backend ()) \
1271 elif self.paramenc == "passthrough":
1272 self.enc = PassthroughCipher ()
1274 raise InvalidParameter ("next: parameter version %d not known"
1275 % self.paramversion)
1276 hdrdum = hdr_make_dummy (filename)
1277 self.lastinfo = (filename, hdrdum)
1278 super().next (self.password, self.paramversion, self.nacl, self.iv)
1280 self.set_object_counter (self.cnt + 1)
1284 def done (self, cmpdata):
1286 Complete encryption of an object. After this has been called, attempts
1287 of encrypting further data will cause an error until ``.next()`` is
1290 Returns a 64 bytes buffer containing the object header including all
1291 values including the “late” ones e. g. the ciphertext size and the
1294 if isinstance (cmpdata, bytes) is False:
1295 raise InvalidParameter ("done: comparison input expected as bytes, "
1296 "not %s" % type (cmpdata))
1297 if self.lastinfo is None:
1298 raise RuntimeError ("done: encryption context not initialized")
1299 filename, hdrdum = self.lastinfo
1300 if cmpdata != hdrdum:
1301 raise RuntimeError ("done: bad sync of header for object %d: "
1302 "preliminary data does not match; this likely "
1303 "indicates a wrongly repositioned stream"
1305 data = self.enc.finalize ()
1306 self.stats ["out"] += len (data)
1307 self.ctsize += len (data)
1308 ok, hdr = hdr_from_params (self.version, self.paramversion, self.nacl,
1309 self.iv, self.ctsize, self.enc.tag)
1311 raise InternalError ("error constructing header: %r" % hdr)
1312 return data, hdr, self.fixed
1315 def process (self, buf):
1317 Encrypt a chunk of plaintext with the active encryptor. Returns the
1318 size of the input consumed. This **must** be checked downstream. If the
1319 maximum possible object size has been reached, the current context must
1320 be finalized and a new one established before any further data can be
1321 encrypted. The second argument is the remainder of the plaintext that
1322 was not encrypted for the caller to use immediately after the new
1325 if isinstance (buf, bytes) is False:
1326 raise InvalidParameter ("process: expected byte buffer, not %s"
1329 newptsize = self.ptsize + bsize
1330 diff = newptsize - PDTCRYPT_MAX_OBJ_SIZE
1333 newptsize = PDTCRYPT_MAX_OBJ_SIZE
1334 self.ptsize = newptsize
1335 data = super().process (buf [:bsize])
1336 self.ctsize += len (data)
1340 class Decrypt (Crypto):
1342 tag = None # GCM tag, part of header
1343 last_iv = None # check consecutive ivs in strict mode
1346 def __init__ (self, password=None, key=None, counter=None, fixedparts=None,
1347 strict_ivs=False, insecure=False):
1349 Sanitizing ctor for the decryption context. ``fixedparts`` specifies a
1350 list of IV fixed parts accepted during decryption. If a fixed part is
1351 encountered that is not in the list, decryption will fail.
1353 :param password: mutually exclusive with ``key``
1354 :type password: bytes
1355 :param key: mutually exclusive with ``password``
1357 :type counter: initial object counter the values
1358 ``AES_GCM_IV_CNT_INFOFILE`` and
1359 ``AES_GCM_IV_CNT_INDEX`` are unique in each backup set
1360 and cannot be reused even with different fixed parts.
1361 :type fixedparts: bytes list
1362 :type insecure: bool
1363 :param insecure: whether to process objects encrypted in
1364 passthrough mode (*``paramversion`` < 1*)
1366 if password is None and key is None \
1367 or password is not None and key is not None :
1368 raise InvalidParameter ("__init__: need either key or password")
1371 if isinstance (key, bytes) is False:
1372 raise InvalidParameter ("__init__: key must be provided as "
1373 "bytes, not %s" % type (key))
1374 else: # password, no key
1375 if isinstance (password, str) is False:
1376 raise InvalidParameter ("__init__: password must be a string, not %s"
1378 if len (password) == 0:
1379 raise InvalidParameter ("__init__: supplied empty password but not "
1380 "permitted for PDT encrypted files")
1382 if fixedparts is not None:
1383 if isinstance (fixedparts, list) is False:
1384 raise InvalidParameter ("__init__: IV fixed parts must be "
1385 "supplied as list, not %s"
1386 % type (fixedparts))
1387 self.fixed = fixedparts
1390 super().__init__ (password=password, key=key, counter=counter,
1391 strict_ivs=strict_ivs, insecure=insecure)
1394 def valid_fixed_part (self, iv):
1396 Check if a fixed part was already seen.
1398 # check if fixed part is known
1399 fixed, _cnt = struct.unpack (FMT_I2N_IV, iv)
1400 i = bisect.bisect_left (self.fixed, fixed)
1401 return i != len (self.fixed) and self.fixed [i] == fixed
1404 def check_consecutive_iv (self, iv):
1406 Check whether the counter part of the given IV is indeed the successor
1407 of the currently present counter. This should always be the case for
1408 the objects in a well formed PDT archive but should not be enforced
1409 when decrypting out-of-order.
1411 fixed, cnt = struct.unpack (FMT_I2N_IV, iv)
1412 if self.strict_ivs is True \
1413 and self.last_iv is not None \
1414 and self.last_iv [0] == fixed \
1415 and self.last_iv [1] != cnt - 1:
1416 raise NonConsecutiveIV ("iv %s counter not successor of "
1417 "last object (expected %d, found %d)"
1418 % (iv_fmt (iv), self.last_iv [1], cnt))
1419 self.last_iv = (fixed, cnt)
1422 def next (self, hdr):
1424 Start decrypting the next object. The PDTCRYPT header for the object
1425 can be given either as already parsed object or as bytes.
1427 if isinstance (hdr, bytes) is True:
1428 hdr = hdr_read (hdr)
1429 elif isinstance (hdr, dict) is False:
1430 # this won’t catch malformed specs though
1431 raise InvalidParameter ("next: wrong type of parameter hdr: "
1432 "expected bytes or spec, got %s"
1435 paramversion = hdr ["paramversion"]
1439 ctsize = hdr ["ctsize"]
1441 raise InvalidHeader ("next: not a header %r" % hdr)
1443 if ctsize > PDTCRYPT_MAX_OBJ_SIZE:
1444 raise InvalidHeader ("next: ciphertext size %d exceeds maximum "
1446 % (ctsize, PDTCRYPT_MAX_OBJ_SIZE))
1448 self.hdr_ctsize = ctsize
1450 super().next (self.password, paramversion, nacl, iv)
1451 if self.fixed is not None and self.valid_fixed_part (iv) is False:
1452 raise InvalidIVFixedPart ("iv %s has invalid fixed part"
1454 self.check_consecutive_iv (iv)
1457 defs = ENCRYPTION_PARAMETERS.get (paramversion, None)
1459 raise FormatError ("header contains unknown parameter version %d; "
1460 "maybe the file was created by a more recent "
1461 "version of Deltatar" % paramversion)
1463 if enc == "aes-gcm":
1465 ( algorithms.AES (self.key)
1466 , modes.GCM (iv, tag=self.tag)
1467 , backend = default_backend ()) \
1469 elif enc == "passthrough":
1470 self.enc = PassthroughCipher ()
1472 raise InternalError ("encryption parameter set %d refers to unknown "
1473 "mode %r" % (paramversion, enc))
1474 self.set_object_counter (self.cnt + 1)
1477 def done (self, tag=None):
1479 Stop decryption of the current object and finalize it with the active
1480 context. This will throw an *InvalidGCMTag* exception to indicate that
1481 the authentication tag does not match the data. If the tag is correct,
1482 the rest of the plaintext is returned.
1487 data = self.enc.finalize ()
1489 if isinstance (tag, bytes) is False:
1490 raise InvalidParameter ("done: wrong type of parameter "
1491 "tag: expected bytes, got %s"
1493 data = self.enc.finalize_with_tag (self.tag)
1494 except cryptography.exceptions.InvalidTag:
1495 raise InvalidGCMTag ("done: tag mismatch of object %d: %s "
1496 "rejected by finalize ()"
1497 % (self.cnt, binascii.hexlify (self.tag)))
1498 self.ptsize += len (data)
1499 self.stats ["out"] += len (data)
1501 assert self.ctsize == self.ptsize == self.hdr_ctsize
1506 def process (self, buf):
1508 Decrypt the bytes object *buf* with the active decryptor.
1510 if isinstance (buf, bytes) is False:
1511 raise InvalidParameter ("process: expected byte buffer, not %s"
1513 self.ctsize += len (buf)
1514 if self.ctsize > self.hdr_ctsize:
1515 raise CiphertextTooLong ("process: object length exceeded: got "
1516 "%d B but header specfiies %d B"
1517 % (self.ctsize, self.hdr_ctsize))
1519 data = super().process (buf)
1520 self.ptsize += len (data)
1524 ###############################################################################
1526 ###############################################################################
1528 def _patch_global (glob, vow, n=None):
1530 Adapt upper file counter bound for testing IV logic. Completely unsafe.
1532 assert vow == "I am fully aware that this will void my warranty."
1533 r = globals () [glob]
1535 n = globals () [glob + "_DEFAULT"]
1536 globals () [glob] = n
1539 _testing_set_AES_GCM_IV_CNT_MAX = \
1540 partial (_patch_global, "AES_GCM_IV_CNT_MAX")
1542 _testing_set_PDTCRYPT_MAX_OBJ_SIZE = \
1543 partial (_patch_global, "PDTCRYPT_MAX_OBJ_SIZE")
1545 def open2_dump_file (fname, dir_fd, force=False):
1548 oflags = os.O_CREAT | os.O_WRONLY
1550 oflags |= os.O_TRUNC
1555 outfd = os.open (fname, oflags,
1556 stat.S_IRUSR | stat.S_IWUSR, dir_fd=dir_fd)
1557 except FileExistsError as exn:
1558 noise ("PDT: refusing to overwrite existing file %s" % fname)
1560 raise RuntimeError ("destination file %s already exists" % fname)
1561 if PDTCRYPT_VERBOSE is True:
1562 noise ("PDT: new output file %s (fd=%d)" % (fname, outfd))
1566 ###############################################################################
1567 ## freestanding invocation
1568 ###############################################################################
1570 PDTCRYPT_SUB_PROCESS = 0
1571 PDTCRYPT_SUB_SCRYPT = 1
1572 PDTCRYPT_SUB_SCAN = 2
1575 { "process" : PDTCRYPT_SUB_PROCESS
1576 , "scrypt" : PDTCRYPT_SUB_SCRYPT
1577 , "scan" : PDTCRYPT_SUB_SCAN }
1579 PDTCRYPT_DECRYPT = 1 << 0 # decrypt archive with password
1580 PDTCRYPT_SPLIT = 1 << 1 # split archive into individual objects
1581 PDTCRYPT_HASH = 1 << 2 # output scrypt hash for file and given password
1583 PDTCRYPT_SPLITNAME = "pdtcrypt-object-%d.bin"
1584 PDTCRYPT_RESCUENAME = "pdtcrypt-rescue-object-%0.5d.bin"
1586 PDTCRYPT_VERBOSE = False
1587 PDTCRYPT_STRICTIVS = False
1588 PDTCRYPT_OVERWRITE = False
1589 PDTCRYPT_BLOCKSIZE = 1 << 12
1594 PDTCRYPT_DEFAULT_VER = 1
1595 PDTCRYPT_DEFAULT_PVER = 1
1597 # scrypt hashing output control
1598 PDTCRYPT_SCRYPT_INTRANATOR = 0
1599 PDTCRYPT_SCRYPT_PARAMETERS = 1
1600 PDTCRYPT_SCRYPT_DEFAULT = PDTCRYPT_SCRYPT_INTRANATOR
1602 PDTCRYPT_SCRYPT_FORMAT = \
1603 { "i2n" : PDTCRYPT_SCRYPT_INTRANATOR
1604 , "params" : PDTCRYPT_SCRYPT_PARAMETERS }
1606 PDTCRYPT_TT_COLUMNS = 80 # assume standard terminal
1608 class PDTDecryptionError (Exception):
1609 """Decryption failed."""
1611 class PDTSplitError (Exception):
1612 """Decryption failed."""
1615 def noise (*a, **b):
1616 print (file=sys.stderr, *a, **b)
1619 class PassthroughDecryptor (object):
1621 curhdr = None # write current header on first data write
1623 def __init__ (self):
1624 if PDTCRYPT_VERBOSE is True:
1625 noise ("PDT: no encryption; data passthrough")
1627 def next (self, hdr):
1628 ok, curhdr = hdr_make (hdr)
1630 raise PDTDecryptionError ("bad header %r" % hdr)
1631 self.curhdr = curhdr
1634 if self.curhdr is not None:
1638 def process (self, d):
1639 if self.curhdr is not None:
1645 def depdtcrypt (mode, secret, ins, outs):
1647 Remove PDTCRYPT layer from all objects encrypted with the secret. Used on a
1648 Deltatar backup this will yield a (possibly Gzip compressed) tarball.
1650 ctleft = -1 # length of ciphertext to consume
1651 ctcurrent = 0 # total ciphertext of current object
1652 total_obj = 0 # total number of objects read
1653 total_pt = 0 # total plaintext bytes
1654 total_ct = 0 # total ciphertext bytes
1655 total_read = 0 # total bytes read
1656 outfile = None # Python file object for output
1658 if mode & PDTCRYPT_DECRYPT: # decryptor
1660 if ks == PDTCRYPT_SECRET_PW:
1661 decr = Decrypt (password=secret [1], strict_ivs=PDTCRYPT_STRICTIVS)
1662 elif ks == PDTCRYPT_SECRET_KEY:
1664 decr = Decrypt (key=key, strict_ivs=PDTCRYPT_STRICTIVS)
1666 raise InternalError ("‘%d’ does not specify a valid kind of secret"
1669 decr = PassthroughDecryptor ()
1672 """Dummy for non-split mode: output file does not vary."""
1675 if mode & PDTCRYPT_SPLIT:
1676 def nextout (outfile):
1678 We were passed an fd as outs for accessing the destination
1679 directory where extracted archive components are supposed
1684 if PDTCRYPT_VERBOSE is True:
1685 noise ("PDT: no output file to close at this point")
1687 if PDTCRYPT_VERBOSE is True:
1688 noise ("PDT: release output file %r" % outfile)
1689 # cleanup happens automatically by the GC; the next
1690 # line will error out on account of an invalid fd
1693 assert total_obj > 0
1694 fname = PDTCRYPT_SPLITNAME % total_obj
1696 outfd = open2_dump_file (fname, outs, force=PDTCRYPT_OVERWRITE)
1697 except RuntimeError as exn:
1698 raise PDTSplitError (exn)
1699 return os.fdopen (outfd, "wb", closefd=True)
1703 """ESPIPE is normal on non-seekable stdio stream."""
1706 except OSError as exn:
1707 if exn.errno == errno.ESPIPE:
1710 def out (pt, outfile):
1714 if PDTCRYPT_VERBOSE is True:
1715 noise ("PDT:\t· decrypt plaintext %d B" % (npt))
1717 nn = outfile.write (pt)
1718 except OSError as exn: # probably ENOSPC
1719 raise DecryptionError ("error (%s)" % exn)
1721 raise DecryptionError ("write aborted after %d of %d B" % (nn, npt))
1725 # current object completed; in a valid archive this marks either
1726 # the start of a new header or the end of the input
1727 if ctleft == 0: # current object requires finalization
1728 if PDTCRYPT_VERBOSE is True:
1729 noise ("PDT: %d finalize" % tell (ins))
1732 except InvalidGCMTag as exn:
1733 raise DecryptionError ("error finalizing object %d (%d B): "
1734 "%r" % (total_obj, len (pt), exn)) \
1737 if PDTCRYPT_VERBOSE is True:
1738 noise ("PDT:\t· object validated")
1740 if PDTCRYPT_VERBOSE is True:
1741 noise ("PDT: %d hdr" % tell (ins))
1743 hdr = hdr_read_stream (ins)
1744 total_read += PDTCRYPT_HDR_SIZE
1745 except EndOfFile as exn:
1746 total_read += exn.remainder
1747 if total_ct + total_obj * PDTCRYPT_HDR_SIZE != total_read:
1748 raise PDTDecryptionError ("ciphertext processed (%d B) plus "
1749 "overhead (%d × %d B) does not match "
1750 "the number of bytes read (%d )"
1751 % (total_ct, total_obj, PDTCRYPT_HDR_SIZE,
1753 # the single good exit
1754 return total_read, total_obj, total_ct, total_pt
1755 except InvalidHeader as exn:
1756 raise PDTDecryptionError ("invalid header at position %d in %r "
1757 "(%s)" % (tell (ins), exn, ins))
1758 if PDTCRYPT_VERBOSE is True:
1759 pretty = hdr_fmt_pretty (hdr)
1760 noise (reduce (lambda a, e: (a + "\n" if a else "") + "PDT:\t· " + e,
1761 pretty.splitlines (), ""))
1762 ctcurrent = ctleft = hdr ["ctsize"]
1766 total_obj += 1 # used in file counter with split mode
1768 # finalization complete or skipped in case of first object in
1769 # stream; create a new output file if necessary
1770 outfile = nextout (outfile)
1772 if PDTCRYPT_VERBOSE is True:
1773 noise ("PDT: %d decrypt obj no. %d, %d B"
1774 % (tell (ins), total_obj, ctleft))
1776 # always allocate a new buffer since python-cryptography doesn’t allow
1777 # passing a bytearray :/
1778 nexpect = min (ctleft, PDTCRYPT_BLOCKSIZE)
1779 if PDTCRYPT_VERBOSE is True:
1780 noise ("PDT:\t· [%d] %d%% done, read block (%d B of %d B remaining)"
1782 100 - ctleft * 100 / (ctcurrent > 0 and ctcurrent or 1),
1784 ct = ins.read (nexpect)
1788 raise EndOfFile (nct,
1789 "hit EOF after %d of %d B in block [%d:%d); "
1790 "%d B ciphertext remaining for object no %d"
1791 % (nct, nexpect, off, off + nexpect, ctleft,
1797 if PDTCRYPT_VERBOSE is True:
1798 noise ("PDT:\t· decrypt ciphertext %d B" % (nct))
1799 pt = decr.process (ct)
1803 def deptdcrypt_mk_stream (kind, path):
1804 """Create stream from file or stdio descriptor."""
1805 if kind == PDTCRYPT_SINK:
1807 if PDTCRYPT_VERBOSE is True: noise ("PDT: sink: stdout")
1808 return sys.stdout.buffer
1810 if PDTCRYPT_VERBOSE is True: noise ("PDT: sink: file %s" % path)
1811 return io.FileIO (path, "w")
1812 if kind == PDTCRYPT_SOURCE:
1814 if PDTCRYPT_VERBOSE is True: noise ("PDT: source: stdin")
1815 return sys.stdin.buffer
1817 if PDTCRYPT_VERBOSE is True: noise ("PDT: source: file %s" % path)
1818 return io.FileIO (path, "r")
1820 raise ValueError ("bogus stream “%s” / %s" % (kind, path))
1823 def mode_depdtcrypt (mode, secret, ins, outs):
1825 total_read, total_obj, total_ct, total_pt = \
1826 depdtcrypt (mode, secret, ins, outs)
1827 except DecryptionError as exn:
1828 noise ("PDT: Decryption failed:")
1830 noise ("PDT: “%s”" % exn)
1832 noise ("PDT: Did you specify the correct key / password?")
1835 except PDTSplitError as exn:
1836 noise ("PDT: Split operation failed:")
1838 noise ("PDT: “%s”" % exn)
1840 noise ("PDT: Hint: target directory should be empty.")
1844 if PDTCRYPT_VERBOSE is True:
1845 noise ("PDT: decryption successful" )
1846 noise ("PDT: %.10d bytes read" % total_read)
1847 noise ("PDT: %.10d objects decrypted" % total_obj )
1848 noise ("PDT: %.10d bytes ciphertext" % total_ct )
1849 noise ("PDT: %.10d bytes plaintext" % total_pt )
1855 def mode_scrypt (pw, ins=None, nacl=None, fmt=PDTCRYPT_SCRYPT_INTRANATOR):
1857 paramversion = PDTCRYPT_DEFAULT_PVER
1859 hsh, nacl, version, paramversion = scrypt_hashsource (pw, ins)
1860 defs = ENCRYPTION_PARAMETERS.get(paramversion, None)
1862 nacl = binascii.unhexlify (nacl)
1863 defs = ENCRYPTION_PARAMETERS.get(paramversion, None)
1864 version = PDTCRYPT_DEFAULT_VER
1866 kdfname, params = defs ["kdf"]
1868 kdf = kdf_by_version (None, defs)
1869 hsh, _void = kdf (pw, nacl)
1873 if fmt == PDTCRYPT_SCRYPT_INTRANATOR:
1874 out = json.dumps ({ "salt" : base64.b64encode (nacl).decode ()
1875 , "key" : base64.b64encode (hsh) .decode ()
1876 , "paramversion" : paramversion })
1877 elif fmt == PDTCRYPT_SCRYPT_PARAMETERS:
1878 out = json.dumps ({ "salt" : binascii.hexlify (nacl).decode ()
1879 , "key" : binascii.hexlify (hsh) .decode ()
1880 , "version" : version
1881 , "scrypt_params" : { "N" : params ["N"]
1882 , "r" : params ["r"]
1883 , "p" : params ["p"]
1884 , "dkLen" : params ["dkLen"] } })
1886 raise RuntimeError ("bad scrypt output scheme %r" % fmt)
1891 def noise_output_candidates (cands, indent=8, cols=PDTCRYPT_TT_COLUMNS):
1893 Print a list of offsets without garbling the terminal too much.
1895 The indent is counted from column zero; if it is wide enough, the “PDT: ”
1896 marker will be prepended, considered part of the indentation.
1900 idt = " " * indent if indent < 5 else "PDT: " + " " * (indent - 5)
1905 init = True # prevent leading separator
1908 raise ValueError ("the requested indentation exceeds the line "
1909 "width by %d" % (indent - wd))
1919 if lpos > wd: # line break
1935 SLICE_START = 1 # ordering is important to have starts of intervals
1936 SLICE_END = 0 # sorted before equal ends
1938 def find_overlaps (slices):
1940 Find overlapping slices: iterate open/close points of intervals, tracking
1941 the ones open at any time.
1944 inside = set () # of indices into bounds
1945 ovrlp = set () # of indices into bounds
1947 for i, s in enumerate (slices):
1948 bounds.append ((s [0], SLICE_START, i))
1949 bounds.append ((s [1], SLICE_END , i))
1950 bounds = sorted (bounds)
1954 if val [1] == SLICE_START:
1957 if len (inside) > 1: # closing one that overlapped
1961 return [ slices [i] for i in ovrlp ]
1964 def mode_scan (secret, fname, outs=None, nacl=None):
1966 Dissect a binary file, looking for PDTCRYPT headers and objects.
1968 If *outs* is supplied, recoverable data will be dumped into the specified
1972 ifd = os.open (fname, os.O_RDONLY)
1973 except FileNotFoundError:
1974 noise ("PDT: failed to open %s readonly" % fname)
1979 if PDTCRYPT_VERBOSE is True:
1980 noise ("PDT: scan for potential sync points")
1981 cands = locate_hdr_candidates (ifd)
1982 if len (cands) == 0:
1983 noise ("PDT: scan complete: input does not contain potential PDT "
1984 "headers; giving up.")
1986 if PDTCRYPT_VERBOSE is True:
1987 noise ("PDT: scan complete: found %d candidates:" % len (cands))
1988 noise_output_candidates (cands)
1993 junk, todo, slices = [], [], []
1998 vdt, hdr = inspect_hdr (ifd, cand)
2000 vdts = verdict_fmt (vdt)
2002 if vdt == HDR_CAND_JUNK:
2003 noise ("PDT: obj %d: %s object: bad header, skipping" % vdts)
2006 off0 = cand + PDTCRYPT_HDR_SIZE
2007 if PDTCRYPT_VERBOSE is True:
2008 noise ("PDT: obj %d: read payload @%d" % (nobj, off0))
2009 pretty = hdr_fmt_pretty (hdr)
2010 noise (reduce (lambda a, e: (a + "\n" if a else "") + "PDT:\t· " + e,
2011 pretty.splitlines (), ""))
2014 if outs is not None:
2015 ofname = PDTCRYPT_RESCUENAME % nobj
2016 ofd = open2_dump_file (ofname, outs, force=PDTCRYPT_OVERWRITE)
2018 ctsize = hdr ["ctsize"]
2020 l = try_decrypt (ifd, off0, hdr, secret, ofd=ofd)
2022 slices.append ((off0, off0 + l))
2026 if vdt == HDR_CAND_GOOD and ok is True:
2027 noise ("PDT: %d → ✓ %s object %d–%d"
2028 % (cand, vdts, off0, off0 + ctsize))
2029 elif vdt == HDR_CAND_FISHY and ok is True:
2030 noise ("PDT: %d → × %s object %d–%d, corrupt header"
2031 % (cand, vdts, off0, off0 + ctsize))
2032 elif vdt == HDR_CAND_GOOD and ok is False:
2033 noise ("PDT: %d → × %s object %d–%d, problematic payload"
2034 % (cand, vdts, off0, off0 + ctsize))
2035 elif vdt == HDR_CAND_FISHY and ok is False:
2036 noise ("PDT: %d → × %s object %d–%d, corrupt header, problematic "
2037 "ciphertext" % (cand, vdts, off0, off0 + ctsize))
2044 noise ("PDT: all headers ok")
2046 noise ("PDT: %d candidates not parseable as headers:" % len (junk))
2047 noise_output_candidates (junk)
2049 overlap = find_overlaps (slices)
2050 if len (overlap) > 0:
2051 noise ("PDT: %d objects overlapping others" % len (overlap))
2052 for slice in overlap:
2053 noise ("PDT: × %d→%d" % (slice [0], slice [1]))
2055 def usage (err=False):
2059 indent = ' ' * len (SELF)
2060 out ("usage: %s SUBCOMMAND { --help" % SELF)
2061 out (" %s | [ -v ] { -p PASSWORD | -k KEY }" % indent)
2062 out (" %s [ { -i | --in } { - | SOURCE } ]" % indent)
2063 out (" %s [ { -n | --nacl } { SALT } ]" % indent)
2064 out (" %s [ { -o | --out } { - | DESTINATION } ]" % indent)
2065 out (" %s [ -D | --no-decrypt ] [ -S | --split ]" % indent)
2066 out (" %s [ -f | --format ]" % indent)
2069 out ("\t\tSUBCOMMAND main mode: { process | scrypt }")
2071 out ("\t\t process: extract objects from PDT archive")
2072 out ("\t\t scrypt: calculate hash from password and first object")
2073 out ("\t\t-p PASSWORD password to derive the encryption key from")
2074 out ("\t\t-k KEY encryption key as 16 bytes in hexadecimal notation")
2075 out ("\t\t-s enforce strict handling of initialization vectors")
2076 out ("\t\t-i SOURCE file name to read from")
2077 out ("\t\t-o DESTINATION file to write output to")
2078 out ("\t\t-n SALT provide salt for scrypt mode in hex encoding")
2079 out ("\t\t-v print extra info")
2080 out ("\t\t-S split into files at object boundaries; this")
2081 out ("\t\t requires DESTINATION to refer to directory")
2082 out ("\t\t-D PDT header and ciphertext passthrough")
2083 out ("\t\t-f format of SCRYPT hash output (“default” or “parameters”)")
2085 out ("\tinstead of filenames, “-” may used to specify stdin / stdout")
2087 sys.exit ((err is True) and 42 or 0)
2097 def parse_argv (argv):
2098 global PDTCRYPT_OVERWRITE
2100 mode = PDTCRYPT_DECRYPT
2106 scrypt_format = PDTCRYPT_SCRYPT_DEFAULT
2109 SELF = os.path.basename (next (argvi))
2112 rawsubcmd = next (argvi)
2113 subcommand = PDTCRYPT_SUB [rawsubcmd]
2114 except StopIteration:
2115 bail ("ERROR: subcommand required")
2117 bail ("ERROR: invalid subcommand “%s” specified" % rawsubcmd)
2123 except StopIteration:
2124 bail ("ERROR: argument list incomplete")
2126 def checked_secret (s):
2131 bail ("ERROR: encountered “%s” but secret already given" % arg)
2134 if arg in [ "-h", "--help" ]:
2137 elif arg in [ "-v", "--verbose", "--wtf" ]:
2138 global PDTCRYPT_VERBOSE
2139 PDTCRYPT_VERBOSE = True
2140 elif arg in [ "-i", "--in", "--source" ]:
2141 insspec = checked_arg ()
2142 if PDTCRYPT_VERBOSE is True: noise ("PDT: decrypt from %s" % insspec)
2143 elif arg in [ "-p", "--password" ]:
2144 arg = checked_arg ()
2145 checked_secret (make_secret (password=arg))
2146 if PDTCRYPT_VERBOSE is True: noise ("PDT: decrypting with password")
2148 if subcommand == PDTCRYPT_SUB_PROCESS:
2149 if arg in [ "-s", "--strict-ivs" ]:
2150 global PDTCRYPT_STRICTIVS
2151 PDTCRYPT_STRICTIVS = True
2152 elif arg in [ "-o", "--out", "--dest", "--sink" ]:
2153 outsspec = checked_arg ()
2154 if PDTCRYPT_VERBOSE is True: noise ("PDT: decrypt to %s" % outsspec)
2155 elif arg in [ "-f", "--force" ]:
2156 PDTCRYPT_OVERWRITE = True
2157 if PDTCRYPT_VERBOSE is True: noise ("PDT: overwrite existing files")
2158 elif arg in [ "-S", "--split" ]:
2159 mode |= PDTCRYPT_SPLIT
2160 if PDTCRYPT_VERBOSE is True: noise ("PDT: split files")
2161 elif arg in [ "-D", "--no-decrypt" ]:
2162 mode &= ~PDTCRYPT_DECRYPT
2163 if PDTCRYPT_VERBOSE is True: noise ("PDT: not decrypting")
2164 elif arg in [ "-k", "--key" ]:
2165 arg = checked_arg ()
2166 checked_secret (make_secret (key=arg))
2167 if PDTCRYPT_VERBOSE is True: noise ("PDT: decrypting with key")
2169 bail ("ERROR: unexpected positional argument “%s”" % arg)
2170 elif subcommand == PDTCRYPT_SUB_SCRYPT:
2171 if arg in [ "-n", "--nacl", "--salt" ]:
2172 nacl = checked_arg ()
2173 if PDTCRYPT_VERBOSE is True: noise ("PDT: salt key with %s" % nacl)
2174 elif arg in [ "-f", "--format" ]:
2175 arg = checked_arg ()
2177 scrypt_format = PDTCRYPT_SCRYPT_FORMAT [arg]
2179 bail ("ERROR: invalid scrypt output format %s" % arg)
2180 if PDTCRYPT_VERBOSE is True:
2181 noise ("PDT: scrypt output format “%s”" % scrypt_format)
2183 bail ("ERROR: unexpected positional argument “%s”" % arg)
2184 elif subcommand == PDTCRYPT_SUB_SCAN:
2185 if arg in [ "-o", "--out", "--dest", "--sink" ]:
2186 outsspec = checked_arg ()
2187 if PDTCRYPT_VERBOSE is True: noise ("PDT: decrypt to %s" % outsspec)
2188 elif arg in [ "-f", "--force" ]:
2189 PDTCRYPT_OVERWRITE = True
2190 if PDTCRYPT_VERBOSE is True: noise ("PDT: overwrite existing files")
2192 bail ("ERROR: unexpected positional argument “%s”" % arg)
2195 if PDTCRYPT_VERBOSE is True:
2196 noise ("ERROR: no password or key specified, trying $PDTCRYPT_PASSWORD")
2197 epw = os.getenv ("PDTCRYPT_PASSWORD")
2199 checked_secret (make_secret (password=epw.strip ()))
2202 if PDTCRYPT_VERBOSE is True:
2203 noise ("ERROR: no password or key specified, trying $PDTCRYPT_KEY")
2204 ek = os.getenv ("PDTCRYPT_KEY")
2206 checked_secret (make_secret (key=ek.strip ()))
2209 if subcommand == PDTCRYPT_SUB_SCRYPT:
2210 bail ("ERROR: scrypt hash mode requested but no password given")
2211 elif mode & PDTCRYPT_DECRYPT:
2212 bail ("ERROR: decryption requested but no password given")
2214 if mode & PDTCRYPT_SPLIT and outsspec is None:
2215 bail ("ERROR: split mode is incompatible with stdout sink "
2218 if subcommand == PDTCRYPT_SUB_SCAN and outsspec is None:
2219 pass # no output by default in scan mode
2220 elif mode & PDTCRYPT_SPLIT or subcommand == PDTCRYPT_SUB_SCAN:
2221 # destination must be directory
2223 bail ("ERROR: mode is incompatible with stdout sink")
2226 os.makedirs (outsspec, 0o700)
2227 except FileExistsError:
2228 # if it’s a directory with appropriate perms, everything is
2229 # good; otherwise, below invocation of open(2) will fail
2231 outs = os.open (outsspec, os.O_DIRECTORY, 0o600)
2232 except FileNotFoundError as exn:
2233 bail ("ERROR: cannot create target directory “%s”" % outsspec)
2234 except NotADirectoryError as exn:
2235 bail ("ERROR: target path “%s” is not a directory" % outsspec)
2237 outs = deptdcrypt_mk_stream (PDTCRYPT_SINK, outsspec or "-")
2239 if subcommand == PDTCRYPT_SUB_SCAN:
2241 bail ("ERROR: please supply an input file for scanning")
2243 bail ("ERROR: input must be seekable; please specify a file")
2244 return True, partial (mode_scan, secret, insspec, outs, nacl=nacl)
2246 if subcommand == PDTCRYPT_SUB_SCRYPT:
2247 if secret [0] == PDTCRYPT_SECRET_KEY:
2248 bail ("ERROR: scrypt mode requires a password")
2249 if insspec is not None and nacl is not None \
2250 or insspec is None and nacl is None :
2251 bail ("ERROR: please supply either an input file or "
2256 if insspec is not None or subcommand != PDTCRYPT_SUB_SCRYPT:
2257 ins = deptdcrypt_mk_stream (PDTCRYPT_SOURCE, insspec or "-")
2259 if subcommand == PDTCRYPT_SUB_SCRYPT:
2260 return True, partial (mode_scrypt, secret [1].encode (), ins, nacl,
2263 return True, partial (mode_depdtcrypt, mode, secret, ins, outs)
2267 ok, runner = parse_argv (argv)
2269 if ok is True: return runner ()
2274 if __name__ == "__main__":
2275 sys.exit (main (sys.argv))