detect overlapping objects
[python-delta-tar] / deltatar / crypto.py
CommitLineData
00b3cd10
PG
1#!/usr/bin/env python3
2
3"""
83f2d71e 4Intra2net 2017
00b3cd10
PG
5
6===============================================================================
704ceaa5 7 crypto -- Encryption Layer for the Deltatar Backup
00b3cd10
PG
8===============================================================================
9
10Crypto stack:
11
12 - AES-GCM for the symmetric encryption;
13 - Scrypt as KDF.
14
15References:
16
17 - NIST Recommendation for Block Cipher Modes of Operation: Galois/Counter
18 Mode (GCM) and GMAC
19 http://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-38d.pdf
20
21 - AES-GCM v1:
22 https://cryptome.org/2014/01/aes-gcm-v1.pdf
23
24 - Authentication weaknesses in GCM
25 http://csrc.nist.gov/groups/ST/toolkit/BCM/documents/comments/CWC-GCM/Ferguson2.pdf
26
83f2d71e
PG
27Trouble with python-cryptography packages: authentication tags can only be
28passed in advance: https://github.com/pyca/cryptography/pull/3421
29
6d08915c
PG
30Errors
31-------------------------------------------------------------------------------
32
33Errors fall into roughly three categories:
34
704ceaa5 35 - Cryptographical errors or invalid data.
6d08915c
PG
36
37 - ``InvalidGCMTag`` (decryption failed on account of an invalid GCM
38 tag),
39 - ``InvalidIVFixedPart`` (IV fixed part of object not found in list),
f6cd676f 40 - ``DuplicateIV`` (the IV of an encrypted object already occurred),
704ceaa5
PG
41 - ``DecryptionError`` (used in CLI decryption for presenting error
42 conditions to the user).
6d08915c
PG
43
44 - Incorrect usage of the library.
45
46 - ``InvalidParameter`` (non-conforming user supplied parameter),
47 - ``InvalidHeader`` (data passed for reading not parsable into header),
48 - ``FormatError`` (cannot handle header or parameter version),
49 - ``RuntimeError``.
50
51 - Bad internal state. If one of these is encountered it means that a state
52 was reached that shouldn’t occur during normal processing.
53
54 - ``InternalError``,
55 - ``Unreachable``.
56
57Also, ``EndOfFile`` is used as a sentinel to communicate that a stream supplied
58for reading is exhausted.
59
f6cd676f
PG
60Initialization Vectors
61-------------------------------------------------------------------------------
62
63Initialization vectors are checked reuse during the lifetime of a decryptor.
704ceaa5
PG
64The fixed counters for metadata files cannot be reused and attempts to do so
65will cause a DuplicateIV error. This means the length of objects encrypted with
66a metadata counter is capped at 63 GB.
67
68For ordinary, non-metadata payload, there is an optional mode with strict IV
69checking that causes a crypto context to fail if an IV encountered or created
70was already used for decrypting or encrypting, respectively, an earlier object.
71Note that this mode can trigger false positives when decrypting non-linearly,
72e. g. when traversing the same object multiple times. Since the crypto context
73has no notion of a position in a PDT encrypted archive, this condition must be
74sorted out downstream.
75
76Command Line Utility
77-------------------------------------------------------------------------------
78
79``crypto.py`` may be invoked as a script for decrypting, validating, and
80splitting PDT encrypted files. Consult the usage message for details.
81
82Usage examples:
83
84Decrypt from stdin using the password ‘foo’: ::
85
86 $ crypto.py process foo -i - -o - <some-file.tar.gz.pdtcrypt >some-file.tar.gz
87
88Output verbose information about the encrypted objects in the archive: ::
89
90 $ crypto.py process foo -v -i some-file.tar.gz.pdtcrypt -o /dev/null
91 PDT: decrypt from some-file.tar.gz.pdtcrypt
92 PDT: decrypt to /dev/null
93 PDT: source: file some-file.tar.gz.pdtcrypt
94 PDT: sink: file /dev/null
95 PDT: 0 hdr
96 PDT: · version = 1 : 0100
97 PDT: · paramversion = 1 : 0100
98 PDT: · nacl : d270 b031 00d1 87e2 c946 610d 7b7f 7e5f
99 PDT: · iv : 02ee 3dd7 a963 1eb1 0100 0000
100 PDT: · ctsize = 591 : 4f02 0000 0000 0000
101 PDT: · tag : 5b2d 6d8b 8f82 4842 12fd 0b10 b6e3 369b
102 PDT: 64 decrypt obj no. 1, 591 B
103 PDT: · [64] 0% done, read block (591 B of 591 B remaining)
104 PDT: · decrypt ciphertext 591 B
105 PDT: · decrypt plaintext 591 B
106 PDT: 655 finalize
107
108
109Also, the mode *scrypt* allows deriving encryption keys. To calculate the
110encryption key from the password ‘foo’ and the salt of the first object in a
111PDT encrypted file: ::
112
113 $ crypto.py scrypt foo -i some-file.pdtcrypt
4f6405d6 114 {"paramversion": 1, "salt": "Cqzbk48e3peEjzWto8D0yA==", "key": "JH9EkMwaM4x9F5aim5gK/Q=="}
704ceaa5
PG
115
116The computed 16 byte key is given in hexadecimal notation in the value to
117``hash`` and can be fed into Python’s ``binascii.unhexlify()`` to obtain the
118corresponding binary representation.
119
120Note that in Scrypt hashing mode, no data integrity checks are being performed.
121If the wrong password is given, a wrong key will be derived. Whether the password
122was indeed correct can only be determined by decrypting. Note that since PDT
123archives essentially consist of a stream of independent objects, the salt and
124other parameters may change. Thus a key derived using above method from the
125first object doesn’t necessarily apply to any of the subsequent objects.
f6cd676f 126
00b3cd10
PG
127"""
128
7b3940e5 129import base64
00b3cd10 130import binascii
50710d86 131import bisect
00b3cd10
PG
132import ctypes
133import io
c46c8670 134from functools import reduce, partial
f41973a6 135import mmap
00b3cd10
PG
136import os
137import struct
a808459e 138import stat
00b3cd10
PG
139import sys
140import time
da82bc58 141import types
00b3cd10
PG
142try:
143 import enum34
144except ImportError as exn:
145 pass
146
6257d5b3 147if __name__ == "__main__": ## Work around the import mechanism lest Python’s
00b3cd10
PG
148 pwd = os.getcwd() ## preference for local imports causes a cyclical
149 ## import (crypto → pylibscrypt → […] → ./tarfile → crypto).
150 sys.path = [ p for p in sys.path if p.find ("deltatar") < 0 ]
151
152import pylibscrypt
153from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes
154from cryptography.hazmat.backends import default_backend
15d3eefd 155import cryptography
00b3cd10
PG
156
157
a64085a8 158__all__ = [ "hdr_make", "hdr_read", "hdr_fmt", "hdr_fmt_pretty"
b360b772 159 , "scrypt_hashfile"
3031b7ae
PG
160 , "PDTCRYPT_HDR_SIZE", "AES_GCM_IV_CNT_DATA"
161 , "AES_GCM_IV_CNT_INFOFILE", "AES_GCM_IV_CNT_INDEX"
2d6fd8c8 162 ]
00b3cd10 163
a393d9cb
PG
164
165###############################################################################
15d3eefd
PG
166## exceptions
167###############################################################################
168
169class EndOfFile (Exception):
170 """Reached EOF."""
ae3d0f2a
PG
171 remainder = 0
172 msg = 0
8a8ac469 173 def __init__ (self, n=None, msg=None):
5d394c0d
PG
174 if n is not None:
175 self.remainder = n
176 self.msg = msg
15d3eefd 177
b0078f26 178
b12110dd
PG
179class InvalidParameter (Exception):
180 """Inputs not valid for PDT encryption."""
181 pass
182
b0078f26 183
15d3eefd
PG
184class InvalidHeader (Exception):
185 """Header not valid."""
186 pass
187
b0078f26
PG
188
189class InvalidGCMTag (Exception):
190 """
191 The GCM tag calculated during decryption differs from that in the object
192 header.
193 """
194 pass
195
196
26b42ad4 197class InvalidIVFixedPart (Exception):
89ec6e2f
PG
198 """
199 IV fixed part not in supplied list: either the backup is corrupt or the
200 current object does not belong to it.
201 """
26b42ad4
PG
202 pass
203
b0078f26 204
be124bca 205class IVFixedPartError (Exception):
89ec6e2f
PG
206 """
207 Error creating a unique IV fixed part: repeated calls to system RNG yielded
208 the same sequence of bytes as the last IV used.
209 """
be124bca
PG
210 pass
211
212
fac2cfe1 213class InvalidFileCounter (Exception):
89ec6e2f
PG
214 """
215 When encrypting, an attempted reuse of a dedicated counter (info file,
216 index file) was caught.
217 """
fac2cfe1
PG
218 pass
219
220
ee6aa239 221class DuplicateIV (Exception):
89ec6e2f
PG
222 """
223 During encryption, the current IV fixed part is identical to an already
224 existing IV (same prefix and file counter). This indicates tampering or
225 programmer error and cannot be recovered from.
226 """
ee6aa239
PG
227 pass
228
229
230class NonConsecutiveIV (Exception):
89ec6e2f
PG
231 """
232 IVs not numbered consecutively. This is a hard error with strict IV
233 checking. Precludes random access to the encrypted objects.
234 """
ee6aa239
PG
235 pass
236
237
b12110dd
PG
238class FormatError (Exception):
239 """Unusable parameters in header."""
240 pass
241
b0078f26 242
15d3eefd 243class DecryptionError (Exception):
89ec6e2f 244 """Error during decryption with ``crypto.py`` on the command line."""
15d3eefd
PG
245 pass
246
b0078f26 247
70ad9458 248class Unreachable (Exception):
89ec6e2f
PG
249 """
250 Makeshift __builtin_unreachable(); always a programmer error if
251 thrown.
252 """
70ad9458
PG
253 pass
254
b0078f26 255
b12110dd
PG
256class InternalError (Exception):
257 """Errors not ascribable to bad user inputs or cryptography."""
258 pass
259
15d3eefd
PG
260
261###############################################################################
a393d9cb
PG
262## crypto layer version
263###############################################################################
264
265ENCRYPTION_PARAMETERS = \
c46c8670 266 { 0: \
dd23cbc9
PG
267 { "kdf": ("dummy", 16)
268 , "enc": "passthrough" }
c46c8670 269 , 1: \
dd23cbc9
PG
270 { "kdf": ( "scrypt"
271 , { "dkLen" : 16
272 , "N" : 1 << 16
273 , "r" : 8
274 , "p" : 1
275 , "NaCl_LEN" : 16 })
276 , "enc": "aes-gcm" } }
a393d9cb 277
00b3cd10
PG
278###############################################################################
279## constants
280###############################################################################
281
dd47d6a2 282PDTCRYPT_HDR_MAGIC = b"PDTCRYPT"
00b3cd10 283
dd47d6a2
PG
284PDTCRYPT_HDR_SIZE_MAGIC = 8 # 8
285PDTCRYPT_HDR_SIZE_VERSION = 2 # 10
286PDTCRYPT_HDR_SIZE_PARAMVERSION = 2 # 12
287PDTCRYPT_HDR_SIZE_NACL = 16 # 28
288PDTCRYPT_HDR_SIZE_IV = 12 # 40
289PDTCRYPT_HDR_SIZE_CTSIZE = 8 # 48
290PDTCRYPT_HDR_SIZE_TAG = 16 # 64 GCM auth tag
00b3cd10 291
dd47d6a2
PG
292PDTCRYPT_HDR_SIZE = PDTCRYPT_HDR_SIZE_MAGIC + PDTCRYPT_HDR_SIZE_VERSION \
293 + PDTCRYPT_HDR_SIZE_PARAMVERSION + PDTCRYPT_HDR_SIZE_NACL \
294 + PDTCRYPT_HDR_SIZE_IV + PDTCRYPT_HDR_SIZE_CTSIZE \
295 + PDTCRYPT_HDR_SIZE_TAG # = 64
00b3cd10
PG
296
297# precalculate offsets since Python can’t do constant folding over names
dd47d6a2
PG
298HDR_OFF_VERSION = PDTCRYPT_HDR_SIZE_MAGIC
299HDR_OFF_PARAMVERSION = HDR_OFF_VERSION + PDTCRYPT_HDR_SIZE_VERSION
300HDR_OFF_NACL = HDR_OFF_PARAMVERSION + PDTCRYPT_HDR_SIZE_PARAMVERSION
301HDR_OFF_IV = HDR_OFF_NACL + PDTCRYPT_HDR_SIZE_NACL
302HDR_OFF_CTSIZE = HDR_OFF_IV + PDTCRYPT_HDR_SIZE_IV
303HDR_OFF_TAG = HDR_OFF_CTSIZE + PDTCRYPT_HDR_SIZE_CTSIZE
00b3cd10
PG
304
305FMT_UINT16_LE = "<H"
306FMT_UINT64_LE = "<Q"
50710d86 307FMT_I2N_IV = "<8sL" # 8 random bytes ‖ 32 bit counter
83f2d71e
PG
308FMT_I2N_HDR = ("<" # host byte order
309 "8s" # magic
310 "H" # version
311 "H" # paramversion
312 "16s" # sodium chloride
313 "12s" # iv
3b53fb98
PG
314 "Q" # size
315 "16s") # GCM tag
00b3cd10
PG
316
317# aes+gcm
addcec42
PG
318AES_KEY_SIZE = 16 # b"0123456789abcdef"
319AES_KEY_SIZE_B64 = 24 # b'MDEyMzQ1Njc4OWFiY2RlZg=='
cb7a3911
PG
320AES_GCM_MAX_SIZE = (1 << 36) - (1 << 5) # 2^39 - 2^8 b ≅ 64 GB
321PDTCRYPT_MAX_OBJ_SIZE_DEFAULT = 63 * (1 << 30) # 63 GB
322PDTCRYPT_MAX_OBJ_SIZE = PDTCRYPT_MAX_OBJ_SIZE_DEFAULT
00b3cd10 323
3031b7ae
PG
324# index and info files are written on-the fly while encrypting so their
325# counters must be available inadvance
cb7a3911
PG
326AES_GCM_IV_CNT_INFOFILE = 1 # constant
327AES_GCM_IV_CNT_INDEX = AES_GCM_IV_CNT_INFOFILE + 1
328AES_GCM_IV_CNT_DATA = AES_GCM_IV_CNT_INDEX + 1 # also for multivolume
329AES_GCM_IV_CNT_MAX_DEFAULT = 0xffFFffFF
330AES_GCM_IV_CNT_MAX = AES_GCM_IV_CNT_MAX_DEFAULT
2d6fd8c8 331
be124bca
PG
332# IV structure and generation
333PDTCRYPT_IV_GEN_MAX_RETRIES = 10 # ×
334PDTCRYPT_IV_FIXEDPART_SIZE = 8 # B
335PDTCRYPT_IV_COUNTER_SIZE = 4 # B
39accaaa 336
addcec42
PG
337# secret type: PW of string | KEY of char [16]
338PDTCRYPT_SECRET_PW = 0
339PDTCRYPT_SECRET_KEY = 1
340
00b3cd10 341###############################################################################
39accaaa 342## header, trailer
00b3cd10
PG
343###############################################################################
344#
345# Interface:
346#
347# struct hdrinfo
348# { version : u16
349# , paramversion : u16
350# , nacl : [u8; 16]
351# , iv : [u8; 12]
704ceaa5
PG
352# , ctsize : usize
353# , tag : [u8; 16] }
83f2d71e 354#
00b3cd10 355# fn hdr_read (f : handle) -> hdrinfo;
c2d1c3ec 356# fn hdr_make (f : handle, h : hdrinfo) -> IOResult<usize>;
00b3cd10
PG
357# fn hdr_fmt (h : hdrinfo) -> String;
358#
359
83f2d71e 360def hdr_read (data):
704ceaa5
PG
361 """
362 Read bytes as header structure.
363
364 If the input could not be interpreted as a header, fail with
365 ``InvalidHeader``.
366 """
83f2d71e 367
00b3cd10 368 try:
3b53fb98 369 mag, version, paramversion, nacl, iv, ctsize, tag = \
83f2d71e
PG
370 struct.unpack (FMT_I2N_HDR, data)
371 except Exception as exn:
15d3eefd
PG
372 raise InvalidHeader ("error unpacking header from [%r]: %s"
373 % (binascii.hexlify (data), str (exn)))
00b3cd10 374
dd47d6a2 375 if mag != PDTCRYPT_HDR_MAGIC:
15d3eefd 376 raise InvalidHeader ("bad magic in header: expected [%s], got [%s]"
dd47d6a2 377 % (PDTCRYPT_HDR_MAGIC, mag))
00b3cd10 378
15d3eefd 379 return \
00b3cd10
PG
380 { "version" : version
381 , "paramversion" : paramversion
382 , "nacl" : nacl
383 , "iv" : iv
384 , "ctsize" : ctsize
3b53fb98 385 , "tag" : tag
00b3cd10
PG
386 }
387
388
39accaaa 389def hdr_read_stream (instr):
704ceaa5
PG
390 """
391 Read header from stream at the current position.
392
393 Fail with ``InvalidHeader`` if insufficient bytes were read from the
394 stream, or if the content could not be interpreted as a header.
395 """
dd47d6a2 396 data = instr.read(PDTCRYPT_HDR_SIZE)
ae3d0f2a 397 ldata = len (data)
8a8ac469
PG
398 if ldata == 0:
399 raise EndOfFile
400 elif ldata != PDTCRYPT_HDR_SIZE:
401 raise InvalidHeader ("hdr_read_stream: expected %d B, received %d B"
402 % (PDTCRYPT_HDR_SIZE, ldata))
47e27926 403 return hdr_read (data)
39accaaa
PG
404
405
3b53fb98 406def hdr_from_params (version, paramversion, nacl, iv, ctsize, tag):
704ceaa5
PG
407 """
408 Assemble the necessary values into a PDTCRYPT header.
409
410 :type version: int to fit uint16_t
411 :type paramversion: int to fit uint16_t
412 :type nacl: bytes to fit uint8_t[16]
413 :type iv: bytes to fit uint8_t[12]
414 :type size: int to fit uint64_t
415 :type tag: bytes to fit uint8_t[16]
416 """
dd47d6a2 417 buf = bytearray (PDTCRYPT_HDR_SIZE)
83f2d71e 418 bufv = memoryview (buf)
00b3cd10 419
00b3cd10 420 try:
83f2d71e 421 struct.pack_into (FMT_I2N_HDR, bufv, 0,
dd47d6a2 422 PDTCRYPT_HDR_MAGIC,
3b53fb98 423 version, paramversion, nacl, iv, ctsize, tag)
83f2d71e 424 except Exception as exn:
a83fa4ed 425 return False, "error assembling header: %s" % str (exn)
00b3cd10 426
83f2d71e 427 return True, bytes (buf)
00b3cd10 428
00b3cd10 429
8a990744
PG
430def hdr_make_dummy (s):
431 """
432 Create a header sized block of bytes initialized to a value derived from a
433 string. Used to verify we’ve jumped back correctly to the actual position
434 of the object header.
435 """
436 c = reduce (lambda a, c: a + ord(c), s, 0) % 0xFF
dd47d6a2 437 return bytes (bytearray (struct.pack ("B", c)) * PDTCRYPT_HDR_SIZE)
8a990744
PG
438
439
a393d9cb 440def hdr_make (hdr):
704ceaa5
PG
441 """
442 Assemble a header from the given header structure.
443 """
a393d9cb
PG
444 return hdr_from_params (version=hdr.get("version"),
445 paramversion=hdr.get("paramversion"),
446 nacl=hdr.get("nacl"), iv=hdr.get("iv"),
3b53fb98 447 ctsize=hdr.get("ctsize"), tag=hdr.get("tag"))
a393d9cb
PG
448
449
83f2d71e 450HDR_FMT = "I2n_header { version: %d, paramversion: %d, nacl: %s[%d]," \
89131745 451 " iv: %s[%d], ctsize: %d, tag: %s[%d] }"
00b3cd10 452
83f2d71e 453def hdr_fmt (h):
704ceaa5 454 """Format a header structure into readable output."""
83f2d71e
PG
455 return HDR_FMT % (h["version"], h["paramversion"],
456 binascii.hexlify (h["nacl"]), len(h["nacl"]),
457 binascii.hexlify (h["iv"]), len(h["iv"]),
db1f3ac7
PG
458 h["ctsize"],
459 binascii.hexlify (h["tag"]), len(h["tag"]))
00b3cd10 460
00b3cd10 461
83f2d71e 462def hex_spaced_of_bytes (b):
704ceaa5 463 """Format bytes object, hexdump style."""
83f2d71e
PG
464 return " ".join ([ "%.2x%.2x" % (c1, c2)
465 for c1, c2 in zip (b[0::2], b[1::2]) ]) \
466 + (len (b) | 1 == len (b) and " %.2x" % b[-1] or "") # odd lengths
00b3cd10 467
591a722f 468
3031b7ae
PG
469def hdr_iv_counter (h):
470 """Extract the variable part of the IV of the given header."""
471 _fixed, cnt = struct.unpack (FMT_I2N_IV, h ["iv"])
472 return cnt
473
474
475def hdr_iv_fixed (h):
476 """Extract the fixed part of the IV of the given header."""
477 fixed, _cnt = struct.unpack (FMT_I2N_IV, h ["iv"])
478 return fixed
479
480
83f2d71e 481hdr_dump = hex_spaced_of_bytes
00b3cd10 482
00b3cd10 483
15d3eefd
PG
484HDR_FMT_PRETTY = \
485"""version = %-4d : %s
486paramversion = %-4d : %s
487nacl : %s
488iv : %s
489ctsize = %-20d : %s
490tag : %s
83f2d71e 491"""
00b3cd10 492
83f2d71e 493def hdr_fmt_pretty (h):
704ceaa5
PG
494 """
495 Format header structure into multi-line representation of its contents and
496 their raw representation. (Omit the implicit “PDTCRYPT” magic bytes that
497 precede every header.)
498 """
83f2d71e
PG
499 return HDR_FMT_PRETTY \
500 % (h["version"],
501 hex_spaced_of_bytes (struct.pack (FMT_UINT16_LE, h["version"])),
502 h["paramversion"],
503 hex_spaced_of_bytes (struct.pack (FMT_UINT16_LE, h["paramversion"])),
504 hex_spaced_of_bytes (h["nacl"]),
505 hex_spaced_of_bytes (h["iv"]),
506 h["ctsize"],
15d3eefd
PG
507 hex_spaced_of_bytes (struct.pack (FMT_UINT64_LE, h["ctsize"])),
508 hex_spaced_of_bytes (h["tag"]))
00b3cd10 509
f6cd676f
PG
510IV_FMT = "((f %s) (c %d))"
511
512def iv_fmt (iv):
704ceaa5 513 """Format the two components of an IV in a readable fashion."""
f6cd676f
PG
514 fixed, cnt = struct.unpack (FMT_I2N_IV, iv)
515 return IV_FMT % (binascii.hexlify (fixed), cnt)
516
00b3cd10 517
00b3cd10 518###############################################################################
f41973a6
PG
519## restoration
520###############################################################################
521
522class Location (object):
523 n = 0
524 offset = 0
525
526def restore_loc_fmt (loc):
527 return "%d off:%d" \
528 % (loc.n, loc.offset)
529
530def locate_hdr_candidates (fd):
531 """
532 Walk over instances of the magic string in the payload, collecting their
533 positions. If the offset of the first found instance is not zero, the file
534 begins with leading garbage.
535
536 :return: The list of offsets in the file.
537 """
538 cands = []
539
540 mm = mmap.mmap(fd, 0, mmap.MAP_SHARED, mmap.PROT_READ)
541 pos = 0
542 while True:
543 pos = mm.find (PDTCRYPT_HDR_MAGIC, pos)
544 if pos == -1:
545 break
546 cands.append (pos)
547 pos += 1
548
549 return cands
550
551
6c8073ab
PG
552HDR_CAND_GOOD = 0 # header marks begin of valid object
553HDR_CAND_FISHY = 1 # inconclusive (tag mismatch, obj overlap etc.)
554HDR_CAND_JUNK = 2 # not a header / object unreadable
555
556
557def inspect_hdr (fd, off):
558 """
559 Attempt to parse a header in *fd* at position *off*.
560
561 Returns a verdict about the quality of that header plus the parsed header
562 when readable.
563 """
564
565 _ = os.lseek (fd, off, os.SEEK_SET)
566
567 if os.lseek (fd, 0, os.SEEK_CUR) != off:
568 if PDTCRYPT_VERBOSE is True:
569 noise ("PDT: %d → dismissed (lseek() past EOF)" % off)
570 return HDR_CAND_JUNK, None
571
572 raw = os.read (fd, PDTCRYPT_HDR_SIZE)
573 if len (raw) != PDTCRYPT_HDR_SIZE:
574 if PDTCRYPT_VERBOSE is True:
575 noise ("PDT: %d → dismissed (EOF inside header)" % off)
576 return HDR_CAND_JUNK, None
577
578 try:
579 hdr = hdr_read (raw)
580 except InvalidHeader as exn:
581 if PDTCRYPT_VERBOSE is True:
582 noise ("PDT: %d → dismissed (invalid: [%s])" % (off, str (exn)))
583 return HDR_CAND_JUNK, None
584
585 obj0 = off + PDTCRYPT_HDR_SIZE
586 objX = obj0 + hdr ["ctsize"]
587
588 eof = os.lseek (fd, 0, os.SEEK_END)
589 if eof < objX:
590 if PDTCRYPT_VERBOSE is True:
591 noise ("PDT: %d → EOF inside object (%d≤%d≤%d); adjusting size to "
592 "%d" % (off, obj0, eof, objX, (eof - obj0)))
593 # try reading up to the end
594 hdr ["ctsize"] = eof - obj0
595 return HDR_CAND_FISHY, hdr
596
597 return HDR_CAND_GOOD, hdr
598
599
a808459e 600def try_decrypt (ifd, off, hdr, secret, ofd=-1):
6c8073ab 601 """
a808459e
PG
602 Attempt to decrypt the object in the (seekable) descriptor *ifd* starting
603 at *off* using the metadata in *hdr* and *secret*. An output fd can be
604 specified with *ofd*; if it is *-1* – the default –, the decrypted payload
605 will be discarded.
70a33834
PG
606
607 Always creates a fresh decryptor, so validation steps across objects don’t
608 apply.
202104ed
PG
609
610 Errors during GCM tag validation are ignored.
6c8073ab 611 """
70a33834
PG
612 ctleft = hdr ["ctsize"]
613 pos = off
614
615 ks = secret [0]
616 if ks == PDTCRYPT_SECRET_PW:
617 decr = Decrypt (password=secret [1])
618 elif ks == PDTCRYPT_SECRET_KEY:
6257d5b3 619 key = secret [1]
70a33834
PG
620 decr = Decrypt (key=key)
621 else:
622 raise RuntimeError
623
70a33834
PG
624 decr.next (hdr)
625
626 try:
a808459e 627 os.lseek (ifd, pos, os.SEEK_SET)
70a33834
PG
628 while ctleft > 0:
629 cnksiz = min (ctleft, PDTCRYPT_BLOCKSIZE)
a808459e 630 cnk = os.read (ifd, cnksiz)
70a33834
PG
631 ctleft -= cnksiz
632 pos += cnksiz
a808459e
PG
633 pt = decr.process (cnk)
634 if ofd != -1:
635 os.write (ofd, pt)
202104ed
PG
636 try:
637 pt = decr.done ()
638 except InvalidGCMTag:
639 noise ("PDT: GCM tag mismatch for object %d–%d"
640 % (off, off + hdr ["ctsize"]))
a808459e
PG
641 if len (pt) > 0 and ofd != -1:
642 os.write (ofd, pt)
70a33834 643
70a33834
PG
644 except Exception as exn:
645 noise ("PDT: error decrypting object %d–%d@%d, %d B remaining [%s]"
646 % (off, off + hdr ["ctsize"], pos, ctleft, exn))
647 raise
6c8073ab 648
70a33834 649 return pos - off
6c8073ab
PG
650
651
6690f5e0
PG
652def readable_objects_offsets (ifd, secret, cands):
653 """
654 From a list of candidates, locate the ones that mark the start of actual
655 readable PDTCRYPT objects.
656 """
657 good = []
658 nobj = 0
659 for cand in cands:
660 nobj += 1
661 vdt, hdr = inspect_hdr (ifd, cand)
662 if vdt == HDR_CAND_JUNK:
663 pass # ignore unreadable ones
664 elif vdt in [HDR_CAND_GOOD, HDR_CAND_FISHY]:
665 off0 = cand + PDTCRYPT_HDR_SIZE
666 ok = try_decrypt (ifd, off0, hdr, secret) == hdr ["ctsize"]
667 if ok is True:
668 good.append (cand)
669 return good
670
671
672def reconstruct_offsets (fname, secret):
673 ifd = os.open (fname, os.O_RDONLY)
674
675 try:
676 cands = locate_hdr_candidates (ifd)
677 return readable_objects_offsets (ifd, secret, cands)
678 finally:
679 os.close (ifd)
680
681
f41973a6 682###############################################################################
addcec42
PG
683## helpers
684###############################################################################
685
686def make_secret (password=None, key=None):
687 """
688 Safely create a “secret” value that consists either of a key or a password.
689 Inputs are validated: the password is accepted as (UTF-8 encoded) bytes or
690 string; for the key only a bytes object of the proper size or a base64
691 encoded string thereof is accepted.
692
693 If both are provided, the key is preferred over the password; no checks are
694 performed whether the key is derived from the password.
695
696 :returns: secret value if inputs were acceptable | None otherwise.
697 """
698 if key is not None:
699 if isinstance (key, str) is True:
700 key = key.encode ("utf-8")
701 if isinstance (key, bytes) is True:
702 if len (key) == AES_KEY_SIZE:
703 return (PDTCRYPT_SECRET_KEY, key)
6257d5b3
PG
704 if len (key) == AES_KEY_SIZE * 2:
705 try:
706 key = binascii.unhexlify (key)
707 return (PDTCRYPT_SECRET_KEY, key)
708 except binascii.Error: # garbage in string
709 pass
addcec42
PG
710 if len (key) == AES_KEY_SIZE_B64:
711 try:
712 key = base64.b64decode (key)
713 # the base64 processor is very tolerant and allows for
6257d5b3 714 # arbitrary trailing and leading data thus the data obtained
addcec42
PG
715 # must be checked for the proper length
716 if len (key) == AES_KEY_SIZE:
717 return (PDTCRYPT_SECRET_KEY, key)
718 except binascii.Error: # “incorrect padding”
719 pass
720 elif password is not None:
721 if isinstance (password, str) is True:
722 return (PDTCRYPT_SECRET_PW, password)
723 elif isinstance (password, bytes) is True:
724 try:
725 password = password.decode ("utf-8")
726 return (PDTCRYPT_SECRET_PW, password)
727 except UnicodeDecodeError:
728 pass
729
730 return None
731
732
733###############################################################################
6178061e
PG
734## passthrough / null encryption
735###############################################################################
736
737class PassthroughCipher (object):
738
739 tag = struct.pack ("<QQ", 0, 0)
740
741 def __init__ (self) : pass
742
743 def update (self, b) : return b
744
50710d86 745 def finalize (self) : return b""
6178061e
PG
746
747 def finalize_with_tag (self, _) : return b""
748
749###############################################################################
a393d9cb 750## convenience wrapper
00b3cd10
PG
751###############################################################################
752
c46c8670
PG
753
754def kdf_dummy (klen, password, _nacl):
704ceaa5
PG
755 """
756 Fake KDF for testing purposes that is called when parameter version zero is
757 encountered.
758 """
c46c8670
PG
759 q, r = divmod (klen, len (password))
760 if isinstance (password, bytes) is False:
761 password = password.encode ()
762 return password * q + password [:r], b""
763
764
765SCRYPT_KEY_MEMO = { } # static because needed for both the info file and the archive
766
767
768def kdf_scrypt (params, password, nacl):
704ceaa5
PG
769 """
770 Wrapper for the Scrypt KDF, corresponds to parameter version one. The
771 computation result is memoized based on the inputs to facilitate spawning
772 multiple encryption contexts.
773 """
c46c8670
PG
774 N = params["N"]
775 r = params["r"]
776 p = params["p"]
777 dkLen = params["dkLen"]
778
779 if nacl is None:
780 nacl = os.urandom (params["NaCl_LEN"])
781
782 key_parms = (password, nacl, N, r, p, dkLen)
783 global SCRYPT_KEY_MEMO
784 if key_parms not in SCRYPT_KEY_MEMO:
785 SCRYPT_KEY_MEMO [key_parms] = \
786 pylibscrypt.scrypt (password, nacl, N, r, p, dkLen)
787 return SCRYPT_KEY_MEMO [key_parms], nacl
a64085a8
PG
788
789
da82bc58 790def kdf_by_version (paramversion=None, defs=None):
704ceaa5
PG
791 """
792 Pick the KDF handler corresponding to the parameter version or the
793 definition set.
794
795 :rtype: function (password : str, nacl : str) -> str
796 """
da82bc58
PG
797 if paramversion is not None:
798 defs = ENCRYPTION_PARAMETERS.get(paramversion, None)
a64085a8 799 if defs is None:
1ed44e7b
PG
800 raise InvalidParameter ("no encryption parameters for version %r"
801 % paramversion)
a64085a8 802 (kdf, params) = defs["kdf"]
c46c8670
PG
803 fn = None
804 if kdf == "scrypt" : fn = kdf_scrypt
805 if kdf == "dummy" : fn = kdf_dummy
806 if fn is None:
a64085a8 807 raise ValueError ("key derivation method %r unknown" % kdf)
c46c8670 808 return partial (fn, params)
a64085a8
PG
809
810
b360b772
PG
811###############################################################################
812## SCRYPT hashing
813###############################################################################
814
815def scrypt_hashsource (pw, ins):
816 """
817 Calculate the SCRYPT hash from the password and the information contained
818 in the first header found in ``ins``.
819
820 This does not validate whether the first object is encrypted correctly.
821 """
c1ecc2e2
PG
822 if isinstance (pw, str) is True:
823 pw = str.encode (pw)
824 elif isinstance (pw, bytes) is False:
825 raise InvalidParameter ("password must be a string, not %s"
826 % type (password))
827 if isinstance (ins, io.BufferedReader) is False and \
828 isinstance (ins, io.FileIO) is False:
829 raise InvalidParameter ("file to hash must be opened in “binary” mode")
b360b772
PG
830 hdr = None
831 try:
832 hdr = hdr_read_stream (ins)
833 except EndOfFile as exn:
834 noise ("PDT: malformed input: end of file reading first object header")
835 noise ("PDT:")
836 return 1
837
838 nacl = hdr ["nacl"]
839 pver = hdr ["paramversion"]
840 if PDTCRYPT_VERBOSE is True:
841 noise ("PDT: salt of first object : %s" % binascii.hexlify (nacl))
842 noise ("PDT: parameter version of archive : %d" % pver)
843
844 try:
845 defs = ENCRYPTION_PARAMETERS.get(pver, None)
846 kdfname, params = defs ["kdf"]
847 if kdfname != "scrypt":
848 noise ("PDT: input is not an SCRYPT archive")
849 noise ("")
850 return 1
851 kdf = kdf_by_version (None, defs)
852 except ValueError as exn:
853 noise ("PDT: object has unknown parameter version %d" % pver)
854
855 hsh, _void = kdf (pw, nacl)
856
c1ecc2e2 857 return hsh, nacl, hdr ["version"], pver
b360b772
PG
858
859
860def scrypt_hashfile (pw, fname):
704ceaa5
PG
861 """
862 Calculate the SCRYPT hash from the password and the information contained
863 in the first header found in the given file. The header is read only at
864 offset zero.
865 """
b360b772 866 with deptdcrypt_mk_stream (PDTCRYPT_SOURCE, fname or "-") as ins:
c1ecc2e2 867 hsh, _void, _void, _void = scrypt_hashsource (pw, ins)
b360b772
PG
868 return hsh
869
870
871###############################################################################
872## AES-GCM context
873###############################################################################
874
a393d9cb
PG
875class Crypto (object):
876 """
877 Encryption context to remain alive throughout an entire tarfile pass.
878 """
6178061e 879 enc = None
a393d9cb
PG
880 nacl = None
881 key = None
50710d86
PG
882 cnt = None # file counter (uint32_t != 0)
883 iv = None # current IV
30019abf
PG
884 fixed = None # accu for 64 bit fixed parts of IV
885 used_ivs = None # tracks IVs
886 strict_ivs = False # if True, panic on duplicate object IV
48db09ba
PG
887 password = None
888 paramversion = None
633b18a9
PG
889 stats = { "in" : 0
890 , "out" : 0
891 , "obj" : 0 }
fa47412e 892
fa47412e
PG
893 ctsize = -1
894 ptsize = -1
3031b7ae
PG
895 info_counter_used = False
896 index_counter_used = False
a393d9cb 897
a64085a8 898 def __init__ (self, *al, **akv):
30019abf 899 self.used_ivs = set ()
a64085a8 900 self.set_parameters (*al, **akv)
39accaaa
PG
901
902
704ceaa5 903 def next_fixed (self):
be124bca 904 # NOP for decryption
50710d86
PG
905 pass
906
907
908 def set_object_counter (self, cnt=None):
704ceaa5
PG
909 """
910 Safely set the internal counter of encrypted objects. Numerous
911 constraints apply:
912
913 The same counter may not be reused in combination with one IV fixed
914 part. This is validated elsewhere in the IV handling.
915
916 Counter zero is invalid. The first two counters are reserved for
917 metadata. The implementation does not allow for splitting metadata
918 files over multiple encrypted objects. (This would be possible by
919 assigning new fixed parts.) Thus in a Deltatar backup there is at most
920 one object with a counter value of one and two. On creation of a
921 context, the initial counter may be chosen. The globals
922 ``AES_GCM_IV_CNT_INFOFILE`` and ``AES_GCM_IV_CNT_INDEX`` can be used to
923 request one of the reserved values. If one of these values has been
924 used, any further attempt of setting the counter to that value will
925 be rejected with an ``InvalidFileCounter`` exception.
926
927 Out of bounds values (i. e. below one and more than the maximum of 2³²)
928 cause an ``InvalidParameter`` exception to be thrown.
929 """
50710d86
PG
930 if cnt is None:
931 self.cnt = AES_GCM_IV_CNT_DATA
932 return
933 if cnt == 0 or cnt > AES_GCM_IV_CNT_MAX + 1:
b12110dd
PG
934 raise InvalidParameter ("invalid counter value %d requested: "
935 "acceptable values are from 1 to %d"
936 % (cnt, AES_GCM_IV_CNT_MAX))
50710d86
PG
937 if cnt == AES_GCM_IV_CNT_INFOFILE:
938 if self.info_counter_used is True:
fac2cfe1
PG
939 raise InvalidFileCounter ("attempted to reuse info file "
940 "counter %d: must be unique" % cnt)
50710d86 941 self.info_counter_used = True
3031b7ae
PG
942 elif cnt == AES_GCM_IV_CNT_INDEX:
943 if self.index_counter_used is True:
fac2cfe1
PG
944 raise InvalidFileCounter ("attempted to reuse index file "
945 " counter %d: must be unique" % cnt)
3031b7ae 946 self.index_counter_used = True
50710d86
PG
947 if cnt <= AES_GCM_IV_CNT_MAX:
948 self.cnt = cnt
949 return
950 # cnt == AES_GCM_IV_CNT_MAX + 1 → wrap
951 self.cnt = AES_GCM_IV_CNT_DATA
704ceaa5 952 self.next_fixed ()
50710d86
PG
953
954
1f3fd7b0 955 def set_parameters (self, password=None, key=None, paramversion=None,
be124bca 956 nacl=None, counter=None, strict_ivs=False):
704ceaa5
PG
957 """
958 Configure the internal state of a crypto context. Not intended for
959 external use.
960 """
be124bca 961 self.next_fixed ()
50710d86 962 self.set_object_counter (counter)
30019abf
PG
963 self.strict_ivs = strict_ivs
964
a83fa4ed
PG
965 if paramversion is not None:
966 self.paramversion = paramversion
967
1f3fd7b0
PG
968 if key is not None:
969 self.key, self.nacl = key, nacl
970 return
971
a83fa4ed
PG
972 if password is not None:
973 if isinstance (password, bytes) is False:
974 password = str.encode (password)
975 self.password = password
976 if paramversion is None and nacl is None:
977 # postpone key setup until first header is available
978 return
979 kdf = kdf_by_version (paramversion)
980 if kdf is not None:
981 self.key, self.nacl = kdf (password, nacl)
fa47412e 982
39accaaa 983
39accaaa 984 def process (self, buf):
704ceaa5
PG
985 """
986 Encrypt / decrypt a buffer. Invokes the ``.update()`` method on the
987 wrapped encryptor or decryptor, respectively.
988
989 The Cryptography exception ``AlreadyFinalized`` is translated to an
990 ``InternalError`` at this point. It may occur in sound code when the GC
991 closes an encrypting stream after an error. Everywhere else it must be
992 treated as a bug.
993 """
cb7a3911
PG
994 if self.enc is None:
995 raise RuntimeError ("process: context not initialized")
996 self.stats ["in"] += len (buf)
fac2cfe1
PG
997 try:
998 out = self.enc.update (buf)
999 except cryptography.exceptions.AlreadyFinalized as exn:
1000 raise InternalError (exn)
cb7a3911
PG
1001 self.stats ["out"] += len (out)
1002 return out
39accaaa
PG
1003
1004
30019abf 1005 def next (self, password, paramversion, nacl, iv):
704ceaa5
PG
1006 """
1007 Prepare for encrypting another object: Reset the data counters and
1008 change the configuration in case one of the variable parameters differs
1009 from the last object. Also check the IV for duplicates and error out
1010 if strict checking was requested.
1011 """
fa47412e
PG
1012 self.ctsize = 0
1013 self.ptsize = 0
1014 self.stats ["obj"] += 1
30019abf
PG
1015
1016 self.check_duplicate_iv (iv)
1017
6178061e
PG
1018 if ( self.paramversion != paramversion
1019 or self.password != password
1020 or self.nacl != nacl):
1f3fd7b0 1021 self.set_parameters (password=password, paramversion=paramversion,
30019abf
PG
1022 nacl=nacl, strict_ivs=self.strict_ivs)
1023
1024
1025 def check_duplicate_iv (self, iv):
704ceaa5
PG
1026 """
1027 Add an IV (the 12 byte representation as in the header) to the list. With
1028 strict checking enabled, this will throw a ``DuplicateIV``. Depending on
1029 the context, this may indicate a serious error (IV reuse).
1030 """
30019abf
PG
1031 if self.strict_ivs is True and iv in self.used_ivs:
1032 raise DuplicateIV ("iv %s was reused" % iv_fmt (iv))
1033 # vi has not been used before; add to collection
1034 self.used_ivs.add (iv)
fa47412e
PG
1035
1036
633b18a9 1037 def counters (self):
704ceaa5
PG
1038 """
1039 Access the data counters.
1040 """
633b18a9
PG
1041 return self.stats ["obj"], self.stats ["in"], self.stats ["out"]
1042
1043
8de91f4f
PG
1044 def drop (self):
1045 """
1046 Clear the current context regardless of its finalization state. The
1047 next operation must be ``.next()``.
1048 """
1049 self.enc = None
1050
1051
39accaaa
PG
1052class Encrypt (Crypto):
1053
48db09ba
PG
1054 lastinfo = None
1055 version = None
72a42219 1056 paramenc = None
50710d86 1057
1f3fd7b0 1058 def __init__ (self, version, paramversion, password=None, key=None, nacl=None,
30019abf 1059 counter=AES_GCM_IV_CNT_DATA, strict_ivs=True):
704ceaa5
PG
1060 """
1061 The ctor will throw immediately if one of the parameters does not conform
1062 to our expectations.
1063
1064 counter=AES_GCM_IV_CNT_DATA, strict_ivs=True):
1065 :type version: int to fit uint16_t
1066 :type paramversion: int to fit uint16_t
1067 :param password: mutually exclusive with ``key``
1068 :type password: bytes
1069 :param key: mutually exclusive with ``password``
1070 :type key: bytes
1071 :type nacl: bytes
1072 :type counter: initial object counter the values
1073 ``AES_GCM_IV_CNT_INFOFILE`` and
1074 ``AES_GCM_IV_CNT_INDEX`` are unique in each backup set
1075 and cannot be reused even with different fixed parts.
1076 :type strict_ivs: bool
1077 """
1f3fd7b0
PG
1078 if password is None and key is None \
1079 or password is not None and key is not None :
1080 raise InvalidParameter ("__init__: need either key or password")
1081
1082 if key is not None:
1083 if isinstance (key, bytes) is False:
1084 raise InvalidParameter ("__init__: key must be provided as "
1085 "bytes, not %s" % type (key))
1086 if nacl is None:
1087 raise InvalidParameter ("__init__: salt must be provided along "
1088 "with encryption key")
1089 else: # password, no key
1090 if isinstance (password, str) is False:
1091 raise InvalidParameter ("__init__: password must be a string, not %s"
1092 % type (password))
1093 if len (password) == 0:
1094 raise InvalidParameter ("__init__: supplied empty password but not "
1095 "permitted for PDT encrypted files")
36b9932a
PG
1096 # version
1097 if isinstance (version, int) is False:
1098 raise InvalidParameter ("__init__: version number must be an "
1099 "integer, not %s" % type (version))
1100 if version < 0:
1101 raise InvalidParameter ("__init__: version number must be a "
1102 "nonnegative integer, not %d" % version)
1103 # paramversion
1104 if isinstance (paramversion, int) is False:
1105 raise InvalidParameter ("__init__: crypto parameter version number "
1106 "must be an integer, not %s"
1107 % type (paramversion))
1108 if paramversion < 0:
1109 raise InvalidParameter ("__init__: crypto parameter version number "
1110 "must be a nonnegative integer, not %d"
1111 % paramversion)
1112 # salt
1113 if nacl is not None:
1114 if isinstance (nacl, bytes) is False:
1115 raise InvalidParameter ("__init__: salt given, but of type %s "
1116 "instead of bytes" % type (nacl))
1117 # salt length would depend on the actual encryption so it can’t be
1118 # validated at this point
b12110dd 1119 self.fixed = [ ]
48db09ba
PG
1120 self.version = version
1121 self.paramenc = ENCRYPTION_PARAMETERS.get (paramversion) ["enc"]
72a42219 1122
1f3fd7b0 1123 super().__init__ (password, key, paramversion, nacl, counter=counter,
30019abf 1124 strict_ivs=strict_ivs)
a393d9cb
PG
1125
1126
be124bca
PG
1127 def next_fixed (self, retries=PDTCRYPT_IV_GEN_MAX_RETRIES):
1128 """
1129 Generate the next IV fixed part by reading eight bytes from
1130 ``/dev/urandom``. The buffer so obtained is tested against the fixed
1131 parts used so far to prevent accidental reuse of IVs. After a
1132 configurable number of attempts to create a unique fixed part, it will
1133 refuse to continue with an ``IVFixedPartError``. This is unlikely to
1134 ever happen on a normal system but may detect an issue with the random
1135 generator.
1136
1137 The list of fixed parts that were used by the context at hand can be
1138 accessed through the ``.fixed`` list. Its last element is the fixed
1139 part currently in use.
1140 """
1141 i = 0
1142 while i < retries:
1143 fp = os.urandom (PDTCRYPT_IV_FIXEDPART_SIZE)
1144 if fp not in self.fixed:
1145 self.fixed.append (fp)
1146 return
1147 i += 1
1148 raise IVFixedPartError ("error obtaining a unique IV fixed part from "
1149 "/dev/urandom; giving up after %d tries" % i)
1150
1151
a393d9cb 1152 def iv_make (self):
704ceaa5
PG
1153 """
1154 Construct a 12-bytes IV from the current fixed part and the object
1155 counter.
1156 """
b12110dd 1157 return struct.pack(FMT_I2N_IV, self.fixed [-1], self.cnt)
a393d9cb
PG
1158
1159
cb7a3911 1160 def next (self, filename=None, counter=None):
704ceaa5
PG
1161 """
1162 Prepare for encrypting the next incoming object. Update the counter
1163 and put together the IV, possibly changing prefixes. Then create the
1164 new encryptor.
1165
1166 The argument ``counter`` can be used to specify a file counter for this
1167 object. Unless it is one of the reserved values, the counter of
1168 subsequent objects will be computed from this one.
1169
1170 If this is the first object in a series, ``filename`` is required,
1171 otherwise it is reused if not present. The value is used to derive a
1172 header sized placeholder to use until after encryption when all the
1173 inputs to construct the final header are available. This is then
1174 matched in ``.done()`` against the value found at the position of the
1175 header. The motivation for this extra check is primarily to assist
1176 format debugging: It makes stray headers easy to spot in malformed
1177 PDTCRYPT files.
1178 """
cb7a3911
PG
1179 if filename is None:
1180 if self.lastinfo is None:
1181 raise InvalidParameter ("next: filename is mandatory for "
1182 "first object")
1183 filename, _dummy = self.lastinfo
1184 else:
1185 if isinstance (filename, str) is False:
1186 raise InvalidParameter ("next: filename must be a string, no %s"
1187 % type (filename))
3031b7ae
PG
1188 if counter is not None:
1189 if isinstance (counter, int) is False:
1190 raise InvalidParameter ("next: the supplied counter is of "
1191 "invalid type %s; please pass an "
1192 "integer instead" % type (counter))
1193 self.set_object_counter (counter)
fac2cfe1 1194
50710d86 1195 self.iv = self.iv_make ()
72a42219 1196 if self.paramenc == "aes-gcm":
6178061e
PG
1197 self.enc = Cipher \
1198 ( algorithms.AES (self.key)
1199 , modes.GCM (self.iv)
1200 , backend = default_backend ()) \
1201 .encryptor ()
72a42219 1202 elif self.paramenc == "passthrough":
6178061e
PG
1203 self.enc = PassthroughCipher ()
1204 else:
b12110dd
PG
1205 raise InvalidParameter ("next: parameter version %d not known"
1206 % self.paramversion)
48db09ba
PG
1207 hdrdum = hdr_make_dummy (filename)
1208 self.lastinfo = (filename, hdrdum)
30019abf 1209 super().next (self.password, self.paramversion, self.nacl, self.iv)
72a42219 1210
3031b7ae 1211 self.set_object_counter (self.cnt + 1)
48db09ba 1212 return hdrdum
a393d9cb 1213
a393d9cb 1214
cd77dadb 1215 def done (self, cmpdata):
704ceaa5
PG
1216 """
1217 Complete encryption of an object. After this has been called, attempts
1218 of encrypting further data will cause an error until ``.next()`` is
1219 invoked properly.
1220
1221 Returns a 64 bytes buffer containing the object header including all
1222 values including the “late” ones e. g. the ciphertext size and the
1223 GCM tag.
1224 """
36b9932a
PG
1225 if isinstance (cmpdata, bytes) is False:
1226 raise InvalidParameter ("done: comparison input expected as bytes, "
1227 "not %s" % type (cmpdata))
cb7a3911
PG
1228 if self.lastinfo is None:
1229 raise RuntimeError ("done: encryption context not initialized")
48db09ba
PG
1230 filename, hdrdum = self.lastinfo
1231 if cmpdata != hdrdum:
b12110dd
PG
1232 raise RuntimeError ("done: bad sync of header for object %d: "
1233 "preliminary data does not match; this likely "
1234 "indicates a wrongly repositioned stream"
1235 % self.cnt)
6178061e 1236 data = self.enc.finalize ()
633b18a9 1237 self.stats ["out"] += len (data)
cd77dadb 1238 self.ctsize += len (data)
48db09ba
PG
1239 ok, hdr = hdr_from_params (self.version, self.paramversion, self.nacl,
1240 self.iv, self.ctsize, self.enc.tag)
8a990744 1241 if ok is False:
b12110dd
PG
1242 raise InternalError ("error constructing header: %r" % hdr)
1243 return data, hdr, self.fixed
a393d9cb 1244
a393d9cb 1245
cd77dadb 1246 def process (self, buf):
704ceaa5
PG
1247 """
1248 Encrypt a chunk of plaintext with the active encryptor. Returns the
1249 size of the input consumed. This **must** be checked downstream. If the
1250 maximum possible object size has been reached, the current context must
1251 be finalized and a new one established before any further data can be
1252 encrypted. The second argument is the remainder of the plaintext that
1253 was not encrypted for the caller to use immediately after the new
1254 context is ready.
1255 """
36b9932a
PG
1256 if isinstance (buf, bytes) is False:
1257 raise InvalidParameter ("process: expected byte buffer, not %s"
1258 % type (buf))
cb7a3911
PG
1259 bsize = len (buf)
1260 newptsize = self.ptsize + bsize
1261 diff = newptsize - PDTCRYPT_MAX_OBJ_SIZE
1262 if diff > 0:
1263 bsize -= diff
1264 newptsize = PDTCRYPT_MAX_OBJ_SIZE
1265 self.ptsize = newptsize
1266 data = super().process (buf [:bsize])
cd77dadb 1267 self.ctsize += len (data)
cb7a3911 1268 return bsize, data
cd77dadb
PG
1269
1270
39accaaa 1271class Decrypt (Crypto):
a393d9cb 1272
3031b7ae 1273 tag = None # GCM tag, part of header
3031b7ae 1274 last_iv = None # check consecutive ivs in strict mode
39accaaa 1275
1f3fd7b0 1276 def __init__ (self, password=None, key=None, counter=None, fixedparts=None,
ee6aa239 1277 strict_ivs=False):
704ceaa5
PG
1278 """
1279 Sanitizing ctor for the decryption context. ``fixedparts`` specifies a
1280 list of IV fixed parts accepted during decryption. If a fixed part is
1281 encountered that is not in the list, decryption will fail.
1282
1283 :param password: mutually exclusive with ``key``
1284 :type password: bytes
1285 :param key: mutually exclusive with ``password``
1286 :type key: bytes
1287 :type counter: initial object counter the values
1288 ``AES_GCM_IV_CNT_INFOFILE`` and
1289 ``AES_GCM_IV_CNT_INDEX`` are unique in each backup set
1290 and cannot be reused even with different fixed parts.
1291 :type fixedparts: bytes list
1292 """
1f3fd7b0
PG
1293 if password is None and key is None \
1294 or password is not None and key is not None :
1295 raise InvalidParameter ("__init__: need either key or password")
1296
1297 if key is not None:
1298 if isinstance (key, bytes) is False:
1299 raise InvalidParameter ("__init__: key must be provided as "
1300 "bytes, not %s" % type (key))
1301 else: # password, no key
1302 if isinstance (password, str) is False:
1303 raise InvalidParameter ("__init__: password must be a string, not %s"
1304 % type (password))
1305 if len (password) == 0:
1306 raise InvalidParameter ("__init__: supplied empty password but not "
1307 "permitted for PDT encrypted files")
36b9932a 1308 # fixed parts
50710d86 1309 if fixedparts is not None:
36b9932a
PG
1310 if isinstance (fixedparts, list) is False:
1311 raise InvalidParameter ("__init__: IV fixed parts must be "
1312 "supplied as list, not %s"
1313 % type (fixedparts))
b12110dd
PG
1314 self.fixed = fixedparts
1315 self.fixed.sort ()
ee6aa239 1316
a83fa4ed
PG
1317 super().__init__ (password=password, key=key, counter=counter,
1318 strict_ivs=strict_ivs)
39accaaa
PG
1319
1320
b12110dd 1321 def valid_fixed_part (self, iv):
704ceaa5
PG
1322 """
1323 Check if a fixed part was already seen.
1324 """
50710d86 1325 # check if fixed part is known
b12110dd
PG
1326 fixed, _cnt = struct.unpack (FMT_I2N_IV, iv)
1327 i = bisect.bisect_left (self.fixed, fixed)
1328 return i != len (self.fixed) and self.fixed [i] == fixed
50710d86
PG
1329
1330
ee6aa239 1331 def check_consecutive_iv (self, iv):
704ceaa5
PG
1332 """
1333 Check whether the counter part of the given IV is indeed the successor
1334 of the currently present counter. This should always be the case for
1335 the objects in a well formed PDT archive but should not be enforced
1336 when decrypting out-of-order.
1337 """
ee6aa239 1338 fixed, cnt = struct.unpack (FMT_I2N_IV, iv)
3031b7ae
PG
1339 if self.strict_ivs is True \
1340 and self.last_iv is not None \
ee6aa239
PG
1341 and self.last_iv [0] == fixed \
1342 and self.last_iv [1] != cnt - 1:
f6cd676f 1343 raise NonConsecutiveIV ("iv %s counter not successor of "
ee6aa239 1344 "last object (expected %d, found %d)"
f6cd676f 1345 % (iv_fmt (self.last_iv [1]), cnt))
ee6aa239
PG
1346 self.last_iv = (iv, cnt)
1347
1348
79782fa9 1349 def next (self, hdr):
704ceaa5
PG
1350 """
1351 Start decrypting the next object. The PDTCRYPT header for the object
1352 can be given either as already parsed object or as bytes.
1353 """
dccfe104
PG
1354 if isinstance (hdr, bytes) is True:
1355 hdr = hdr_read (hdr)
36b9932a
PG
1356 elif isinstance (hdr, dict) is False:
1357 # this won’t catch malformed specs though
1358 raise InvalidParameter ("next: wrong type of parameter hdr: "
1359 "expected bytes or spec, got %s"
fbfda3d4 1360 % type (hdr))
36b9932a
PG
1361 try:
1362 paramversion = hdr ["paramversion"]
1363 nacl = hdr ["nacl"]
1364 iv = hdr ["iv"]
1365 tag = hdr ["tag"]
1366 except KeyError:
1367 raise InvalidHeader ("next: not a header %r" % hdr)
1368
30019abf 1369 super().next (self.password, paramversion, nacl, iv)
b12110dd 1370 if self.fixed is not None and self.valid_fixed_part (iv) is False:
f6cd676f
PG
1371 raise InvalidIVFixedPart ("iv %s has invalid fixed part"
1372 % iv_fmt (iv))
3031b7ae 1373 self.check_consecutive_iv (iv)
ee6aa239 1374
36b9932a 1375 self.tag = tag
b12110dd
PG
1376 defs = ENCRYPTION_PARAMETERS.get (paramversion, None)
1377 if defs is None:
1378 raise FormatError ("header contains unknown parameter version %d; "
1379 "maybe the file was created by a more recent "
1380 "version of Deltatar" % paramversion)
50710d86 1381 enc = defs ["enc"]
6178061e
PG
1382 if enc == "aes-gcm":
1383 self.enc = Cipher \
1384 ( algorithms.AES (self.key)
36b9932a 1385 , modes.GCM (iv, tag=self.tag)
6178061e
PG
1386 , backend = default_backend ()) \
1387 . decryptor ()
1388 elif enc == "passthrough":
1389 self.enc = PassthroughCipher ()
1390 else:
b12110dd
PG
1391 raise InternalError ("encryption parameter set %d refers to unknown "
1392 "mode %r" % (paramversion, enc))
f484f2d1 1393 self.set_object_counter (self.cnt + 1)
39accaaa
PG
1394
1395
db1f3ac7 1396 def done (self, tag=None):
704ceaa5
PG
1397 """
1398 Stop decryption of the current object and finalize it with the active
1399 context. This will throw an *InvalidGCMTag* exception to indicate that
1400 the authentication tag does not match the data. If the tag is correct,
1401 the rest of the plaintext is returned.
1402 """
633b18a9 1403 data = b""
db1f3ac7
PG
1404 try:
1405 if tag is None:
f484f2d1 1406 data = self.enc.finalize ()
db1f3ac7 1407 else:
36b9932a
PG
1408 if isinstance (tag, bytes) is False:
1409 raise InvalidParameter ("done: wrong type of parameter "
1410 "tag: expected bytes, got %s"
1411 % type (tag))
f484f2d1 1412 data = self.enc.finalize_with_tag (self.tag)
b0078f26 1413 except cryptography.exceptions.InvalidTag:
f08c604b 1414 raise InvalidGCMTag ("done: tag mismatch of object %d: %s "
b0078f26 1415 "rejected by finalize ()"
f08c604b 1416 % (self.cnt, binascii.hexlify (self.tag)))
50710d86 1417 self.ctsize += len (data)
633b18a9 1418 self.stats ["out"] += len (data)
b0078f26 1419 return data
00b3cd10
PG
1420
1421
47e27926 1422 def process (self, buf):
704ceaa5
PG
1423 """
1424 Decrypt the bytes object *buf* with the active decryptor.
1425 """
36b9932a
PG
1426 if isinstance (buf, bytes) is False:
1427 raise InvalidParameter ("process: expected byte buffer, not %s"
1428 % type (buf))
47e27926
PG
1429 self.ctsize += len (buf)
1430 data = super().process (buf)
1431 self.ptsize += len (data)
1432 return data
1433
1434
00b3cd10 1435###############################################################################
770173c5
PG
1436## testing helpers
1437###############################################################################
1438
cb7a3911 1439def _patch_global (glob, vow, n=None):
770173c5
PG
1440 """
1441 Adapt upper file counter bound for testing IV logic. Completely unsafe.
1442 """
1443 assert vow == "I am fully aware that this will void my warranty."
cb7a3911
PG
1444 r = globals () [glob]
1445 if n is None:
1446 n = globals () [glob + "_DEFAULT"]
1447 globals () [glob] = n
770173c5
PG
1448 return r
1449
cb7a3911
PG
1450_testing_set_AES_GCM_IV_CNT_MAX = \
1451 partial (_patch_global, "AES_GCM_IV_CNT_MAX")
1452
1453_testing_set_PDTCRYPT_MAX_OBJ_SIZE = \
1454 partial (_patch_global, "PDTCRYPT_MAX_OBJ_SIZE")
1455
a808459e
PG
1456def open2_dump_file (fname, dir_fd, force=False):
1457 outfd = -1
1458
1459 oflags = os.O_CREAT | os.O_WRONLY
6690f5e0 1460 if force is True:
a808459e
PG
1461 oflags |= os.O_TRUNC
1462 else:
1463 oflags |= os.O_EXCL
1464
1465 try:
1466 outfd = os.open (fname, oflags,
1467 stat.S_IRUSR | stat.S_IWUSR, dir_fd=dir_fd)
1468 except FileExistsError as exn:
1469 noise ("PDT: refusing to overwrite existing file %s" % fname)
1470 noise ("")
1471 raise RuntimeError ("destination file %s already exists" % fname)
1472 if PDTCRYPT_VERBOSE is True:
1473 noise ("PDT: new output file %s (fd=%d)" % (fname, outfd))
1474
1475 return outfd
1476
770173c5 1477###############################################################################
00b3cd10
PG
1478## freestanding invocation
1479###############################################################################
1480
da82bc58
PG
1481PDTCRYPT_SUB_PROCESS = 0
1482PDTCRYPT_SUB_SCRYPT = 1
f41973a6 1483PDTCRYPT_SUB_SCAN = 2
da82bc58
PG
1484
1485PDTCRYPT_SUB = \
1486 { "process" : PDTCRYPT_SUB_PROCESS
f41973a6
PG
1487 , "scrypt" : PDTCRYPT_SUB_SCRYPT
1488 , "scan" : PDTCRYPT_SUB_SCAN }
da82bc58 1489
e3abcdf0
PG
1490PDTCRYPT_DECRYPT = 1 << 0 # decrypt archive with password
1491PDTCRYPT_SPLIT = 1 << 1 # split archive into individual objects
da82bc58 1492PDTCRYPT_HASH = 1 << 2 # output scrypt hash for file and given password
e3abcdf0 1493
a808459e
PG
1494PDTCRYPT_SPLITNAME = "pdtcrypt-object-%d.bin"
1495PDTCRYPT_RESCUENAME = "pdtcrypt-rescue-object-%0.5d.bin"
e3abcdf0 1496
70ad9458 1497PDTCRYPT_VERBOSE = False
ee6aa239 1498PDTCRYPT_STRICTIVS = False
b07633d3 1499PDTCRYPT_OVERWRITE = False
15d3eefd 1500PDTCRYPT_BLOCKSIZE = 1 << 12
70ad9458
PG
1501PDTCRYPT_SINK = 0
1502PDTCRYPT_SOURCE = 1
1503SELF = None
1504
77058bab
PG
1505PDTCRYPT_DEFAULT_VER = 1
1506PDTCRYPT_DEFAULT_PVER = 1
1507
7b3940e5
PG
1508# scrypt hashing output control
1509PDTCRYPT_SCRYPT_INTRANATOR = 0
1510PDTCRYPT_SCRYPT_PARAMETERS = 1
4f6405d6 1511PDTCRYPT_SCRYPT_DEFAULT = PDTCRYPT_SCRYPT_INTRANATOR
7b3940e5
PG
1512
1513PDTCRYPT_SCRYPT_FORMAT = \
1514 { "i2n" : PDTCRYPT_SCRYPT_INTRANATOR
1515 , "params" : PDTCRYPT_SCRYPT_PARAMETERS }
1516
4c62ddc0 1517PDTCRYPT_TT_COLUMNS = 80 # assume standard terminal
15d3eefd
PG
1518
1519class PDTDecryptionError (Exception):
1520 """Decryption failed."""
1521
e3abcdf0
PG
1522class PDTSplitError (Exception):
1523 """Decryption failed."""
1524
15d3eefd
PG
1525
1526def noise (*a, **b):
591a722f 1527 print (file=sys.stderr, *a, **b)
15d3eefd
PG
1528
1529
89e1073c
PG
1530class PassthroughDecryptor (object):
1531
1532 curhdr = None # write current header on first data write
1533
1534 def __init__ (self):
1535 if PDTCRYPT_VERBOSE is True:
1536 noise ("PDT: no encryption; data passthrough")
1537
1538 def next (self, hdr):
1539 ok, curhdr = hdr_make (hdr)
1540 if ok is False:
1541 raise PDTDecryptionError ("bad header %r" % hdr)
1542 self.curhdr = curhdr
1543
1544 def done (self):
1545 if self.curhdr is not None:
1546 return self.curhdr
1547 return b""
1548
1549 def process (self, d):
1550 if self.curhdr is not None:
1551 d = self.curhdr + d
1552 self.curhdr = None
1553 return d
1554
1555
a83fa4ed 1556def depdtcrypt (mode, secret, ins, outs):
15d3eefd 1557 """
a83fa4ed
PG
1558 Remove PDTCRYPT layer from all objects encrypted with the secret. Used on a
1559 Deltatar backup this will yield a (possibly Gzip compressed) tarball.
15d3eefd
PG
1560 """
1561 ctleft = -1 # length of ciphertext to consume
1562 ctcurrent = 0 # total ciphertext of current object
15d3eefd
PG
1563 total_obj = 0 # total number of objects read
1564 total_pt = 0 # total plaintext bytes
1565 total_ct = 0 # total ciphertext bytes
1566 total_read = 0 # total bytes read
e3abcdf0
PG
1567 outfile = None # Python file object for output
1568
89e1073c 1569 if mode & PDTCRYPT_DECRYPT: # decryptor
a83fa4ed
PG
1570 ks = secret [0]
1571 if ks == PDTCRYPT_SECRET_PW:
1572 decr = Decrypt (password=secret [1], strict_ivs=PDTCRYPT_STRICTIVS)
1573 elif ks == PDTCRYPT_SECRET_KEY:
6257d5b3 1574 key = secret [1]
a83fa4ed
PG
1575 decr = Decrypt (key=key, strict_ivs=PDTCRYPT_STRICTIVS)
1576 else:
1577 raise InternalError ("‘%d’ does not specify a valid kind of secret"
1578 % ks)
89e1073c
PG
1579 else:
1580 decr = PassthroughDecryptor ()
1581
e3abcdf0
PG
1582 def nextout (_):
1583 """Dummy for non-split mode: output file does not vary."""
1584 return outs
1585
1586 if mode & PDTCRYPT_SPLIT:
1587 def nextout (outfile):
1588 """
1589 We were passed an fd as outs for accessing the destination
1590 directory where extracted archive components are supposed
1591 to end up in.
1592 """
1593
1594 if outfile is None:
1595 if PDTCRYPT_VERBOSE is True:
1596 noise ("PDT: no output file to close at this point")
77058bab
PG
1597 else:
1598 if PDTCRYPT_VERBOSE is True:
1599 noise ("PDT: release output file %r" % outfile)
e3abcdf0
PG
1600 # cleanup happens automatically by the GC; the next
1601 # line will error out on account of an invalid fd
1602 #outfile.close ()
1603
1604 assert total_obj > 0
1605 fname = PDTCRYPT_SPLITNAME % total_obj
1606 try:
a808459e
PG
1607 outfd = open2_dump_file (fname, outs, force=PDTCRYPT_OVERWRITE)
1608 except RuntimeError as exn:
1609 raise PDTSplitError (exn)
e3abcdf0
PG
1610 return os.fdopen (outfd, "wb", closefd=True)
1611
15d3eefd 1612
47d22679 1613 def tell (s):
b09a99eb 1614 """ESPIPE is normal on non-seekable stdio stream."""
47d22679
PG
1615 try:
1616 return s.tell ()
1617 except OSError as exn:
b09a99eb 1618 if exn.errno == os.errno.ESPIPE:
47d22679
PG
1619 return -1
1620
e3abcdf0 1621 def out (pt, outfile):
15d3eefd
PG
1622 npt = len (pt)
1623 nonlocal total_pt
1624 total_pt += npt
70ad9458 1625 if PDTCRYPT_VERBOSE is True:
15d3eefd
PG
1626 noise ("PDT:\t· decrypt plaintext %d B" % (npt))
1627 try:
e3abcdf0 1628 nn = outfile.write (pt)
15d3eefd
PG
1629 except OSError as exn: # probably ENOSPC
1630 raise DecryptionError ("error (%s)" % exn)
1631 if nn != npt:
1632 raise DecryptionError ("write aborted after %d of %d B" % (nn, npt))
1633
1634 while True:
1635 if ctleft <= 0:
1636 # current object completed; in a valid archive this marks either
1637 # the start of a new header or the end of the input
1638 if ctleft == 0: # current object requires finalization
70ad9458 1639 if PDTCRYPT_VERBOSE is True:
47d22679 1640 noise ("PDT: %d finalize" % tell (ins))
5d394c0d
PG
1641 try:
1642 pt = decr.done ()
1643 except InvalidGCMTag as exn:
f08c604b
PG
1644 raise DecryptionError ("error finalizing object %d (%d B): "
1645 "%r" % (total_obj, len (pt), exn)) \
1646 from exn
e3abcdf0 1647 out (pt, outfile)
70ad9458 1648 if PDTCRYPT_VERBOSE is True:
15d3eefd
PG
1649 noise ("PDT:\t· object validated")
1650
70ad9458 1651 if PDTCRYPT_VERBOSE is True:
47d22679 1652 noise ("PDT: %d hdr" % tell (ins))
15d3eefd
PG
1653 try:
1654 hdr = hdr_read_stream (ins)
dd47d6a2 1655 total_read += PDTCRYPT_HDR_SIZE
ae3d0f2a
PG
1656 except EndOfFile as exn:
1657 total_read += exn.remainder
dd47d6a2 1658 if total_ct + total_obj * PDTCRYPT_HDR_SIZE != total_read:
15d3eefd
PG
1659 raise PDTDecryptionError ("ciphertext processed (%d B) plus "
1660 "overhead (%d × %d B) does not match "
1661 "the number of bytes read (%d )"
dd47d6a2 1662 % (total_ct, total_obj, PDTCRYPT_HDR_SIZE,
15d3eefd
PG
1663 total_read))
1664 # the single good exit
1665 return total_read, total_obj, total_ct, total_pt
1666 except InvalidHeader as exn:
1667 raise PDTDecryptionError ("invalid header at position %d in %r "
ee6aa239 1668 "(%s)" % (tell (ins), exn, ins))
70ad9458 1669 if PDTCRYPT_VERBOSE is True:
15d3eefd
PG
1670 pretty = hdr_fmt_pretty (hdr)
1671 noise (reduce (lambda a, e: (a + "\n" if a else "") + "PDT:\t· " + e,
1672 pretty.splitlines (), ""))
1673 ctcurrent = ctleft = hdr ["ctsize"]
89e1073c 1674
15d3eefd 1675 decr.next (hdr)
e3abcdf0
PG
1676
1677 total_obj += 1 # used in file counter with split mode
1678
1679 # finalization complete or skipped in case of first object in
1680 # stream; create a new output file if necessary
1681 outfile = nextout (outfile)
15d3eefd 1682
70ad9458 1683 if PDTCRYPT_VERBOSE is True:
15d3eefd 1684 noise ("PDT: %d decrypt obj no. %d, %d B"
47d22679 1685 % (tell (ins), total_obj, ctleft))
15d3eefd
PG
1686
1687 # always allocate a new buffer since python-cryptography doesn’t allow
1688 # passing a bytearray :/
1689 nexpect = min (ctleft, PDTCRYPT_BLOCKSIZE)
70ad9458 1690 if PDTCRYPT_VERBOSE is True:
15d3eefd 1691 noise ("PDT:\t· [%d] %d%% done, read block (%d B of %d B remaining)"
47d22679 1692 % (tell (ins),
15d3eefd
PG
1693 100 - ctleft * 100 / (ctcurrent > 0 and ctcurrent or 1),
1694 nexpect, ctleft))
1695 ct = ins.read (nexpect)
1696 nct = len (ct)
1697 if nct < nexpect:
47d22679 1698 off = tell (ins)
ae3d0f2a
PG
1699 raise EndOfFile (nct,
1700 "hit EOF after %d of %d B in block [%d:%d); "
15d3eefd
PG
1701 "%d B ciphertext remaining for object no %d"
1702 % (nct, nexpect, off, off + nexpect, ctleft,
1703 total_obj))
1704 ctleft -= nct
1705 total_ct += nct
1706 total_read += nct
1707
70ad9458 1708 if PDTCRYPT_VERBOSE is True:
15d3eefd
PG
1709 noise ("PDT:\t· decrypt ciphertext %d B" % (nct))
1710 pt = decr.process (ct)
e3abcdf0 1711 out (pt, outfile)
15d3eefd 1712
d6c15a52 1713
70ad9458 1714def deptdcrypt_mk_stream (kind, path):
d6c15a52 1715 """Create stream from file or stdio descriptor."""
70ad9458 1716 if kind == PDTCRYPT_SINK:
d6c15a52 1717 if path == "-":
70ad9458 1718 if PDTCRYPT_VERBOSE is True: noise ("PDT: sink: stdout")
d6c15a52
PG
1719 return sys.stdout.buffer
1720 else:
70ad9458 1721 if PDTCRYPT_VERBOSE is True: noise ("PDT: sink: file %s" % path)
d6c15a52 1722 return io.FileIO (path, "w")
70ad9458 1723 if kind == PDTCRYPT_SOURCE:
d6c15a52 1724 if path == "-":
70ad9458 1725 if PDTCRYPT_VERBOSE is True: noise ("PDT: source: stdin")
d6c15a52
PG
1726 return sys.stdin.buffer
1727 else:
70ad9458 1728 if PDTCRYPT_VERBOSE is True: noise ("PDT: source: file %s" % path)
d6c15a52
PG
1729 return io.FileIO (path, "r")
1730
1731 raise ValueError ("bogus stream “%s” / %s" % (kind, path))
1732
15d3eefd 1733
a83fa4ed 1734def mode_depdtcrypt (mode, secret, ins, outs):
da82bc58
PG
1735 try:
1736 total_read, total_obj, total_ct, total_pt = \
a83fa4ed 1737 depdtcrypt (mode, secret, ins, outs)
da82bc58
PG
1738 except DecryptionError as exn:
1739 noise ("PDT: Decryption failed:")
1740 noise ("PDT:")
1741 noise ("PDT: “%s”" % exn)
1742 noise ("PDT:")
a83fa4ed 1743 noise ("PDT: Did you specify the correct key / password?")
da82bc58
PG
1744 noise ("")
1745 return 1
1746 except PDTSplitError as exn:
1747 noise ("PDT: Split operation failed:")
1748 noise ("PDT:")
1749 noise ("PDT: “%s”" % exn)
1750 noise ("PDT:")
a83fa4ed 1751 noise ("PDT: Hint: target directory should be empty.")
da82bc58
PG
1752 noise ("")
1753 return 1
1754
1755 if PDTCRYPT_VERBOSE is True:
1756 noise ("PDT: decryption successful" )
1757 noise ("PDT: %.10d bytes read" % total_read)
1758 noise ("PDT: %.10d objects decrypted" % total_obj )
1759 noise ("PDT: %.10d bytes ciphertext" % total_ct )
1760 noise ("PDT: %.10d bytes plaintext" % total_pt )
1761 noise ("" )
1762
1763 return 0
1764
1765
7b3940e5 1766def mode_scrypt (pw, ins=None, nacl=None, fmt=PDTCRYPT_SCRYPT_INTRANATOR):
77058bab 1767 hsh = None
7b3940e5 1768 paramversion = PDTCRYPT_DEFAULT_PVER
77058bab
PG
1769 if ins is not None:
1770 hsh, nacl, version, paramversion = scrypt_hashsource (pw, ins)
1771 defs = ENCRYPTION_PARAMETERS.get(paramversion, None)
1772 else:
1773 nacl = binascii.unhexlify (nacl)
7b3940e5 1774 defs = ENCRYPTION_PARAMETERS.get(paramversion, None)
77058bab
PG
1775 version = PDTCRYPT_DEFAULT_VER
1776
1777 kdfname, params = defs ["kdf"]
1778 if hsh is None:
1779 kdf = kdf_by_version (None, defs)
1780 hsh, _void = kdf (pw, nacl)
da82bc58
PG
1781
1782 import json
7b3940e5
PG
1783
1784 if fmt == PDTCRYPT_SCRYPT_INTRANATOR:
1785 out = json.dumps ({ "salt" : base64.b64encode (nacl).decode ()
1786 , "key" : base64.b64encode (hsh) .decode ()
1787 , "paramversion" : paramversion })
1788 elif fmt == PDTCRYPT_SCRYPT_PARAMETERS:
1789 out = json.dumps ({ "salt" : binascii.hexlify (nacl).decode ()
1790 , "key" : binascii.hexlify (hsh) .decode ()
1791 , "version" : version
1792 , "scrypt_params" : { "N" : params ["N"]
1793 , "r" : params ["r"]
1794 , "p" : params ["p"]
1795 , "dkLen" : params ["dkLen"] } })
1796 else:
1797 raise RuntimeError ("bad scrypt output scheme %r" % fmt)
1798
da82bc58
PG
1799 print (out)
1800
1801
4c62ddc0
PG
1802def noise_output_candidates (cands, indent=8, cols=PDTCRYPT_TT_COLUMNS):
1803 """
1804 Print a list of offsets without garbling the terminal too much.
1805
1806 The indent is counted from column zero; if it is wide enough, the “PDT: ”
1807 marker will be prepended, considered part of the indentation.
1808 """
1809 wd = cols - 1
1810 nc = len (cands)
1811 idt = " " * indent if indent < 5 else "PDT: " + " " * (indent - 5)
1812 line = idt
1813 lpos = indent
1814 sep = ","
1815 lsep = len (sep)
1816 init = True # prevent leading separator
1817
1818 if indent >= wd:
1819 raise ValueError ("the requested indentation exceeds the line "
1820 "width by %d" % (indent - wd))
1821
1822 for n in cands:
1823 ns = "%d" % n
1824 lns = len (ns)
1825 if init is False:
1826 line += sep
1827 lpos += lsep
1828
1829 lpos += lns
1830 if lpos > wd: # line break
1831 noise (line)
1832 line = idt
1833 lpos = indent + lns
1834 elif init is True:
1835 init = False
1836 else: # space
1837 line += ' '
1838 lpos += 1
1839
1840 line += ns
1841
1842 if lpos != indent:
1843 noise (line)
1844
1845
15047fe4
PG
1846SLICE_START = 1 # ordering is important to have starts of intervals
1847SLICE_END = 0 # sorted before equal ends
1848
1849def find_overlaps (slices):
1850 """
1851 Find overlapping slices: iterate open/close points of intervals, tracking
1852 the ones open at any time.
1853 """
1854 bounds = []
1855 inside = set () # of indices into bounds
1856 ovrlp = set () # of indices into bounds
1857
1858 for i, s in enumerate (slices):
1859 bounds.append ((s [0], SLICE_START, i))
1860 bounds.append ((s [1], SLICE_END , i))
1861 bounds = sorted (bounds)
1862
1863 for val in bounds:
1864 i = val [2]
1865 if val [1] == SLICE_START:
1866 inside.add (i)
1867 else:
1868 if len (inside) > 1: # closing one that overlapped
1869 ovrlp |= inside
1870 inside.remove (i)
1871
1872 return [ slices [i] for i in ovrlp ]
1873
1874
a808459e 1875def mode_scan (secret, fname, outs=None, nacl=None):
f41973a6
PG
1876 """
1877 Dissect a binary file, looking for PDTCRYPT headers and objects.
a808459e
PG
1878
1879 If *outs* is supplied, recoverable data will be dumped into the specified
1880 directory.
f41973a6
PG
1881 """
1882 try:
a808459e 1883 ifd = os.open (fname, os.O_RDONLY)
f41973a6
PG
1884 except FileNotFoundError:
1885 noise ("PDT: failed to open %s readonly" % fname)
1886 noise ("")
1887 usage (err=True)
1888
1889 try:
1890 if PDTCRYPT_VERBOSE is True:
1891 noise ("PDT: scan for potential sync points")
a808459e 1892 cands = locate_hdr_candidates (ifd)
f41973a6
PG
1893 if len (cands) == 0:
1894 noise ("PDT: scan complete: input does not contain potential PDT "
1895 "headers; giving up.")
1896 return -1
1897 if PDTCRYPT_VERBOSE is True:
4c62ddc0
PG
1898 noise ("PDT: scan complete: found %d candidates:" % len (cands))
1899 noise_output_candidates (cands)
6c8073ab 1900 except:
a808459e 1901 os.close (ifd)
6c8073ab 1902 raise
f41973a6 1903
15047fe4 1904 junk, todo, slices = [], [], []
6c8073ab 1905 try:
a808459e 1906 nobj = 0
6c8073ab 1907 for cand in cands:
a808459e
PG
1908 nobj += 1
1909 vdt, hdr = inspect_hdr (ifd, cand)
15047fe4 1910
6c8073ab
PG
1911 if vdt == HDR_CAND_JUNK:
1912 junk.append (cand)
1913 else:
1914 off0 = cand + PDTCRYPT_HDR_SIZE
1915 if PDTCRYPT_VERBOSE is True:
a808459e 1916 noise ("PDT: obj %d: read payload @%d" % (nobj, off0))
70a33834
PG
1917 pretty = hdr_fmt_pretty (hdr)
1918 noise (reduce (lambda a, e: (a + "\n" if a else "") + "PDT:\t· " + e,
1919 pretty.splitlines (), ""))
6c8073ab 1920
a808459e
PG
1921 ofd = -1
1922 if outs is not None:
1923 ofname = PDTCRYPT_RESCUENAME % nobj
1924 ofd = open2_dump_file (ofname, outs, force=PDTCRYPT_OVERWRITE)
1925
15047fe4 1926 ctsize = hdr ["ctsize"]
a808459e 1927 try:
15047fe4
PG
1928 l = try_decrypt (ifd, off0, hdr, secret, ofd=ofd)
1929 ok = l == ctsize
1930 slices.append ((off0, off0 + l))
a808459e
PG
1931 finally:
1932 if ofd != -1:
1933 os.close (ofd)
70a33834 1934 if vdt == HDR_CAND_GOOD and ok is True:
6c8073ab 1935 noise ("PDT: %d → ✓ valid object %d–%d"
15047fe4 1936 % (cand, off0, off0 + ctsize))
70a33834 1937 elif vdt == HDR_CAND_FISHY and ok is True:
6c8073ab 1938 noise ("PDT: %d → × object %d–%d, corrupt header"
15047fe4 1939 % (cand, off0, off0 + ctsize))
70a33834 1940 elif vdt == HDR_CAND_GOOD and ok is False:
6c8073ab 1941 noise ("PDT: %d → × object %d–%d, problematic payload"
15047fe4 1942 % (cand, off0, off0 + ctsize))
70a33834 1943 elif vdt == HDR_CAND_FISHY and ok is False:
6c8073ab 1944 noise ("PDT: %d → × object %d–%d, corrupt header, problematic "
15047fe4 1945 "ciphertext" % (cand, off0, off0 + ctsize))
6c8073ab
PG
1946 else:
1947 raise Unreachable
1948 finally:
a808459e 1949 os.close (ifd)
7b3940e5 1950
70a33834
PG
1951 if len (junk) == 0:
1952 noise ("PDT: all headers ok")
1953 else:
1954 noise ("PDT: %d candidates not parseable as headers:" % len (junk))
1955 noise_output_candidates (junk)
1956
15047fe4
PG
1957 overlap = find_overlaps (slices)
1958 if len (overlap) > 0:
1959 noise ("PDT: %d objects overlapping others" % len (overlap))
1960 for slice in overlap:
1961 noise ("PDT: × %d→%d" % (slice [0], slice [1]))
1962
70ad9458
PG
1963def usage (err=False):
1964 out = print
1965 if err is True:
1966 out = noise
5afcb45d 1967 indent = ' ' * len (SELF)
da82bc58 1968 out ("usage: %s SUBCOMMAND { --help" % SELF)
5afcb45d 1969 out (" %s | [ -v ] { -p PASSWORD | -k KEY }" % indent)
77058bab
PG
1970 out (" %s [ { -i | --in } { - | SOURCE } ]" % indent)
1971 out (" %s [ { -n | --nacl } { SALT } ]" % indent)
1972 out (" %s [ { -o | --out } { - | DESTINATION } ]" % indent)
1973 out (" %s [ -D | --no-decrypt ] [ -S | --split ]" % indent)
7b3940e5 1974 out (" %s [ -f | --format ]" % indent)
70ad9458
PG
1975 out ("")
1976 out ("\twhere")
da82bc58
PG
1977 out ("\t\tSUBCOMMAND main mode: { process | scrypt }")
1978 out ("\t\t where:")
1979 out ("\t\t process: extract objects from PDT archive")
1980 out ("\t\t scrypt: calculate hash from password and first object")
a83fa4ed
PG
1981 out ("\t\t-p PASSWORD password to derive the encryption key from")
1982 out ("\t\t-k KEY encryption key as 16 bytes in hexadecimal notation")
e3abcdf0 1983 out ("\t\t-s enforce strict handling of initialization vectors")
70ad9458
PG
1984 out ("\t\t-i SOURCE file name to read from")
1985 out ("\t\t-o DESTINATION file to write output to")
77058bab 1986 out ("\t\t-n SALT provide salt for scrypt mode in hex encoding")
70ad9458 1987 out ("\t\t-v print extra info")
e3abcdf0
PG
1988 out ("\t\t-S split into files at object boundaries; this")
1989 out ("\t\t requires DESTINATION to refer to directory")
1990 out ("\t\t-D PDT header and ciphertext passthrough")
7b3940e5 1991 out ("\t\t-f format of SCRYPT hash output (“default” or “parameters”)")
70ad9458
PG
1992 out ("")
1993 out ("\tinstead of filenames, “-” may used to specify stdin / stdout")
1994 out ("")
1995 sys.exit ((err is True) and 42 or 0)
1996
1997
a83fa4ed
PG
1998def bail (msg):
1999 noise (msg)
2000 noise ("")
2001 usage (err=True)
2002 raise Unreachable
2003
2004
70ad9458 2005def parse_argv (argv):
6690f5e0 2006 global PDTCRYPT_OVERWRITE
70ad9458 2007 global SELF
7b3940e5
PG
2008 mode = PDTCRYPT_DECRYPT
2009 secret = None
2010 insspec = None
2011 outsspec = None
a808459e 2012 outs = None
7b3940e5 2013 nacl = None
4f6405d6 2014 scrypt_format = PDTCRYPT_SCRYPT_DEFAULT
70ad9458
PG
2015
2016 argvi = iter (argv)
2017 SELF = os.path.basename (next (argvi))
2018
da82bc58
PG
2019 try:
2020 rawsubcmd = next (argvi)
2021 subcommand = PDTCRYPT_SUB [rawsubcmd]
2022 except StopIteration:
a83fa4ed 2023 bail ("ERROR: subcommand required")
da82bc58 2024 except KeyError:
a83fa4ed 2025 bail ("ERROR: invalid subcommand “%s” specified" % rawsubcmd)
da82bc58 2026
59d74e2b
PG
2027 def checked_arg ():
2028 nonlocal argvi
2029 try:
2030 return next (argvi)
2031 except StopIteration:
2032 bail ("ERROR: argument list incomplete")
2033
addcec42 2034 def checked_secret (s):
a83fa4ed
PG
2035 nonlocal secret
2036 if secret is None:
addcec42 2037 secret = s
da82bc58 2038 else:
a83fa4ed 2039 bail ("ERROR: encountered “%s” but secret already given" % arg)
da82bc58 2040
70ad9458
PG
2041 for arg in argvi:
2042 if arg in [ "-h", "--help" ]:
2043 usage ()
2044 raise Unreachable
2045 elif arg in [ "-v", "--verbose", "--wtf" ]:
2046 global PDTCRYPT_VERBOSE
2047 PDTCRYPT_VERBOSE = True
2048 elif arg in [ "-i", "--in", "--source" ]:
59d74e2b 2049 insspec = checked_arg ()
70ad9458 2050 if PDTCRYPT_VERBOSE is True: noise ("PDT: decrypt from %s" % insspec)
a83fa4ed 2051 elif arg in [ "-p", "--password" ]:
59d74e2b 2052 arg = checked_arg ()
addcec42 2053 checked_secret (make_secret (password=arg))
a83fa4ed 2054 if PDTCRYPT_VERBOSE is True: noise ("PDT: decrypting with password")
70ad9458 2055 else:
da82bc58
PG
2056 if subcommand == PDTCRYPT_SUB_PROCESS:
2057 if arg in [ "-s", "--strict-ivs" ]:
2058 global PDTCRYPT_STRICTIVS
2059 PDTCRYPT_STRICTIVS = True
77058bab
PG
2060 elif arg in [ "-o", "--out", "--dest", "--sink" ]:
2061 outsspec = checked_arg ()
2062 if PDTCRYPT_VERBOSE is True: noise ("PDT: decrypt to %s" % outsspec)
da82bc58 2063 elif arg in [ "-f", "--force" ]:
da82bc58
PG
2064 PDTCRYPT_OVERWRITE = True
2065 if PDTCRYPT_VERBOSE is True: noise ("PDT: overwrite existing files")
2066 elif arg in [ "-S", "--split" ]:
2067 mode |= PDTCRYPT_SPLIT
2068 if PDTCRYPT_VERBOSE is True: noise ("PDT: split files")
2069 elif arg in [ "-D", "--no-decrypt" ]:
2070 mode &= ~PDTCRYPT_DECRYPT
2071 if PDTCRYPT_VERBOSE is True: noise ("PDT: not decrypting")
a83fa4ed 2072 elif arg in [ "-k", "--key" ]:
59d74e2b 2073 arg = checked_arg ()
addcec42 2074 checked_secret (make_secret (key=arg))
a83fa4ed 2075 if PDTCRYPT_VERBOSE is True: noise ("PDT: decrypting with key")
da82bc58 2076 else:
a83fa4ed 2077 bail ("ERROR: unexpected positional argument “%s”" % arg)
da82bc58 2078 elif subcommand == PDTCRYPT_SUB_SCRYPT:
77058bab
PG
2079 if arg in [ "-n", "--nacl", "--salt" ]:
2080 nacl = checked_arg ()
2081 if PDTCRYPT_VERBOSE is True: noise ("PDT: salt key with %s" % nacl)
7b3940e5
PG
2082 elif arg in [ "-f", "--format" ]:
2083 arg = checked_arg ()
2084 try:
2085 scrypt_format = PDTCRYPT_SCRYPT_FORMAT [arg]
2086 except KeyError:
2087 bail ("ERROR: invalid scrypt output format %s" % arg)
2088 if PDTCRYPT_VERBOSE is True:
2089 noise ("PDT: scrypt output format “%s”" % scrypt_format)
77058bab
PG
2090 else:
2091 bail ("ERROR: unexpected positional argument “%s”" % arg)
f41973a6 2092 elif subcommand == PDTCRYPT_SUB_SCAN:
a808459e
PG
2093 if arg in [ "-o", "--out", "--dest", "--sink" ]:
2094 outsspec = checked_arg ()
2095 if PDTCRYPT_VERBOSE is True: noise ("PDT: decrypt to %s" % outsspec)
2096 elif arg in [ "-f", "--force" ]:
a808459e
PG
2097 PDTCRYPT_OVERWRITE = True
2098 if PDTCRYPT_VERBOSE is True: noise ("PDT: overwrite existing files")
2099 else:
2100 bail ("ERROR: unexpected positional argument “%s”" % arg)
70ad9458 2101
a83fa4ed 2102 if secret is None:
ecb9676d 2103 if PDTCRYPT_VERBOSE is True:
a83fa4ed 2104 noise ("ERROR: no password or key specified, trying $PDTCRYPT_PASSWORD")
ecb9676d
PG
2105 epw = os.getenv ("PDTCRYPT_PASSWORD")
2106 if epw is not None:
addcec42 2107 checked_secret (make_secret (password=epw.strip ()))
a83fa4ed
PG
2108
2109 if secret is None:
2110 if PDTCRYPT_VERBOSE is True:
2111 noise ("ERROR: no password or key specified, trying $PDTCRYPT_KEY")
2112 ek = os.getenv ("PDTCRYPT_KEY")
2113 if ek is not None:
addcec42 2114 checked_secret (make_secret (key=ek.strip ()))
ecb9676d 2115
a83fa4ed 2116 if secret is None:
da82bc58 2117 if subcommand == PDTCRYPT_SUB_SCRYPT:
a83fa4ed 2118 bail ("ERROR: scrypt hash mode requested but no password given")
da82bc58 2119 elif mode & PDTCRYPT_DECRYPT:
6257d5b3 2120 bail ("ERROR: decryption requested but no password given")
a83fa4ed 2121
a808459e
PG
2122 if mode & PDTCRYPT_SPLIT and outsspec is None:
2123 bail ("ERROR: split mode is incompatible with stdout sink "
2124 "(the default)")
2125
2126 if subcommand == PDTCRYPT_SUB_SCAN and outsspec is None:
2127 pass # no output by default in scan mode
2128 elif mode & PDTCRYPT_SPLIT or subcommand == PDTCRYPT_SUB_SCAN:
2129 # destination must be directory
2130 if outsspec == "-":
2131 bail ("ERROR: mode is incompatible with stdout sink")
2132 try:
2133 try:
2134 os.makedirs (outsspec, 0o700)
2135 except FileExistsError:
2136 # if it’s a directory with appropriate perms, everything is
2137 # good; otherwise, below invocation of open(2) will fail
2138 pass
2139 outs = os.open (outsspec, os.O_DIRECTORY, 0o600)
2140 except FileNotFoundError as exn:
2141 bail ("ERROR: cannot create target directory “%s”" % outsspec)
2142 except NotADirectoryError as exn:
2143 bail ("ERROR: target path “%s” is not a directory" % outsspec)
2144 else:
2145 outs = deptdcrypt_mk_stream (PDTCRYPT_SINK, outsspec or "-")
2146
f41973a6
PG
2147 if subcommand == PDTCRYPT_SUB_SCAN:
2148 if insspec is None:
2149 bail ("ERROR: please supply an input file for scanning")
2150 if insspec == '-':
2151 bail ("ERROR: input must be seekable; please specify a file")
a808459e 2152 return True, partial (mode_scan, secret, insspec, outs, nacl=nacl)
f41973a6 2153
77058bab
PG
2154 if subcommand == PDTCRYPT_SUB_SCRYPT:
2155 if secret [0] == PDTCRYPT_SECRET_KEY:
2156 bail ("ERROR: scrypt mode requires a password")
2157 if insspec is not None and nacl is not None \
2158 or insspec is None and nacl is None :
2159 bail ("ERROR: please supply either an input file or "
2160 "the salt")
70ad9458
PG
2161
2162 # default to stdout
77058bab
PG
2163 ins = None
2164 if insspec is not None or subcommand != PDTCRYPT_SUB_SCRYPT:
2165 ins = deptdcrypt_mk_stream (PDTCRYPT_SOURCE, insspec or "-")
da82bc58
PG
2166
2167 if subcommand == PDTCRYPT_SUB_SCRYPT:
7b3940e5
PG
2168 return True, partial (mode_scrypt, secret [1].encode (), ins, nacl,
2169 fmt=scrypt_format)
da82bc58 2170
a83fa4ed 2171 return True, partial (mode_depdtcrypt, mode, secret, ins, outs)
15d3eefd
PG
2172
2173
00b3cd10 2174def main (argv):
da82bc58 2175 ok, runner = parse_argv (argv)
f08c604b 2176
da82bc58 2177 if ok is True: return runner ()
15d3eefd 2178
da82bc58 2179 return 1
f08c604b 2180
00b3cd10
PG
2181
2182if __name__ == "__main__":
2183 sys.exit (main (sys.argv))
2184