Add missing iv fixed part in error output
[python-delta-tar] / deltatar / crypto.py
CommitLineData
00b3cd10
PG
1#!/usr/bin/env python3
2
3"""
83f2d71e 4Intra2net 2017
00b3cd10
PG
5
6===============================================================================
704ceaa5 7 crypto -- Encryption Layer for the Deltatar Backup
00b3cd10
PG
8===============================================================================
9
10Crypto stack:
11
12 - AES-GCM for the symmetric encryption;
13 - Scrypt as KDF.
14
15References:
16
17 - NIST Recommendation for Block Cipher Modes of Operation: Galois/Counter
18 Mode (GCM) and GMAC
19 http://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-38d.pdf
20
21 - AES-GCM v1:
22 https://cryptome.org/2014/01/aes-gcm-v1.pdf
23
24 - Authentication weaknesses in GCM
25 http://csrc.nist.gov/groups/ST/toolkit/BCM/documents/comments/CWC-GCM/Ferguson2.pdf
26
83f2d71e
PG
27Trouble with python-cryptography packages: authentication tags can only be
28passed in advance: https://github.com/pyca/cryptography/pull/3421
29
6d08915c
PG
30Errors
31-------------------------------------------------------------------------------
32
33Errors fall into roughly three categories:
34
704ceaa5 35 - Cryptographical errors or invalid data.
6d08915c
PG
36
37 - ``InvalidGCMTag`` (decryption failed on account of an invalid GCM
38 tag),
39 - ``InvalidIVFixedPart`` (IV fixed part of object not found in list),
f6cd676f 40 - ``DuplicateIV`` (the IV of an encrypted object already occurred),
704ceaa5
PG
41 - ``DecryptionError`` (used in CLI decryption for presenting error
42 conditions to the user).
6d08915c
PG
43
44 - Incorrect usage of the library.
45
46 - ``InvalidParameter`` (non-conforming user supplied parameter),
47 - ``InvalidHeader`` (data passed for reading not parsable into header),
48 - ``FormatError`` (cannot handle header or parameter version),
49 - ``RuntimeError``.
50
51 - Bad internal state. If one of these is encountered it means that a state
52 was reached that shouldn’t occur during normal processing.
53
54 - ``InternalError``,
55 - ``Unreachable``.
56
57Also, ``EndOfFile`` is used as a sentinel to communicate that a stream supplied
58for reading is exhausted.
59
f6cd676f
PG
60Initialization Vectors
61-------------------------------------------------------------------------------
62
817cfffa 63Initialization vectors are checked for reuse during the lifetime of a decryptor.
704ceaa5
PG
64The fixed counters for metadata files cannot be reused and attempts to do so
65will cause a DuplicateIV error. This means the length of objects encrypted with
66a metadata counter is capped at 63 GB.
67
68For ordinary, non-metadata payload, there is an optional mode with strict IV
69checking that causes a crypto context to fail if an IV encountered or created
70was already used for decrypting or encrypting, respectively, an earlier object.
71Note that this mode can trigger false positives when decrypting non-linearly,
72e. g. when traversing the same object multiple times. Since the crypto context
73has no notion of a position in a PDT encrypted archive, this condition must be
74sorted out downstream.
75
76Command Line Utility
77-------------------------------------------------------------------------------
78
79``crypto.py`` may be invoked as a script for decrypting, validating, and
80splitting PDT encrypted files. Consult the usage message for details.
81
82Usage examples:
83
84Decrypt from stdin using the password ‘foo’: ::
85
86 $ crypto.py process foo -i - -o - <some-file.tar.gz.pdtcrypt >some-file.tar.gz
87
88Output verbose information about the encrypted objects in the archive: ::
89
90 $ crypto.py process foo -v -i some-file.tar.gz.pdtcrypt -o /dev/null
91 PDT: decrypt from some-file.tar.gz.pdtcrypt
92 PDT: decrypt to /dev/null
93 PDT: source: file some-file.tar.gz.pdtcrypt
94 PDT: sink: file /dev/null
95 PDT: 0 hdr
96 PDT: · version = 1 : 0100
97 PDT: · paramversion = 1 : 0100
98 PDT: · nacl : d270 b031 00d1 87e2 c946 610d 7b7f 7e5f
99 PDT: · iv : 02ee 3dd7 a963 1eb1 0100 0000
100 PDT: · ctsize = 591 : 4f02 0000 0000 0000
101 PDT: · tag : 5b2d 6d8b 8f82 4842 12fd 0b10 b6e3 369b
102 PDT: 64 decrypt obj no. 1, 591 B
103 PDT: · [64] 0% done, read block (591 B of 591 B remaining)
104 PDT: · decrypt ciphertext 591 B
105 PDT: · decrypt plaintext 591 B
106 PDT: 655 finalize
107
108
109Also, the mode *scrypt* allows deriving encryption keys. To calculate the
110encryption key from the password ‘foo’ and the salt of the first object in a
111PDT encrypted file: ::
112
113 $ crypto.py scrypt foo -i some-file.pdtcrypt
4f6405d6 114 {"paramversion": 1, "salt": "Cqzbk48e3peEjzWto8D0yA==", "key": "JH9EkMwaM4x9F5aim5gK/Q=="}
704ceaa5
PG
115
116The computed 16 byte key is given in hexadecimal notation in the value to
117``hash`` and can be fed into Python’s ``binascii.unhexlify()`` to obtain the
118corresponding binary representation.
119
120Note that in Scrypt hashing mode, no data integrity checks are being performed.
121If the wrong password is given, a wrong key will be derived. Whether the password
122was indeed correct can only be determined by decrypting. Note that since PDT
123archives essentially consist of a stream of independent objects, the salt and
124other parameters may change. Thus a key derived using above method from the
125first object doesn’t necessarily apply to any of the subsequent objects.
f6cd676f 126
00b3cd10
PG
127"""
128
7b3940e5 129import base64
00b3cd10 130import binascii
50710d86 131import bisect
00b3cd10
PG
132import ctypes
133import io
c46c8670 134from functools import reduce, partial
f41973a6 135import mmap
00b3cd10
PG
136import os
137import struct
a808459e 138import stat
00b3cd10
PG
139import sys
140import time
da82bc58 141import types
00b3cd10
PG
142try:
143 import enum34
144except ImportError as exn:
145 pass
146
6257d5b3 147if __name__ == "__main__": ## Work around the import mechanism lest Python’s
00b3cd10
PG
148 pwd = os.getcwd() ## preference for local imports causes a cyclical
149 ## import (crypto → pylibscrypt → […] → ./tarfile → crypto).
150 sys.path = [ p for p in sys.path if p.find ("deltatar") < 0 ]
151
152import pylibscrypt
153from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes
154from cryptography.hazmat.backends import default_backend
15d3eefd 155import cryptography
00b3cd10
PG
156
157
a64085a8 158__all__ = [ "hdr_make", "hdr_read", "hdr_fmt", "hdr_fmt_pretty"
b360b772 159 , "scrypt_hashfile"
3031b7ae
PG
160 , "PDTCRYPT_HDR_SIZE", "AES_GCM_IV_CNT_DATA"
161 , "AES_GCM_IV_CNT_INFOFILE", "AES_GCM_IV_CNT_INDEX"
2d6fd8c8 162 ]
00b3cd10 163
a393d9cb
PG
164
165###############################################################################
15d3eefd
PG
166## exceptions
167###############################################################################
168
169class EndOfFile (Exception):
170 """Reached EOF."""
ae3d0f2a
PG
171 remainder = 0
172 msg = 0
8a8ac469 173 def __init__ (self, n=None, msg=None):
5d394c0d
PG
174 if n is not None:
175 self.remainder = n
176 self.msg = msg
15d3eefd 177
b0078f26 178
b12110dd
PG
179class InvalidParameter (Exception):
180 """Inputs not valid for PDT encryption."""
181 pass
182
b0078f26 183
15d3eefd
PG
184class InvalidHeader (Exception):
185 """Header not valid."""
186 pass
187
b0078f26
PG
188
189class InvalidGCMTag (Exception):
190 """
191 The GCM tag calculated during decryption differs from that in the object
192 header.
193 """
194 pass
195
196
26b42ad4 197class InvalidIVFixedPart (Exception):
89ec6e2f
PG
198 """
199 IV fixed part not in supplied list: either the backup is corrupt or the
200 current object does not belong to it.
201 """
26b42ad4
PG
202 pass
203
b0078f26 204
be124bca 205class IVFixedPartError (Exception):
89ec6e2f
PG
206 """
207 Error creating a unique IV fixed part: repeated calls to system RNG yielded
208 the same sequence of bytes as the last IV used.
209 """
be124bca
PG
210 pass
211
212
fac2cfe1 213class InvalidFileCounter (Exception):
89ec6e2f
PG
214 """
215 When encrypting, an attempted reuse of a dedicated counter (info file,
216 index file) was caught.
217 """
fac2cfe1
PG
218 pass
219
220
ee6aa239 221class DuplicateIV (Exception):
89ec6e2f
PG
222 """
223 During encryption, the current IV fixed part is identical to an already
224 existing IV (same prefix and file counter). This indicates tampering or
225 programmer error and cannot be recovered from.
226 """
ee6aa239
PG
227 pass
228
229
230class NonConsecutiveIV (Exception):
89ec6e2f
PG
231 """
232 IVs not numbered consecutively. This is a hard error with strict IV
233 checking. Precludes random access to the encrypted objects.
234 """
ee6aa239
PG
235 pass
236
237
b12110dd
PG
238class FormatError (Exception):
239 """Unusable parameters in header."""
240 pass
241
b0078f26 242
15d3eefd 243class DecryptionError (Exception):
89ec6e2f 244 """Error during decryption with ``crypto.py`` on the command line."""
15d3eefd
PG
245 pass
246
b0078f26 247
70ad9458 248class Unreachable (Exception):
89ec6e2f
PG
249 """
250 Makeshift __builtin_unreachable(); always a programmer error if
251 thrown.
252 """
70ad9458
PG
253 pass
254
b0078f26 255
b12110dd
PG
256class InternalError (Exception):
257 """Errors not ascribable to bad user inputs or cryptography."""
258 pass
259
15d3eefd
PG
260
261###############################################################################
a393d9cb
PG
262## crypto layer version
263###############################################################################
264
265ENCRYPTION_PARAMETERS = \
c46c8670 266 { 0: \
dd23cbc9
PG
267 { "kdf": ("dummy", 16)
268 , "enc": "passthrough" }
c46c8670 269 , 1: \
dd23cbc9
PG
270 { "kdf": ( "scrypt"
271 , { "dkLen" : 16
272 , "N" : 1 << 16
273 , "r" : 8
274 , "p" : 1
275 , "NaCl_LEN" : 16 })
276 , "enc": "aes-gcm" } }
a393d9cb 277
00b3cd10
PG
278###############################################################################
279## constants
280###############################################################################
281
dd47d6a2 282PDTCRYPT_HDR_MAGIC = b"PDTCRYPT"
00b3cd10 283
dd47d6a2
PG
284PDTCRYPT_HDR_SIZE_MAGIC = 8 # 8
285PDTCRYPT_HDR_SIZE_VERSION = 2 # 10
286PDTCRYPT_HDR_SIZE_PARAMVERSION = 2 # 12
287PDTCRYPT_HDR_SIZE_NACL = 16 # 28
288PDTCRYPT_HDR_SIZE_IV = 12 # 40
289PDTCRYPT_HDR_SIZE_CTSIZE = 8 # 48
290PDTCRYPT_HDR_SIZE_TAG = 16 # 64 GCM auth tag
00b3cd10 291
dd47d6a2
PG
292PDTCRYPT_HDR_SIZE = PDTCRYPT_HDR_SIZE_MAGIC + PDTCRYPT_HDR_SIZE_VERSION \
293 + PDTCRYPT_HDR_SIZE_PARAMVERSION + PDTCRYPT_HDR_SIZE_NACL \
294 + PDTCRYPT_HDR_SIZE_IV + PDTCRYPT_HDR_SIZE_CTSIZE \
295 + PDTCRYPT_HDR_SIZE_TAG # = 64
00b3cd10
PG
296
297# precalculate offsets since Python can’t do constant folding over names
dd47d6a2
PG
298HDR_OFF_VERSION = PDTCRYPT_HDR_SIZE_MAGIC
299HDR_OFF_PARAMVERSION = HDR_OFF_VERSION + PDTCRYPT_HDR_SIZE_VERSION
300HDR_OFF_NACL = HDR_OFF_PARAMVERSION + PDTCRYPT_HDR_SIZE_PARAMVERSION
301HDR_OFF_IV = HDR_OFF_NACL + PDTCRYPT_HDR_SIZE_NACL
302HDR_OFF_CTSIZE = HDR_OFF_IV + PDTCRYPT_HDR_SIZE_IV
303HDR_OFF_TAG = HDR_OFF_CTSIZE + PDTCRYPT_HDR_SIZE_CTSIZE
00b3cd10
PG
304
305FMT_UINT16_LE = "<H"
306FMT_UINT64_LE = "<Q"
50710d86 307FMT_I2N_IV = "<8sL" # 8 random bytes ‖ 32 bit counter
83f2d71e
PG
308FMT_I2N_HDR = ("<" # host byte order
309 "8s" # magic
310 "H" # version
311 "H" # paramversion
312 "16s" # sodium chloride
313 "12s" # iv
3b53fb98
PG
314 "Q" # size
315 "16s") # GCM tag
00b3cd10
PG
316
317# aes+gcm
addcec42
PG
318AES_KEY_SIZE = 16 # b"0123456789abcdef"
319AES_KEY_SIZE_B64 = 24 # b'MDEyMzQ1Njc4OWFiY2RlZg=='
cb7a3911
PG
320AES_GCM_MAX_SIZE = (1 << 36) - (1 << 5) # 2^39 - 2^8 b ≅ 64 GB
321PDTCRYPT_MAX_OBJ_SIZE_DEFAULT = 63 * (1 << 30) # 63 GB
322PDTCRYPT_MAX_OBJ_SIZE = PDTCRYPT_MAX_OBJ_SIZE_DEFAULT
00b3cd10 323
3031b7ae 324# index and info files are written on-the fly while encrypting so their
817cfffa 325# counters must be available in advance
cb7a3911
PG
326AES_GCM_IV_CNT_INFOFILE = 1 # constant
327AES_GCM_IV_CNT_INDEX = AES_GCM_IV_CNT_INFOFILE + 1
328AES_GCM_IV_CNT_DATA = AES_GCM_IV_CNT_INDEX + 1 # also for multivolume
329AES_GCM_IV_CNT_MAX_DEFAULT = 0xffFFffFF
330AES_GCM_IV_CNT_MAX = AES_GCM_IV_CNT_MAX_DEFAULT
2d6fd8c8 331
be124bca
PG
332# IV structure and generation
333PDTCRYPT_IV_GEN_MAX_RETRIES = 10 # ×
334PDTCRYPT_IV_FIXEDPART_SIZE = 8 # B
335PDTCRYPT_IV_COUNTER_SIZE = 4 # B
39accaaa 336
addcec42
PG
337# secret type: PW of string | KEY of char [16]
338PDTCRYPT_SECRET_PW = 0
339PDTCRYPT_SECRET_KEY = 1
340
00b3cd10 341###############################################################################
39accaaa 342## header, trailer
00b3cd10
PG
343###############################################################################
344#
345# Interface:
346#
347# struct hdrinfo
348# { version : u16
349# , paramversion : u16
350# , nacl : [u8; 16]
351# , iv : [u8; 12]
704ceaa5
PG
352# , ctsize : usize
353# , tag : [u8; 16] }
83f2d71e 354#
00b3cd10 355# fn hdr_read (f : handle) -> hdrinfo;
c2d1c3ec 356# fn hdr_make (f : handle, h : hdrinfo) -> IOResult<usize>;
00b3cd10
PG
357# fn hdr_fmt (h : hdrinfo) -> String;
358#
359
83f2d71e 360def hdr_read (data):
704ceaa5
PG
361 """
362 Read bytes as header structure.
363
364 If the input could not be interpreted as a header, fail with
365 ``InvalidHeader``.
366 """
83f2d71e 367
00b3cd10 368 try:
3b53fb98 369 mag, version, paramversion, nacl, iv, ctsize, tag = \
83f2d71e
PG
370 struct.unpack (FMT_I2N_HDR, data)
371 except Exception as exn:
15d3eefd
PG
372 raise InvalidHeader ("error unpacking header from [%r]: %s"
373 % (binascii.hexlify (data), str (exn)))
00b3cd10 374
dd47d6a2 375 if mag != PDTCRYPT_HDR_MAGIC:
15d3eefd 376 raise InvalidHeader ("bad magic in header: expected [%s], got [%s]"
dd47d6a2 377 % (PDTCRYPT_HDR_MAGIC, mag))
00b3cd10 378
15d3eefd 379 return \
00b3cd10
PG
380 { "version" : version
381 , "paramversion" : paramversion
382 , "nacl" : nacl
383 , "iv" : iv
384 , "ctsize" : ctsize
3b53fb98 385 , "tag" : tag
00b3cd10
PG
386 }
387
388
39accaaa 389def hdr_read_stream (instr):
704ceaa5
PG
390 """
391 Read header from stream at the current position.
392
393 Fail with ``InvalidHeader`` if insufficient bytes were read from the
394 stream, or if the content could not be interpreted as a header.
395 """
dd47d6a2 396 data = instr.read(PDTCRYPT_HDR_SIZE)
ae3d0f2a 397 ldata = len (data)
8a8ac469
PG
398 if ldata == 0:
399 raise EndOfFile
400 elif ldata != PDTCRYPT_HDR_SIZE:
401 raise InvalidHeader ("hdr_read_stream: expected %d B, received %d B"
402 % (PDTCRYPT_HDR_SIZE, ldata))
47e27926 403 return hdr_read (data)
39accaaa
PG
404
405
3b53fb98 406def hdr_from_params (version, paramversion, nacl, iv, ctsize, tag):
704ceaa5
PG
407 """
408 Assemble the necessary values into a PDTCRYPT header.
409
410 :type version: int to fit uint16_t
411 :type paramversion: int to fit uint16_t
412 :type nacl: bytes to fit uint8_t[16]
413 :type iv: bytes to fit uint8_t[12]
414 :type size: int to fit uint64_t
415 :type tag: bytes to fit uint8_t[16]
416 """
dd47d6a2 417 buf = bytearray (PDTCRYPT_HDR_SIZE)
83f2d71e 418 bufv = memoryview (buf)
00b3cd10 419
00b3cd10 420 try:
83f2d71e 421 struct.pack_into (FMT_I2N_HDR, bufv, 0,
dd47d6a2 422 PDTCRYPT_HDR_MAGIC,
3b53fb98 423 version, paramversion, nacl, iv, ctsize, tag)
83f2d71e 424 except Exception as exn:
a83fa4ed 425 return False, "error assembling header: %s" % str (exn)
00b3cd10 426
83f2d71e 427 return True, bytes (buf)
00b3cd10 428
00b3cd10 429
8a990744
PG
430def hdr_make_dummy (s):
431 """
432 Create a header sized block of bytes initialized to a value derived from a
433 string. Used to verify we’ve jumped back correctly to the actual position
434 of the object header.
435 """
436 c = reduce (lambda a, c: a + ord(c), s, 0) % 0xFF
dd47d6a2 437 return bytes (bytearray (struct.pack ("B", c)) * PDTCRYPT_HDR_SIZE)
8a990744
PG
438
439
a393d9cb 440def hdr_make (hdr):
704ceaa5
PG
441 """
442 Assemble a header from the given header structure.
443 """
a393d9cb
PG
444 return hdr_from_params (version=hdr.get("version"),
445 paramversion=hdr.get("paramversion"),
446 nacl=hdr.get("nacl"), iv=hdr.get("iv"),
3b53fb98 447 ctsize=hdr.get("ctsize"), tag=hdr.get("tag"))
a393d9cb
PG
448
449
83f2d71e 450HDR_FMT = "I2n_header { version: %d, paramversion: %d, nacl: %s[%d]," \
89131745 451 " iv: %s[%d], ctsize: %d, tag: %s[%d] }"
00b3cd10 452
83f2d71e 453def hdr_fmt (h):
704ceaa5 454 """Format a header structure into readable output."""
83f2d71e
PG
455 return HDR_FMT % (h["version"], h["paramversion"],
456 binascii.hexlify (h["nacl"]), len(h["nacl"]),
457 binascii.hexlify (h["iv"]), len(h["iv"]),
db1f3ac7
PG
458 h["ctsize"],
459 binascii.hexlify (h["tag"]), len(h["tag"]))
00b3cd10 460
00b3cd10 461
83f2d71e 462def hex_spaced_of_bytes (b):
704ceaa5 463 """Format bytes object, hexdump style."""
83f2d71e
PG
464 return " ".join ([ "%.2x%.2x" % (c1, c2)
465 for c1, c2 in zip (b[0::2], b[1::2]) ]) \
466 + (len (b) | 1 == len (b) and " %.2x" % b[-1] or "") # odd lengths
00b3cd10 467
591a722f 468
3031b7ae
PG
469def hdr_iv_counter (h):
470 """Extract the variable part of the IV of the given header."""
471 _fixed, cnt = struct.unpack (FMT_I2N_IV, h ["iv"])
472 return cnt
473
474
475def hdr_iv_fixed (h):
476 """Extract the fixed part of the IV of the given header."""
477 fixed, _cnt = struct.unpack (FMT_I2N_IV, h ["iv"])
478 return fixed
479
480
83f2d71e 481hdr_dump = hex_spaced_of_bytes
00b3cd10 482
00b3cd10 483
15d3eefd
PG
484HDR_FMT_PRETTY = \
485"""version = %-4d : %s
486paramversion = %-4d : %s
487nacl : %s
488iv : %s
489ctsize = %-20d : %s
490tag : %s
83f2d71e 491"""
00b3cd10 492
83f2d71e 493def hdr_fmt_pretty (h):
704ceaa5
PG
494 """
495 Format header structure into multi-line representation of its contents and
496 their raw representation. (Omit the implicit “PDTCRYPT” magic bytes that
497 precede every header.)
498 """
83f2d71e
PG
499 return HDR_FMT_PRETTY \
500 % (h["version"],
501 hex_spaced_of_bytes (struct.pack (FMT_UINT16_LE, h["version"])),
502 h["paramversion"],
503 hex_spaced_of_bytes (struct.pack (FMT_UINT16_LE, h["paramversion"])),
504 hex_spaced_of_bytes (h["nacl"]),
505 hex_spaced_of_bytes (h["iv"]),
506 h["ctsize"],
15d3eefd
PG
507 hex_spaced_of_bytes (struct.pack (FMT_UINT64_LE, h["ctsize"])),
508 hex_spaced_of_bytes (h["tag"]))
00b3cd10 509
f6cd676f
PG
510IV_FMT = "((f %s) (c %d))"
511
512def iv_fmt (iv):
704ceaa5 513 """Format the two components of an IV in a readable fashion."""
f6cd676f
PG
514 fixed, cnt = struct.unpack (FMT_I2N_IV, iv)
515 return IV_FMT % (binascii.hexlify (fixed), cnt)
516
00b3cd10 517
00b3cd10 518###############################################################################
f41973a6
PG
519## restoration
520###############################################################################
521
522class Location (object):
523 n = 0
524 offset = 0
525
526def restore_loc_fmt (loc):
527 return "%d off:%d" \
528 % (loc.n, loc.offset)
529
530def locate_hdr_candidates (fd):
531 """
532 Walk over instances of the magic string in the payload, collecting their
533 positions. If the offset of the first found instance is not zero, the file
534 begins with leading garbage.
535
536 :return: The list of offsets in the file.
537 """
538 cands = []
539
540 mm = mmap.mmap(fd, 0, mmap.MAP_SHARED, mmap.PROT_READ)
541 pos = 0
542 while True:
543 pos = mm.find (PDTCRYPT_HDR_MAGIC, pos)
544 if pos == -1:
545 break
546 cands.append (pos)
547 pos += 1
548
549 return cands
550
551
6c8073ab
PG
552HDR_CAND_GOOD = 0 # header marks begin of valid object
553HDR_CAND_FISHY = 1 # inconclusive (tag mismatch, obj overlap etc.)
554HDR_CAND_JUNK = 2 # not a header / object unreadable
555
5ed4c57d
PG
556HDR_VERDICT_NAME = \
557 { HDR_CAND_GOOD : "valid"
558 , HDR_CAND_FISHY : "fishy"
559 , HDR_CAND_JUNK : "junk"
560 }
561
562
563def verdict_fmt (vdt):
564 return HDR_VERDICT_NAME [vdt]
565
6c8073ab
PG
566
567def inspect_hdr (fd, off):
568 """
569 Attempt to parse a header in *fd* at position *off*.
570
571 Returns a verdict about the quality of that header plus the parsed header
572 when readable.
573 """
574
575 _ = os.lseek (fd, off, os.SEEK_SET)
576
577 if os.lseek (fd, 0, os.SEEK_CUR) != off:
578 if PDTCRYPT_VERBOSE is True:
579 noise ("PDT: %d → dismissed (lseek() past EOF)" % off)
580 return HDR_CAND_JUNK, None
581
582 raw = os.read (fd, PDTCRYPT_HDR_SIZE)
583 if len (raw) != PDTCRYPT_HDR_SIZE:
584 if PDTCRYPT_VERBOSE is True:
585 noise ("PDT: %d → dismissed (EOF inside header)" % off)
586 return HDR_CAND_JUNK, None
587
588 try:
589 hdr = hdr_read (raw)
590 except InvalidHeader as exn:
591 if PDTCRYPT_VERBOSE is True:
592 noise ("PDT: %d → dismissed (invalid: [%s])" % (off, str (exn)))
593 return HDR_CAND_JUNK, None
594
595 obj0 = off + PDTCRYPT_HDR_SIZE
596 objX = obj0 + hdr ["ctsize"]
597
598 eof = os.lseek (fd, 0, os.SEEK_END)
599 if eof < objX:
600 if PDTCRYPT_VERBOSE is True:
601 noise ("PDT: %d → EOF inside object (%d≤%d≤%d); adjusting size to "
602 "%d" % (off, obj0, eof, objX, (eof - obj0)))
603 # try reading up to the end
604 hdr ["ctsize"] = eof - obj0
605 return HDR_CAND_FISHY, hdr
606
607 return HDR_CAND_GOOD, hdr
608
609
a808459e 610def try_decrypt (ifd, off, hdr, secret, ofd=-1):
6c8073ab 611 """
a808459e
PG
612 Attempt to decrypt the object in the (seekable) descriptor *ifd* starting
613 at *off* using the metadata in *hdr* and *secret*. An output fd can be
614 specified with *ofd*; if it is *-1* – the default –, the decrypted payload
615 will be discarded.
70a33834
PG
616
617 Always creates a fresh decryptor, so validation steps across objects don’t
618 apply.
202104ed
PG
619
620 Errors during GCM tag validation are ignored.
6c8073ab 621 """
70a33834
PG
622 ctleft = hdr ["ctsize"]
623 pos = off
624
625 ks = secret [0]
626 if ks == PDTCRYPT_SECRET_PW:
627 decr = Decrypt (password=secret [1])
628 elif ks == PDTCRYPT_SECRET_KEY:
6257d5b3 629 key = secret [1]
70a33834
PG
630 decr = Decrypt (key=key)
631 else:
632 raise RuntimeError
633
70a33834
PG
634 decr.next (hdr)
635
636 try:
a808459e 637 os.lseek (ifd, pos, os.SEEK_SET)
37ccf5bc 638 pt = b""
70a33834
PG
639 while ctleft > 0:
640 cnksiz = min (ctleft, PDTCRYPT_BLOCKSIZE)
a808459e 641 cnk = os.read (ifd, cnksiz)
70a33834
PG
642 ctleft -= cnksiz
643 pos += cnksiz
a808459e
PG
644 pt = decr.process (cnk)
645 if ofd != -1:
646 os.write (ofd, pt)
202104ed
PG
647 try:
648 pt = decr.done ()
649 except InvalidGCMTag:
650 noise ("PDT: GCM tag mismatch for object %d–%d"
651 % (off, off + hdr ["ctsize"]))
a808459e
PG
652 if len (pt) > 0 and ofd != -1:
653 os.write (ofd, pt)
70a33834 654
70a33834
PG
655 except Exception as exn:
656 noise ("PDT: error decrypting object %d–%d@%d, %d B remaining [%s]"
657 % (off, off + hdr ["ctsize"], pos, ctleft, exn))
658 raise
6c8073ab 659
70a33834 660 return pos - off
6c8073ab
PG
661
662
6690f5e0
PG
663def readable_objects_offsets (ifd, secret, cands):
664 """
665 From a list of candidates, locate the ones that mark the start of actual
666 readable PDTCRYPT objects.
667 """
668 good = []
24afaf18
PG
669
670 for i, cand in enumerate (cands):
6690f5e0
PG
671 vdt, hdr = inspect_hdr (ifd, cand)
672 if vdt == HDR_CAND_JUNK:
673 pass # ignore unreadable ones
674 elif vdt in [HDR_CAND_GOOD, HDR_CAND_FISHY]:
24afaf18 675 ctsize = hdr ["ctsize"]
6690f5e0 676 off0 = cand + PDTCRYPT_HDR_SIZE
24afaf18 677 ok = try_decrypt (ifd, off0, hdr, secret) == ctsize
6690f5e0 678 if ok is True:
24afaf18
PG
679 good.append ((cand, off0 + ctsize))
680
681 overlap = find_overlaps (good)
682
683 return [ g [0] for g in good ]
6690f5e0
PG
684
685
686def reconstruct_offsets (fname, secret):
687 ifd = os.open (fname, os.O_RDONLY)
688
689 try:
690 cands = locate_hdr_candidates (ifd)
691 return readable_objects_offsets (ifd, secret, cands)
692 finally:
693 os.close (ifd)
694
695
f41973a6 696###############################################################################
addcec42
PG
697## helpers
698###############################################################################
699
700def make_secret (password=None, key=None):
701 """
702 Safely create a “secret” value that consists either of a key or a password.
703 Inputs are validated: the password is accepted as (UTF-8 encoded) bytes or
704 string; for the key only a bytes object of the proper size or a base64
705 encoded string thereof is accepted.
706
707 If both are provided, the key is preferred over the password; no checks are
708 performed whether the key is derived from the password.
709
710 :returns: secret value if inputs were acceptable | None otherwise.
711 """
712 if key is not None:
713 if isinstance (key, str) is True:
714 key = key.encode ("utf-8")
715 if isinstance (key, bytes) is True:
716 if len (key) == AES_KEY_SIZE:
717 return (PDTCRYPT_SECRET_KEY, key)
6257d5b3
PG
718 if len (key) == AES_KEY_SIZE * 2:
719 try:
720 key = binascii.unhexlify (key)
721 return (PDTCRYPT_SECRET_KEY, key)
722 except binascii.Error: # garbage in string
723 pass
addcec42
PG
724 if len (key) == AES_KEY_SIZE_B64:
725 try:
726 key = base64.b64decode (key)
727 # the base64 processor is very tolerant and allows for
6257d5b3 728 # arbitrary trailing and leading data thus the data obtained
addcec42
PG
729 # must be checked for the proper length
730 if len (key) == AES_KEY_SIZE:
731 return (PDTCRYPT_SECRET_KEY, key)
732 except binascii.Error: # “incorrect padding”
733 pass
734 elif password is not None:
735 if isinstance (password, str) is True:
736 return (PDTCRYPT_SECRET_PW, password)
737 elif isinstance (password, bytes) is True:
738 try:
739 password = password.decode ("utf-8")
740 return (PDTCRYPT_SECRET_PW, password)
741 except UnicodeDecodeError:
742 pass
743
744 return None
745
746
747###############################################################################
6178061e
PG
748## passthrough / null encryption
749###############################################################################
750
751class PassthroughCipher (object):
752
753 tag = struct.pack ("<QQ", 0, 0)
754
755 def __init__ (self) : pass
756
757 def update (self, b) : return b
758
50710d86 759 def finalize (self) : return b""
6178061e
PG
760
761 def finalize_with_tag (self, _) : return b""
762
763###############################################################################
a393d9cb 764## convenience wrapper
00b3cd10
PG
765###############################################################################
766
c46c8670
PG
767
768def kdf_dummy (klen, password, _nacl):
704ceaa5
PG
769 """
770 Fake KDF for testing purposes that is called when parameter version zero is
771 encountered.
772 """
c46c8670
PG
773 q, r = divmod (klen, len (password))
774 if isinstance (password, bytes) is False:
775 password = password.encode ()
776 return password * q + password [:r], b""
777
778
779SCRYPT_KEY_MEMO = { } # static because needed for both the info file and the archive
780
781
782def kdf_scrypt (params, password, nacl):
704ceaa5
PG
783 """
784 Wrapper for the Scrypt KDF, corresponds to parameter version one. The
785 computation result is memoized based on the inputs to facilitate spawning
786 multiple encryption contexts.
787 """
c46c8670
PG
788 N = params["N"]
789 r = params["r"]
790 p = params["p"]
791 dkLen = params["dkLen"]
792
793 if nacl is None:
794 nacl = os.urandom (params["NaCl_LEN"])
795
796 key_parms = (password, nacl, N, r, p, dkLen)
797 global SCRYPT_KEY_MEMO
798 if key_parms not in SCRYPT_KEY_MEMO:
799 SCRYPT_KEY_MEMO [key_parms] = \
800 pylibscrypt.scrypt (password, nacl, N, r, p, dkLen)
801 return SCRYPT_KEY_MEMO [key_parms], nacl
a64085a8
PG
802
803
da82bc58 804def kdf_by_version (paramversion=None, defs=None):
704ceaa5
PG
805 """
806 Pick the KDF handler corresponding to the parameter version or the
807 definition set.
808
809 :rtype: function (password : str, nacl : str) -> str
810 """
da82bc58
PG
811 if paramversion is not None:
812 defs = ENCRYPTION_PARAMETERS.get(paramversion, None)
a64085a8 813 if defs is None:
1ed44e7b
PG
814 raise InvalidParameter ("no encryption parameters for version %r"
815 % paramversion)
a64085a8 816 (kdf, params) = defs["kdf"]
c46c8670
PG
817 fn = None
818 if kdf == "scrypt" : fn = kdf_scrypt
819 if kdf == "dummy" : fn = kdf_dummy
820 if fn is None:
a64085a8 821 raise ValueError ("key derivation method %r unknown" % kdf)
c46c8670 822 return partial (fn, params)
a64085a8
PG
823
824
b360b772
PG
825###############################################################################
826## SCRYPT hashing
827###############################################################################
828
829def scrypt_hashsource (pw, ins):
830 """
831 Calculate the SCRYPT hash from the password and the information contained
832 in the first header found in ``ins``.
833
834 This does not validate whether the first object is encrypted correctly.
835 """
c1ecc2e2
PG
836 if isinstance (pw, str) is True:
837 pw = str.encode (pw)
838 elif isinstance (pw, bytes) is False:
839 raise InvalidParameter ("password must be a string, not %s"
1ae49141 840 % type (pw))
c1ecc2e2
PG
841 if isinstance (ins, io.BufferedReader) is False and \
842 isinstance (ins, io.FileIO) is False:
843 raise InvalidParameter ("file to hash must be opened in “binary” mode")
b360b772
PG
844 hdr = None
845 try:
846 hdr = hdr_read_stream (ins)
847 except EndOfFile as exn:
848 noise ("PDT: malformed input: end of file reading first object header")
849 noise ("PDT:")
850 return 1
851
852 nacl = hdr ["nacl"]
853 pver = hdr ["paramversion"]
854 if PDTCRYPT_VERBOSE is True:
855 noise ("PDT: salt of first object : %s" % binascii.hexlify (nacl))
856 noise ("PDT: parameter version of archive : %d" % pver)
857
858 try:
859 defs = ENCRYPTION_PARAMETERS.get(pver, None)
860 kdfname, params = defs ["kdf"]
861 if kdfname != "scrypt":
862 noise ("PDT: input is not an SCRYPT archive")
863 noise ("")
864 return 1
865 kdf = kdf_by_version (None, defs)
866 except ValueError as exn:
867 noise ("PDT: object has unknown parameter version %d" % pver)
868
869 hsh, _void = kdf (pw, nacl)
870
c1ecc2e2 871 return hsh, nacl, hdr ["version"], pver
b360b772
PG
872
873
874def scrypt_hashfile (pw, fname):
704ceaa5
PG
875 """
876 Calculate the SCRYPT hash from the password and the information contained
877 in the first header found in the given file. The header is read only at
878 offset zero.
879 """
b360b772 880 with deptdcrypt_mk_stream (PDTCRYPT_SOURCE, fname or "-") as ins:
c1ecc2e2 881 hsh, _void, _void, _void = scrypt_hashsource (pw, ins)
b360b772
PG
882 return hsh
883
884
885###############################################################################
886## AES-GCM context
887###############################################################################
888
a393d9cb
PG
889class Crypto (object):
890 """
891 Encryption context to remain alive throughout an entire tarfile pass.
892 """
6178061e 893 enc = None
a393d9cb
PG
894 nacl = None
895 key = None
50710d86
PG
896 cnt = None # file counter (uint32_t != 0)
897 iv = None # current IV
30019abf
PG
898 fixed = None # accu for 64 bit fixed parts of IV
899 used_ivs = None # tracks IVs
900 strict_ivs = False # if True, panic on duplicate object IV
48db09ba
PG
901 password = None
902 paramversion = None
633b18a9
PG
903 stats = { "in" : 0
904 , "out" : 0
905 , "obj" : 0 }
fa47412e 906
fa47412e
PG
907 ctsize = -1
908 ptsize = -1
3031b7ae
PG
909 info_counter_used = False
910 index_counter_used = False
a393d9cb 911
a64085a8 912 def __init__ (self, *al, **akv):
30019abf 913 self.used_ivs = set ()
a64085a8 914 self.set_parameters (*al, **akv)
39accaaa
PG
915
916
704ceaa5 917 def next_fixed (self):
be124bca 918 # NOP for decryption
50710d86
PG
919 pass
920
921
922 def set_object_counter (self, cnt=None):
704ceaa5
PG
923 """
924 Safely set the internal counter of encrypted objects. Numerous
925 constraints apply:
926
927 The same counter may not be reused in combination with one IV fixed
928 part. This is validated elsewhere in the IV handling.
929
930 Counter zero is invalid. The first two counters are reserved for
931 metadata. The implementation does not allow for splitting metadata
932 files over multiple encrypted objects. (This would be possible by
933 assigning new fixed parts.) Thus in a Deltatar backup there is at most
934 one object with a counter value of one and two. On creation of a
935 context, the initial counter may be chosen. The globals
936 ``AES_GCM_IV_CNT_INFOFILE`` and ``AES_GCM_IV_CNT_INDEX`` can be used to
937 request one of the reserved values. If one of these values has been
938 used, any further attempt of setting the counter to that value will
939 be rejected with an ``InvalidFileCounter`` exception.
940
941 Out of bounds values (i. e. below one and more than the maximum of 2³²)
942 cause an ``InvalidParameter`` exception to be thrown.
943 """
50710d86
PG
944 if cnt is None:
945 self.cnt = AES_GCM_IV_CNT_DATA
946 return
947 if cnt == 0 or cnt > AES_GCM_IV_CNT_MAX + 1:
b12110dd
PG
948 raise InvalidParameter ("invalid counter value %d requested: "
949 "acceptable values are from 1 to %d"
950 % (cnt, AES_GCM_IV_CNT_MAX))
50710d86
PG
951 if cnt == AES_GCM_IV_CNT_INFOFILE:
952 if self.info_counter_used is True:
fac2cfe1
PG
953 raise InvalidFileCounter ("attempted to reuse info file "
954 "counter %d: must be unique" % cnt)
50710d86 955 self.info_counter_used = True
3031b7ae
PG
956 elif cnt == AES_GCM_IV_CNT_INDEX:
957 if self.index_counter_used is True:
fac2cfe1
PG
958 raise InvalidFileCounter ("attempted to reuse index file "
959 " counter %d: must be unique" % cnt)
3031b7ae 960 self.index_counter_used = True
50710d86
PG
961 if cnt <= AES_GCM_IV_CNT_MAX:
962 self.cnt = cnt
963 return
964 # cnt == AES_GCM_IV_CNT_MAX + 1 → wrap
965 self.cnt = AES_GCM_IV_CNT_DATA
704ceaa5 966 self.next_fixed ()
50710d86
PG
967
968
1f3fd7b0 969 def set_parameters (self, password=None, key=None, paramversion=None,
be124bca 970 nacl=None, counter=None, strict_ivs=False):
704ceaa5
PG
971 """
972 Configure the internal state of a crypto context. Not intended for
973 external use.
974 """
be124bca 975 self.next_fixed ()
50710d86 976 self.set_object_counter (counter)
30019abf
PG
977 self.strict_ivs = strict_ivs
978
a83fa4ed
PG
979 if paramversion is not None:
980 self.paramversion = paramversion
981
1f3fd7b0
PG
982 if key is not None:
983 self.key, self.nacl = key, nacl
984 return
985
a83fa4ed
PG
986 if password is not None:
987 if isinstance (password, bytes) is False:
988 password = str.encode (password)
989 self.password = password
990 if paramversion is None and nacl is None:
991 # postpone key setup until first header is available
992 return
993 kdf = kdf_by_version (paramversion)
994 if kdf is not None:
995 self.key, self.nacl = kdf (password, nacl)
fa47412e 996
39accaaa 997
39accaaa 998 def process (self, buf):
704ceaa5
PG
999 """
1000 Encrypt / decrypt a buffer. Invokes the ``.update()`` method on the
1001 wrapped encryptor or decryptor, respectively.
1002
1003 The Cryptography exception ``AlreadyFinalized`` is translated to an
1004 ``InternalError`` at this point. It may occur in sound code when the GC
1005 closes an encrypting stream after an error. Everywhere else it must be
1006 treated as a bug.
1007 """
cb7a3911
PG
1008 if self.enc is None:
1009 raise RuntimeError ("process: context not initialized")
1010 self.stats ["in"] += len (buf)
fac2cfe1
PG
1011 try:
1012 out = self.enc.update (buf)
1013 except cryptography.exceptions.AlreadyFinalized as exn:
1014 raise InternalError (exn)
cb7a3911
PG
1015 self.stats ["out"] += len (out)
1016 return out
39accaaa
PG
1017
1018
30019abf 1019 def next (self, password, paramversion, nacl, iv):
704ceaa5
PG
1020 """
1021 Prepare for encrypting another object: Reset the data counters and
1022 change the configuration in case one of the variable parameters differs
1023 from the last object. Also check the IV for duplicates and error out
1024 if strict checking was requested.
1025 """
fa47412e
PG
1026 self.ctsize = 0
1027 self.ptsize = 0
1028 self.stats ["obj"] += 1
30019abf
PG
1029
1030 self.check_duplicate_iv (iv)
1031
6178061e
PG
1032 if ( self.paramversion != paramversion
1033 or self.password != password
1034 or self.nacl != nacl):
1f3fd7b0 1035 self.set_parameters (password=password, paramversion=paramversion,
30019abf
PG
1036 nacl=nacl, strict_ivs=self.strict_ivs)
1037
1038
1039 def check_duplicate_iv (self, iv):
704ceaa5
PG
1040 """
1041 Add an IV (the 12 byte representation as in the header) to the list. With
1042 strict checking enabled, this will throw a ``DuplicateIV``. Depending on
1043 the context, this may indicate a serious error (IV reuse).
1044 """
30019abf
PG
1045 if self.strict_ivs is True and iv in self.used_ivs:
1046 raise DuplicateIV ("iv %s was reused" % iv_fmt (iv))
1047 # vi has not been used before; add to collection
1048 self.used_ivs.add (iv)
fa47412e
PG
1049
1050
633b18a9 1051 def counters (self):
704ceaa5
PG
1052 """
1053 Access the data counters.
1054 """
633b18a9
PG
1055 return self.stats ["obj"], self.stats ["in"], self.stats ["out"]
1056
1057
8de91f4f
PG
1058 def drop (self):
1059 """
1060 Clear the current context regardless of its finalization state. The
1061 next operation must be ``.next()``.
1062 """
1063 self.enc = None
1064
1065
39accaaa
PG
1066class Encrypt (Crypto):
1067
48db09ba
PG
1068 lastinfo = None
1069 version = None
72a42219 1070 paramenc = None
50710d86 1071
1f3fd7b0 1072 def __init__ (self, version, paramversion, password=None, key=None, nacl=None,
30019abf 1073 counter=AES_GCM_IV_CNT_DATA, strict_ivs=True):
704ceaa5
PG
1074 """
1075 The ctor will throw immediately if one of the parameters does not conform
1076 to our expectations.
1077
1078 counter=AES_GCM_IV_CNT_DATA, strict_ivs=True):
1079 :type version: int to fit uint16_t
1080 :type paramversion: int to fit uint16_t
1081 :param password: mutually exclusive with ``key``
1082 :type password: bytes
1083 :param key: mutually exclusive with ``password``
1084 :type key: bytes
1085 :type nacl: bytes
1086 :type counter: initial object counter the values
1087 ``AES_GCM_IV_CNT_INFOFILE`` and
1088 ``AES_GCM_IV_CNT_INDEX`` are unique in each backup set
1089 and cannot be reused even with different fixed parts.
1090 :type strict_ivs: bool
1091 """
1f3fd7b0
PG
1092 if password is None and key is None \
1093 or password is not None and key is not None :
1094 raise InvalidParameter ("__init__: need either key or password")
1095
1096 if key is not None:
1097 if isinstance (key, bytes) is False:
1098 raise InvalidParameter ("__init__: key must be provided as "
1099 "bytes, not %s" % type (key))
1100 if nacl is None:
1101 raise InvalidParameter ("__init__: salt must be provided along "
1102 "with encryption key")
1103 else: # password, no key
1104 if isinstance (password, str) is False:
1105 raise InvalidParameter ("__init__: password must be a string, not %s"
1106 % type (password))
1107 if len (password) == 0:
1108 raise InvalidParameter ("__init__: supplied empty password but not "
1109 "permitted for PDT encrypted files")
36b9932a
PG
1110 # version
1111 if isinstance (version, int) is False:
1112 raise InvalidParameter ("__init__: version number must be an "
1113 "integer, not %s" % type (version))
1114 if version < 0:
1115 raise InvalidParameter ("__init__: version number must be a "
1116 "nonnegative integer, not %d" % version)
1117 # paramversion
1118 if isinstance (paramversion, int) is False:
1119 raise InvalidParameter ("__init__: crypto parameter version number "
1120 "must be an integer, not %s"
1121 % type (paramversion))
1122 if paramversion < 0:
1123 raise InvalidParameter ("__init__: crypto parameter version number "
1124 "must be a nonnegative integer, not %d"
1125 % paramversion)
1126 # salt
1127 if nacl is not None:
1128 if isinstance (nacl, bytes) is False:
1129 raise InvalidParameter ("__init__: salt given, but of type %s "
1130 "instead of bytes" % type (nacl))
1131 # salt length would depend on the actual encryption so it can’t be
1132 # validated at this point
b12110dd 1133 self.fixed = [ ]
48db09ba
PG
1134 self.version = version
1135 self.paramenc = ENCRYPTION_PARAMETERS.get (paramversion) ["enc"]
72a42219 1136
1f3fd7b0 1137 super().__init__ (password, key, paramversion, nacl, counter=counter,
30019abf 1138 strict_ivs=strict_ivs)
a393d9cb
PG
1139
1140
be124bca
PG
1141 def next_fixed (self, retries=PDTCRYPT_IV_GEN_MAX_RETRIES):
1142 """
1143 Generate the next IV fixed part by reading eight bytes from
1144 ``/dev/urandom``. The buffer so obtained is tested against the fixed
1145 parts used so far to prevent accidental reuse of IVs. After a
1146 configurable number of attempts to create a unique fixed part, it will
1147 refuse to continue with an ``IVFixedPartError``. This is unlikely to
1148 ever happen on a normal system but may detect an issue with the random
1149 generator.
1150
1151 The list of fixed parts that were used by the context at hand can be
1152 accessed through the ``.fixed`` list. Its last element is the fixed
1153 part currently in use.
1154 """
1155 i = 0
1156 while i < retries:
1157 fp = os.urandom (PDTCRYPT_IV_FIXEDPART_SIZE)
1158 if fp not in self.fixed:
1159 self.fixed.append (fp)
1160 return
1161 i += 1
1162 raise IVFixedPartError ("error obtaining a unique IV fixed part from "
1163 "/dev/urandom; giving up after %d tries" % i)
1164
1165
a393d9cb 1166 def iv_make (self):
704ceaa5
PG
1167 """
1168 Construct a 12-bytes IV from the current fixed part and the object
1169 counter.
1170 """
b12110dd 1171 return struct.pack(FMT_I2N_IV, self.fixed [-1], self.cnt)
a393d9cb
PG
1172
1173
cb7a3911 1174 def next (self, filename=None, counter=None):
704ceaa5
PG
1175 """
1176 Prepare for encrypting the next incoming object. Update the counter
1177 and put together the IV, possibly changing prefixes. Then create the
1178 new encryptor.
1179
1180 The argument ``counter`` can be used to specify a file counter for this
1181 object. Unless it is one of the reserved values, the counter of
1182 subsequent objects will be computed from this one.
1183
1184 If this is the first object in a series, ``filename`` is required,
1185 otherwise it is reused if not present. The value is used to derive a
1186 header sized placeholder to use until after encryption when all the
1187 inputs to construct the final header are available. This is then
1188 matched in ``.done()`` against the value found at the position of the
1189 header. The motivation for this extra check is primarily to assist
1190 format debugging: It makes stray headers easy to spot in malformed
1191 PDTCRYPT files.
1192 """
cb7a3911
PG
1193 if filename is None:
1194 if self.lastinfo is None:
1195 raise InvalidParameter ("next: filename is mandatory for "
1196 "first object")
1197 filename, _dummy = self.lastinfo
1198 else:
1199 if isinstance (filename, str) is False:
1200 raise InvalidParameter ("next: filename must be a string, no %s"
1201 % type (filename))
3031b7ae
PG
1202 if counter is not None:
1203 if isinstance (counter, int) is False:
1204 raise InvalidParameter ("next: the supplied counter is of "
1205 "invalid type %s; please pass an "
1206 "integer instead" % type (counter))
1207 self.set_object_counter (counter)
fac2cfe1 1208
50710d86 1209 self.iv = self.iv_make ()
72a42219 1210 if self.paramenc == "aes-gcm":
6178061e
PG
1211 self.enc = Cipher \
1212 ( algorithms.AES (self.key)
1213 , modes.GCM (self.iv)
1214 , backend = default_backend ()) \
1215 .encryptor ()
72a42219 1216 elif self.paramenc == "passthrough":
6178061e
PG
1217 self.enc = PassthroughCipher ()
1218 else:
b12110dd
PG
1219 raise InvalidParameter ("next: parameter version %d not known"
1220 % self.paramversion)
48db09ba
PG
1221 hdrdum = hdr_make_dummy (filename)
1222 self.lastinfo = (filename, hdrdum)
30019abf 1223 super().next (self.password, self.paramversion, self.nacl, self.iv)
72a42219 1224
3031b7ae 1225 self.set_object_counter (self.cnt + 1)
48db09ba 1226 return hdrdum
a393d9cb 1227
a393d9cb 1228
cd77dadb 1229 def done (self, cmpdata):
704ceaa5
PG
1230 """
1231 Complete encryption of an object. After this has been called, attempts
1232 of encrypting further data will cause an error until ``.next()`` is
1233 invoked properly.
1234
1235 Returns a 64 bytes buffer containing the object header including all
1236 values including the “late” ones e. g. the ciphertext size and the
1237 GCM tag.
1238 """
36b9932a
PG
1239 if isinstance (cmpdata, bytes) is False:
1240 raise InvalidParameter ("done: comparison input expected as bytes, "
1241 "not %s" % type (cmpdata))
cb7a3911
PG
1242 if self.lastinfo is None:
1243 raise RuntimeError ("done: encryption context not initialized")
48db09ba
PG
1244 filename, hdrdum = self.lastinfo
1245 if cmpdata != hdrdum:
b12110dd
PG
1246 raise RuntimeError ("done: bad sync of header for object %d: "
1247 "preliminary data does not match; this likely "
1248 "indicates a wrongly repositioned stream"
1249 % self.cnt)
6178061e 1250 data = self.enc.finalize ()
633b18a9 1251 self.stats ["out"] += len (data)
cd77dadb 1252 self.ctsize += len (data)
48db09ba
PG
1253 ok, hdr = hdr_from_params (self.version, self.paramversion, self.nacl,
1254 self.iv, self.ctsize, self.enc.tag)
8a990744 1255 if ok is False:
b12110dd
PG
1256 raise InternalError ("error constructing header: %r" % hdr)
1257 return data, hdr, self.fixed
a393d9cb 1258
a393d9cb 1259
cd77dadb 1260 def process (self, buf):
704ceaa5
PG
1261 """
1262 Encrypt a chunk of plaintext with the active encryptor. Returns the
1263 size of the input consumed. This **must** be checked downstream. If the
1264 maximum possible object size has been reached, the current context must
1265 be finalized and a new one established before any further data can be
1266 encrypted. The second argument is the remainder of the plaintext that
1267 was not encrypted for the caller to use immediately after the new
1268 context is ready.
1269 """
36b9932a
PG
1270 if isinstance (buf, bytes) is False:
1271 raise InvalidParameter ("process: expected byte buffer, not %s"
1272 % type (buf))
cb7a3911
PG
1273 bsize = len (buf)
1274 newptsize = self.ptsize + bsize
1275 diff = newptsize - PDTCRYPT_MAX_OBJ_SIZE
1276 if diff > 0:
1277 bsize -= diff
1278 newptsize = PDTCRYPT_MAX_OBJ_SIZE
1279 self.ptsize = newptsize
1280 data = super().process (buf [:bsize])
cd77dadb 1281 self.ctsize += len (data)
cb7a3911 1282 return bsize, data
cd77dadb
PG
1283
1284
39accaaa 1285class Decrypt (Crypto):
a393d9cb 1286
3031b7ae 1287 tag = None # GCM tag, part of header
3031b7ae 1288 last_iv = None # check consecutive ivs in strict mode
39accaaa 1289
1f3fd7b0 1290 def __init__ (self, password=None, key=None, counter=None, fixedparts=None,
ee6aa239 1291 strict_ivs=False):
704ceaa5
PG
1292 """
1293 Sanitizing ctor for the decryption context. ``fixedparts`` specifies a
1294 list of IV fixed parts accepted during decryption. If a fixed part is
1295 encountered that is not in the list, decryption will fail.
1296
1297 :param password: mutually exclusive with ``key``
1298 :type password: bytes
1299 :param key: mutually exclusive with ``password``
1300 :type key: bytes
1301 :type counter: initial object counter the values
1302 ``AES_GCM_IV_CNT_INFOFILE`` and
1303 ``AES_GCM_IV_CNT_INDEX`` are unique in each backup set
1304 and cannot be reused even with different fixed parts.
1305 :type fixedparts: bytes list
1306 """
1f3fd7b0
PG
1307 if password is None and key is None \
1308 or password is not None and key is not None :
1309 raise InvalidParameter ("__init__: need either key or password")
1310
1311 if key is not None:
1312 if isinstance (key, bytes) is False:
1313 raise InvalidParameter ("__init__: key must be provided as "
1314 "bytes, not %s" % type (key))
1315 else: # password, no key
1316 if isinstance (password, str) is False:
1317 raise InvalidParameter ("__init__: password must be a string, not %s"
1318 % type (password))
1319 if len (password) == 0:
1320 raise InvalidParameter ("__init__: supplied empty password but not "
1321 "permitted for PDT encrypted files")
36b9932a 1322 # fixed parts
50710d86 1323 if fixedparts is not None:
36b9932a
PG
1324 if isinstance (fixedparts, list) is False:
1325 raise InvalidParameter ("__init__: IV fixed parts must be "
1326 "supplied as list, not %s"
1327 % type (fixedparts))
b12110dd
PG
1328 self.fixed = fixedparts
1329 self.fixed.sort ()
ee6aa239 1330
a83fa4ed
PG
1331 super().__init__ (password=password, key=key, counter=counter,
1332 strict_ivs=strict_ivs)
39accaaa
PG
1333
1334
b12110dd 1335 def valid_fixed_part (self, iv):
704ceaa5
PG
1336 """
1337 Check if a fixed part was already seen.
1338 """
50710d86 1339 # check if fixed part is known
b12110dd
PG
1340 fixed, _cnt = struct.unpack (FMT_I2N_IV, iv)
1341 i = bisect.bisect_left (self.fixed, fixed)
1342 return i != len (self.fixed) and self.fixed [i] == fixed
50710d86
PG
1343
1344
ee6aa239 1345 def check_consecutive_iv (self, iv):
704ceaa5
PG
1346 """
1347 Check whether the counter part of the given IV is indeed the successor
1348 of the currently present counter. This should always be the case for
1349 the objects in a well formed PDT archive but should not be enforced
1350 when decrypting out-of-order.
1351 """
ee6aa239 1352 fixed, cnt = struct.unpack (FMT_I2N_IV, iv)
3031b7ae
PG
1353 if self.strict_ivs is True \
1354 and self.last_iv is not None \
ee6aa239
PG
1355 and self.last_iv [0] == fixed \
1356 and self.last_iv [1] != cnt - 1:
f6cd676f 1357 raise NonConsecutiveIV ("iv %s counter not successor of "
ee6aa239 1358 "last object (expected %d, found %d)"
afa13ebc 1359 % (fixed, iv_fmt (self.last_iv [1]), cnt))
ee6aa239
PG
1360 self.last_iv = (iv, cnt)
1361
1362
79782fa9 1363 def next (self, hdr):
704ceaa5
PG
1364 """
1365 Start decrypting the next object. The PDTCRYPT header for the object
1366 can be given either as already parsed object or as bytes.
1367 """
dccfe104
PG
1368 if isinstance (hdr, bytes) is True:
1369 hdr = hdr_read (hdr)
36b9932a
PG
1370 elif isinstance (hdr, dict) is False:
1371 # this won’t catch malformed specs though
1372 raise InvalidParameter ("next: wrong type of parameter hdr: "
1373 "expected bytes or spec, got %s"
fbfda3d4 1374 % type (hdr))
36b9932a
PG
1375 try:
1376 paramversion = hdr ["paramversion"]
1377 nacl = hdr ["nacl"]
1378 iv = hdr ["iv"]
1379 tag = hdr ["tag"]
1380 except KeyError:
1381 raise InvalidHeader ("next: not a header %r" % hdr)
1382
30019abf 1383 super().next (self.password, paramversion, nacl, iv)
b12110dd 1384 if self.fixed is not None and self.valid_fixed_part (iv) is False:
f6cd676f
PG
1385 raise InvalidIVFixedPart ("iv %s has invalid fixed part"
1386 % iv_fmt (iv))
3031b7ae 1387 self.check_consecutive_iv (iv)
ee6aa239 1388
36b9932a 1389 self.tag = tag
b12110dd
PG
1390 defs = ENCRYPTION_PARAMETERS.get (paramversion, None)
1391 if defs is None:
1392 raise FormatError ("header contains unknown parameter version %d; "
1393 "maybe the file was created by a more recent "
1394 "version of Deltatar" % paramversion)
50710d86 1395 enc = defs ["enc"]
6178061e
PG
1396 if enc == "aes-gcm":
1397 self.enc = Cipher \
1398 ( algorithms.AES (self.key)
36b9932a 1399 , modes.GCM (iv, tag=self.tag)
6178061e
PG
1400 , backend = default_backend ()) \
1401 . decryptor ()
1402 elif enc == "passthrough":
1403 self.enc = PassthroughCipher ()
1404 else:
b12110dd
PG
1405 raise InternalError ("encryption parameter set %d refers to unknown "
1406 "mode %r" % (paramversion, enc))
f484f2d1 1407 self.set_object_counter (self.cnt + 1)
39accaaa
PG
1408
1409
db1f3ac7 1410 def done (self, tag=None):
704ceaa5
PG
1411 """
1412 Stop decryption of the current object and finalize it with the active
1413 context. This will throw an *InvalidGCMTag* exception to indicate that
1414 the authentication tag does not match the data. If the tag is correct,
1415 the rest of the plaintext is returned.
1416 """
633b18a9 1417 data = b""
db1f3ac7
PG
1418 try:
1419 if tag is None:
f484f2d1 1420 data = self.enc.finalize ()
db1f3ac7 1421 else:
36b9932a
PG
1422 if isinstance (tag, bytes) is False:
1423 raise InvalidParameter ("done: wrong type of parameter "
1424 "tag: expected bytes, got %s"
1425 % type (tag))
f484f2d1 1426 data = self.enc.finalize_with_tag (self.tag)
b0078f26 1427 except cryptography.exceptions.InvalidTag:
f08c604b 1428 raise InvalidGCMTag ("done: tag mismatch of object %d: %s "
b0078f26 1429 "rejected by finalize ()"
f08c604b 1430 % (self.cnt, binascii.hexlify (self.tag)))
50710d86 1431 self.ctsize += len (data)
633b18a9 1432 self.stats ["out"] += len (data)
b0078f26 1433 return data
00b3cd10
PG
1434
1435
47e27926 1436 def process (self, buf):
704ceaa5
PG
1437 """
1438 Decrypt the bytes object *buf* with the active decryptor.
1439 """
36b9932a
PG
1440 if isinstance (buf, bytes) is False:
1441 raise InvalidParameter ("process: expected byte buffer, not %s"
1442 % type (buf))
47e27926
PG
1443 self.ctsize += len (buf)
1444 data = super().process (buf)
1445 self.ptsize += len (data)
1446 return data
1447
1448
00b3cd10 1449###############################################################################
770173c5
PG
1450## testing helpers
1451###############################################################################
1452
cb7a3911 1453def _patch_global (glob, vow, n=None):
770173c5
PG
1454 """
1455 Adapt upper file counter bound for testing IV logic. Completely unsafe.
1456 """
1457 assert vow == "I am fully aware that this will void my warranty."
cb7a3911
PG
1458 r = globals () [glob]
1459 if n is None:
1460 n = globals () [glob + "_DEFAULT"]
1461 globals () [glob] = n
770173c5
PG
1462 return r
1463
cb7a3911
PG
1464_testing_set_AES_GCM_IV_CNT_MAX = \
1465 partial (_patch_global, "AES_GCM_IV_CNT_MAX")
1466
1467_testing_set_PDTCRYPT_MAX_OBJ_SIZE = \
1468 partial (_patch_global, "PDTCRYPT_MAX_OBJ_SIZE")
1469
a808459e
PG
1470def open2_dump_file (fname, dir_fd, force=False):
1471 outfd = -1
1472
1473 oflags = os.O_CREAT | os.O_WRONLY
6690f5e0 1474 if force is True:
a808459e
PG
1475 oflags |= os.O_TRUNC
1476 else:
1477 oflags |= os.O_EXCL
1478
1479 try:
1480 outfd = os.open (fname, oflags,
1481 stat.S_IRUSR | stat.S_IWUSR, dir_fd=dir_fd)
1482 except FileExistsError as exn:
1483 noise ("PDT: refusing to overwrite existing file %s" % fname)
1484 noise ("")
1485 raise RuntimeError ("destination file %s already exists" % fname)
1486 if PDTCRYPT_VERBOSE is True:
1487 noise ("PDT: new output file %s (fd=%d)" % (fname, outfd))
1488
1489 return outfd
1490
770173c5 1491###############################################################################
00b3cd10
PG
1492## freestanding invocation
1493###############################################################################
1494
da82bc58
PG
1495PDTCRYPT_SUB_PROCESS = 0
1496PDTCRYPT_SUB_SCRYPT = 1
f41973a6 1497PDTCRYPT_SUB_SCAN = 2
da82bc58
PG
1498
1499PDTCRYPT_SUB = \
1500 { "process" : PDTCRYPT_SUB_PROCESS
f41973a6
PG
1501 , "scrypt" : PDTCRYPT_SUB_SCRYPT
1502 , "scan" : PDTCRYPT_SUB_SCAN }
da82bc58 1503
e3abcdf0
PG
1504PDTCRYPT_DECRYPT = 1 << 0 # decrypt archive with password
1505PDTCRYPT_SPLIT = 1 << 1 # split archive into individual objects
da82bc58 1506PDTCRYPT_HASH = 1 << 2 # output scrypt hash for file and given password
e3abcdf0 1507
a808459e
PG
1508PDTCRYPT_SPLITNAME = "pdtcrypt-object-%d.bin"
1509PDTCRYPT_RESCUENAME = "pdtcrypt-rescue-object-%0.5d.bin"
e3abcdf0 1510
70ad9458 1511PDTCRYPT_VERBOSE = False
ee6aa239 1512PDTCRYPT_STRICTIVS = False
b07633d3 1513PDTCRYPT_OVERWRITE = False
15d3eefd 1514PDTCRYPT_BLOCKSIZE = 1 << 12
70ad9458
PG
1515PDTCRYPT_SINK = 0
1516PDTCRYPT_SOURCE = 1
1517SELF = None
1518
77058bab
PG
1519PDTCRYPT_DEFAULT_VER = 1
1520PDTCRYPT_DEFAULT_PVER = 1
1521
7b3940e5
PG
1522# scrypt hashing output control
1523PDTCRYPT_SCRYPT_INTRANATOR = 0
1524PDTCRYPT_SCRYPT_PARAMETERS = 1
4f6405d6 1525PDTCRYPT_SCRYPT_DEFAULT = PDTCRYPT_SCRYPT_INTRANATOR
7b3940e5
PG
1526
1527PDTCRYPT_SCRYPT_FORMAT = \
1528 { "i2n" : PDTCRYPT_SCRYPT_INTRANATOR
1529 , "params" : PDTCRYPT_SCRYPT_PARAMETERS }
1530
4c62ddc0 1531PDTCRYPT_TT_COLUMNS = 80 # assume standard terminal
15d3eefd
PG
1532
1533class PDTDecryptionError (Exception):
1534 """Decryption failed."""
1535
e3abcdf0
PG
1536class PDTSplitError (Exception):
1537 """Decryption failed."""
1538
15d3eefd
PG
1539
1540def noise (*a, **b):
591a722f 1541 print (file=sys.stderr, *a, **b)
15d3eefd
PG
1542
1543
89e1073c
PG
1544class PassthroughDecryptor (object):
1545
1546 curhdr = None # write current header on first data write
1547
1548 def __init__ (self):
1549 if PDTCRYPT_VERBOSE is True:
1550 noise ("PDT: no encryption; data passthrough")
1551
1552 def next (self, hdr):
1553 ok, curhdr = hdr_make (hdr)
1554 if ok is False:
1555 raise PDTDecryptionError ("bad header %r" % hdr)
1556 self.curhdr = curhdr
1557
1558 def done (self):
1559 if self.curhdr is not None:
1560 return self.curhdr
1561 return b""
1562
1563 def process (self, d):
1564 if self.curhdr is not None:
1565 d = self.curhdr + d
1566 self.curhdr = None
1567 return d
1568
1569
a83fa4ed 1570def depdtcrypt (mode, secret, ins, outs):
15d3eefd 1571 """
a83fa4ed
PG
1572 Remove PDTCRYPT layer from all objects encrypted with the secret. Used on a
1573 Deltatar backup this will yield a (possibly Gzip compressed) tarball.
15d3eefd
PG
1574 """
1575 ctleft = -1 # length of ciphertext to consume
1576 ctcurrent = 0 # total ciphertext of current object
15d3eefd
PG
1577 total_obj = 0 # total number of objects read
1578 total_pt = 0 # total plaintext bytes
1579 total_ct = 0 # total ciphertext bytes
1580 total_read = 0 # total bytes read
e3abcdf0
PG
1581 outfile = None # Python file object for output
1582
89e1073c 1583 if mode & PDTCRYPT_DECRYPT: # decryptor
a83fa4ed
PG
1584 ks = secret [0]
1585 if ks == PDTCRYPT_SECRET_PW:
1586 decr = Decrypt (password=secret [1], strict_ivs=PDTCRYPT_STRICTIVS)
1587 elif ks == PDTCRYPT_SECRET_KEY:
6257d5b3 1588 key = secret [1]
a83fa4ed
PG
1589 decr = Decrypt (key=key, strict_ivs=PDTCRYPT_STRICTIVS)
1590 else:
1591 raise InternalError ("‘%d’ does not specify a valid kind of secret"
1592 % ks)
89e1073c
PG
1593 else:
1594 decr = PassthroughDecryptor ()
1595
e3abcdf0
PG
1596 def nextout (_):
1597 """Dummy for non-split mode: output file does not vary."""
1598 return outs
1599
1600 if mode & PDTCRYPT_SPLIT:
1601 def nextout (outfile):
1602 """
1603 We were passed an fd as outs for accessing the destination
1604 directory where extracted archive components are supposed
1605 to end up in.
1606 """
1607
1608 if outfile is None:
1609 if PDTCRYPT_VERBOSE is True:
1610 noise ("PDT: no output file to close at this point")
77058bab
PG
1611 else:
1612 if PDTCRYPT_VERBOSE is True:
1613 noise ("PDT: release output file %r" % outfile)
e3abcdf0
PG
1614 # cleanup happens automatically by the GC; the next
1615 # line will error out on account of an invalid fd
1616 #outfile.close ()
1617
1618 assert total_obj > 0
1619 fname = PDTCRYPT_SPLITNAME % total_obj
1620 try:
a808459e
PG
1621 outfd = open2_dump_file (fname, outs, force=PDTCRYPT_OVERWRITE)
1622 except RuntimeError as exn:
1623 raise PDTSplitError (exn)
e3abcdf0
PG
1624 return os.fdopen (outfd, "wb", closefd=True)
1625
15d3eefd 1626
47d22679 1627 def tell (s):
b09a99eb 1628 """ESPIPE is normal on non-seekable stdio stream."""
47d22679
PG
1629 try:
1630 return s.tell ()
1631 except OSError as exn:
b09a99eb 1632 if exn.errno == os.errno.ESPIPE:
47d22679
PG
1633 return -1
1634
e3abcdf0 1635 def out (pt, outfile):
15d3eefd
PG
1636 npt = len (pt)
1637 nonlocal total_pt
1638 total_pt += npt
70ad9458 1639 if PDTCRYPT_VERBOSE is True:
15d3eefd
PG
1640 noise ("PDT:\t· decrypt plaintext %d B" % (npt))
1641 try:
e3abcdf0 1642 nn = outfile.write (pt)
15d3eefd
PG
1643 except OSError as exn: # probably ENOSPC
1644 raise DecryptionError ("error (%s)" % exn)
1645 if nn != npt:
1646 raise DecryptionError ("write aborted after %d of %d B" % (nn, npt))
1647
1648 while True:
1649 if ctleft <= 0:
1650 # current object completed; in a valid archive this marks either
1651 # the start of a new header or the end of the input
1652 if ctleft == 0: # current object requires finalization
70ad9458 1653 if PDTCRYPT_VERBOSE is True:
47d22679 1654 noise ("PDT: %d finalize" % tell (ins))
5d394c0d
PG
1655 try:
1656 pt = decr.done ()
1657 except InvalidGCMTag as exn:
f08c604b
PG
1658 raise DecryptionError ("error finalizing object %d (%d B): "
1659 "%r" % (total_obj, len (pt), exn)) \
1660 from exn
e3abcdf0 1661 out (pt, outfile)
70ad9458 1662 if PDTCRYPT_VERBOSE is True:
15d3eefd
PG
1663 noise ("PDT:\t· object validated")
1664
70ad9458 1665 if PDTCRYPT_VERBOSE is True:
47d22679 1666 noise ("PDT: %d hdr" % tell (ins))
15d3eefd
PG
1667 try:
1668 hdr = hdr_read_stream (ins)
dd47d6a2 1669 total_read += PDTCRYPT_HDR_SIZE
ae3d0f2a
PG
1670 except EndOfFile as exn:
1671 total_read += exn.remainder
dd47d6a2 1672 if total_ct + total_obj * PDTCRYPT_HDR_SIZE != total_read:
15d3eefd
PG
1673 raise PDTDecryptionError ("ciphertext processed (%d B) plus "
1674 "overhead (%d × %d B) does not match "
1675 "the number of bytes read (%d )"
dd47d6a2 1676 % (total_ct, total_obj, PDTCRYPT_HDR_SIZE,
15d3eefd
PG
1677 total_read))
1678 # the single good exit
1679 return total_read, total_obj, total_ct, total_pt
1680 except InvalidHeader as exn:
1681 raise PDTDecryptionError ("invalid header at position %d in %r "
ee6aa239 1682 "(%s)" % (tell (ins), exn, ins))
70ad9458 1683 if PDTCRYPT_VERBOSE is True:
15d3eefd
PG
1684 pretty = hdr_fmt_pretty (hdr)
1685 noise (reduce (lambda a, e: (a + "\n" if a else "") + "PDT:\t· " + e,
1686 pretty.splitlines (), ""))
1687 ctcurrent = ctleft = hdr ["ctsize"]
89e1073c 1688
15d3eefd 1689 decr.next (hdr)
e3abcdf0
PG
1690
1691 total_obj += 1 # used in file counter with split mode
1692
1693 # finalization complete or skipped in case of first object in
1694 # stream; create a new output file if necessary
1695 outfile = nextout (outfile)
15d3eefd 1696
70ad9458 1697 if PDTCRYPT_VERBOSE is True:
15d3eefd 1698 noise ("PDT: %d decrypt obj no. %d, %d B"
47d22679 1699 % (tell (ins), total_obj, ctleft))
15d3eefd
PG
1700
1701 # always allocate a new buffer since python-cryptography doesn’t allow
1702 # passing a bytearray :/
1703 nexpect = min (ctleft, PDTCRYPT_BLOCKSIZE)
70ad9458 1704 if PDTCRYPT_VERBOSE is True:
15d3eefd 1705 noise ("PDT:\t· [%d] %d%% done, read block (%d B of %d B remaining)"
47d22679 1706 % (tell (ins),
15d3eefd
PG
1707 100 - ctleft * 100 / (ctcurrent > 0 and ctcurrent or 1),
1708 nexpect, ctleft))
1709 ct = ins.read (nexpect)
1710 nct = len (ct)
1711 if nct < nexpect:
47d22679 1712 off = tell (ins)
ae3d0f2a
PG
1713 raise EndOfFile (nct,
1714 "hit EOF after %d of %d B in block [%d:%d); "
15d3eefd
PG
1715 "%d B ciphertext remaining for object no %d"
1716 % (nct, nexpect, off, off + nexpect, ctleft,
1717 total_obj))
1718 ctleft -= nct
1719 total_ct += nct
1720 total_read += nct
1721
70ad9458 1722 if PDTCRYPT_VERBOSE is True:
15d3eefd
PG
1723 noise ("PDT:\t· decrypt ciphertext %d B" % (nct))
1724 pt = decr.process (ct)
e3abcdf0 1725 out (pt, outfile)
15d3eefd 1726
d6c15a52 1727
70ad9458 1728def deptdcrypt_mk_stream (kind, path):
d6c15a52 1729 """Create stream from file or stdio descriptor."""
70ad9458 1730 if kind == PDTCRYPT_SINK:
d6c15a52 1731 if path == "-":
70ad9458 1732 if PDTCRYPT_VERBOSE is True: noise ("PDT: sink: stdout")
d6c15a52
PG
1733 return sys.stdout.buffer
1734 else:
70ad9458 1735 if PDTCRYPT_VERBOSE is True: noise ("PDT: sink: file %s" % path)
d6c15a52 1736 return io.FileIO (path, "w")
70ad9458 1737 if kind == PDTCRYPT_SOURCE:
d6c15a52 1738 if path == "-":
70ad9458 1739 if PDTCRYPT_VERBOSE is True: noise ("PDT: source: stdin")
d6c15a52
PG
1740 return sys.stdin.buffer
1741 else:
70ad9458 1742 if PDTCRYPT_VERBOSE is True: noise ("PDT: source: file %s" % path)
d6c15a52
PG
1743 return io.FileIO (path, "r")
1744
1745 raise ValueError ("bogus stream “%s” / %s" % (kind, path))
1746
15d3eefd 1747
a83fa4ed 1748def mode_depdtcrypt (mode, secret, ins, outs):
da82bc58
PG
1749 try:
1750 total_read, total_obj, total_ct, total_pt = \
a83fa4ed 1751 depdtcrypt (mode, secret, ins, outs)
da82bc58
PG
1752 except DecryptionError as exn:
1753 noise ("PDT: Decryption failed:")
1754 noise ("PDT:")
1755 noise ("PDT: “%s”" % exn)
1756 noise ("PDT:")
a83fa4ed 1757 noise ("PDT: Did you specify the correct key / password?")
da82bc58
PG
1758 noise ("")
1759 return 1
1760 except PDTSplitError as exn:
1761 noise ("PDT: Split operation failed:")
1762 noise ("PDT:")
1763 noise ("PDT: “%s”" % exn)
1764 noise ("PDT:")
a83fa4ed 1765 noise ("PDT: Hint: target directory should be empty.")
da82bc58
PG
1766 noise ("")
1767 return 1
1768
1769 if PDTCRYPT_VERBOSE is True:
1770 noise ("PDT: decryption successful" )
1771 noise ("PDT: %.10d bytes read" % total_read)
1772 noise ("PDT: %.10d objects decrypted" % total_obj )
1773 noise ("PDT: %.10d bytes ciphertext" % total_ct )
1774 noise ("PDT: %.10d bytes plaintext" % total_pt )
1775 noise ("" )
1776
1777 return 0
1778
1779
7b3940e5 1780def mode_scrypt (pw, ins=None, nacl=None, fmt=PDTCRYPT_SCRYPT_INTRANATOR):
77058bab 1781 hsh = None
7b3940e5 1782 paramversion = PDTCRYPT_DEFAULT_PVER
77058bab
PG
1783 if ins is not None:
1784 hsh, nacl, version, paramversion = scrypt_hashsource (pw, ins)
1785 defs = ENCRYPTION_PARAMETERS.get(paramversion, None)
1786 else:
1787 nacl = binascii.unhexlify (nacl)
7b3940e5 1788 defs = ENCRYPTION_PARAMETERS.get(paramversion, None)
77058bab
PG
1789 version = PDTCRYPT_DEFAULT_VER
1790
1791 kdfname, params = defs ["kdf"]
1792 if hsh is None:
1793 kdf = kdf_by_version (None, defs)
1794 hsh, _void = kdf (pw, nacl)
da82bc58
PG
1795
1796 import json
7b3940e5
PG
1797
1798 if fmt == PDTCRYPT_SCRYPT_INTRANATOR:
1799 out = json.dumps ({ "salt" : base64.b64encode (nacl).decode ()
1800 , "key" : base64.b64encode (hsh) .decode ()
1801 , "paramversion" : paramversion })
1802 elif fmt == PDTCRYPT_SCRYPT_PARAMETERS:
1803 out = json.dumps ({ "salt" : binascii.hexlify (nacl).decode ()
1804 , "key" : binascii.hexlify (hsh) .decode ()
1805 , "version" : version
1806 , "scrypt_params" : { "N" : params ["N"]
1807 , "r" : params ["r"]
1808 , "p" : params ["p"]
1809 , "dkLen" : params ["dkLen"] } })
1810 else:
1811 raise RuntimeError ("bad scrypt output scheme %r" % fmt)
1812
da82bc58
PG
1813 print (out)
1814
1815
4c62ddc0
PG
1816def noise_output_candidates (cands, indent=8, cols=PDTCRYPT_TT_COLUMNS):
1817 """
1818 Print a list of offsets without garbling the terminal too much.
1819
1820 The indent is counted from column zero; if it is wide enough, the “PDT: ”
1821 marker will be prepended, considered part of the indentation.
1822 """
1823 wd = cols - 1
1824 nc = len (cands)
1825 idt = " " * indent if indent < 5 else "PDT: " + " " * (indent - 5)
1826 line = idt
1827 lpos = indent
1828 sep = ","
1829 lsep = len (sep)
1830 init = True # prevent leading separator
1831
1832 if indent >= wd:
1833 raise ValueError ("the requested indentation exceeds the line "
1834 "width by %d" % (indent - wd))
1835
1836 for n in cands:
1837 ns = "%d" % n
1838 lns = len (ns)
1839 if init is False:
1840 line += sep
1841 lpos += lsep
1842
1843 lpos += lns
1844 if lpos > wd: # line break
1845 noise (line)
1846 line = idt
1847 lpos = indent + lns
1848 elif init is True:
1849 init = False
1850 else: # space
1851 line += ' '
1852 lpos += 1
1853
1854 line += ns
1855
1856 if lpos != indent:
1857 noise (line)
1858
1859
15047fe4
PG
1860SLICE_START = 1 # ordering is important to have starts of intervals
1861SLICE_END = 0 # sorted before equal ends
1862
1863def find_overlaps (slices):
1864 """
1865 Find overlapping slices: iterate open/close points of intervals, tracking
1866 the ones open at any time.
1867 """
1868 bounds = []
1869 inside = set () # of indices into bounds
1870 ovrlp = set () # of indices into bounds
1871
1872 for i, s in enumerate (slices):
1873 bounds.append ((s [0], SLICE_START, i))
1874 bounds.append ((s [1], SLICE_END , i))
1875 bounds = sorted (bounds)
1876
1877 for val in bounds:
1878 i = val [2]
1879 if val [1] == SLICE_START:
1880 inside.add (i)
1881 else:
1882 if len (inside) > 1: # closing one that overlapped
1883 ovrlp |= inside
1884 inside.remove (i)
1885
1886 return [ slices [i] for i in ovrlp ]
1887
1888
a808459e 1889def mode_scan (secret, fname, outs=None, nacl=None):
f41973a6
PG
1890 """
1891 Dissect a binary file, looking for PDTCRYPT headers and objects.
a808459e
PG
1892
1893 If *outs* is supplied, recoverable data will be dumped into the specified
1894 directory.
f41973a6
PG
1895 """
1896 try:
a808459e 1897 ifd = os.open (fname, os.O_RDONLY)
f41973a6
PG
1898 except FileNotFoundError:
1899 noise ("PDT: failed to open %s readonly" % fname)
1900 noise ("")
1901 usage (err=True)
1902
1903 try:
1904 if PDTCRYPT_VERBOSE is True:
1905 noise ("PDT: scan for potential sync points")
a808459e 1906 cands = locate_hdr_candidates (ifd)
f41973a6
PG
1907 if len (cands) == 0:
1908 noise ("PDT: scan complete: input does not contain potential PDT "
1909 "headers; giving up.")
1910 return -1
1911 if PDTCRYPT_VERBOSE is True:
4c62ddc0
PG
1912 noise ("PDT: scan complete: found %d candidates:" % len (cands))
1913 noise_output_candidates (cands)
6c8073ab 1914 except:
a808459e 1915 os.close (ifd)
6c8073ab 1916 raise
f41973a6 1917
15047fe4 1918 junk, todo, slices = [], [], []
6c8073ab 1919 try:
a808459e 1920 nobj = 0
6c8073ab 1921 for cand in cands:
a808459e
PG
1922 nobj += 1
1923 vdt, hdr = inspect_hdr (ifd, cand)
15047fe4 1924
5ed4c57d
PG
1925 vdts = verdict_fmt (vdt)
1926
6c8073ab 1927 if vdt == HDR_CAND_JUNK:
5ed4c57d 1928 noise ("PDT: obj %d: %s object: bad header, skipping" % vdts)
6c8073ab
PG
1929 junk.append (cand)
1930 else:
1931 off0 = cand + PDTCRYPT_HDR_SIZE
1932 if PDTCRYPT_VERBOSE is True:
a808459e 1933 noise ("PDT: obj %d: read payload @%d" % (nobj, off0))
70a33834
PG
1934 pretty = hdr_fmt_pretty (hdr)
1935 noise (reduce (lambda a, e: (a + "\n" if a else "") + "PDT:\t· " + e,
1936 pretty.splitlines (), ""))
6c8073ab 1937
a808459e
PG
1938 ofd = -1
1939 if outs is not None:
1940 ofname = PDTCRYPT_RESCUENAME % nobj
1941 ofd = open2_dump_file (ofname, outs, force=PDTCRYPT_OVERWRITE)
1942
15047fe4 1943 ctsize = hdr ["ctsize"]
a808459e 1944 try:
15047fe4
PG
1945 l = try_decrypt (ifd, off0, hdr, secret, ofd=ofd)
1946 ok = l == ctsize
1947 slices.append ((off0, off0 + l))
a808459e
PG
1948 finally:
1949 if ofd != -1:
1950 os.close (ofd)
70a33834 1951 if vdt == HDR_CAND_GOOD and ok is True:
5ed4c57d
PG
1952 noise ("PDT: %d → ✓ %s object %d–%d"
1953 % (cand, vdts, off0, off0 + ctsize))
70a33834 1954 elif vdt == HDR_CAND_FISHY and ok is True:
5ed4c57d
PG
1955 noise ("PDT: %d → × %s object %d–%d, corrupt header"
1956 % (cand, vdts, off0, off0 + ctsize))
70a33834 1957 elif vdt == HDR_CAND_GOOD and ok is False:
5ed4c57d
PG
1958 noise ("PDT: %d → × %s object %d–%d, problematic payload"
1959 % (cand, vdts, off0, off0 + ctsize))
70a33834 1960 elif vdt == HDR_CAND_FISHY and ok is False:
5ed4c57d
PG
1961 noise ("PDT: %d → × %s object %d–%d, corrupt header, problematic "
1962 "ciphertext" % (cand, vdts, off0, off0 + ctsize))
6c8073ab
PG
1963 else:
1964 raise Unreachable
1965 finally:
a808459e 1966 os.close (ifd)
7b3940e5 1967
70a33834
PG
1968 if len (junk) == 0:
1969 noise ("PDT: all headers ok")
1970 else:
1971 noise ("PDT: %d candidates not parseable as headers:" % len (junk))
1972 noise_output_candidates (junk)
1973
15047fe4
PG
1974 overlap = find_overlaps (slices)
1975 if len (overlap) > 0:
1976 noise ("PDT: %d objects overlapping others" % len (overlap))
1977 for slice in overlap:
1978 noise ("PDT: × %d→%d" % (slice [0], slice [1]))
1979
70ad9458
PG
1980def usage (err=False):
1981 out = print
1982 if err is True:
1983 out = noise
5afcb45d 1984 indent = ' ' * len (SELF)
da82bc58 1985 out ("usage: %s SUBCOMMAND { --help" % SELF)
5afcb45d 1986 out (" %s | [ -v ] { -p PASSWORD | -k KEY }" % indent)
77058bab
PG
1987 out (" %s [ { -i | --in } { - | SOURCE } ]" % indent)
1988 out (" %s [ { -n | --nacl } { SALT } ]" % indent)
1989 out (" %s [ { -o | --out } { - | DESTINATION } ]" % indent)
1990 out (" %s [ -D | --no-decrypt ] [ -S | --split ]" % indent)
7b3940e5 1991 out (" %s [ -f | --format ]" % indent)
70ad9458
PG
1992 out ("")
1993 out ("\twhere")
da82bc58
PG
1994 out ("\t\tSUBCOMMAND main mode: { process | scrypt }")
1995 out ("\t\t where:")
1996 out ("\t\t process: extract objects from PDT archive")
1997 out ("\t\t scrypt: calculate hash from password and first object")
a83fa4ed
PG
1998 out ("\t\t-p PASSWORD password to derive the encryption key from")
1999 out ("\t\t-k KEY encryption key as 16 bytes in hexadecimal notation")
e3abcdf0 2000 out ("\t\t-s enforce strict handling of initialization vectors")
70ad9458
PG
2001 out ("\t\t-i SOURCE file name to read from")
2002 out ("\t\t-o DESTINATION file to write output to")
77058bab 2003 out ("\t\t-n SALT provide salt for scrypt mode in hex encoding")
70ad9458 2004 out ("\t\t-v print extra info")
e3abcdf0
PG
2005 out ("\t\t-S split into files at object boundaries; this")
2006 out ("\t\t requires DESTINATION to refer to directory")
2007 out ("\t\t-D PDT header and ciphertext passthrough")
7b3940e5 2008 out ("\t\t-f format of SCRYPT hash output (“default” or “parameters”)")
70ad9458
PG
2009 out ("")
2010 out ("\tinstead of filenames, “-” may used to specify stdin / stdout")
2011 out ("")
2012 sys.exit ((err is True) and 42 or 0)
2013
2014
a83fa4ed
PG
2015def bail (msg):
2016 noise (msg)
2017 noise ("")
2018 usage (err=True)
2019 raise Unreachable
2020
2021
70ad9458 2022def parse_argv (argv):
6690f5e0 2023 global PDTCRYPT_OVERWRITE
70ad9458 2024 global SELF
7b3940e5
PG
2025 mode = PDTCRYPT_DECRYPT
2026 secret = None
2027 insspec = None
2028 outsspec = None
a808459e 2029 outs = None
7b3940e5 2030 nacl = None
4f6405d6 2031 scrypt_format = PDTCRYPT_SCRYPT_DEFAULT
70ad9458
PG
2032
2033 argvi = iter (argv)
2034 SELF = os.path.basename (next (argvi))
2035
da82bc58
PG
2036 try:
2037 rawsubcmd = next (argvi)
2038 subcommand = PDTCRYPT_SUB [rawsubcmd]
2039 except StopIteration:
a83fa4ed 2040 bail ("ERROR: subcommand required")
da82bc58 2041 except KeyError:
a83fa4ed 2042 bail ("ERROR: invalid subcommand “%s” specified" % rawsubcmd)
da82bc58 2043
59d74e2b
PG
2044 def checked_arg ():
2045 nonlocal argvi
2046 try:
2047 return next (argvi)
2048 except StopIteration:
2049 bail ("ERROR: argument list incomplete")
2050
addcec42 2051 def checked_secret (s):
a83fa4ed
PG
2052 nonlocal secret
2053 if secret is None:
addcec42 2054 secret = s
da82bc58 2055 else:
a83fa4ed 2056 bail ("ERROR: encountered “%s” but secret already given" % arg)
da82bc58 2057
70ad9458
PG
2058 for arg in argvi:
2059 if arg in [ "-h", "--help" ]:
2060 usage ()
2061 raise Unreachable
2062 elif arg in [ "-v", "--verbose", "--wtf" ]:
2063 global PDTCRYPT_VERBOSE
2064 PDTCRYPT_VERBOSE = True
2065 elif arg in [ "-i", "--in", "--source" ]:
59d74e2b 2066 insspec = checked_arg ()
70ad9458 2067 if PDTCRYPT_VERBOSE is True: noise ("PDT: decrypt from %s" % insspec)
a83fa4ed 2068 elif arg in [ "-p", "--password" ]:
59d74e2b 2069 arg = checked_arg ()
addcec42 2070 checked_secret (make_secret (password=arg))
a83fa4ed 2071 if PDTCRYPT_VERBOSE is True: noise ("PDT: decrypting with password")
70ad9458 2072 else:
da82bc58
PG
2073 if subcommand == PDTCRYPT_SUB_PROCESS:
2074 if arg in [ "-s", "--strict-ivs" ]:
2075 global PDTCRYPT_STRICTIVS
2076 PDTCRYPT_STRICTIVS = True
77058bab
PG
2077 elif arg in [ "-o", "--out", "--dest", "--sink" ]:
2078 outsspec = checked_arg ()
2079 if PDTCRYPT_VERBOSE is True: noise ("PDT: decrypt to %s" % outsspec)
da82bc58 2080 elif arg in [ "-f", "--force" ]:
da82bc58
PG
2081 PDTCRYPT_OVERWRITE = True
2082 if PDTCRYPT_VERBOSE is True: noise ("PDT: overwrite existing files")
2083 elif arg in [ "-S", "--split" ]:
2084 mode |= PDTCRYPT_SPLIT
2085 if PDTCRYPT_VERBOSE is True: noise ("PDT: split files")
2086 elif arg in [ "-D", "--no-decrypt" ]:
2087 mode &= ~PDTCRYPT_DECRYPT
2088 if PDTCRYPT_VERBOSE is True: noise ("PDT: not decrypting")
a83fa4ed 2089 elif arg in [ "-k", "--key" ]:
59d74e2b 2090 arg = checked_arg ()
addcec42 2091 checked_secret (make_secret (key=arg))
a83fa4ed 2092 if PDTCRYPT_VERBOSE is True: noise ("PDT: decrypting with key")
da82bc58 2093 else:
a83fa4ed 2094 bail ("ERROR: unexpected positional argument “%s”" % arg)
da82bc58 2095 elif subcommand == PDTCRYPT_SUB_SCRYPT:
77058bab
PG
2096 if arg in [ "-n", "--nacl", "--salt" ]:
2097 nacl = checked_arg ()
2098 if PDTCRYPT_VERBOSE is True: noise ("PDT: salt key with %s" % nacl)
7b3940e5
PG
2099 elif arg in [ "-f", "--format" ]:
2100 arg = checked_arg ()
2101 try:
2102 scrypt_format = PDTCRYPT_SCRYPT_FORMAT [arg]
2103 except KeyError:
2104 bail ("ERROR: invalid scrypt output format %s" % arg)
2105 if PDTCRYPT_VERBOSE is True:
2106 noise ("PDT: scrypt output format “%s”" % scrypt_format)
77058bab
PG
2107 else:
2108 bail ("ERROR: unexpected positional argument “%s”" % arg)
f41973a6 2109 elif subcommand == PDTCRYPT_SUB_SCAN:
a808459e
PG
2110 if arg in [ "-o", "--out", "--dest", "--sink" ]:
2111 outsspec = checked_arg ()
2112 if PDTCRYPT_VERBOSE is True: noise ("PDT: decrypt to %s" % outsspec)
2113 elif arg in [ "-f", "--force" ]:
a808459e
PG
2114 PDTCRYPT_OVERWRITE = True
2115 if PDTCRYPT_VERBOSE is True: noise ("PDT: overwrite existing files")
2116 else:
2117 bail ("ERROR: unexpected positional argument “%s”" % arg)
70ad9458 2118
a83fa4ed 2119 if secret is None:
ecb9676d 2120 if PDTCRYPT_VERBOSE is True:
a83fa4ed 2121 noise ("ERROR: no password or key specified, trying $PDTCRYPT_PASSWORD")
ecb9676d
PG
2122 epw = os.getenv ("PDTCRYPT_PASSWORD")
2123 if epw is not None:
addcec42 2124 checked_secret (make_secret (password=epw.strip ()))
a83fa4ed
PG
2125
2126 if secret is None:
2127 if PDTCRYPT_VERBOSE is True:
2128 noise ("ERROR: no password or key specified, trying $PDTCRYPT_KEY")
2129 ek = os.getenv ("PDTCRYPT_KEY")
2130 if ek is not None:
addcec42 2131 checked_secret (make_secret (key=ek.strip ()))
ecb9676d 2132
a83fa4ed 2133 if secret is None:
da82bc58 2134 if subcommand == PDTCRYPT_SUB_SCRYPT:
a83fa4ed 2135 bail ("ERROR: scrypt hash mode requested but no password given")
da82bc58 2136 elif mode & PDTCRYPT_DECRYPT:
6257d5b3 2137 bail ("ERROR: decryption requested but no password given")
a83fa4ed 2138
a808459e
PG
2139 if mode & PDTCRYPT_SPLIT and outsspec is None:
2140 bail ("ERROR: split mode is incompatible with stdout sink "
2141 "(the default)")
2142
2143 if subcommand == PDTCRYPT_SUB_SCAN and outsspec is None:
2144 pass # no output by default in scan mode
2145 elif mode & PDTCRYPT_SPLIT or subcommand == PDTCRYPT_SUB_SCAN:
2146 # destination must be directory
2147 if outsspec == "-":
2148 bail ("ERROR: mode is incompatible with stdout sink")
2149 try:
2150 try:
2151 os.makedirs (outsspec, 0o700)
2152 except FileExistsError:
2153 # if it’s a directory with appropriate perms, everything is
2154 # good; otherwise, below invocation of open(2) will fail
2155 pass
2156 outs = os.open (outsspec, os.O_DIRECTORY, 0o600)
2157 except FileNotFoundError as exn:
2158 bail ("ERROR: cannot create target directory “%s”" % outsspec)
2159 except NotADirectoryError as exn:
2160 bail ("ERROR: target path “%s” is not a directory" % outsspec)
2161 else:
2162 outs = deptdcrypt_mk_stream (PDTCRYPT_SINK, outsspec or "-")
2163
f41973a6
PG
2164 if subcommand == PDTCRYPT_SUB_SCAN:
2165 if insspec is None:
2166 bail ("ERROR: please supply an input file for scanning")
2167 if insspec == '-':
2168 bail ("ERROR: input must be seekable; please specify a file")
a808459e 2169 return True, partial (mode_scan, secret, insspec, outs, nacl=nacl)
f41973a6 2170
77058bab
PG
2171 if subcommand == PDTCRYPT_SUB_SCRYPT:
2172 if secret [0] == PDTCRYPT_SECRET_KEY:
2173 bail ("ERROR: scrypt mode requires a password")
2174 if insspec is not None and nacl is not None \
2175 or insspec is None and nacl is None :
2176 bail ("ERROR: please supply either an input file or "
2177 "the salt")
70ad9458
PG
2178
2179 # default to stdout
77058bab
PG
2180 ins = None
2181 if insspec is not None or subcommand != PDTCRYPT_SUB_SCRYPT:
2182 ins = deptdcrypt_mk_stream (PDTCRYPT_SOURCE, insspec or "-")
da82bc58
PG
2183
2184 if subcommand == PDTCRYPT_SUB_SCRYPT:
7b3940e5
PG
2185 return True, partial (mode_scrypt, secret [1].encode (), ins, nacl,
2186 fmt=scrypt_format)
da82bc58 2187
a83fa4ed 2188 return True, partial (mode_depdtcrypt, mode, secret, ins, outs)
15d3eefd
PG
2189
2190
00b3cd10 2191def main (argv):
da82bc58 2192 ok, runner = parse_argv (argv)
f08c604b 2193
da82bc58 2194 if ok is True: return runner ()
15d3eefd 2195
da82bc58 2196 return 1
f08c604b 2197
00b3cd10
PG
2198
2199if __name__ == "__main__":
2200 sys.exit (main (sys.argv))
2201