allow for detecting overlapping objects with tarfile
[python-delta-tar] / deltatar / crypto.py
CommitLineData
00b3cd10
PG
1#!/usr/bin/env python3
2
3"""
83f2d71e 4Intra2net 2017
00b3cd10
PG
5
6===============================================================================
704ceaa5 7 crypto -- Encryption Layer for the Deltatar Backup
00b3cd10
PG
8===============================================================================
9
10Crypto stack:
11
12 - AES-GCM for the symmetric encryption;
13 - Scrypt as KDF.
14
15References:
16
17 - NIST Recommendation for Block Cipher Modes of Operation: Galois/Counter
18 Mode (GCM) and GMAC
19 http://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-38d.pdf
20
21 - AES-GCM v1:
22 https://cryptome.org/2014/01/aes-gcm-v1.pdf
23
24 - Authentication weaknesses in GCM
25 http://csrc.nist.gov/groups/ST/toolkit/BCM/documents/comments/CWC-GCM/Ferguson2.pdf
26
83f2d71e
PG
27Trouble with python-cryptography packages: authentication tags can only be
28passed in advance: https://github.com/pyca/cryptography/pull/3421
29
6d08915c
PG
30Errors
31-------------------------------------------------------------------------------
32
33Errors fall into roughly three categories:
34
704ceaa5 35 - Cryptographical errors or invalid data.
6d08915c
PG
36
37 - ``InvalidGCMTag`` (decryption failed on account of an invalid GCM
38 tag),
39 - ``InvalidIVFixedPart`` (IV fixed part of object not found in list),
f6cd676f 40 - ``DuplicateIV`` (the IV of an encrypted object already occurred),
704ceaa5
PG
41 - ``DecryptionError`` (used in CLI decryption for presenting error
42 conditions to the user).
6d08915c
PG
43
44 - Incorrect usage of the library.
45
46 - ``InvalidParameter`` (non-conforming user supplied parameter),
47 - ``InvalidHeader`` (data passed for reading not parsable into header),
48 - ``FormatError`` (cannot handle header or parameter version),
49 - ``RuntimeError``.
50
51 - Bad internal state. If one of these is encountered it means that a state
52 was reached that shouldn’t occur during normal processing.
53
54 - ``InternalError``,
55 - ``Unreachable``.
56
57Also, ``EndOfFile`` is used as a sentinel to communicate that a stream supplied
58for reading is exhausted.
59
f6cd676f
PG
60Initialization Vectors
61-------------------------------------------------------------------------------
62
63Initialization vectors are checked reuse during the lifetime of a decryptor.
704ceaa5
PG
64The fixed counters for metadata files cannot be reused and attempts to do so
65will cause a DuplicateIV error. This means the length of objects encrypted with
66a metadata counter is capped at 63 GB.
67
68For ordinary, non-metadata payload, there is an optional mode with strict IV
69checking that causes a crypto context to fail if an IV encountered or created
70was already used for decrypting or encrypting, respectively, an earlier object.
71Note that this mode can trigger false positives when decrypting non-linearly,
72e. g. when traversing the same object multiple times. Since the crypto context
73has no notion of a position in a PDT encrypted archive, this condition must be
74sorted out downstream.
75
76Command Line Utility
77-------------------------------------------------------------------------------
78
79``crypto.py`` may be invoked as a script for decrypting, validating, and
80splitting PDT encrypted files. Consult the usage message for details.
81
82Usage examples:
83
84Decrypt from stdin using the password ‘foo’: ::
85
86 $ crypto.py process foo -i - -o - <some-file.tar.gz.pdtcrypt >some-file.tar.gz
87
88Output verbose information about the encrypted objects in the archive: ::
89
90 $ crypto.py process foo -v -i some-file.tar.gz.pdtcrypt -o /dev/null
91 PDT: decrypt from some-file.tar.gz.pdtcrypt
92 PDT: decrypt to /dev/null
93 PDT: source: file some-file.tar.gz.pdtcrypt
94 PDT: sink: file /dev/null
95 PDT: 0 hdr
96 PDT: · version = 1 : 0100
97 PDT: · paramversion = 1 : 0100
98 PDT: · nacl : d270 b031 00d1 87e2 c946 610d 7b7f 7e5f
99 PDT: · iv : 02ee 3dd7 a963 1eb1 0100 0000
100 PDT: · ctsize = 591 : 4f02 0000 0000 0000
101 PDT: · tag : 5b2d 6d8b 8f82 4842 12fd 0b10 b6e3 369b
102 PDT: 64 decrypt obj no. 1, 591 B
103 PDT: · [64] 0% done, read block (591 B of 591 B remaining)
104 PDT: · decrypt ciphertext 591 B
105 PDT: · decrypt plaintext 591 B
106 PDT: 655 finalize
107
108
109Also, the mode *scrypt* allows deriving encryption keys. To calculate the
110encryption key from the password ‘foo’ and the salt of the first object in a
111PDT encrypted file: ::
112
113 $ crypto.py scrypt foo -i some-file.pdtcrypt
4f6405d6 114 {"paramversion": 1, "salt": "Cqzbk48e3peEjzWto8D0yA==", "key": "JH9EkMwaM4x9F5aim5gK/Q=="}
704ceaa5
PG
115
116The computed 16 byte key is given in hexadecimal notation in the value to
117``hash`` and can be fed into Python’s ``binascii.unhexlify()`` to obtain the
118corresponding binary representation.
119
120Note that in Scrypt hashing mode, no data integrity checks are being performed.
121If the wrong password is given, a wrong key will be derived. Whether the password
122was indeed correct can only be determined by decrypting. Note that since PDT
123archives essentially consist of a stream of independent objects, the salt and
124other parameters may change. Thus a key derived using above method from the
125first object doesn’t necessarily apply to any of the subsequent objects.
f6cd676f 126
00b3cd10
PG
127"""
128
7b3940e5 129import base64
00b3cd10 130import binascii
50710d86 131import bisect
00b3cd10
PG
132import ctypes
133import io
c46c8670 134from functools import reduce, partial
f41973a6 135import mmap
00b3cd10
PG
136import os
137import struct
a808459e 138import stat
00b3cd10
PG
139import sys
140import time
da82bc58 141import types
00b3cd10
PG
142try:
143 import enum34
144except ImportError as exn:
145 pass
146
6257d5b3 147if __name__ == "__main__": ## Work around the import mechanism lest Python’s
00b3cd10
PG
148 pwd = os.getcwd() ## preference for local imports causes a cyclical
149 ## import (crypto → pylibscrypt → […] → ./tarfile → crypto).
150 sys.path = [ p for p in sys.path if p.find ("deltatar") < 0 ]
151
152import pylibscrypt
153from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes
154from cryptography.hazmat.backends import default_backend
15d3eefd 155import cryptography
00b3cd10
PG
156
157
a64085a8 158__all__ = [ "hdr_make", "hdr_read", "hdr_fmt", "hdr_fmt_pretty"
b360b772 159 , "scrypt_hashfile"
3031b7ae
PG
160 , "PDTCRYPT_HDR_SIZE", "AES_GCM_IV_CNT_DATA"
161 , "AES_GCM_IV_CNT_INFOFILE", "AES_GCM_IV_CNT_INDEX"
2d6fd8c8 162 ]
00b3cd10 163
a393d9cb
PG
164
165###############################################################################
15d3eefd
PG
166## exceptions
167###############################################################################
168
169class EndOfFile (Exception):
170 """Reached EOF."""
ae3d0f2a
PG
171 remainder = 0
172 msg = 0
8a8ac469 173 def __init__ (self, n=None, msg=None):
5d394c0d
PG
174 if n is not None:
175 self.remainder = n
176 self.msg = msg
15d3eefd 177
b0078f26 178
b12110dd
PG
179class InvalidParameter (Exception):
180 """Inputs not valid for PDT encryption."""
181 pass
182
b0078f26 183
15d3eefd
PG
184class InvalidHeader (Exception):
185 """Header not valid."""
186 pass
187
b0078f26
PG
188
189class InvalidGCMTag (Exception):
190 """
191 The GCM tag calculated during decryption differs from that in the object
192 header.
193 """
194 pass
195
196
26b42ad4 197class InvalidIVFixedPart (Exception):
89ec6e2f
PG
198 """
199 IV fixed part not in supplied list: either the backup is corrupt or the
200 current object does not belong to it.
201 """
26b42ad4
PG
202 pass
203
b0078f26 204
be124bca 205class IVFixedPartError (Exception):
89ec6e2f
PG
206 """
207 Error creating a unique IV fixed part: repeated calls to system RNG yielded
208 the same sequence of bytes as the last IV used.
209 """
be124bca
PG
210 pass
211
212
fac2cfe1 213class InvalidFileCounter (Exception):
89ec6e2f
PG
214 """
215 When encrypting, an attempted reuse of a dedicated counter (info file,
216 index file) was caught.
217 """
fac2cfe1
PG
218 pass
219
220
ee6aa239 221class DuplicateIV (Exception):
89ec6e2f
PG
222 """
223 During encryption, the current IV fixed part is identical to an already
224 existing IV (same prefix and file counter). This indicates tampering or
225 programmer error and cannot be recovered from.
226 """
ee6aa239
PG
227 pass
228
229
230class NonConsecutiveIV (Exception):
89ec6e2f
PG
231 """
232 IVs not numbered consecutively. This is a hard error with strict IV
233 checking. Precludes random access to the encrypted objects.
234 """
ee6aa239
PG
235 pass
236
237
b12110dd
PG
238class FormatError (Exception):
239 """Unusable parameters in header."""
240 pass
241
b0078f26 242
15d3eefd 243class DecryptionError (Exception):
89ec6e2f 244 """Error during decryption with ``crypto.py`` on the command line."""
15d3eefd
PG
245 pass
246
b0078f26 247
70ad9458 248class Unreachable (Exception):
89ec6e2f
PG
249 """
250 Makeshift __builtin_unreachable(); always a programmer error if
251 thrown.
252 """
70ad9458
PG
253 pass
254
b0078f26 255
b12110dd
PG
256class InternalError (Exception):
257 """Errors not ascribable to bad user inputs or cryptography."""
258 pass
259
15d3eefd
PG
260
261###############################################################################
a393d9cb
PG
262## crypto layer version
263###############################################################################
264
265ENCRYPTION_PARAMETERS = \
c46c8670 266 { 0: \
dd23cbc9
PG
267 { "kdf": ("dummy", 16)
268 , "enc": "passthrough" }
c46c8670 269 , 1: \
dd23cbc9
PG
270 { "kdf": ( "scrypt"
271 , { "dkLen" : 16
272 , "N" : 1 << 16
273 , "r" : 8
274 , "p" : 1
275 , "NaCl_LEN" : 16 })
276 , "enc": "aes-gcm" } }
a393d9cb 277
00b3cd10
PG
278###############################################################################
279## constants
280###############################################################################
281
dd47d6a2 282PDTCRYPT_HDR_MAGIC = b"PDTCRYPT"
00b3cd10 283
dd47d6a2
PG
284PDTCRYPT_HDR_SIZE_MAGIC = 8 # 8
285PDTCRYPT_HDR_SIZE_VERSION = 2 # 10
286PDTCRYPT_HDR_SIZE_PARAMVERSION = 2 # 12
287PDTCRYPT_HDR_SIZE_NACL = 16 # 28
288PDTCRYPT_HDR_SIZE_IV = 12 # 40
289PDTCRYPT_HDR_SIZE_CTSIZE = 8 # 48
290PDTCRYPT_HDR_SIZE_TAG = 16 # 64 GCM auth tag
00b3cd10 291
dd47d6a2
PG
292PDTCRYPT_HDR_SIZE = PDTCRYPT_HDR_SIZE_MAGIC + PDTCRYPT_HDR_SIZE_VERSION \
293 + PDTCRYPT_HDR_SIZE_PARAMVERSION + PDTCRYPT_HDR_SIZE_NACL \
294 + PDTCRYPT_HDR_SIZE_IV + PDTCRYPT_HDR_SIZE_CTSIZE \
295 + PDTCRYPT_HDR_SIZE_TAG # = 64
00b3cd10
PG
296
297# precalculate offsets since Python can’t do constant folding over names
dd47d6a2
PG
298HDR_OFF_VERSION = PDTCRYPT_HDR_SIZE_MAGIC
299HDR_OFF_PARAMVERSION = HDR_OFF_VERSION + PDTCRYPT_HDR_SIZE_VERSION
300HDR_OFF_NACL = HDR_OFF_PARAMVERSION + PDTCRYPT_HDR_SIZE_PARAMVERSION
301HDR_OFF_IV = HDR_OFF_NACL + PDTCRYPT_HDR_SIZE_NACL
302HDR_OFF_CTSIZE = HDR_OFF_IV + PDTCRYPT_HDR_SIZE_IV
303HDR_OFF_TAG = HDR_OFF_CTSIZE + PDTCRYPT_HDR_SIZE_CTSIZE
00b3cd10
PG
304
305FMT_UINT16_LE = "<H"
306FMT_UINT64_LE = "<Q"
50710d86 307FMT_I2N_IV = "<8sL" # 8 random bytes ‖ 32 bit counter
83f2d71e
PG
308FMT_I2N_HDR = ("<" # host byte order
309 "8s" # magic
310 "H" # version
311 "H" # paramversion
312 "16s" # sodium chloride
313 "12s" # iv
3b53fb98
PG
314 "Q" # size
315 "16s") # GCM tag
00b3cd10
PG
316
317# aes+gcm
addcec42
PG
318AES_KEY_SIZE = 16 # b"0123456789abcdef"
319AES_KEY_SIZE_B64 = 24 # b'MDEyMzQ1Njc4OWFiY2RlZg=='
cb7a3911
PG
320AES_GCM_MAX_SIZE = (1 << 36) - (1 << 5) # 2^39 - 2^8 b ≅ 64 GB
321PDTCRYPT_MAX_OBJ_SIZE_DEFAULT = 63 * (1 << 30) # 63 GB
322PDTCRYPT_MAX_OBJ_SIZE = PDTCRYPT_MAX_OBJ_SIZE_DEFAULT
00b3cd10 323
3031b7ae
PG
324# index and info files are written on-the fly while encrypting so their
325# counters must be available inadvance
cb7a3911
PG
326AES_GCM_IV_CNT_INFOFILE = 1 # constant
327AES_GCM_IV_CNT_INDEX = AES_GCM_IV_CNT_INFOFILE + 1
328AES_GCM_IV_CNT_DATA = AES_GCM_IV_CNT_INDEX + 1 # also for multivolume
329AES_GCM_IV_CNT_MAX_DEFAULT = 0xffFFffFF
330AES_GCM_IV_CNT_MAX = AES_GCM_IV_CNT_MAX_DEFAULT
2d6fd8c8 331
be124bca
PG
332# IV structure and generation
333PDTCRYPT_IV_GEN_MAX_RETRIES = 10 # ×
334PDTCRYPT_IV_FIXEDPART_SIZE = 8 # B
335PDTCRYPT_IV_COUNTER_SIZE = 4 # B
39accaaa 336
addcec42
PG
337# secret type: PW of string | KEY of char [16]
338PDTCRYPT_SECRET_PW = 0
339PDTCRYPT_SECRET_KEY = 1
340
00b3cd10 341###############################################################################
39accaaa 342## header, trailer
00b3cd10
PG
343###############################################################################
344#
345# Interface:
346#
347# struct hdrinfo
348# { version : u16
349# , paramversion : u16
350# , nacl : [u8; 16]
351# , iv : [u8; 12]
704ceaa5
PG
352# , ctsize : usize
353# , tag : [u8; 16] }
83f2d71e 354#
00b3cd10 355# fn hdr_read (f : handle) -> hdrinfo;
c2d1c3ec 356# fn hdr_make (f : handle, h : hdrinfo) -> IOResult<usize>;
00b3cd10
PG
357# fn hdr_fmt (h : hdrinfo) -> String;
358#
359
83f2d71e 360def hdr_read (data):
704ceaa5
PG
361 """
362 Read bytes as header structure.
363
364 If the input could not be interpreted as a header, fail with
365 ``InvalidHeader``.
366 """
83f2d71e 367
00b3cd10 368 try:
3b53fb98 369 mag, version, paramversion, nacl, iv, ctsize, tag = \
83f2d71e
PG
370 struct.unpack (FMT_I2N_HDR, data)
371 except Exception as exn:
15d3eefd
PG
372 raise InvalidHeader ("error unpacking header from [%r]: %s"
373 % (binascii.hexlify (data), str (exn)))
00b3cd10 374
dd47d6a2 375 if mag != PDTCRYPT_HDR_MAGIC:
15d3eefd 376 raise InvalidHeader ("bad magic in header: expected [%s], got [%s]"
dd47d6a2 377 % (PDTCRYPT_HDR_MAGIC, mag))
00b3cd10 378
15d3eefd 379 return \
00b3cd10
PG
380 { "version" : version
381 , "paramversion" : paramversion
382 , "nacl" : nacl
383 , "iv" : iv
384 , "ctsize" : ctsize
3b53fb98 385 , "tag" : tag
00b3cd10
PG
386 }
387
388
39accaaa 389def hdr_read_stream (instr):
704ceaa5
PG
390 """
391 Read header from stream at the current position.
392
393 Fail with ``InvalidHeader`` if insufficient bytes were read from the
394 stream, or if the content could not be interpreted as a header.
395 """
dd47d6a2 396 data = instr.read(PDTCRYPT_HDR_SIZE)
ae3d0f2a 397 ldata = len (data)
8a8ac469
PG
398 if ldata == 0:
399 raise EndOfFile
400 elif ldata != PDTCRYPT_HDR_SIZE:
401 raise InvalidHeader ("hdr_read_stream: expected %d B, received %d B"
402 % (PDTCRYPT_HDR_SIZE, ldata))
47e27926 403 return hdr_read (data)
39accaaa
PG
404
405
3b53fb98 406def hdr_from_params (version, paramversion, nacl, iv, ctsize, tag):
704ceaa5
PG
407 """
408 Assemble the necessary values into a PDTCRYPT header.
409
410 :type version: int to fit uint16_t
411 :type paramversion: int to fit uint16_t
412 :type nacl: bytes to fit uint8_t[16]
413 :type iv: bytes to fit uint8_t[12]
414 :type size: int to fit uint64_t
415 :type tag: bytes to fit uint8_t[16]
416 """
dd47d6a2 417 buf = bytearray (PDTCRYPT_HDR_SIZE)
83f2d71e 418 bufv = memoryview (buf)
00b3cd10 419
00b3cd10 420 try:
83f2d71e 421 struct.pack_into (FMT_I2N_HDR, bufv, 0,
dd47d6a2 422 PDTCRYPT_HDR_MAGIC,
3b53fb98 423 version, paramversion, nacl, iv, ctsize, tag)
83f2d71e 424 except Exception as exn:
a83fa4ed 425 return False, "error assembling header: %s" % str (exn)
00b3cd10 426
83f2d71e 427 return True, bytes (buf)
00b3cd10 428
00b3cd10 429
8a990744
PG
430def hdr_make_dummy (s):
431 """
432 Create a header sized block of bytes initialized to a value derived from a
433 string. Used to verify we’ve jumped back correctly to the actual position
434 of the object header.
435 """
436 c = reduce (lambda a, c: a + ord(c), s, 0) % 0xFF
dd47d6a2 437 return bytes (bytearray (struct.pack ("B", c)) * PDTCRYPT_HDR_SIZE)
8a990744
PG
438
439
a393d9cb 440def hdr_make (hdr):
704ceaa5
PG
441 """
442 Assemble a header from the given header structure.
443 """
a393d9cb
PG
444 return hdr_from_params (version=hdr.get("version"),
445 paramversion=hdr.get("paramversion"),
446 nacl=hdr.get("nacl"), iv=hdr.get("iv"),
3b53fb98 447 ctsize=hdr.get("ctsize"), tag=hdr.get("tag"))
a393d9cb
PG
448
449
83f2d71e 450HDR_FMT = "I2n_header { version: %d, paramversion: %d, nacl: %s[%d]," \
89131745 451 " iv: %s[%d], ctsize: %d, tag: %s[%d] }"
00b3cd10 452
83f2d71e 453def hdr_fmt (h):
704ceaa5 454 """Format a header structure into readable output."""
83f2d71e
PG
455 return HDR_FMT % (h["version"], h["paramversion"],
456 binascii.hexlify (h["nacl"]), len(h["nacl"]),
457 binascii.hexlify (h["iv"]), len(h["iv"]),
db1f3ac7
PG
458 h["ctsize"],
459 binascii.hexlify (h["tag"]), len(h["tag"]))
00b3cd10 460
00b3cd10 461
83f2d71e 462def hex_spaced_of_bytes (b):
704ceaa5 463 """Format bytes object, hexdump style."""
83f2d71e
PG
464 return " ".join ([ "%.2x%.2x" % (c1, c2)
465 for c1, c2 in zip (b[0::2], b[1::2]) ]) \
466 + (len (b) | 1 == len (b) and " %.2x" % b[-1] or "") # odd lengths
00b3cd10 467
591a722f 468
3031b7ae
PG
469def hdr_iv_counter (h):
470 """Extract the variable part of the IV of the given header."""
471 _fixed, cnt = struct.unpack (FMT_I2N_IV, h ["iv"])
472 return cnt
473
474
475def hdr_iv_fixed (h):
476 """Extract the fixed part of the IV of the given header."""
477 fixed, _cnt = struct.unpack (FMT_I2N_IV, h ["iv"])
478 return fixed
479
480
83f2d71e 481hdr_dump = hex_spaced_of_bytes
00b3cd10 482
00b3cd10 483
15d3eefd
PG
484HDR_FMT_PRETTY = \
485"""version = %-4d : %s
486paramversion = %-4d : %s
487nacl : %s
488iv : %s
489ctsize = %-20d : %s
490tag : %s
83f2d71e 491"""
00b3cd10 492
83f2d71e 493def hdr_fmt_pretty (h):
704ceaa5
PG
494 """
495 Format header structure into multi-line representation of its contents and
496 their raw representation. (Omit the implicit “PDTCRYPT” magic bytes that
497 precede every header.)
498 """
83f2d71e
PG
499 return HDR_FMT_PRETTY \
500 % (h["version"],
501 hex_spaced_of_bytes (struct.pack (FMT_UINT16_LE, h["version"])),
502 h["paramversion"],
503 hex_spaced_of_bytes (struct.pack (FMT_UINT16_LE, h["paramversion"])),
504 hex_spaced_of_bytes (h["nacl"]),
505 hex_spaced_of_bytes (h["iv"]),
506 h["ctsize"],
15d3eefd
PG
507 hex_spaced_of_bytes (struct.pack (FMT_UINT64_LE, h["ctsize"])),
508 hex_spaced_of_bytes (h["tag"]))
00b3cd10 509
f6cd676f
PG
510IV_FMT = "((f %s) (c %d))"
511
512def iv_fmt (iv):
704ceaa5 513 """Format the two components of an IV in a readable fashion."""
f6cd676f
PG
514 fixed, cnt = struct.unpack (FMT_I2N_IV, iv)
515 return IV_FMT % (binascii.hexlify (fixed), cnt)
516
00b3cd10 517
00b3cd10 518###############################################################################
f41973a6
PG
519## restoration
520###############################################################################
521
522class Location (object):
523 n = 0
524 offset = 0
525
526def restore_loc_fmt (loc):
527 return "%d off:%d" \
528 % (loc.n, loc.offset)
529
530def locate_hdr_candidates (fd):
531 """
532 Walk over instances of the magic string in the payload, collecting their
533 positions. If the offset of the first found instance is not zero, the file
534 begins with leading garbage.
535
536 :return: The list of offsets in the file.
537 """
538 cands = []
539
540 mm = mmap.mmap(fd, 0, mmap.MAP_SHARED, mmap.PROT_READ)
541 pos = 0
542 while True:
543 pos = mm.find (PDTCRYPT_HDR_MAGIC, pos)
544 if pos == -1:
545 break
546 cands.append (pos)
547 pos += 1
548
549 return cands
550
551
6c8073ab
PG
552HDR_CAND_GOOD = 0 # header marks begin of valid object
553HDR_CAND_FISHY = 1 # inconclusive (tag mismatch, obj overlap etc.)
554HDR_CAND_JUNK = 2 # not a header / object unreadable
555
556
557def inspect_hdr (fd, off):
558 """
559 Attempt to parse a header in *fd* at position *off*.
560
561 Returns a verdict about the quality of that header plus the parsed header
562 when readable.
563 """
564
565 _ = os.lseek (fd, off, os.SEEK_SET)
566
567 if os.lseek (fd, 0, os.SEEK_CUR) != off:
568 if PDTCRYPT_VERBOSE is True:
569 noise ("PDT: %d → dismissed (lseek() past EOF)" % off)
570 return HDR_CAND_JUNK, None
571
572 raw = os.read (fd, PDTCRYPT_HDR_SIZE)
573 if len (raw) != PDTCRYPT_HDR_SIZE:
574 if PDTCRYPT_VERBOSE is True:
575 noise ("PDT: %d → dismissed (EOF inside header)" % off)
576 return HDR_CAND_JUNK, None
577
578 try:
579 hdr = hdr_read (raw)
580 except InvalidHeader as exn:
581 if PDTCRYPT_VERBOSE is True:
582 noise ("PDT: %d → dismissed (invalid: [%s])" % (off, str (exn)))
583 return HDR_CAND_JUNK, None
584
585 obj0 = off + PDTCRYPT_HDR_SIZE
586 objX = obj0 + hdr ["ctsize"]
587
588 eof = os.lseek (fd, 0, os.SEEK_END)
589 if eof < objX:
590 if PDTCRYPT_VERBOSE is True:
591 noise ("PDT: %d → EOF inside object (%d≤%d≤%d); adjusting size to "
592 "%d" % (off, obj0, eof, objX, (eof - obj0)))
593 # try reading up to the end
594 hdr ["ctsize"] = eof - obj0
595 return HDR_CAND_FISHY, hdr
596
597 return HDR_CAND_GOOD, hdr
598
599
a808459e 600def try_decrypt (ifd, off, hdr, secret, ofd=-1):
6c8073ab 601 """
a808459e
PG
602 Attempt to decrypt the object in the (seekable) descriptor *ifd* starting
603 at *off* using the metadata in *hdr* and *secret*. An output fd can be
604 specified with *ofd*; if it is *-1* – the default –, the decrypted payload
605 will be discarded.
70a33834
PG
606
607 Always creates a fresh decryptor, so validation steps across objects don’t
608 apply.
202104ed
PG
609
610 Errors during GCM tag validation are ignored.
6c8073ab 611 """
70a33834
PG
612 ctleft = hdr ["ctsize"]
613 pos = off
614
615 ks = secret [0]
616 if ks == PDTCRYPT_SECRET_PW:
617 decr = Decrypt (password=secret [1])
618 elif ks == PDTCRYPT_SECRET_KEY:
6257d5b3 619 key = secret [1]
70a33834
PG
620 decr = Decrypt (key=key)
621 else:
622 raise RuntimeError
623
70a33834
PG
624 decr.next (hdr)
625
626 try:
a808459e 627 os.lseek (ifd, pos, os.SEEK_SET)
70a33834
PG
628 while ctleft > 0:
629 cnksiz = min (ctleft, PDTCRYPT_BLOCKSIZE)
a808459e 630 cnk = os.read (ifd, cnksiz)
70a33834
PG
631 ctleft -= cnksiz
632 pos += cnksiz
a808459e
PG
633 pt = decr.process (cnk)
634 if ofd != -1:
635 os.write (ofd, pt)
202104ed
PG
636 try:
637 pt = decr.done ()
638 except InvalidGCMTag:
639 noise ("PDT: GCM tag mismatch for object %d–%d"
640 % (off, off + hdr ["ctsize"]))
a808459e
PG
641 if len (pt) > 0 and ofd != -1:
642 os.write (ofd, pt)
70a33834 643
70a33834
PG
644 except Exception as exn:
645 noise ("PDT: error decrypting object %d–%d@%d, %d B remaining [%s]"
646 % (off, off + hdr ["ctsize"], pos, ctleft, exn))
647 raise
6c8073ab 648
70a33834 649 return pos - off
6c8073ab
PG
650
651
6690f5e0
PG
652def readable_objects_offsets (ifd, secret, cands):
653 """
654 From a list of candidates, locate the ones that mark the start of actual
655 readable PDTCRYPT objects.
656 """
657 good = []
24afaf18
PG
658
659 for i, cand in enumerate (cands):
6690f5e0
PG
660 vdt, hdr = inspect_hdr (ifd, cand)
661 if vdt == HDR_CAND_JUNK:
662 pass # ignore unreadable ones
663 elif vdt in [HDR_CAND_GOOD, HDR_CAND_FISHY]:
24afaf18 664 ctsize = hdr ["ctsize"]
6690f5e0 665 off0 = cand + PDTCRYPT_HDR_SIZE
24afaf18 666 ok = try_decrypt (ifd, off0, hdr, secret) == ctsize
6690f5e0 667 if ok is True:
24afaf18
PG
668 good.append ((cand, off0 + ctsize))
669
670 overlap = find_overlaps (good)
671
672 return [ g [0] for g in good ]
6690f5e0
PG
673
674
675def reconstruct_offsets (fname, secret):
676 ifd = os.open (fname, os.O_RDONLY)
677
678 try:
679 cands = locate_hdr_candidates (ifd)
680 return readable_objects_offsets (ifd, secret, cands)
681 finally:
682 os.close (ifd)
683
684
f41973a6 685###############################################################################
addcec42
PG
686## helpers
687###############################################################################
688
689def make_secret (password=None, key=None):
690 """
691 Safely create a “secret” value that consists either of a key or a password.
692 Inputs are validated: the password is accepted as (UTF-8 encoded) bytes or
693 string; for the key only a bytes object of the proper size or a base64
694 encoded string thereof is accepted.
695
696 If both are provided, the key is preferred over the password; no checks are
697 performed whether the key is derived from the password.
698
699 :returns: secret value if inputs were acceptable | None otherwise.
700 """
701 if key is not None:
702 if isinstance (key, str) is True:
703 key = key.encode ("utf-8")
704 if isinstance (key, bytes) is True:
705 if len (key) == AES_KEY_SIZE:
706 return (PDTCRYPT_SECRET_KEY, key)
6257d5b3
PG
707 if len (key) == AES_KEY_SIZE * 2:
708 try:
709 key = binascii.unhexlify (key)
710 return (PDTCRYPT_SECRET_KEY, key)
711 except binascii.Error: # garbage in string
712 pass
addcec42
PG
713 if len (key) == AES_KEY_SIZE_B64:
714 try:
715 key = base64.b64decode (key)
716 # the base64 processor is very tolerant and allows for
6257d5b3 717 # arbitrary trailing and leading data thus the data obtained
addcec42
PG
718 # must be checked for the proper length
719 if len (key) == AES_KEY_SIZE:
720 return (PDTCRYPT_SECRET_KEY, key)
721 except binascii.Error: # “incorrect padding”
722 pass
723 elif password is not None:
724 if isinstance (password, str) is True:
725 return (PDTCRYPT_SECRET_PW, password)
726 elif isinstance (password, bytes) is True:
727 try:
728 password = password.decode ("utf-8")
729 return (PDTCRYPT_SECRET_PW, password)
730 except UnicodeDecodeError:
731 pass
732
733 return None
734
735
736###############################################################################
6178061e
PG
737## passthrough / null encryption
738###############################################################################
739
740class PassthroughCipher (object):
741
742 tag = struct.pack ("<QQ", 0, 0)
743
744 def __init__ (self) : pass
745
746 def update (self, b) : return b
747
50710d86 748 def finalize (self) : return b""
6178061e
PG
749
750 def finalize_with_tag (self, _) : return b""
751
752###############################################################################
a393d9cb 753## convenience wrapper
00b3cd10
PG
754###############################################################################
755
c46c8670
PG
756
757def kdf_dummy (klen, password, _nacl):
704ceaa5
PG
758 """
759 Fake KDF for testing purposes that is called when parameter version zero is
760 encountered.
761 """
c46c8670
PG
762 q, r = divmod (klen, len (password))
763 if isinstance (password, bytes) is False:
764 password = password.encode ()
765 return password * q + password [:r], b""
766
767
768SCRYPT_KEY_MEMO = { } # static because needed for both the info file and the archive
769
770
771def kdf_scrypt (params, password, nacl):
704ceaa5
PG
772 """
773 Wrapper for the Scrypt KDF, corresponds to parameter version one. The
774 computation result is memoized based on the inputs to facilitate spawning
775 multiple encryption contexts.
776 """
c46c8670
PG
777 N = params["N"]
778 r = params["r"]
779 p = params["p"]
780 dkLen = params["dkLen"]
781
782 if nacl is None:
783 nacl = os.urandom (params["NaCl_LEN"])
784
785 key_parms = (password, nacl, N, r, p, dkLen)
786 global SCRYPT_KEY_MEMO
787 if key_parms not in SCRYPT_KEY_MEMO:
788 SCRYPT_KEY_MEMO [key_parms] = \
789 pylibscrypt.scrypt (password, nacl, N, r, p, dkLen)
790 return SCRYPT_KEY_MEMO [key_parms], nacl
a64085a8
PG
791
792
da82bc58 793def kdf_by_version (paramversion=None, defs=None):
704ceaa5
PG
794 """
795 Pick the KDF handler corresponding to the parameter version or the
796 definition set.
797
798 :rtype: function (password : str, nacl : str) -> str
799 """
da82bc58
PG
800 if paramversion is not None:
801 defs = ENCRYPTION_PARAMETERS.get(paramversion, None)
a64085a8 802 if defs is None:
1ed44e7b
PG
803 raise InvalidParameter ("no encryption parameters for version %r"
804 % paramversion)
a64085a8 805 (kdf, params) = defs["kdf"]
c46c8670
PG
806 fn = None
807 if kdf == "scrypt" : fn = kdf_scrypt
808 if kdf == "dummy" : fn = kdf_dummy
809 if fn is None:
a64085a8 810 raise ValueError ("key derivation method %r unknown" % kdf)
c46c8670 811 return partial (fn, params)
a64085a8
PG
812
813
b360b772
PG
814###############################################################################
815## SCRYPT hashing
816###############################################################################
817
818def scrypt_hashsource (pw, ins):
819 """
820 Calculate the SCRYPT hash from the password and the information contained
821 in the first header found in ``ins``.
822
823 This does not validate whether the first object is encrypted correctly.
824 """
c1ecc2e2
PG
825 if isinstance (pw, str) is True:
826 pw = str.encode (pw)
827 elif isinstance (pw, bytes) is False:
828 raise InvalidParameter ("password must be a string, not %s"
829 % type (password))
830 if isinstance (ins, io.BufferedReader) is False and \
831 isinstance (ins, io.FileIO) is False:
832 raise InvalidParameter ("file to hash must be opened in “binary” mode")
b360b772
PG
833 hdr = None
834 try:
835 hdr = hdr_read_stream (ins)
836 except EndOfFile as exn:
837 noise ("PDT: malformed input: end of file reading first object header")
838 noise ("PDT:")
839 return 1
840
841 nacl = hdr ["nacl"]
842 pver = hdr ["paramversion"]
843 if PDTCRYPT_VERBOSE is True:
844 noise ("PDT: salt of first object : %s" % binascii.hexlify (nacl))
845 noise ("PDT: parameter version of archive : %d" % pver)
846
847 try:
848 defs = ENCRYPTION_PARAMETERS.get(pver, None)
849 kdfname, params = defs ["kdf"]
850 if kdfname != "scrypt":
851 noise ("PDT: input is not an SCRYPT archive")
852 noise ("")
853 return 1
854 kdf = kdf_by_version (None, defs)
855 except ValueError as exn:
856 noise ("PDT: object has unknown parameter version %d" % pver)
857
858 hsh, _void = kdf (pw, nacl)
859
c1ecc2e2 860 return hsh, nacl, hdr ["version"], pver
b360b772
PG
861
862
863def scrypt_hashfile (pw, fname):
704ceaa5
PG
864 """
865 Calculate the SCRYPT hash from the password and the information contained
866 in the first header found in the given file. The header is read only at
867 offset zero.
868 """
b360b772 869 with deptdcrypt_mk_stream (PDTCRYPT_SOURCE, fname or "-") as ins:
c1ecc2e2 870 hsh, _void, _void, _void = scrypt_hashsource (pw, ins)
b360b772
PG
871 return hsh
872
873
874###############################################################################
875## AES-GCM context
876###############################################################################
877
a393d9cb
PG
878class Crypto (object):
879 """
880 Encryption context to remain alive throughout an entire tarfile pass.
881 """
6178061e 882 enc = None
a393d9cb
PG
883 nacl = None
884 key = None
50710d86
PG
885 cnt = None # file counter (uint32_t != 0)
886 iv = None # current IV
30019abf
PG
887 fixed = None # accu for 64 bit fixed parts of IV
888 used_ivs = None # tracks IVs
889 strict_ivs = False # if True, panic on duplicate object IV
48db09ba
PG
890 password = None
891 paramversion = None
633b18a9
PG
892 stats = { "in" : 0
893 , "out" : 0
894 , "obj" : 0 }
fa47412e 895
fa47412e
PG
896 ctsize = -1
897 ptsize = -1
3031b7ae
PG
898 info_counter_used = False
899 index_counter_used = False
a393d9cb 900
a64085a8 901 def __init__ (self, *al, **akv):
30019abf 902 self.used_ivs = set ()
a64085a8 903 self.set_parameters (*al, **akv)
39accaaa
PG
904
905
704ceaa5 906 def next_fixed (self):
be124bca 907 # NOP for decryption
50710d86
PG
908 pass
909
910
911 def set_object_counter (self, cnt=None):
704ceaa5
PG
912 """
913 Safely set the internal counter of encrypted objects. Numerous
914 constraints apply:
915
916 The same counter may not be reused in combination with one IV fixed
917 part. This is validated elsewhere in the IV handling.
918
919 Counter zero is invalid. The first two counters are reserved for
920 metadata. The implementation does not allow for splitting metadata
921 files over multiple encrypted objects. (This would be possible by
922 assigning new fixed parts.) Thus in a Deltatar backup there is at most
923 one object with a counter value of one and two. On creation of a
924 context, the initial counter may be chosen. The globals
925 ``AES_GCM_IV_CNT_INFOFILE`` and ``AES_GCM_IV_CNT_INDEX`` can be used to
926 request one of the reserved values. If one of these values has been
927 used, any further attempt of setting the counter to that value will
928 be rejected with an ``InvalidFileCounter`` exception.
929
930 Out of bounds values (i. e. below one and more than the maximum of 2³²)
931 cause an ``InvalidParameter`` exception to be thrown.
932 """
50710d86
PG
933 if cnt is None:
934 self.cnt = AES_GCM_IV_CNT_DATA
935 return
936 if cnt == 0 or cnt > AES_GCM_IV_CNT_MAX + 1:
b12110dd
PG
937 raise InvalidParameter ("invalid counter value %d requested: "
938 "acceptable values are from 1 to %d"
939 % (cnt, AES_GCM_IV_CNT_MAX))
50710d86
PG
940 if cnt == AES_GCM_IV_CNT_INFOFILE:
941 if self.info_counter_used is True:
fac2cfe1
PG
942 raise InvalidFileCounter ("attempted to reuse info file "
943 "counter %d: must be unique" % cnt)
50710d86 944 self.info_counter_used = True
3031b7ae
PG
945 elif cnt == AES_GCM_IV_CNT_INDEX:
946 if self.index_counter_used is True:
fac2cfe1
PG
947 raise InvalidFileCounter ("attempted to reuse index file "
948 " counter %d: must be unique" % cnt)
3031b7ae 949 self.index_counter_used = True
50710d86
PG
950 if cnt <= AES_GCM_IV_CNT_MAX:
951 self.cnt = cnt
952 return
953 # cnt == AES_GCM_IV_CNT_MAX + 1 → wrap
954 self.cnt = AES_GCM_IV_CNT_DATA
704ceaa5 955 self.next_fixed ()
50710d86
PG
956
957
1f3fd7b0 958 def set_parameters (self, password=None, key=None, paramversion=None,
be124bca 959 nacl=None, counter=None, strict_ivs=False):
704ceaa5
PG
960 """
961 Configure the internal state of a crypto context. Not intended for
962 external use.
963 """
be124bca 964 self.next_fixed ()
50710d86 965 self.set_object_counter (counter)
30019abf
PG
966 self.strict_ivs = strict_ivs
967
a83fa4ed
PG
968 if paramversion is not None:
969 self.paramversion = paramversion
970
1f3fd7b0
PG
971 if key is not None:
972 self.key, self.nacl = key, nacl
973 return
974
a83fa4ed
PG
975 if password is not None:
976 if isinstance (password, bytes) is False:
977 password = str.encode (password)
978 self.password = password
979 if paramversion is None and nacl is None:
980 # postpone key setup until first header is available
981 return
982 kdf = kdf_by_version (paramversion)
983 if kdf is not None:
984 self.key, self.nacl = kdf (password, nacl)
fa47412e 985
39accaaa 986
39accaaa 987 def process (self, buf):
704ceaa5
PG
988 """
989 Encrypt / decrypt a buffer. Invokes the ``.update()`` method on the
990 wrapped encryptor or decryptor, respectively.
991
992 The Cryptography exception ``AlreadyFinalized`` is translated to an
993 ``InternalError`` at this point. It may occur in sound code when the GC
994 closes an encrypting stream after an error. Everywhere else it must be
995 treated as a bug.
996 """
cb7a3911
PG
997 if self.enc is None:
998 raise RuntimeError ("process: context not initialized")
999 self.stats ["in"] += len (buf)
fac2cfe1
PG
1000 try:
1001 out = self.enc.update (buf)
1002 except cryptography.exceptions.AlreadyFinalized as exn:
1003 raise InternalError (exn)
cb7a3911
PG
1004 self.stats ["out"] += len (out)
1005 return out
39accaaa
PG
1006
1007
30019abf 1008 def next (self, password, paramversion, nacl, iv):
704ceaa5
PG
1009 """
1010 Prepare for encrypting another object: Reset the data counters and
1011 change the configuration in case one of the variable parameters differs
1012 from the last object. Also check the IV for duplicates and error out
1013 if strict checking was requested.
1014 """
fa47412e
PG
1015 self.ctsize = 0
1016 self.ptsize = 0
1017 self.stats ["obj"] += 1
30019abf
PG
1018
1019 self.check_duplicate_iv (iv)
1020
6178061e
PG
1021 if ( self.paramversion != paramversion
1022 or self.password != password
1023 or self.nacl != nacl):
1f3fd7b0 1024 self.set_parameters (password=password, paramversion=paramversion,
30019abf
PG
1025 nacl=nacl, strict_ivs=self.strict_ivs)
1026
1027
1028 def check_duplicate_iv (self, iv):
704ceaa5
PG
1029 """
1030 Add an IV (the 12 byte representation as in the header) to the list. With
1031 strict checking enabled, this will throw a ``DuplicateIV``. Depending on
1032 the context, this may indicate a serious error (IV reuse).
1033 """
30019abf
PG
1034 if self.strict_ivs is True and iv in self.used_ivs:
1035 raise DuplicateIV ("iv %s was reused" % iv_fmt (iv))
1036 # vi has not been used before; add to collection
1037 self.used_ivs.add (iv)
fa47412e
PG
1038
1039
633b18a9 1040 def counters (self):
704ceaa5
PG
1041 """
1042 Access the data counters.
1043 """
633b18a9
PG
1044 return self.stats ["obj"], self.stats ["in"], self.stats ["out"]
1045
1046
8de91f4f
PG
1047 def drop (self):
1048 """
1049 Clear the current context regardless of its finalization state. The
1050 next operation must be ``.next()``.
1051 """
1052 self.enc = None
1053
1054
39accaaa
PG
1055class Encrypt (Crypto):
1056
48db09ba
PG
1057 lastinfo = None
1058 version = None
72a42219 1059 paramenc = None
50710d86 1060
1f3fd7b0 1061 def __init__ (self, version, paramversion, password=None, key=None, nacl=None,
30019abf 1062 counter=AES_GCM_IV_CNT_DATA, strict_ivs=True):
704ceaa5
PG
1063 """
1064 The ctor will throw immediately if one of the parameters does not conform
1065 to our expectations.
1066
1067 counter=AES_GCM_IV_CNT_DATA, strict_ivs=True):
1068 :type version: int to fit uint16_t
1069 :type paramversion: int to fit uint16_t
1070 :param password: mutually exclusive with ``key``
1071 :type password: bytes
1072 :param key: mutually exclusive with ``password``
1073 :type key: bytes
1074 :type nacl: bytes
1075 :type counter: initial object counter the values
1076 ``AES_GCM_IV_CNT_INFOFILE`` and
1077 ``AES_GCM_IV_CNT_INDEX`` are unique in each backup set
1078 and cannot be reused even with different fixed parts.
1079 :type strict_ivs: bool
1080 """
1f3fd7b0
PG
1081 if password is None and key is None \
1082 or password is not None and key is not None :
1083 raise InvalidParameter ("__init__: need either key or password")
1084
1085 if key is not None:
1086 if isinstance (key, bytes) is False:
1087 raise InvalidParameter ("__init__: key must be provided as "
1088 "bytes, not %s" % type (key))
1089 if nacl is None:
1090 raise InvalidParameter ("__init__: salt must be provided along "
1091 "with encryption key")
1092 else: # password, no key
1093 if isinstance (password, str) is False:
1094 raise InvalidParameter ("__init__: password must be a string, not %s"
1095 % type (password))
1096 if len (password) == 0:
1097 raise InvalidParameter ("__init__: supplied empty password but not "
1098 "permitted for PDT encrypted files")
36b9932a
PG
1099 # version
1100 if isinstance (version, int) is False:
1101 raise InvalidParameter ("__init__: version number must be an "
1102 "integer, not %s" % type (version))
1103 if version < 0:
1104 raise InvalidParameter ("__init__: version number must be a "
1105 "nonnegative integer, not %d" % version)
1106 # paramversion
1107 if isinstance (paramversion, int) is False:
1108 raise InvalidParameter ("__init__: crypto parameter version number "
1109 "must be an integer, not %s"
1110 % type (paramversion))
1111 if paramversion < 0:
1112 raise InvalidParameter ("__init__: crypto parameter version number "
1113 "must be a nonnegative integer, not %d"
1114 % paramversion)
1115 # salt
1116 if nacl is not None:
1117 if isinstance (nacl, bytes) is False:
1118 raise InvalidParameter ("__init__: salt given, but of type %s "
1119 "instead of bytes" % type (nacl))
1120 # salt length would depend on the actual encryption so it can’t be
1121 # validated at this point
b12110dd 1122 self.fixed = [ ]
48db09ba
PG
1123 self.version = version
1124 self.paramenc = ENCRYPTION_PARAMETERS.get (paramversion) ["enc"]
72a42219 1125
1f3fd7b0 1126 super().__init__ (password, key, paramversion, nacl, counter=counter,
30019abf 1127 strict_ivs=strict_ivs)
a393d9cb
PG
1128
1129
be124bca
PG
1130 def next_fixed (self, retries=PDTCRYPT_IV_GEN_MAX_RETRIES):
1131 """
1132 Generate the next IV fixed part by reading eight bytes from
1133 ``/dev/urandom``. The buffer so obtained is tested against the fixed
1134 parts used so far to prevent accidental reuse of IVs. After a
1135 configurable number of attempts to create a unique fixed part, it will
1136 refuse to continue with an ``IVFixedPartError``. This is unlikely to
1137 ever happen on a normal system but may detect an issue with the random
1138 generator.
1139
1140 The list of fixed parts that were used by the context at hand can be
1141 accessed through the ``.fixed`` list. Its last element is the fixed
1142 part currently in use.
1143 """
1144 i = 0
1145 while i < retries:
1146 fp = os.urandom (PDTCRYPT_IV_FIXEDPART_SIZE)
1147 if fp not in self.fixed:
1148 self.fixed.append (fp)
1149 return
1150 i += 1
1151 raise IVFixedPartError ("error obtaining a unique IV fixed part from "
1152 "/dev/urandom; giving up after %d tries" % i)
1153
1154
a393d9cb 1155 def iv_make (self):
704ceaa5
PG
1156 """
1157 Construct a 12-bytes IV from the current fixed part and the object
1158 counter.
1159 """
b12110dd 1160 return struct.pack(FMT_I2N_IV, self.fixed [-1], self.cnt)
a393d9cb
PG
1161
1162
cb7a3911 1163 def next (self, filename=None, counter=None):
704ceaa5
PG
1164 """
1165 Prepare for encrypting the next incoming object. Update the counter
1166 and put together the IV, possibly changing prefixes. Then create the
1167 new encryptor.
1168
1169 The argument ``counter`` can be used to specify a file counter for this
1170 object. Unless it is one of the reserved values, the counter of
1171 subsequent objects will be computed from this one.
1172
1173 If this is the first object in a series, ``filename`` is required,
1174 otherwise it is reused if not present. The value is used to derive a
1175 header sized placeholder to use until after encryption when all the
1176 inputs to construct the final header are available. This is then
1177 matched in ``.done()`` against the value found at the position of the
1178 header. The motivation for this extra check is primarily to assist
1179 format debugging: It makes stray headers easy to spot in malformed
1180 PDTCRYPT files.
1181 """
cb7a3911
PG
1182 if filename is None:
1183 if self.lastinfo is None:
1184 raise InvalidParameter ("next: filename is mandatory for "
1185 "first object")
1186 filename, _dummy = self.lastinfo
1187 else:
1188 if isinstance (filename, str) is False:
1189 raise InvalidParameter ("next: filename must be a string, no %s"
1190 % type (filename))
3031b7ae
PG
1191 if counter is not None:
1192 if isinstance (counter, int) is False:
1193 raise InvalidParameter ("next: the supplied counter is of "
1194 "invalid type %s; please pass an "
1195 "integer instead" % type (counter))
1196 self.set_object_counter (counter)
fac2cfe1 1197
50710d86 1198 self.iv = self.iv_make ()
72a42219 1199 if self.paramenc == "aes-gcm":
6178061e
PG
1200 self.enc = Cipher \
1201 ( algorithms.AES (self.key)
1202 , modes.GCM (self.iv)
1203 , backend = default_backend ()) \
1204 .encryptor ()
72a42219 1205 elif self.paramenc == "passthrough":
6178061e
PG
1206 self.enc = PassthroughCipher ()
1207 else:
b12110dd
PG
1208 raise InvalidParameter ("next: parameter version %d not known"
1209 % self.paramversion)
48db09ba
PG
1210 hdrdum = hdr_make_dummy (filename)
1211 self.lastinfo = (filename, hdrdum)
30019abf 1212 super().next (self.password, self.paramversion, self.nacl, self.iv)
72a42219 1213
3031b7ae 1214 self.set_object_counter (self.cnt + 1)
48db09ba 1215 return hdrdum
a393d9cb 1216
a393d9cb 1217
cd77dadb 1218 def done (self, cmpdata):
704ceaa5
PG
1219 """
1220 Complete encryption of an object. After this has been called, attempts
1221 of encrypting further data will cause an error until ``.next()`` is
1222 invoked properly.
1223
1224 Returns a 64 bytes buffer containing the object header including all
1225 values including the “late” ones e. g. the ciphertext size and the
1226 GCM tag.
1227 """
36b9932a
PG
1228 if isinstance (cmpdata, bytes) is False:
1229 raise InvalidParameter ("done: comparison input expected as bytes, "
1230 "not %s" % type (cmpdata))
cb7a3911
PG
1231 if self.lastinfo is None:
1232 raise RuntimeError ("done: encryption context not initialized")
48db09ba
PG
1233 filename, hdrdum = self.lastinfo
1234 if cmpdata != hdrdum:
b12110dd
PG
1235 raise RuntimeError ("done: bad sync of header for object %d: "
1236 "preliminary data does not match; this likely "
1237 "indicates a wrongly repositioned stream"
1238 % self.cnt)
6178061e 1239 data = self.enc.finalize ()
633b18a9 1240 self.stats ["out"] += len (data)
cd77dadb 1241 self.ctsize += len (data)
48db09ba
PG
1242 ok, hdr = hdr_from_params (self.version, self.paramversion, self.nacl,
1243 self.iv, self.ctsize, self.enc.tag)
8a990744 1244 if ok is False:
b12110dd
PG
1245 raise InternalError ("error constructing header: %r" % hdr)
1246 return data, hdr, self.fixed
a393d9cb 1247
a393d9cb 1248
cd77dadb 1249 def process (self, buf):
704ceaa5
PG
1250 """
1251 Encrypt a chunk of plaintext with the active encryptor. Returns the
1252 size of the input consumed. This **must** be checked downstream. If the
1253 maximum possible object size has been reached, the current context must
1254 be finalized and a new one established before any further data can be
1255 encrypted. The second argument is the remainder of the plaintext that
1256 was not encrypted for the caller to use immediately after the new
1257 context is ready.
1258 """
36b9932a
PG
1259 if isinstance (buf, bytes) is False:
1260 raise InvalidParameter ("process: expected byte buffer, not %s"
1261 % type (buf))
cb7a3911
PG
1262 bsize = len (buf)
1263 newptsize = self.ptsize + bsize
1264 diff = newptsize - PDTCRYPT_MAX_OBJ_SIZE
1265 if diff > 0:
1266 bsize -= diff
1267 newptsize = PDTCRYPT_MAX_OBJ_SIZE
1268 self.ptsize = newptsize
1269 data = super().process (buf [:bsize])
cd77dadb 1270 self.ctsize += len (data)
cb7a3911 1271 return bsize, data
cd77dadb
PG
1272
1273
39accaaa 1274class Decrypt (Crypto):
a393d9cb 1275
3031b7ae 1276 tag = None # GCM tag, part of header
3031b7ae 1277 last_iv = None # check consecutive ivs in strict mode
39accaaa 1278
1f3fd7b0 1279 def __init__ (self, password=None, key=None, counter=None, fixedparts=None,
ee6aa239 1280 strict_ivs=False):
704ceaa5
PG
1281 """
1282 Sanitizing ctor for the decryption context. ``fixedparts`` specifies a
1283 list of IV fixed parts accepted during decryption. If a fixed part is
1284 encountered that is not in the list, decryption will fail.
1285
1286 :param password: mutually exclusive with ``key``
1287 :type password: bytes
1288 :param key: mutually exclusive with ``password``
1289 :type key: bytes
1290 :type counter: initial object counter the values
1291 ``AES_GCM_IV_CNT_INFOFILE`` and
1292 ``AES_GCM_IV_CNT_INDEX`` are unique in each backup set
1293 and cannot be reused even with different fixed parts.
1294 :type fixedparts: bytes list
1295 """
1f3fd7b0
PG
1296 if password is None and key is None \
1297 or password is not None and key is not None :
1298 raise InvalidParameter ("__init__: need either key or password")
1299
1300 if key is not None:
1301 if isinstance (key, bytes) is False:
1302 raise InvalidParameter ("__init__: key must be provided as "
1303 "bytes, not %s" % type (key))
1304 else: # password, no key
1305 if isinstance (password, str) is False:
1306 raise InvalidParameter ("__init__: password must be a string, not %s"
1307 % type (password))
1308 if len (password) == 0:
1309 raise InvalidParameter ("__init__: supplied empty password but not "
1310 "permitted for PDT encrypted files")
36b9932a 1311 # fixed parts
50710d86 1312 if fixedparts is not None:
36b9932a
PG
1313 if isinstance (fixedparts, list) is False:
1314 raise InvalidParameter ("__init__: IV fixed parts must be "
1315 "supplied as list, not %s"
1316 % type (fixedparts))
b12110dd
PG
1317 self.fixed = fixedparts
1318 self.fixed.sort ()
ee6aa239 1319
a83fa4ed
PG
1320 super().__init__ (password=password, key=key, counter=counter,
1321 strict_ivs=strict_ivs)
39accaaa
PG
1322
1323
b12110dd 1324 def valid_fixed_part (self, iv):
704ceaa5
PG
1325 """
1326 Check if a fixed part was already seen.
1327 """
50710d86 1328 # check if fixed part is known
b12110dd
PG
1329 fixed, _cnt = struct.unpack (FMT_I2N_IV, iv)
1330 i = bisect.bisect_left (self.fixed, fixed)
1331 return i != len (self.fixed) and self.fixed [i] == fixed
50710d86
PG
1332
1333
ee6aa239 1334 def check_consecutive_iv (self, iv):
704ceaa5
PG
1335 """
1336 Check whether the counter part of the given IV is indeed the successor
1337 of the currently present counter. This should always be the case for
1338 the objects in a well formed PDT archive but should not be enforced
1339 when decrypting out-of-order.
1340 """
ee6aa239 1341 fixed, cnt = struct.unpack (FMT_I2N_IV, iv)
3031b7ae
PG
1342 if self.strict_ivs is True \
1343 and self.last_iv is not None \
ee6aa239
PG
1344 and self.last_iv [0] == fixed \
1345 and self.last_iv [1] != cnt - 1:
f6cd676f 1346 raise NonConsecutiveIV ("iv %s counter not successor of "
ee6aa239 1347 "last object (expected %d, found %d)"
f6cd676f 1348 % (iv_fmt (self.last_iv [1]), cnt))
ee6aa239
PG
1349 self.last_iv = (iv, cnt)
1350
1351
79782fa9 1352 def next (self, hdr):
704ceaa5
PG
1353 """
1354 Start decrypting the next object. The PDTCRYPT header for the object
1355 can be given either as already parsed object or as bytes.
1356 """
dccfe104
PG
1357 if isinstance (hdr, bytes) is True:
1358 hdr = hdr_read (hdr)
36b9932a
PG
1359 elif isinstance (hdr, dict) is False:
1360 # this won’t catch malformed specs though
1361 raise InvalidParameter ("next: wrong type of parameter hdr: "
1362 "expected bytes or spec, got %s"
fbfda3d4 1363 % type (hdr))
36b9932a
PG
1364 try:
1365 paramversion = hdr ["paramversion"]
1366 nacl = hdr ["nacl"]
1367 iv = hdr ["iv"]
1368 tag = hdr ["tag"]
1369 except KeyError:
1370 raise InvalidHeader ("next: not a header %r" % hdr)
1371
30019abf 1372 super().next (self.password, paramversion, nacl, iv)
b12110dd 1373 if self.fixed is not None and self.valid_fixed_part (iv) is False:
f6cd676f
PG
1374 raise InvalidIVFixedPart ("iv %s has invalid fixed part"
1375 % iv_fmt (iv))
3031b7ae 1376 self.check_consecutive_iv (iv)
ee6aa239 1377
36b9932a 1378 self.tag = tag
b12110dd
PG
1379 defs = ENCRYPTION_PARAMETERS.get (paramversion, None)
1380 if defs is None:
1381 raise FormatError ("header contains unknown parameter version %d; "
1382 "maybe the file was created by a more recent "
1383 "version of Deltatar" % paramversion)
50710d86 1384 enc = defs ["enc"]
6178061e
PG
1385 if enc == "aes-gcm":
1386 self.enc = Cipher \
1387 ( algorithms.AES (self.key)
36b9932a 1388 , modes.GCM (iv, tag=self.tag)
6178061e
PG
1389 , backend = default_backend ()) \
1390 . decryptor ()
1391 elif enc == "passthrough":
1392 self.enc = PassthroughCipher ()
1393 else:
b12110dd
PG
1394 raise InternalError ("encryption parameter set %d refers to unknown "
1395 "mode %r" % (paramversion, enc))
f484f2d1 1396 self.set_object_counter (self.cnt + 1)
39accaaa
PG
1397
1398
db1f3ac7 1399 def done (self, tag=None):
704ceaa5
PG
1400 """
1401 Stop decryption of the current object and finalize it with the active
1402 context. This will throw an *InvalidGCMTag* exception to indicate that
1403 the authentication tag does not match the data. If the tag is correct,
1404 the rest of the plaintext is returned.
1405 """
633b18a9 1406 data = b""
db1f3ac7
PG
1407 try:
1408 if tag is None:
f484f2d1 1409 data = self.enc.finalize ()
db1f3ac7 1410 else:
36b9932a
PG
1411 if isinstance (tag, bytes) is False:
1412 raise InvalidParameter ("done: wrong type of parameter "
1413 "tag: expected bytes, got %s"
1414 % type (tag))
f484f2d1 1415 data = self.enc.finalize_with_tag (self.tag)
b0078f26 1416 except cryptography.exceptions.InvalidTag:
f08c604b 1417 raise InvalidGCMTag ("done: tag mismatch of object %d: %s "
b0078f26 1418 "rejected by finalize ()"
f08c604b 1419 % (self.cnt, binascii.hexlify (self.tag)))
50710d86 1420 self.ctsize += len (data)
633b18a9 1421 self.stats ["out"] += len (data)
b0078f26 1422 return data
00b3cd10
PG
1423
1424
47e27926 1425 def process (self, buf):
704ceaa5
PG
1426 """
1427 Decrypt the bytes object *buf* with the active decryptor.
1428 """
36b9932a
PG
1429 if isinstance (buf, bytes) is False:
1430 raise InvalidParameter ("process: expected byte buffer, not %s"
1431 % type (buf))
47e27926
PG
1432 self.ctsize += len (buf)
1433 data = super().process (buf)
1434 self.ptsize += len (data)
1435 return data
1436
1437
00b3cd10 1438###############################################################################
770173c5
PG
1439## testing helpers
1440###############################################################################
1441
cb7a3911 1442def _patch_global (glob, vow, n=None):
770173c5
PG
1443 """
1444 Adapt upper file counter bound for testing IV logic. Completely unsafe.
1445 """
1446 assert vow == "I am fully aware that this will void my warranty."
cb7a3911
PG
1447 r = globals () [glob]
1448 if n is None:
1449 n = globals () [glob + "_DEFAULT"]
1450 globals () [glob] = n
770173c5
PG
1451 return r
1452
cb7a3911
PG
1453_testing_set_AES_GCM_IV_CNT_MAX = \
1454 partial (_patch_global, "AES_GCM_IV_CNT_MAX")
1455
1456_testing_set_PDTCRYPT_MAX_OBJ_SIZE = \
1457 partial (_patch_global, "PDTCRYPT_MAX_OBJ_SIZE")
1458
a808459e
PG
1459def open2_dump_file (fname, dir_fd, force=False):
1460 outfd = -1
1461
1462 oflags = os.O_CREAT | os.O_WRONLY
6690f5e0 1463 if force is True:
a808459e
PG
1464 oflags |= os.O_TRUNC
1465 else:
1466 oflags |= os.O_EXCL
1467
1468 try:
1469 outfd = os.open (fname, oflags,
1470 stat.S_IRUSR | stat.S_IWUSR, dir_fd=dir_fd)
1471 except FileExistsError as exn:
1472 noise ("PDT: refusing to overwrite existing file %s" % fname)
1473 noise ("")
1474 raise RuntimeError ("destination file %s already exists" % fname)
1475 if PDTCRYPT_VERBOSE is True:
1476 noise ("PDT: new output file %s (fd=%d)" % (fname, outfd))
1477
1478 return outfd
1479
770173c5 1480###############################################################################
00b3cd10
PG
1481## freestanding invocation
1482###############################################################################
1483
da82bc58
PG
1484PDTCRYPT_SUB_PROCESS = 0
1485PDTCRYPT_SUB_SCRYPT = 1
f41973a6 1486PDTCRYPT_SUB_SCAN = 2
da82bc58
PG
1487
1488PDTCRYPT_SUB = \
1489 { "process" : PDTCRYPT_SUB_PROCESS
f41973a6
PG
1490 , "scrypt" : PDTCRYPT_SUB_SCRYPT
1491 , "scan" : PDTCRYPT_SUB_SCAN }
da82bc58 1492
e3abcdf0
PG
1493PDTCRYPT_DECRYPT = 1 << 0 # decrypt archive with password
1494PDTCRYPT_SPLIT = 1 << 1 # split archive into individual objects
da82bc58 1495PDTCRYPT_HASH = 1 << 2 # output scrypt hash for file and given password
e3abcdf0 1496
a808459e
PG
1497PDTCRYPT_SPLITNAME = "pdtcrypt-object-%d.bin"
1498PDTCRYPT_RESCUENAME = "pdtcrypt-rescue-object-%0.5d.bin"
e3abcdf0 1499
70ad9458 1500PDTCRYPT_VERBOSE = False
ee6aa239 1501PDTCRYPT_STRICTIVS = False
b07633d3 1502PDTCRYPT_OVERWRITE = False
15d3eefd 1503PDTCRYPT_BLOCKSIZE = 1 << 12
70ad9458
PG
1504PDTCRYPT_SINK = 0
1505PDTCRYPT_SOURCE = 1
1506SELF = None
1507
77058bab
PG
1508PDTCRYPT_DEFAULT_VER = 1
1509PDTCRYPT_DEFAULT_PVER = 1
1510
7b3940e5
PG
1511# scrypt hashing output control
1512PDTCRYPT_SCRYPT_INTRANATOR = 0
1513PDTCRYPT_SCRYPT_PARAMETERS = 1
4f6405d6 1514PDTCRYPT_SCRYPT_DEFAULT = PDTCRYPT_SCRYPT_INTRANATOR
7b3940e5
PG
1515
1516PDTCRYPT_SCRYPT_FORMAT = \
1517 { "i2n" : PDTCRYPT_SCRYPT_INTRANATOR
1518 , "params" : PDTCRYPT_SCRYPT_PARAMETERS }
1519
4c62ddc0 1520PDTCRYPT_TT_COLUMNS = 80 # assume standard terminal
15d3eefd
PG
1521
1522class PDTDecryptionError (Exception):
1523 """Decryption failed."""
1524
e3abcdf0
PG
1525class PDTSplitError (Exception):
1526 """Decryption failed."""
1527
15d3eefd
PG
1528
1529def noise (*a, **b):
591a722f 1530 print (file=sys.stderr, *a, **b)
15d3eefd
PG
1531
1532
89e1073c
PG
1533class PassthroughDecryptor (object):
1534
1535 curhdr = None # write current header on first data write
1536
1537 def __init__ (self):
1538 if PDTCRYPT_VERBOSE is True:
1539 noise ("PDT: no encryption; data passthrough")
1540
1541 def next (self, hdr):
1542 ok, curhdr = hdr_make (hdr)
1543 if ok is False:
1544 raise PDTDecryptionError ("bad header %r" % hdr)
1545 self.curhdr = curhdr
1546
1547 def done (self):
1548 if self.curhdr is not None:
1549 return self.curhdr
1550 return b""
1551
1552 def process (self, d):
1553 if self.curhdr is not None:
1554 d = self.curhdr + d
1555 self.curhdr = None
1556 return d
1557
1558
a83fa4ed 1559def depdtcrypt (mode, secret, ins, outs):
15d3eefd 1560 """
a83fa4ed
PG
1561 Remove PDTCRYPT layer from all objects encrypted with the secret. Used on a
1562 Deltatar backup this will yield a (possibly Gzip compressed) tarball.
15d3eefd
PG
1563 """
1564 ctleft = -1 # length of ciphertext to consume
1565 ctcurrent = 0 # total ciphertext of current object
15d3eefd
PG
1566 total_obj = 0 # total number of objects read
1567 total_pt = 0 # total plaintext bytes
1568 total_ct = 0 # total ciphertext bytes
1569 total_read = 0 # total bytes read
e3abcdf0
PG
1570 outfile = None # Python file object for output
1571
89e1073c 1572 if mode & PDTCRYPT_DECRYPT: # decryptor
a83fa4ed
PG
1573 ks = secret [0]
1574 if ks == PDTCRYPT_SECRET_PW:
1575 decr = Decrypt (password=secret [1], strict_ivs=PDTCRYPT_STRICTIVS)
1576 elif ks == PDTCRYPT_SECRET_KEY:
6257d5b3 1577 key = secret [1]
a83fa4ed
PG
1578 decr = Decrypt (key=key, strict_ivs=PDTCRYPT_STRICTIVS)
1579 else:
1580 raise InternalError ("‘%d’ does not specify a valid kind of secret"
1581 % ks)
89e1073c
PG
1582 else:
1583 decr = PassthroughDecryptor ()
1584
e3abcdf0
PG
1585 def nextout (_):
1586 """Dummy for non-split mode: output file does not vary."""
1587 return outs
1588
1589 if mode & PDTCRYPT_SPLIT:
1590 def nextout (outfile):
1591 """
1592 We were passed an fd as outs for accessing the destination
1593 directory where extracted archive components are supposed
1594 to end up in.
1595 """
1596
1597 if outfile is None:
1598 if PDTCRYPT_VERBOSE is True:
1599 noise ("PDT: no output file to close at this point")
77058bab
PG
1600 else:
1601 if PDTCRYPT_VERBOSE is True:
1602 noise ("PDT: release output file %r" % outfile)
e3abcdf0
PG
1603 # cleanup happens automatically by the GC; the next
1604 # line will error out on account of an invalid fd
1605 #outfile.close ()
1606
1607 assert total_obj > 0
1608 fname = PDTCRYPT_SPLITNAME % total_obj
1609 try:
a808459e
PG
1610 outfd = open2_dump_file (fname, outs, force=PDTCRYPT_OVERWRITE)
1611 except RuntimeError as exn:
1612 raise PDTSplitError (exn)
e3abcdf0
PG
1613 return os.fdopen (outfd, "wb", closefd=True)
1614
15d3eefd 1615
47d22679 1616 def tell (s):
b09a99eb 1617 """ESPIPE is normal on non-seekable stdio stream."""
47d22679
PG
1618 try:
1619 return s.tell ()
1620 except OSError as exn:
b09a99eb 1621 if exn.errno == os.errno.ESPIPE:
47d22679
PG
1622 return -1
1623
e3abcdf0 1624 def out (pt, outfile):
15d3eefd
PG
1625 npt = len (pt)
1626 nonlocal total_pt
1627 total_pt += npt
70ad9458 1628 if PDTCRYPT_VERBOSE is True:
15d3eefd
PG
1629 noise ("PDT:\t· decrypt plaintext %d B" % (npt))
1630 try:
e3abcdf0 1631 nn = outfile.write (pt)
15d3eefd
PG
1632 except OSError as exn: # probably ENOSPC
1633 raise DecryptionError ("error (%s)" % exn)
1634 if nn != npt:
1635 raise DecryptionError ("write aborted after %d of %d B" % (nn, npt))
1636
1637 while True:
1638 if ctleft <= 0:
1639 # current object completed; in a valid archive this marks either
1640 # the start of a new header or the end of the input
1641 if ctleft == 0: # current object requires finalization
70ad9458 1642 if PDTCRYPT_VERBOSE is True:
47d22679 1643 noise ("PDT: %d finalize" % tell (ins))
5d394c0d
PG
1644 try:
1645 pt = decr.done ()
1646 except InvalidGCMTag as exn:
f08c604b
PG
1647 raise DecryptionError ("error finalizing object %d (%d B): "
1648 "%r" % (total_obj, len (pt), exn)) \
1649 from exn
e3abcdf0 1650 out (pt, outfile)
70ad9458 1651 if PDTCRYPT_VERBOSE is True:
15d3eefd
PG
1652 noise ("PDT:\t· object validated")
1653
70ad9458 1654 if PDTCRYPT_VERBOSE is True:
47d22679 1655 noise ("PDT: %d hdr" % tell (ins))
15d3eefd
PG
1656 try:
1657 hdr = hdr_read_stream (ins)
dd47d6a2 1658 total_read += PDTCRYPT_HDR_SIZE
ae3d0f2a
PG
1659 except EndOfFile as exn:
1660 total_read += exn.remainder
dd47d6a2 1661 if total_ct + total_obj * PDTCRYPT_HDR_SIZE != total_read:
15d3eefd
PG
1662 raise PDTDecryptionError ("ciphertext processed (%d B) plus "
1663 "overhead (%d × %d B) does not match "
1664 "the number of bytes read (%d )"
dd47d6a2 1665 % (total_ct, total_obj, PDTCRYPT_HDR_SIZE,
15d3eefd
PG
1666 total_read))
1667 # the single good exit
1668 return total_read, total_obj, total_ct, total_pt
1669 except InvalidHeader as exn:
1670 raise PDTDecryptionError ("invalid header at position %d in %r "
ee6aa239 1671 "(%s)" % (tell (ins), exn, ins))
70ad9458 1672 if PDTCRYPT_VERBOSE is True:
15d3eefd
PG
1673 pretty = hdr_fmt_pretty (hdr)
1674 noise (reduce (lambda a, e: (a + "\n" if a else "") + "PDT:\t· " + e,
1675 pretty.splitlines (), ""))
1676 ctcurrent = ctleft = hdr ["ctsize"]
89e1073c 1677
15d3eefd 1678 decr.next (hdr)
e3abcdf0
PG
1679
1680 total_obj += 1 # used in file counter with split mode
1681
1682 # finalization complete or skipped in case of first object in
1683 # stream; create a new output file if necessary
1684 outfile = nextout (outfile)
15d3eefd 1685
70ad9458 1686 if PDTCRYPT_VERBOSE is True:
15d3eefd 1687 noise ("PDT: %d decrypt obj no. %d, %d B"
47d22679 1688 % (tell (ins), total_obj, ctleft))
15d3eefd
PG
1689
1690 # always allocate a new buffer since python-cryptography doesn’t allow
1691 # passing a bytearray :/
1692 nexpect = min (ctleft, PDTCRYPT_BLOCKSIZE)
70ad9458 1693 if PDTCRYPT_VERBOSE is True:
15d3eefd 1694 noise ("PDT:\t· [%d] %d%% done, read block (%d B of %d B remaining)"
47d22679 1695 % (tell (ins),
15d3eefd
PG
1696 100 - ctleft * 100 / (ctcurrent > 0 and ctcurrent or 1),
1697 nexpect, ctleft))
1698 ct = ins.read (nexpect)
1699 nct = len (ct)
1700 if nct < nexpect:
47d22679 1701 off = tell (ins)
ae3d0f2a
PG
1702 raise EndOfFile (nct,
1703 "hit EOF after %d of %d B in block [%d:%d); "
15d3eefd
PG
1704 "%d B ciphertext remaining for object no %d"
1705 % (nct, nexpect, off, off + nexpect, ctleft,
1706 total_obj))
1707 ctleft -= nct
1708 total_ct += nct
1709 total_read += nct
1710
70ad9458 1711 if PDTCRYPT_VERBOSE is True:
15d3eefd
PG
1712 noise ("PDT:\t· decrypt ciphertext %d B" % (nct))
1713 pt = decr.process (ct)
e3abcdf0 1714 out (pt, outfile)
15d3eefd 1715
d6c15a52 1716
70ad9458 1717def deptdcrypt_mk_stream (kind, path):
d6c15a52 1718 """Create stream from file or stdio descriptor."""
70ad9458 1719 if kind == PDTCRYPT_SINK:
d6c15a52 1720 if path == "-":
70ad9458 1721 if PDTCRYPT_VERBOSE is True: noise ("PDT: sink: stdout")
d6c15a52
PG
1722 return sys.stdout.buffer
1723 else:
70ad9458 1724 if PDTCRYPT_VERBOSE is True: noise ("PDT: sink: file %s" % path)
d6c15a52 1725 return io.FileIO (path, "w")
70ad9458 1726 if kind == PDTCRYPT_SOURCE:
d6c15a52 1727 if path == "-":
70ad9458 1728 if PDTCRYPT_VERBOSE is True: noise ("PDT: source: stdin")
d6c15a52
PG
1729 return sys.stdin.buffer
1730 else:
70ad9458 1731 if PDTCRYPT_VERBOSE is True: noise ("PDT: source: file %s" % path)
d6c15a52
PG
1732 return io.FileIO (path, "r")
1733
1734 raise ValueError ("bogus stream “%s” / %s" % (kind, path))
1735
15d3eefd 1736
a83fa4ed 1737def mode_depdtcrypt (mode, secret, ins, outs):
da82bc58
PG
1738 try:
1739 total_read, total_obj, total_ct, total_pt = \
a83fa4ed 1740 depdtcrypt (mode, secret, ins, outs)
da82bc58
PG
1741 except DecryptionError as exn:
1742 noise ("PDT: Decryption failed:")
1743 noise ("PDT:")
1744 noise ("PDT: “%s”" % exn)
1745 noise ("PDT:")
a83fa4ed 1746 noise ("PDT: Did you specify the correct key / password?")
da82bc58
PG
1747 noise ("")
1748 return 1
1749 except PDTSplitError as exn:
1750 noise ("PDT: Split operation failed:")
1751 noise ("PDT:")
1752 noise ("PDT: “%s”" % exn)
1753 noise ("PDT:")
a83fa4ed 1754 noise ("PDT: Hint: target directory should be empty.")
da82bc58
PG
1755 noise ("")
1756 return 1
1757
1758 if PDTCRYPT_VERBOSE is True:
1759 noise ("PDT: decryption successful" )
1760 noise ("PDT: %.10d bytes read" % total_read)
1761 noise ("PDT: %.10d objects decrypted" % total_obj )
1762 noise ("PDT: %.10d bytes ciphertext" % total_ct )
1763 noise ("PDT: %.10d bytes plaintext" % total_pt )
1764 noise ("" )
1765
1766 return 0
1767
1768
7b3940e5 1769def mode_scrypt (pw, ins=None, nacl=None, fmt=PDTCRYPT_SCRYPT_INTRANATOR):
77058bab 1770 hsh = None
7b3940e5 1771 paramversion = PDTCRYPT_DEFAULT_PVER
77058bab
PG
1772 if ins is not None:
1773 hsh, nacl, version, paramversion = scrypt_hashsource (pw, ins)
1774 defs = ENCRYPTION_PARAMETERS.get(paramversion, None)
1775 else:
1776 nacl = binascii.unhexlify (nacl)
7b3940e5 1777 defs = ENCRYPTION_PARAMETERS.get(paramversion, None)
77058bab
PG
1778 version = PDTCRYPT_DEFAULT_VER
1779
1780 kdfname, params = defs ["kdf"]
1781 if hsh is None:
1782 kdf = kdf_by_version (None, defs)
1783 hsh, _void = kdf (pw, nacl)
da82bc58
PG
1784
1785 import json
7b3940e5
PG
1786
1787 if fmt == PDTCRYPT_SCRYPT_INTRANATOR:
1788 out = json.dumps ({ "salt" : base64.b64encode (nacl).decode ()
1789 , "key" : base64.b64encode (hsh) .decode ()
1790 , "paramversion" : paramversion })
1791 elif fmt == PDTCRYPT_SCRYPT_PARAMETERS:
1792 out = json.dumps ({ "salt" : binascii.hexlify (nacl).decode ()
1793 , "key" : binascii.hexlify (hsh) .decode ()
1794 , "version" : version
1795 , "scrypt_params" : { "N" : params ["N"]
1796 , "r" : params ["r"]
1797 , "p" : params ["p"]
1798 , "dkLen" : params ["dkLen"] } })
1799 else:
1800 raise RuntimeError ("bad scrypt output scheme %r" % fmt)
1801
da82bc58
PG
1802 print (out)
1803
1804
4c62ddc0
PG
1805def noise_output_candidates (cands, indent=8, cols=PDTCRYPT_TT_COLUMNS):
1806 """
1807 Print a list of offsets without garbling the terminal too much.
1808
1809 The indent is counted from column zero; if it is wide enough, the “PDT: ”
1810 marker will be prepended, considered part of the indentation.
1811 """
1812 wd = cols - 1
1813 nc = len (cands)
1814 idt = " " * indent if indent < 5 else "PDT: " + " " * (indent - 5)
1815 line = idt
1816 lpos = indent
1817 sep = ","
1818 lsep = len (sep)
1819 init = True # prevent leading separator
1820
1821 if indent >= wd:
1822 raise ValueError ("the requested indentation exceeds the line "
1823 "width by %d" % (indent - wd))
1824
1825 for n in cands:
1826 ns = "%d" % n
1827 lns = len (ns)
1828 if init is False:
1829 line += sep
1830 lpos += lsep
1831
1832 lpos += lns
1833 if lpos > wd: # line break
1834 noise (line)
1835 line = idt
1836 lpos = indent + lns
1837 elif init is True:
1838 init = False
1839 else: # space
1840 line += ' '
1841 lpos += 1
1842
1843 line += ns
1844
1845 if lpos != indent:
1846 noise (line)
1847
1848
15047fe4
PG
1849SLICE_START = 1 # ordering is important to have starts of intervals
1850SLICE_END = 0 # sorted before equal ends
1851
1852def find_overlaps (slices):
1853 """
1854 Find overlapping slices: iterate open/close points of intervals, tracking
1855 the ones open at any time.
1856 """
1857 bounds = []
1858 inside = set () # of indices into bounds
1859 ovrlp = set () # of indices into bounds
1860
1861 for i, s in enumerate (slices):
1862 bounds.append ((s [0], SLICE_START, i))
1863 bounds.append ((s [1], SLICE_END , i))
1864 bounds = sorted (bounds)
1865
1866 for val in bounds:
1867 i = val [2]
1868 if val [1] == SLICE_START:
1869 inside.add (i)
1870 else:
1871 if len (inside) > 1: # closing one that overlapped
1872 ovrlp |= inside
1873 inside.remove (i)
1874
1875 return [ slices [i] for i in ovrlp ]
1876
1877
a808459e 1878def mode_scan (secret, fname, outs=None, nacl=None):
f41973a6
PG
1879 """
1880 Dissect a binary file, looking for PDTCRYPT headers and objects.
a808459e
PG
1881
1882 If *outs* is supplied, recoverable data will be dumped into the specified
1883 directory.
f41973a6
PG
1884 """
1885 try:
a808459e 1886 ifd = os.open (fname, os.O_RDONLY)
f41973a6
PG
1887 except FileNotFoundError:
1888 noise ("PDT: failed to open %s readonly" % fname)
1889 noise ("")
1890 usage (err=True)
1891
1892 try:
1893 if PDTCRYPT_VERBOSE is True:
1894 noise ("PDT: scan for potential sync points")
a808459e 1895 cands = locate_hdr_candidates (ifd)
f41973a6
PG
1896 if len (cands) == 0:
1897 noise ("PDT: scan complete: input does not contain potential PDT "
1898 "headers; giving up.")
1899 return -1
1900 if PDTCRYPT_VERBOSE is True:
4c62ddc0
PG
1901 noise ("PDT: scan complete: found %d candidates:" % len (cands))
1902 noise_output_candidates (cands)
6c8073ab 1903 except:
a808459e 1904 os.close (ifd)
6c8073ab 1905 raise
f41973a6 1906
15047fe4 1907 junk, todo, slices = [], [], []
6c8073ab 1908 try:
a808459e 1909 nobj = 0
6c8073ab 1910 for cand in cands:
a808459e
PG
1911 nobj += 1
1912 vdt, hdr = inspect_hdr (ifd, cand)
15047fe4 1913
6c8073ab
PG
1914 if vdt == HDR_CAND_JUNK:
1915 junk.append (cand)
1916 else:
1917 off0 = cand + PDTCRYPT_HDR_SIZE
1918 if PDTCRYPT_VERBOSE is True:
a808459e 1919 noise ("PDT: obj %d: read payload @%d" % (nobj, off0))
70a33834
PG
1920 pretty = hdr_fmt_pretty (hdr)
1921 noise (reduce (lambda a, e: (a + "\n" if a else "") + "PDT:\t· " + e,
1922 pretty.splitlines (), ""))
6c8073ab 1923
a808459e
PG
1924 ofd = -1
1925 if outs is not None:
1926 ofname = PDTCRYPT_RESCUENAME % nobj
1927 ofd = open2_dump_file (ofname, outs, force=PDTCRYPT_OVERWRITE)
1928
15047fe4 1929 ctsize = hdr ["ctsize"]
a808459e 1930 try:
15047fe4
PG
1931 l = try_decrypt (ifd, off0, hdr, secret, ofd=ofd)
1932 ok = l == ctsize
1933 slices.append ((off0, off0 + l))
a808459e
PG
1934 finally:
1935 if ofd != -1:
1936 os.close (ofd)
70a33834 1937 if vdt == HDR_CAND_GOOD and ok is True:
6c8073ab 1938 noise ("PDT: %d → ✓ valid object %d–%d"
15047fe4 1939 % (cand, off0, off0 + ctsize))
70a33834 1940 elif vdt == HDR_CAND_FISHY and ok is True:
6c8073ab 1941 noise ("PDT: %d → × object %d–%d, corrupt header"
15047fe4 1942 % (cand, off0, off0 + ctsize))
70a33834 1943 elif vdt == HDR_CAND_GOOD and ok is False:
6c8073ab 1944 noise ("PDT: %d → × object %d–%d, problematic payload"
15047fe4 1945 % (cand, off0, off0 + ctsize))
70a33834 1946 elif vdt == HDR_CAND_FISHY and ok is False:
6c8073ab 1947 noise ("PDT: %d → × object %d–%d, corrupt header, problematic "
15047fe4 1948 "ciphertext" % (cand, off0, off0 + ctsize))
6c8073ab
PG
1949 else:
1950 raise Unreachable
1951 finally:
a808459e 1952 os.close (ifd)
7b3940e5 1953
70a33834
PG
1954 if len (junk) == 0:
1955 noise ("PDT: all headers ok")
1956 else:
1957 noise ("PDT: %d candidates not parseable as headers:" % len (junk))
1958 noise_output_candidates (junk)
1959
15047fe4
PG
1960 overlap = find_overlaps (slices)
1961 if len (overlap) > 0:
1962 noise ("PDT: %d objects overlapping others" % len (overlap))
1963 for slice in overlap:
1964 noise ("PDT: × %d→%d" % (slice [0], slice [1]))
1965
70ad9458
PG
1966def usage (err=False):
1967 out = print
1968 if err is True:
1969 out = noise
5afcb45d 1970 indent = ' ' * len (SELF)
da82bc58 1971 out ("usage: %s SUBCOMMAND { --help" % SELF)
5afcb45d 1972 out (" %s | [ -v ] { -p PASSWORD | -k KEY }" % indent)
77058bab
PG
1973 out (" %s [ { -i | --in } { - | SOURCE } ]" % indent)
1974 out (" %s [ { -n | --nacl } { SALT } ]" % indent)
1975 out (" %s [ { -o | --out } { - | DESTINATION } ]" % indent)
1976 out (" %s [ -D | --no-decrypt ] [ -S | --split ]" % indent)
7b3940e5 1977 out (" %s [ -f | --format ]" % indent)
70ad9458
PG
1978 out ("")
1979 out ("\twhere")
da82bc58
PG
1980 out ("\t\tSUBCOMMAND main mode: { process | scrypt }")
1981 out ("\t\t where:")
1982 out ("\t\t process: extract objects from PDT archive")
1983 out ("\t\t scrypt: calculate hash from password and first object")
a83fa4ed
PG
1984 out ("\t\t-p PASSWORD password to derive the encryption key from")
1985 out ("\t\t-k KEY encryption key as 16 bytes in hexadecimal notation")
e3abcdf0 1986 out ("\t\t-s enforce strict handling of initialization vectors")
70ad9458
PG
1987 out ("\t\t-i SOURCE file name to read from")
1988 out ("\t\t-o DESTINATION file to write output to")
77058bab 1989 out ("\t\t-n SALT provide salt for scrypt mode in hex encoding")
70ad9458 1990 out ("\t\t-v print extra info")
e3abcdf0
PG
1991 out ("\t\t-S split into files at object boundaries; this")
1992 out ("\t\t requires DESTINATION to refer to directory")
1993 out ("\t\t-D PDT header and ciphertext passthrough")
7b3940e5 1994 out ("\t\t-f format of SCRYPT hash output (“default” or “parameters”)")
70ad9458
PG
1995 out ("")
1996 out ("\tinstead of filenames, “-” may used to specify stdin / stdout")
1997 out ("")
1998 sys.exit ((err is True) and 42 or 0)
1999
2000
a83fa4ed
PG
2001def bail (msg):
2002 noise (msg)
2003 noise ("")
2004 usage (err=True)
2005 raise Unreachable
2006
2007
70ad9458 2008def parse_argv (argv):
6690f5e0 2009 global PDTCRYPT_OVERWRITE
70ad9458 2010 global SELF
7b3940e5
PG
2011 mode = PDTCRYPT_DECRYPT
2012 secret = None
2013 insspec = None
2014 outsspec = None
a808459e 2015 outs = None
7b3940e5 2016 nacl = None
4f6405d6 2017 scrypt_format = PDTCRYPT_SCRYPT_DEFAULT
70ad9458
PG
2018
2019 argvi = iter (argv)
2020 SELF = os.path.basename (next (argvi))
2021
da82bc58
PG
2022 try:
2023 rawsubcmd = next (argvi)
2024 subcommand = PDTCRYPT_SUB [rawsubcmd]
2025 except StopIteration:
a83fa4ed 2026 bail ("ERROR: subcommand required")
da82bc58 2027 except KeyError:
a83fa4ed 2028 bail ("ERROR: invalid subcommand “%s” specified" % rawsubcmd)
da82bc58 2029
59d74e2b
PG
2030 def checked_arg ():
2031 nonlocal argvi
2032 try:
2033 return next (argvi)
2034 except StopIteration:
2035 bail ("ERROR: argument list incomplete")
2036
addcec42 2037 def checked_secret (s):
a83fa4ed
PG
2038 nonlocal secret
2039 if secret is None:
addcec42 2040 secret = s
da82bc58 2041 else:
a83fa4ed 2042 bail ("ERROR: encountered “%s” but secret already given" % arg)
da82bc58 2043
70ad9458
PG
2044 for arg in argvi:
2045 if arg in [ "-h", "--help" ]:
2046 usage ()
2047 raise Unreachable
2048 elif arg in [ "-v", "--verbose", "--wtf" ]:
2049 global PDTCRYPT_VERBOSE
2050 PDTCRYPT_VERBOSE = True
2051 elif arg in [ "-i", "--in", "--source" ]:
59d74e2b 2052 insspec = checked_arg ()
70ad9458 2053 if PDTCRYPT_VERBOSE is True: noise ("PDT: decrypt from %s" % insspec)
a83fa4ed 2054 elif arg in [ "-p", "--password" ]:
59d74e2b 2055 arg = checked_arg ()
addcec42 2056 checked_secret (make_secret (password=arg))
a83fa4ed 2057 if PDTCRYPT_VERBOSE is True: noise ("PDT: decrypting with password")
70ad9458 2058 else:
da82bc58
PG
2059 if subcommand == PDTCRYPT_SUB_PROCESS:
2060 if arg in [ "-s", "--strict-ivs" ]:
2061 global PDTCRYPT_STRICTIVS
2062 PDTCRYPT_STRICTIVS = True
77058bab
PG
2063 elif arg in [ "-o", "--out", "--dest", "--sink" ]:
2064 outsspec = checked_arg ()
2065 if PDTCRYPT_VERBOSE is True: noise ("PDT: decrypt to %s" % outsspec)
da82bc58 2066 elif arg in [ "-f", "--force" ]:
da82bc58
PG
2067 PDTCRYPT_OVERWRITE = True
2068 if PDTCRYPT_VERBOSE is True: noise ("PDT: overwrite existing files")
2069 elif arg in [ "-S", "--split" ]:
2070 mode |= PDTCRYPT_SPLIT
2071 if PDTCRYPT_VERBOSE is True: noise ("PDT: split files")
2072 elif arg in [ "-D", "--no-decrypt" ]:
2073 mode &= ~PDTCRYPT_DECRYPT
2074 if PDTCRYPT_VERBOSE is True: noise ("PDT: not decrypting")
a83fa4ed 2075 elif arg in [ "-k", "--key" ]:
59d74e2b 2076 arg = checked_arg ()
addcec42 2077 checked_secret (make_secret (key=arg))
a83fa4ed 2078 if PDTCRYPT_VERBOSE is True: noise ("PDT: decrypting with key")
da82bc58 2079 else:
a83fa4ed 2080 bail ("ERROR: unexpected positional argument “%s”" % arg)
da82bc58 2081 elif subcommand == PDTCRYPT_SUB_SCRYPT:
77058bab
PG
2082 if arg in [ "-n", "--nacl", "--salt" ]:
2083 nacl = checked_arg ()
2084 if PDTCRYPT_VERBOSE is True: noise ("PDT: salt key with %s" % nacl)
7b3940e5
PG
2085 elif arg in [ "-f", "--format" ]:
2086 arg = checked_arg ()
2087 try:
2088 scrypt_format = PDTCRYPT_SCRYPT_FORMAT [arg]
2089 except KeyError:
2090 bail ("ERROR: invalid scrypt output format %s" % arg)
2091 if PDTCRYPT_VERBOSE is True:
2092 noise ("PDT: scrypt output format “%s”" % scrypt_format)
77058bab
PG
2093 else:
2094 bail ("ERROR: unexpected positional argument “%s”" % arg)
f41973a6 2095 elif subcommand == PDTCRYPT_SUB_SCAN:
a808459e
PG
2096 if arg in [ "-o", "--out", "--dest", "--sink" ]:
2097 outsspec = checked_arg ()
2098 if PDTCRYPT_VERBOSE is True: noise ("PDT: decrypt to %s" % outsspec)
2099 elif arg in [ "-f", "--force" ]:
a808459e
PG
2100 PDTCRYPT_OVERWRITE = True
2101 if PDTCRYPT_VERBOSE is True: noise ("PDT: overwrite existing files")
2102 else:
2103 bail ("ERROR: unexpected positional argument “%s”" % arg)
70ad9458 2104
a83fa4ed 2105 if secret is None:
ecb9676d 2106 if PDTCRYPT_VERBOSE is True:
a83fa4ed 2107 noise ("ERROR: no password or key specified, trying $PDTCRYPT_PASSWORD")
ecb9676d
PG
2108 epw = os.getenv ("PDTCRYPT_PASSWORD")
2109 if epw is not None:
addcec42 2110 checked_secret (make_secret (password=epw.strip ()))
a83fa4ed
PG
2111
2112 if secret is None:
2113 if PDTCRYPT_VERBOSE is True:
2114 noise ("ERROR: no password or key specified, trying $PDTCRYPT_KEY")
2115 ek = os.getenv ("PDTCRYPT_KEY")
2116 if ek is not None:
addcec42 2117 checked_secret (make_secret (key=ek.strip ()))
ecb9676d 2118
a83fa4ed 2119 if secret is None:
da82bc58 2120 if subcommand == PDTCRYPT_SUB_SCRYPT:
a83fa4ed 2121 bail ("ERROR: scrypt hash mode requested but no password given")
da82bc58 2122 elif mode & PDTCRYPT_DECRYPT:
6257d5b3 2123 bail ("ERROR: decryption requested but no password given")
a83fa4ed 2124
a808459e
PG
2125 if mode & PDTCRYPT_SPLIT and outsspec is None:
2126 bail ("ERROR: split mode is incompatible with stdout sink "
2127 "(the default)")
2128
2129 if subcommand == PDTCRYPT_SUB_SCAN and outsspec is None:
2130 pass # no output by default in scan mode
2131 elif mode & PDTCRYPT_SPLIT or subcommand == PDTCRYPT_SUB_SCAN:
2132 # destination must be directory
2133 if outsspec == "-":
2134 bail ("ERROR: mode is incompatible with stdout sink")
2135 try:
2136 try:
2137 os.makedirs (outsspec, 0o700)
2138 except FileExistsError:
2139 # if it’s a directory with appropriate perms, everything is
2140 # good; otherwise, below invocation of open(2) will fail
2141 pass
2142 outs = os.open (outsspec, os.O_DIRECTORY, 0o600)
2143 except FileNotFoundError as exn:
2144 bail ("ERROR: cannot create target directory “%s”" % outsspec)
2145 except NotADirectoryError as exn:
2146 bail ("ERROR: target path “%s” is not a directory" % outsspec)
2147 else:
2148 outs = deptdcrypt_mk_stream (PDTCRYPT_SINK, outsspec or "-")
2149
f41973a6
PG
2150 if subcommand == PDTCRYPT_SUB_SCAN:
2151 if insspec is None:
2152 bail ("ERROR: please supply an input file for scanning")
2153 if insspec == '-':
2154 bail ("ERROR: input must be seekable; please specify a file")
a808459e 2155 return True, partial (mode_scan, secret, insspec, outs, nacl=nacl)
f41973a6 2156
77058bab
PG
2157 if subcommand == PDTCRYPT_SUB_SCRYPT:
2158 if secret [0] == PDTCRYPT_SECRET_KEY:
2159 bail ("ERROR: scrypt mode requires a password")
2160 if insspec is not None and nacl is not None \
2161 or insspec is None and nacl is None :
2162 bail ("ERROR: please supply either an input file or "
2163 "the salt")
70ad9458
PG
2164
2165 # default to stdout
77058bab
PG
2166 ins = None
2167 if insspec is not None or subcommand != PDTCRYPT_SUB_SCRYPT:
2168 ins = deptdcrypt_mk_stream (PDTCRYPT_SOURCE, insspec or "-")
da82bc58
PG
2169
2170 if subcommand == PDTCRYPT_SUB_SCRYPT:
7b3940e5
PG
2171 return True, partial (mode_scrypt, secret [1].encode (), ins, nacl,
2172 fmt=scrypt_format)
da82bc58 2173
a83fa4ed 2174 return True, partial (mode_depdtcrypt, mode, secret, ins, outs)
15d3eefd
PG
2175
2176
00b3cd10 2177def main (argv):
da82bc58 2178 ok, runner = parse_argv (argv)
f08c604b 2179
da82bc58 2180 if ok is True: return runner ()
15d3eefd 2181
da82bc58 2182 return 1
f08c604b 2183
00b3cd10
PG
2184
2185if __name__ == "__main__":
2186 sys.exit (main (sys.argv))
2187