Clarify two functions are meant to be used by desaster recovery
[python-delta-tar] / deltatar / crypto.py
CommitLineData
00b3cd10
PG
1#!/usr/bin/env python3
2
3"""
83f2d71e 4Intra2net 2017
00b3cd10
PG
5
6===============================================================================
704ceaa5 7 crypto -- Encryption Layer for the Deltatar Backup
00b3cd10
PG
8===============================================================================
9
10Crypto stack:
11
12 - AES-GCM for the symmetric encryption;
13 - Scrypt as KDF.
14
15References:
16
17 - NIST Recommendation for Block Cipher Modes of Operation: Galois/Counter
18 Mode (GCM) and GMAC
19 http://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-38d.pdf
20
21 - AES-GCM v1:
22 https://cryptome.org/2014/01/aes-gcm-v1.pdf
23
24 - Authentication weaknesses in GCM
25 http://csrc.nist.gov/groups/ST/toolkit/BCM/documents/comments/CWC-GCM/Ferguson2.pdf
26
6d08915c
PG
27Errors
28-------------------------------------------------------------------------------
29
30Errors fall into roughly three categories:
31
704ceaa5 32 - Cryptographical errors or invalid data.
6d08915c
PG
33
34 - ``InvalidGCMTag`` (decryption failed on account of an invalid GCM
35 tag),
36 - ``InvalidIVFixedPart`` (IV fixed part of object not found in list),
f6cd676f 37 - ``DuplicateIV`` (the IV of an encrypted object already occurred),
704ceaa5
PG
38 - ``DecryptionError`` (used in CLI decryption for presenting error
39 conditions to the user).
6d08915c
PG
40
41 - Incorrect usage of the library.
42
43 - ``InvalidParameter`` (non-conforming user supplied parameter),
44 - ``InvalidHeader`` (data passed for reading not parsable into header),
45 - ``FormatError`` (cannot handle header or parameter version),
46 - ``RuntimeError``.
47
48 - Bad internal state. If one of these is encountered it means that a state
49 was reached that shouldn’t occur during normal processing.
50
51 - ``InternalError``,
52 - ``Unreachable``.
53
54Also, ``EndOfFile`` is used as a sentinel to communicate that a stream supplied
55for reading is exhausted.
56
f6cd676f
PG
57Initialization Vectors
58-------------------------------------------------------------------------------
59
817cfffa 60Initialization vectors are checked for reuse during the lifetime of a decryptor.
704ceaa5
PG
61The fixed counters for metadata files cannot be reused and attempts to do so
62will cause a DuplicateIV error. This means the length of objects encrypted with
63a metadata counter is capped at 63 GB.
64
65For ordinary, non-metadata payload, there is an optional mode with strict IV
66checking that causes a crypto context to fail if an IV encountered or created
67was already used for decrypting or encrypting, respectively, an earlier object.
68Note that this mode can trigger false positives when decrypting non-linearly,
69e. g. when traversing the same object multiple times. Since the crypto context
70has no notion of a position in a PDT encrypted archive, this condition must be
71sorted out downstream.
72
73Command Line Utility
74-------------------------------------------------------------------------------
75
76``crypto.py`` may be invoked as a script for decrypting, validating, and
77splitting PDT encrypted files. Consult the usage message for details.
78
79Usage examples:
80
81Decrypt from stdin using the password ‘foo’: ::
82
83 $ crypto.py process foo -i - -o - <some-file.tar.gz.pdtcrypt >some-file.tar.gz
84
85Output verbose information about the encrypted objects in the archive: ::
86
87 $ crypto.py process foo -v -i some-file.tar.gz.pdtcrypt -o /dev/null
88 PDT: decrypt from some-file.tar.gz.pdtcrypt
89 PDT: decrypt to /dev/null
90 PDT: source: file some-file.tar.gz.pdtcrypt
91 PDT: sink: file /dev/null
92 PDT: 0 hdr
93 PDT: · version = 1 : 0100
94 PDT: · paramversion = 1 : 0100
95 PDT: · nacl : d270 b031 00d1 87e2 c946 610d 7b7f 7e5f
96 PDT: · iv : 02ee 3dd7 a963 1eb1 0100 0000
97 PDT: · ctsize = 591 : 4f02 0000 0000 0000
98 PDT: · tag : 5b2d 6d8b 8f82 4842 12fd 0b10 b6e3 369b
99 PDT: 64 decrypt obj no. 1, 591 B
100 PDT: · [64] 0% done, read block (591 B of 591 B remaining)
101 PDT: · decrypt ciphertext 591 B
102 PDT: · decrypt plaintext 591 B
103 PDT: 655 finalize
104
105
106Also, the mode *scrypt* allows deriving encryption keys. To calculate the
107encryption key from the password ‘foo’ and the salt of the first object in a
108PDT encrypted file: ::
109
110 $ crypto.py scrypt foo -i some-file.pdtcrypt
4f6405d6 111 {"paramversion": 1, "salt": "Cqzbk48e3peEjzWto8D0yA==", "key": "JH9EkMwaM4x9F5aim5gK/Q=="}
704ceaa5
PG
112
113The computed 16 byte key is given in hexadecimal notation in the value to
114``hash`` and can be fed into Python’s ``binascii.unhexlify()`` to obtain the
115corresponding binary representation.
116
117Note that in Scrypt hashing mode, no data integrity checks are being performed.
118If the wrong password is given, a wrong key will be derived. Whether the password
119was indeed correct can only be determined by decrypting. Note that since PDT
120archives essentially consist of a stream of independent objects, the salt and
121other parameters may change. Thus a key derived using above method from the
122first object doesn’t necessarily apply to any of the subsequent objects.
f6cd676f 123
00b3cd10
PG
124"""
125
7b3940e5 126import base64
00b3cd10 127import binascii
50710d86 128import bisect
00b3cd10
PG
129import ctypes
130import io
c46c8670 131from functools import reduce, partial
f41973a6 132import mmap
00b3cd10
PG
133import os
134import struct
a808459e 135import stat
00b3cd10
PG
136import sys
137import time
da82bc58 138import types
2a307f41 139import errno
00b3cd10
PG
140try:
141 import enum34
142except ImportError as exn:
143 pass
144
6257d5b3 145if __name__ == "__main__": ## Work around the import mechanism lest Python’s
00b3cd10
PG
146 pwd = os.getcwd() ## preference for local imports causes a cyclical
147 ## import (crypto → pylibscrypt → […] → ./tarfile → crypto).
148 sys.path = [ p for p in sys.path if p.find ("deltatar") < 0 ]
149
150import pylibscrypt
151from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes
152from cryptography.hazmat.backends import default_backend
15d3eefd 153import cryptography
00b3cd10
PG
154
155
a64085a8 156__all__ = [ "hdr_make", "hdr_read", "hdr_fmt", "hdr_fmt_pretty"
b360b772 157 , "scrypt_hashfile"
3031b7ae
PG
158 , "PDTCRYPT_HDR_SIZE", "AES_GCM_IV_CNT_DATA"
159 , "AES_GCM_IV_CNT_INFOFILE", "AES_GCM_IV_CNT_INDEX"
2d6fd8c8 160 ]
00b3cd10 161
a393d9cb
PG
162
163###############################################################################
15d3eefd
PG
164## exceptions
165###############################################################################
166
167class EndOfFile (Exception):
168 """Reached EOF."""
ae3d0f2a
PG
169 remainder = 0
170 msg = 0
8a8ac469 171 def __init__ (self, n=None, msg=None):
5d394c0d
PG
172 if n is not None:
173 self.remainder = n
174 self.msg = msg
15d3eefd 175
b0078f26 176
b12110dd
PG
177class InvalidParameter (Exception):
178 """Inputs not valid for PDT encryption."""
179 pass
180
b0078f26 181
15d3eefd
PG
182class InvalidHeader (Exception):
183 """Header not valid."""
184 pass
185
b0078f26
PG
186
187class InvalidGCMTag (Exception):
188 """
189 The GCM tag calculated during decryption differs from that in the object
190 header.
191 """
192 pass
193
194
26b42ad4 195class InvalidIVFixedPart (Exception):
89ec6e2f
PG
196 """
197 IV fixed part not in supplied list: either the backup is corrupt or the
198 current object does not belong to it.
199 """
26b42ad4
PG
200 pass
201
b0078f26 202
be124bca 203class IVFixedPartError (Exception):
89ec6e2f
PG
204 """
205 Error creating a unique IV fixed part: repeated calls to system RNG yielded
206 the same sequence of bytes as the last IV used.
207 """
be124bca
PG
208 pass
209
210
fac2cfe1 211class InvalidFileCounter (Exception):
89ec6e2f
PG
212 """
213 When encrypting, an attempted reuse of a dedicated counter (info file,
214 index file) was caught.
215 """
fac2cfe1
PG
216 pass
217
218
ee6aa239 219class DuplicateIV (Exception):
89ec6e2f
PG
220 """
221 During encryption, the current IV fixed part is identical to an already
222 existing IV (same prefix and file counter). This indicates tampering or
223 programmer error and cannot be recovered from.
224 """
ee6aa239
PG
225 pass
226
227
228class NonConsecutiveIV (Exception):
89ec6e2f
PG
229 """
230 IVs not numbered consecutively. This is a hard error with strict IV
231 checking. Precludes random access to the encrypted objects.
232 """
ee6aa239
PG
233 pass
234
235
b12110dd
PG
236class FormatError (Exception):
237 """Unusable parameters in header."""
238 pass
239
b0078f26 240
15d3eefd 241class DecryptionError (Exception):
89ec6e2f 242 """Error during decryption with ``crypto.py`` on the command line."""
15d3eefd
PG
243 pass
244
b0078f26 245
70ad9458 246class Unreachable (Exception):
89ec6e2f
PG
247 """
248 Makeshift __builtin_unreachable(); always a programmer error if
249 thrown.
250 """
70ad9458
PG
251 pass
252
b0078f26 253
b12110dd
PG
254class InternalError (Exception):
255 """Errors not ascribable to bad user inputs or cryptography."""
256 pass
257
15d3eefd
PG
258
259###############################################################################
a393d9cb
PG
260## crypto layer version
261###############################################################################
262
263ENCRYPTION_PARAMETERS = \
c46c8670 264 { 0: \
dd23cbc9
PG
265 { "kdf": ("dummy", 16)
266 , "enc": "passthrough" }
c46c8670 267 , 1: \
dd23cbc9
PG
268 { "kdf": ( "scrypt"
269 , { "dkLen" : 16
270 , "N" : 1 << 16
271 , "r" : 8
272 , "p" : 1
273 , "NaCl_LEN" : 16 })
274 , "enc": "aes-gcm" } }
a393d9cb 275
00b3cd10
PG
276###############################################################################
277## constants
278###############################################################################
279
dd47d6a2 280PDTCRYPT_HDR_MAGIC = b"PDTCRYPT"
00b3cd10 281
dd47d6a2
PG
282PDTCRYPT_HDR_SIZE_MAGIC = 8 # 8
283PDTCRYPT_HDR_SIZE_VERSION = 2 # 10
284PDTCRYPT_HDR_SIZE_PARAMVERSION = 2 # 12
285PDTCRYPT_HDR_SIZE_NACL = 16 # 28
286PDTCRYPT_HDR_SIZE_IV = 12 # 40
287PDTCRYPT_HDR_SIZE_CTSIZE = 8 # 48
288PDTCRYPT_HDR_SIZE_TAG = 16 # 64 GCM auth tag
00b3cd10 289
dd47d6a2
PG
290PDTCRYPT_HDR_SIZE = PDTCRYPT_HDR_SIZE_MAGIC + PDTCRYPT_HDR_SIZE_VERSION \
291 + PDTCRYPT_HDR_SIZE_PARAMVERSION + PDTCRYPT_HDR_SIZE_NACL \
292 + PDTCRYPT_HDR_SIZE_IV + PDTCRYPT_HDR_SIZE_CTSIZE \
293 + PDTCRYPT_HDR_SIZE_TAG # = 64
00b3cd10
PG
294
295# precalculate offsets since Python can’t do constant folding over names
dd47d6a2
PG
296HDR_OFF_VERSION = PDTCRYPT_HDR_SIZE_MAGIC
297HDR_OFF_PARAMVERSION = HDR_OFF_VERSION + PDTCRYPT_HDR_SIZE_VERSION
298HDR_OFF_NACL = HDR_OFF_PARAMVERSION + PDTCRYPT_HDR_SIZE_PARAMVERSION
299HDR_OFF_IV = HDR_OFF_NACL + PDTCRYPT_HDR_SIZE_NACL
300HDR_OFF_CTSIZE = HDR_OFF_IV + PDTCRYPT_HDR_SIZE_IV
301HDR_OFF_TAG = HDR_OFF_CTSIZE + PDTCRYPT_HDR_SIZE_CTSIZE
00b3cd10
PG
302
303FMT_UINT16_LE = "<H"
304FMT_UINT64_LE = "<Q"
50710d86 305FMT_I2N_IV = "<8sL" # 8 random bytes ‖ 32 bit counter
83f2d71e
PG
306FMT_I2N_HDR = ("<" # host byte order
307 "8s" # magic
308 "H" # version
309 "H" # paramversion
310 "16s" # sodium chloride
311 "12s" # iv
3b53fb98
PG
312 "Q" # size
313 "16s") # GCM tag
00b3cd10
PG
314
315# aes+gcm
addcec42
PG
316AES_KEY_SIZE = 16 # b"0123456789abcdef"
317AES_KEY_SIZE_B64 = 24 # b'MDEyMzQ1Njc4OWFiY2RlZg=='
cb7a3911
PG
318AES_GCM_MAX_SIZE = (1 << 36) - (1 << 5) # 2^39 - 2^8 b ≅ 64 GB
319PDTCRYPT_MAX_OBJ_SIZE_DEFAULT = 63 * (1 << 30) # 63 GB
320PDTCRYPT_MAX_OBJ_SIZE = PDTCRYPT_MAX_OBJ_SIZE_DEFAULT
00b3cd10 321
3031b7ae 322# index and info files are written on-the fly while encrypting so their
817cfffa 323# counters must be available in advance
cb7a3911
PG
324AES_GCM_IV_CNT_INFOFILE = 1 # constant
325AES_GCM_IV_CNT_INDEX = AES_GCM_IV_CNT_INFOFILE + 1
326AES_GCM_IV_CNT_DATA = AES_GCM_IV_CNT_INDEX + 1 # also for multivolume
327AES_GCM_IV_CNT_MAX_DEFAULT = 0xffFFffFF
328AES_GCM_IV_CNT_MAX = AES_GCM_IV_CNT_MAX_DEFAULT
2d6fd8c8 329
be124bca
PG
330# IV structure and generation
331PDTCRYPT_IV_GEN_MAX_RETRIES = 10 # ×
332PDTCRYPT_IV_FIXEDPART_SIZE = 8 # B
333PDTCRYPT_IV_COUNTER_SIZE = 4 # B
39accaaa 334
addcec42
PG
335# secret type: PW of string | KEY of char [16]
336PDTCRYPT_SECRET_PW = 0
337PDTCRYPT_SECRET_KEY = 1
338
00b3cd10 339###############################################################################
39accaaa 340## header, trailer
00b3cd10
PG
341###############################################################################
342#
343# Interface:
344#
345# struct hdrinfo
346# { version : u16
347# , paramversion : u16
348# , nacl : [u8; 16]
349# , iv : [u8; 12]
704ceaa5
PG
350# , ctsize : usize
351# , tag : [u8; 16] }
83f2d71e 352#
00b3cd10 353# fn hdr_read (f : handle) -> hdrinfo;
c2d1c3ec 354# fn hdr_make (f : handle, h : hdrinfo) -> IOResult<usize>;
00b3cd10
PG
355# fn hdr_fmt (h : hdrinfo) -> String;
356#
357
83f2d71e 358def hdr_read (data):
704ceaa5
PG
359 """
360 Read bytes as header structure.
361
362 If the input could not be interpreted as a header, fail with
363 ``InvalidHeader``.
364 """
83f2d71e 365
00b3cd10 366 try:
3b53fb98 367 mag, version, paramversion, nacl, iv, ctsize, tag = \
83f2d71e
PG
368 struct.unpack (FMT_I2N_HDR, data)
369 except Exception as exn:
15d3eefd
PG
370 raise InvalidHeader ("error unpacking header from [%r]: %s"
371 % (binascii.hexlify (data), str (exn)))
00b3cd10 372
dd47d6a2 373 if mag != PDTCRYPT_HDR_MAGIC:
15d3eefd 374 raise InvalidHeader ("bad magic in header: expected [%s], got [%s]"
dd47d6a2 375 % (PDTCRYPT_HDR_MAGIC, mag))
00b3cd10 376
15d3eefd 377 return \
00b3cd10
PG
378 { "version" : version
379 , "paramversion" : paramversion
380 , "nacl" : nacl
381 , "iv" : iv
382 , "ctsize" : ctsize
3b53fb98 383 , "tag" : tag
00b3cd10
PG
384 }
385
386
39accaaa 387def hdr_read_stream (instr):
704ceaa5
PG
388 """
389 Read header from stream at the current position.
390
391 Fail with ``InvalidHeader`` if insufficient bytes were read from the
392 stream, or if the content could not be interpreted as a header.
393 """
dd47d6a2 394 data = instr.read(PDTCRYPT_HDR_SIZE)
ae3d0f2a 395 ldata = len (data)
8a8ac469
PG
396 if ldata == 0:
397 raise EndOfFile
398 elif ldata != PDTCRYPT_HDR_SIZE:
399 raise InvalidHeader ("hdr_read_stream: expected %d B, received %d B"
400 % (PDTCRYPT_HDR_SIZE, ldata))
47e27926 401 return hdr_read (data)
39accaaa
PG
402
403
3b53fb98 404def hdr_from_params (version, paramversion, nacl, iv, ctsize, tag):
704ceaa5
PG
405 """
406 Assemble the necessary values into a PDTCRYPT header.
407
408 :type version: int to fit uint16_t
409 :type paramversion: int to fit uint16_t
410 :type nacl: bytes to fit uint8_t[16]
411 :type iv: bytes to fit uint8_t[12]
412 :type size: int to fit uint64_t
413 :type tag: bytes to fit uint8_t[16]
414 """
dd47d6a2 415 buf = bytearray (PDTCRYPT_HDR_SIZE)
83f2d71e 416 bufv = memoryview (buf)
00b3cd10 417
00b3cd10 418 try:
83f2d71e 419 struct.pack_into (FMT_I2N_HDR, bufv, 0,
dd47d6a2 420 PDTCRYPT_HDR_MAGIC,
3b53fb98 421 version, paramversion, nacl, iv, ctsize, tag)
83f2d71e 422 except Exception as exn:
a83fa4ed 423 return False, "error assembling header: %s" % str (exn)
00b3cd10 424
83f2d71e 425 return True, bytes (buf)
00b3cd10 426
00b3cd10 427
8a990744
PG
428def hdr_make_dummy (s):
429 """
430 Create a header sized block of bytes initialized to a value derived from a
431 string. Used to verify we’ve jumped back correctly to the actual position
432 of the object header.
433 """
434 c = reduce (lambda a, c: a + ord(c), s, 0) % 0xFF
dd47d6a2 435 return bytes (bytearray (struct.pack ("B", c)) * PDTCRYPT_HDR_SIZE)
8a990744
PG
436
437
a393d9cb 438def hdr_make (hdr):
704ceaa5
PG
439 """
440 Assemble a header from the given header structure.
441 """
a393d9cb
PG
442 return hdr_from_params (version=hdr.get("version"),
443 paramversion=hdr.get("paramversion"),
444 nacl=hdr.get("nacl"), iv=hdr.get("iv"),
3b53fb98 445 ctsize=hdr.get("ctsize"), tag=hdr.get("tag"))
a393d9cb
PG
446
447
83f2d71e 448HDR_FMT = "I2n_header { version: %d, paramversion: %d, nacl: %s[%d]," \
89131745 449 " iv: %s[%d], ctsize: %d, tag: %s[%d] }"
00b3cd10 450
83f2d71e 451def hdr_fmt (h):
704ceaa5 452 """Format a header structure into readable output."""
83f2d71e
PG
453 return HDR_FMT % (h["version"], h["paramversion"],
454 binascii.hexlify (h["nacl"]), len(h["nacl"]),
455 binascii.hexlify (h["iv"]), len(h["iv"]),
db1f3ac7
PG
456 h["ctsize"],
457 binascii.hexlify (h["tag"]), len(h["tag"]))
00b3cd10 458
00b3cd10 459
83f2d71e 460def hex_spaced_of_bytes (b):
704ceaa5 461 """Format bytes object, hexdump style."""
83f2d71e
PG
462 return " ".join ([ "%.2x%.2x" % (c1, c2)
463 for c1, c2 in zip (b[0::2], b[1::2]) ]) \
464 + (len (b) | 1 == len (b) and " %.2x" % b[-1] or "") # odd lengths
00b3cd10 465
591a722f 466
3031b7ae
PG
467def hdr_iv_counter (h):
468 """Extract the variable part of the IV of the given header."""
469 _fixed, cnt = struct.unpack (FMT_I2N_IV, h ["iv"])
470 return cnt
471
472
473def hdr_iv_fixed (h):
474 """Extract the fixed part of the IV of the given header."""
475 fixed, _cnt = struct.unpack (FMT_I2N_IV, h ["iv"])
476 return fixed
477
478
83f2d71e 479hdr_dump = hex_spaced_of_bytes
00b3cd10 480
00b3cd10 481
15d3eefd
PG
482HDR_FMT_PRETTY = \
483"""version = %-4d : %s
484paramversion = %-4d : %s
485nacl : %s
486iv : %s
487ctsize = %-20d : %s
488tag : %s
83f2d71e 489"""
00b3cd10 490
83f2d71e 491def hdr_fmt_pretty (h):
704ceaa5
PG
492 """
493 Format header structure into multi-line representation of its contents and
494 their raw representation. (Omit the implicit “PDTCRYPT” magic bytes that
495 precede every header.)
496 """
83f2d71e
PG
497 return HDR_FMT_PRETTY \
498 % (h["version"],
499 hex_spaced_of_bytes (struct.pack (FMT_UINT16_LE, h["version"])),
500 h["paramversion"],
501 hex_spaced_of_bytes (struct.pack (FMT_UINT16_LE, h["paramversion"])),
502 hex_spaced_of_bytes (h["nacl"]),
503 hex_spaced_of_bytes (h["iv"]),
504 h["ctsize"],
15d3eefd
PG
505 hex_spaced_of_bytes (struct.pack (FMT_UINT64_LE, h["ctsize"])),
506 hex_spaced_of_bytes (h["tag"]))
00b3cd10 507
f6cd676f
PG
508IV_FMT = "((f %s) (c %d))"
509
510def iv_fmt (iv):
704ceaa5 511 """Format the two components of an IV in a readable fashion."""
f6cd676f
PG
512 fixed, cnt = struct.unpack (FMT_I2N_IV, iv)
513 return IV_FMT % (binascii.hexlify (fixed), cnt)
514
00b3cd10 515
00b3cd10 516###############################################################################
f41973a6
PG
517## restoration
518###############################################################################
519
520class Location (object):
521 n = 0
522 offset = 0
523
524def restore_loc_fmt (loc):
525 return "%d off:%d" \
526 % (loc.n, loc.offset)
527
528def locate_hdr_candidates (fd):
529 """
530 Walk over instances of the magic string in the payload, collecting their
531 positions. If the offset of the first found instance is not zero, the file
d52e2737 532 begins with leading garbage. Used by desaster recovery.
f41973a6
PG
533
534 :return: The list of offsets in the file.
535 """
536 cands = []
537
538 mm = mmap.mmap(fd, 0, mmap.MAP_SHARED, mmap.PROT_READ)
539 pos = 0
540 while True:
541 pos = mm.find (PDTCRYPT_HDR_MAGIC, pos)
542 if pos == -1:
543 break
544 cands.append (pos)
545 pos += 1
546
547 return cands
548
549
6c8073ab
PG
550HDR_CAND_GOOD = 0 # header marks begin of valid object
551HDR_CAND_FISHY = 1 # inconclusive (tag mismatch, obj overlap etc.)
552HDR_CAND_JUNK = 2 # not a header / object unreadable
553
5ed4c57d
PG
554HDR_VERDICT_NAME = \
555 { HDR_CAND_GOOD : "valid"
556 , HDR_CAND_FISHY : "fishy"
557 , HDR_CAND_JUNK : "junk"
558 }
559
560
561def verdict_fmt (vdt):
562 return HDR_VERDICT_NAME [vdt]
563
6c8073ab
PG
564
565def inspect_hdr (fd, off):
566 """
567 Attempt to parse a header in *fd* at position *off*.
568
569 Returns a verdict about the quality of that header plus the parsed header
570 when readable.
571 """
572
573 _ = os.lseek (fd, off, os.SEEK_SET)
574
575 if os.lseek (fd, 0, os.SEEK_CUR) != off:
576 if PDTCRYPT_VERBOSE is True:
577 noise ("PDT: %d → dismissed (lseek() past EOF)" % off)
578 return HDR_CAND_JUNK, None
579
580 raw = os.read (fd, PDTCRYPT_HDR_SIZE)
581 if len (raw) != PDTCRYPT_HDR_SIZE:
582 if PDTCRYPT_VERBOSE is True:
583 noise ("PDT: %d → dismissed (EOF inside header)" % off)
584 return HDR_CAND_JUNK, None
585
586 try:
587 hdr = hdr_read (raw)
588 except InvalidHeader as exn:
589 if PDTCRYPT_VERBOSE is True:
590 noise ("PDT: %d → dismissed (invalid: [%s])" % (off, str (exn)))
591 return HDR_CAND_JUNK, None
592
593 obj0 = off + PDTCRYPT_HDR_SIZE
594 objX = obj0 + hdr ["ctsize"]
595
596 eof = os.lseek (fd, 0, os.SEEK_END)
597 if eof < objX:
598 if PDTCRYPT_VERBOSE is True:
599 noise ("PDT: %d → EOF inside object (%d≤%d≤%d); adjusting size to "
600 "%d" % (off, obj0, eof, objX, (eof - obj0)))
601 # try reading up to the end
602 hdr ["ctsize"] = eof - obj0
603 return HDR_CAND_FISHY, hdr
604
605 return HDR_CAND_GOOD, hdr
606
607
a808459e 608def try_decrypt (ifd, off, hdr, secret, ofd=-1):
6c8073ab 609 """
a808459e
PG
610 Attempt to decrypt the object in the (seekable) descriptor *ifd* starting
611 at *off* using the metadata in *hdr* and *secret*. An output fd can be
612 specified with *ofd*; if it is *-1* – the default –, the decrypted payload
613 will be discarded.
70a33834
PG
614
615 Always creates a fresh decryptor, so validation steps across objects don’t
616 apply.
202104ed 617
d52e2737 618 Errors during GCM tag validation are ignored. Used by desaster recovery.
6c8073ab 619 """
70a33834
PG
620 ctleft = hdr ["ctsize"]
621 pos = off
622
623 ks = secret [0]
624 if ks == PDTCRYPT_SECRET_PW:
625 decr = Decrypt (password=secret [1])
626 elif ks == PDTCRYPT_SECRET_KEY:
6257d5b3 627 key = secret [1]
70a33834
PG
628 decr = Decrypt (key=key)
629 else:
630 raise RuntimeError
631
70a33834
PG
632 decr.next (hdr)
633
634 try:
a808459e 635 os.lseek (ifd, pos, os.SEEK_SET)
37ccf5bc 636 pt = b""
70a33834
PG
637 while ctleft > 0:
638 cnksiz = min (ctleft, PDTCRYPT_BLOCKSIZE)
a808459e 639 cnk = os.read (ifd, cnksiz)
70a33834
PG
640 ctleft -= cnksiz
641 pos += cnksiz
a808459e
PG
642 pt = decr.process (cnk)
643 if ofd != -1:
644 os.write (ofd, pt)
202104ed
PG
645 try:
646 pt = decr.done ()
647 except InvalidGCMTag:
648 noise ("PDT: GCM tag mismatch for object %d–%d"
649 % (off, off + hdr ["ctsize"]))
a808459e
PG
650 if len (pt) > 0 and ofd != -1:
651 os.write (ofd, pt)
70a33834 652
70a33834
PG
653 except Exception as exn:
654 noise ("PDT: error decrypting object %d–%d@%d, %d B remaining [%s]"
655 % (off, off + hdr ["ctsize"], pos, ctleft, exn))
656 raise
6c8073ab 657
70a33834 658 return pos - off
6c8073ab
PG
659
660
6690f5e0
PG
661def readable_objects_offsets (ifd, secret, cands):
662 """
663 From a list of candidates, locate the ones that mark the start of actual
664 readable PDTCRYPT objects.
665 """
666 good = []
24afaf18
PG
667
668 for i, cand in enumerate (cands):
6690f5e0
PG
669 vdt, hdr = inspect_hdr (ifd, cand)
670 if vdt == HDR_CAND_JUNK:
671 pass # ignore unreadable ones
672 elif vdt in [HDR_CAND_GOOD, HDR_CAND_FISHY]:
24afaf18 673 ctsize = hdr ["ctsize"]
6690f5e0 674 off0 = cand + PDTCRYPT_HDR_SIZE
24afaf18 675 ok = try_decrypt (ifd, off0, hdr, secret) == ctsize
6690f5e0 676 if ok is True:
24afaf18
PG
677 good.append ((cand, off0 + ctsize))
678
679 overlap = find_overlaps (good)
680
681 return [ g [0] for g in good ]
6690f5e0
PG
682
683
684def reconstruct_offsets (fname, secret):
685 ifd = os.open (fname, os.O_RDONLY)
686
687 try:
688 cands = locate_hdr_candidates (ifd)
689 return readable_objects_offsets (ifd, secret, cands)
690 finally:
691 os.close (ifd)
692
693
f41973a6 694###############################################################################
addcec42
PG
695## helpers
696###############################################################################
697
698def make_secret (password=None, key=None):
699 """
700 Safely create a “secret” value that consists either of a key or a password.
701 Inputs are validated: the password is accepted as (UTF-8 encoded) bytes or
702 string; for the key only a bytes object of the proper size or a base64
703 encoded string thereof is accepted.
704
705 If both are provided, the key is preferred over the password; no checks are
706 performed whether the key is derived from the password.
707
708 :returns: secret value if inputs were acceptable | None otherwise.
709 """
710 if key is not None:
711 if isinstance (key, str) is True:
712 key = key.encode ("utf-8")
713 if isinstance (key, bytes) is True:
714 if len (key) == AES_KEY_SIZE:
715 return (PDTCRYPT_SECRET_KEY, key)
6257d5b3
PG
716 if len (key) == AES_KEY_SIZE * 2:
717 try:
718 key = binascii.unhexlify (key)
719 return (PDTCRYPT_SECRET_KEY, key)
720 except binascii.Error: # garbage in string
721 pass
addcec42
PG
722 if len (key) == AES_KEY_SIZE_B64:
723 try:
724 key = base64.b64decode (key)
725 # the base64 processor is very tolerant and allows for
6257d5b3 726 # arbitrary trailing and leading data thus the data obtained
addcec42
PG
727 # must be checked for the proper length
728 if len (key) == AES_KEY_SIZE:
729 return (PDTCRYPT_SECRET_KEY, key)
730 except binascii.Error: # “incorrect padding”
731 pass
732 elif password is not None:
733 if isinstance (password, str) is True:
734 return (PDTCRYPT_SECRET_PW, password)
735 elif isinstance (password, bytes) is True:
736 try:
737 password = password.decode ("utf-8")
738 return (PDTCRYPT_SECRET_PW, password)
739 except UnicodeDecodeError:
740 pass
741
742 return None
743
744
745###############################################################################
6178061e
PG
746## passthrough / null encryption
747###############################################################################
748
749class PassthroughCipher (object):
750
751 tag = struct.pack ("<QQ", 0, 0)
752
753 def __init__ (self) : pass
754
755 def update (self, b) : return b
756
50710d86 757 def finalize (self) : return b""
6178061e
PG
758
759 def finalize_with_tag (self, _) : return b""
760
761###############################################################################
a393d9cb 762## convenience wrapper
00b3cd10
PG
763###############################################################################
764
c46c8670
PG
765
766def kdf_dummy (klen, password, _nacl):
704ceaa5
PG
767 """
768 Fake KDF for testing purposes that is called when parameter version zero is
769 encountered.
770 """
c46c8670
PG
771 q, r = divmod (klen, len (password))
772 if isinstance (password, bytes) is False:
773 password = password.encode ()
774 return password * q + password [:r], b""
775
776
777SCRYPT_KEY_MEMO = { } # static because needed for both the info file and the archive
778
779
780def kdf_scrypt (params, password, nacl):
704ceaa5
PG
781 """
782 Wrapper for the Scrypt KDF, corresponds to parameter version one. The
783 computation result is memoized based on the inputs to facilitate spawning
784 multiple encryption contexts.
785 """
c46c8670
PG
786 N = params["N"]
787 r = params["r"]
788 p = params["p"]
789 dkLen = params["dkLen"]
790
791 if nacl is None:
792 nacl = os.urandom (params["NaCl_LEN"])
793
794 key_parms = (password, nacl, N, r, p, dkLen)
795 global SCRYPT_KEY_MEMO
796 if key_parms not in SCRYPT_KEY_MEMO:
797 SCRYPT_KEY_MEMO [key_parms] = \
798 pylibscrypt.scrypt (password, nacl, N, r, p, dkLen)
799 return SCRYPT_KEY_MEMO [key_parms], nacl
a64085a8
PG
800
801
da82bc58 802def kdf_by_version (paramversion=None, defs=None):
704ceaa5
PG
803 """
804 Pick the KDF handler corresponding to the parameter version or the
805 definition set.
806
807 :rtype: function (password : str, nacl : str) -> str
808 """
da82bc58
PG
809 if paramversion is not None:
810 defs = ENCRYPTION_PARAMETERS.get(paramversion, None)
a64085a8 811 if defs is None:
1ed44e7b
PG
812 raise InvalidParameter ("no encryption parameters for version %r"
813 % paramversion)
a64085a8 814 (kdf, params) = defs["kdf"]
c46c8670
PG
815 fn = None
816 if kdf == "scrypt" : fn = kdf_scrypt
817 if kdf == "dummy" : fn = kdf_dummy
818 if fn is None:
a64085a8 819 raise ValueError ("key derivation method %r unknown" % kdf)
c46c8670 820 return partial (fn, params)
a64085a8
PG
821
822
b360b772
PG
823###############################################################################
824## SCRYPT hashing
825###############################################################################
826
827def scrypt_hashsource (pw, ins):
828 """
829 Calculate the SCRYPT hash from the password and the information contained
830 in the first header found in ``ins``.
831
832 This does not validate whether the first object is encrypted correctly.
833 """
c1ecc2e2
PG
834 if isinstance (pw, str) is True:
835 pw = str.encode (pw)
836 elif isinstance (pw, bytes) is False:
837 raise InvalidParameter ("password must be a string, not %s"
1ae49141 838 % type (pw))
c1ecc2e2
PG
839 if isinstance (ins, io.BufferedReader) is False and \
840 isinstance (ins, io.FileIO) is False:
841 raise InvalidParameter ("file to hash must be opened in “binary” mode")
b360b772
PG
842 hdr = None
843 try:
844 hdr = hdr_read_stream (ins)
845 except EndOfFile as exn:
846 noise ("PDT: malformed input: end of file reading first object header")
847 noise ("PDT:")
848 return 1
849
850 nacl = hdr ["nacl"]
851 pver = hdr ["paramversion"]
852 if PDTCRYPT_VERBOSE is True:
853 noise ("PDT: salt of first object : %s" % binascii.hexlify (nacl))
854 noise ("PDT: parameter version of archive : %d" % pver)
855
856 try:
857 defs = ENCRYPTION_PARAMETERS.get(pver, None)
858 kdfname, params = defs ["kdf"]
859 if kdfname != "scrypt":
860 noise ("PDT: input is not an SCRYPT archive")
861 noise ("")
862 return 1
863 kdf = kdf_by_version (None, defs)
864 except ValueError as exn:
865 noise ("PDT: object has unknown parameter version %d" % pver)
866
867 hsh, _void = kdf (pw, nacl)
868
c1ecc2e2 869 return hsh, nacl, hdr ["version"], pver
b360b772
PG
870
871
872def scrypt_hashfile (pw, fname):
704ceaa5
PG
873 """
874 Calculate the SCRYPT hash from the password and the information contained
875 in the first header found in the given file. The header is read only at
876 offset zero.
877 """
b360b772 878 with deptdcrypt_mk_stream (PDTCRYPT_SOURCE, fname or "-") as ins:
c1ecc2e2 879 hsh, _void, _void, _void = scrypt_hashsource (pw, ins)
b360b772
PG
880 return hsh
881
882
883###############################################################################
884## AES-GCM context
885###############################################################################
886
a393d9cb
PG
887class Crypto (object):
888 """
889 Encryption context to remain alive throughout an entire tarfile pass.
890 """
6178061e 891 enc = None
a393d9cb
PG
892 nacl = None
893 key = None
50710d86
PG
894 cnt = None # file counter (uint32_t != 0)
895 iv = None # current IV
30019abf
PG
896 fixed = None # accu for 64 bit fixed parts of IV
897 used_ivs = None # tracks IVs
898 strict_ivs = False # if True, panic on duplicate object IV
48db09ba
PG
899 password = None
900 paramversion = None
633b18a9
PG
901 stats = { "in" : 0
902 , "out" : 0
903 , "obj" : 0 }
fa47412e 904
fa47412e
PG
905 ctsize = -1
906 ptsize = -1
3031b7ae
PG
907 info_counter_used = False
908 index_counter_used = False
a393d9cb 909
a64085a8 910 def __init__ (self, *al, **akv):
30019abf 911 self.used_ivs = set ()
a64085a8 912 self.set_parameters (*al, **akv)
39accaaa
PG
913
914
704ceaa5 915 def next_fixed (self):
be124bca 916 # NOP for decryption
50710d86
PG
917 pass
918
919
920 def set_object_counter (self, cnt=None):
704ceaa5
PG
921 """
922 Safely set the internal counter of encrypted objects. Numerous
923 constraints apply:
924
925 The same counter may not be reused in combination with one IV fixed
926 part. This is validated elsewhere in the IV handling.
927
928 Counter zero is invalid. The first two counters are reserved for
929 metadata. The implementation does not allow for splitting metadata
930 files over multiple encrypted objects. (This would be possible by
931 assigning new fixed parts.) Thus in a Deltatar backup there is at most
932 one object with a counter value of one and two. On creation of a
933 context, the initial counter may be chosen. The globals
934 ``AES_GCM_IV_CNT_INFOFILE`` and ``AES_GCM_IV_CNT_INDEX`` can be used to
935 request one of the reserved values. If one of these values has been
936 used, any further attempt of setting the counter to that value will
937 be rejected with an ``InvalidFileCounter`` exception.
938
939 Out of bounds values (i. e. below one and more than the maximum of 2³²)
940 cause an ``InvalidParameter`` exception to be thrown.
941 """
50710d86
PG
942 if cnt is None:
943 self.cnt = AES_GCM_IV_CNT_DATA
944 return
945 if cnt == 0 or cnt > AES_GCM_IV_CNT_MAX + 1:
b12110dd
PG
946 raise InvalidParameter ("invalid counter value %d requested: "
947 "acceptable values are from 1 to %d"
948 % (cnt, AES_GCM_IV_CNT_MAX))
50710d86
PG
949 if cnt == AES_GCM_IV_CNT_INFOFILE:
950 if self.info_counter_used is True:
fac2cfe1
PG
951 raise InvalidFileCounter ("attempted to reuse info file "
952 "counter %d: must be unique" % cnt)
50710d86 953 self.info_counter_used = True
3031b7ae
PG
954 elif cnt == AES_GCM_IV_CNT_INDEX:
955 if self.index_counter_used is True:
fac2cfe1
PG
956 raise InvalidFileCounter ("attempted to reuse index file "
957 " counter %d: must be unique" % cnt)
3031b7ae 958 self.index_counter_used = True
50710d86
PG
959 if cnt <= AES_GCM_IV_CNT_MAX:
960 self.cnt = cnt
961 return
962 # cnt == AES_GCM_IV_CNT_MAX + 1 → wrap
963 self.cnt = AES_GCM_IV_CNT_DATA
704ceaa5 964 self.next_fixed ()
50710d86
PG
965
966
1f3fd7b0 967 def set_parameters (self, password=None, key=None, paramversion=None,
be124bca 968 nacl=None, counter=None, strict_ivs=False):
704ceaa5
PG
969 """
970 Configure the internal state of a crypto context. Not intended for
971 external use.
972 """
be124bca 973 self.next_fixed ()
50710d86 974 self.set_object_counter (counter)
30019abf
PG
975 self.strict_ivs = strict_ivs
976
a83fa4ed
PG
977 if paramversion is not None:
978 self.paramversion = paramversion
979
1f3fd7b0
PG
980 if key is not None:
981 self.key, self.nacl = key, nacl
982 return
983
a83fa4ed
PG
984 if password is not None:
985 if isinstance (password, bytes) is False:
986 password = str.encode (password)
987 self.password = password
988 if paramversion is None and nacl is None:
989 # postpone key setup until first header is available
990 return
991 kdf = kdf_by_version (paramversion)
992 if kdf is not None:
993 self.key, self.nacl = kdf (password, nacl)
fa47412e 994
39accaaa 995
39accaaa 996 def process (self, buf):
704ceaa5
PG
997 """
998 Encrypt / decrypt a buffer. Invokes the ``.update()`` method on the
999 wrapped encryptor or decryptor, respectively.
1000
1001 The Cryptography exception ``AlreadyFinalized`` is translated to an
1002 ``InternalError`` at this point. It may occur in sound code when the GC
1003 closes an encrypting stream after an error. Everywhere else it must be
1004 treated as a bug.
1005 """
cb7a3911
PG
1006 if self.enc is None:
1007 raise RuntimeError ("process: context not initialized")
1008 self.stats ["in"] += len (buf)
fac2cfe1
PG
1009 try:
1010 out = self.enc.update (buf)
1011 except cryptography.exceptions.AlreadyFinalized as exn:
1012 raise InternalError (exn)
cb7a3911
PG
1013 self.stats ["out"] += len (out)
1014 return out
39accaaa
PG
1015
1016
30019abf 1017 def next (self, password, paramversion, nacl, iv):
704ceaa5
PG
1018 """
1019 Prepare for encrypting another object: Reset the data counters and
1020 change the configuration in case one of the variable parameters differs
1021 from the last object. Also check the IV for duplicates and error out
1022 if strict checking was requested.
1023 """
fa47412e
PG
1024 self.ctsize = 0
1025 self.ptsize = 0
1026 self.stats ["obj"] += 1
30019abf
PG
1027
1028 self.check_duplicate_iv (iv)
1029
6178061e
PG
1030 if ( self.paramversion != paramversion
1031 or self.password != password
1032 or self.nacl != nacl):
1f3fd7b0 1033 self.set_parameters (password=password, paramversion=paramversion,
30019abf
PG
1034 nacl=nacl, strict_ivs=self.strict_ivs)
1035
1036
1037 def check_duplicate_iv (self, iv):
704ceaa5
PG
1038 """
1039 Add an IV (the 12 byte representation as in the header) to the list. With
1040 strict checking enabled, this will throw a ``DuplicateIV``. Depending on
1041 the context, this may indicate a serious error (IV reuse).
1042 """
30019abf
PG
1043 if self.strict_ivs is True and iv in self.used_ivs:
1044 raise DuplicateIV ("iv %s was reused" % iv_fmt (iv))
1045 # vi has not been used before; add to collection
1046 self.used_ivs.add (iv)
fa47412e
PG
1047
1048
633b18a9 1049 def counters (self):
704ceaa5
PG
1050 """
1051 Access the data counters.
1052 """
633b18a9
PG
1053 return self.stats ["obj"], self.stats ["in"], self.stats ["out"]
1054
1055
8de91f4f
PG
1056 def drop (self):
1057 """
1058 Clear the current context regardless of its finalization state. The
1059 next operation must be ``.next()``.
1060 """
1061 self.enc = None
1062
1063
39accaaa
PG
1064class Encrypt (Crypto):
1065
48db09ba
PG
1066 lastinfo = None
1067 version = None
72a42219 1068 paramenc = None
50710d86 1069
1f3fd7b0 1070 def __init__ (self, version, paramversion, password=None, key=None, nacl=None,
30019abf 1071 counter=AES_GCM_IV_CNT_DATA, strict_ivs=True):
704ceaa5
PG
1072 """
1073 The ctor will throw immediately if one of the parameters does not conform
1074 to our expectations.
1075
1076 counter=AES_GCM_IV_CNT_DATA, strict_ivs=True):
1077 :type version: int to fit uint16_t
1078 :type paramversion: int to fit uint16_t
1079 :param password: mutually exclusive with ``key``
1080 :type password: bytes
1081 :param key: mutually exclusive with ``password``
1082 :type key: bytes
1083 :type nacl: bytes
1084 :type counter: initial object counter the values
1085 ``AES_GCM_IV_CNT_INFOFILE`` and
1086 ``AES_GCM_IV_CNT_INDEX`` are unique in each backup set
1087 and cannot be reused even with different fixed parts.
1088 :type strict_ivs: bool
1089 """
1f3fd7b0
PG
1090 if password is None and key is None \
1091 or password is not None and key is not None :
1092 raise InvalidParameter ("__init__: need either key or password")
1093
1094 if key is not None:
1095 if isinstance (key, bytes) is False:
1096 raise InvalidParameter ("__init__: key must be provided as "
1097 "bytes, not %s" % type (key))
1098 if nacl is None:
1099 raise InvalidParameter ("__init__: salt must be provided along "
1100 "with encryption key")
1101 else: # password, no key
1102 if isinstance (password, str) is False:
1103 raise InvalidParameter ("__init__: password must be a string, not %s"
1104 % type (password))
1105 if len (password) == 0:
1106 raise InvalidParameter ("__init__: supplied empty password but not "
1107 "permitted for PDT encrypted files")
36b9932a
PG
1108 # version
1109 if isinstance (version, int) is False:
1110 raise InvalidParameter ("__init__: version number must be an "
1111 "integer, not %s" % type (version))
1112 if version < 0:
1113 raise InvalidParameter ("__init__: version number must be a "
1114 "nonnegative integer, not %d" % version)
1115 # paramversion
1116 if isinstance (paramversion, int) is False:
1117 raise InvalidParameter ("__init__: crypto parameter version number "
1118 "must be an integer, not %s"
1119 % type (paramversion))
1120 if paramversion < 0:
1121 raise InvalidParameter ("__init__: crypto parameter version number "
1122 "must be a nonnegative integer, not %d"
1123 % paramversion)
1124 # salt
1125 if nacl is not None:
1126 if isinstance (nacl, bytes) is False:
1127 raise InvalidParameter ("__init__: salt given, but of type %s "
1128 "instead of bytes" % type (nacl))
1129 # salt length would depend on the actual encryption so it can’t be
1130 # validated at this point
b12110dd 1131 self.fixed = [ ]
48db09ba
PG
1132 self.version = version
1133 self.paramenc = ENCRYPTION_PARAMETERS.get (paramversion) ["enc"]
72a42219 1134
1f3fd7b0 1135 super().__init__ (password, key, paramversion, nacl, counter=counter,
30019abf 1136 strict_ivs=strict_ivs)
a393d9cb
PG
1137
1138
be124bca
PG
1139 def next_fixed (self, retries=PDTCRYPT_IV_GEN_MAX_RETRIES):
1140 """
1141 Generate the next IV fixed part by reading eight bytes from
1142 ``/dev/urandom``. The buffer so obtained is tested against the fixed
1143 parts used so far to prevent accidental reuse of IVs. After a
1144 configurable number of attempts to create a unique fixed part, it will
1145 refuse to continue with an ``IVFixedPartError``. This is unlikely to
1146 ever happen on a normal system but may detect an issue with the random
1147 generator.
1148
1149 The list of fixed parts that were used by the context at hand can be
1150 accessed through the ``.fixed`` list. Its last element is the fixed
1151 part currently in use.
1152 """
1153 i = 0
1154 while i < retries:
1155 fp = os.urandom (PDTCRYPT_IV_FIXEDPART_SIZE)
1156 if fp not in self.fixed:
1157 self.fixed.append (fp)
1158 return
1159 i += 1
1160 raise IVFixedPartError ("error obtaining a unique IV fixed part from "
1161 "/dev/urandom; giving up after %d tries" % i)
1162
1163
a393d9cb 1164 def iv_make (self):
704ceaa5
PG
1165 """
1166 Construct a 12-bytes IV from the current fixed part and the object
1167 counter.
1168 """
b12110dd 1169 return struct.pack(FMT_I2N_IV, self.fixed [-1], self.cnt)
a393d9cb
PG
1170
1171
cb7a3911 1172 def next (self, filename=None, counter=None):
704ceaa5
PG
1173 """
1174 Prepare for encrypting the next incoming object. Update the counter
1175 and put together the IV, possibly changing prefixes. Then create the
1176 new encryptor.
1177
1178 The argument ``counter`` can be used to specify a file counter for this
1179 object. Unless it is one of the reserved values, the counter of
1180 subsequent objects will be computed from this one.
1181
1182 If this is the first object in a series, ``filename`` is required,
1183 otherwise it is reused if not present. The value is used to derive a
1184 header sized placeholder to use until after encryption when all the
1185 inputs to construct the final header are available. This is then
1186 matched in ``.done()`` against the value found at the position of the
1187 header. The motivation for this extra check is primarily to assist
1188 format debugging: It makes stray headers easy to spot in malformed
1189 PDTCRYPT files.
1190 """
cb7a3911
PG
1191 if filename is None:
1192 if self.lastinfo is None:
1193 raise InvalidParameter ("next: filename is mandatory for "
1194 "first object")
1195 filename, _dummy = self.lastinfo
1196 else:
1197 if isinstance (filename, str) is False:
1198 raise InvalidParameter ("next: filename must be a string, no %s"
1199 % type (filename))
3031b7ae
PG
1200 if counter is not None:
1201 if isinstance (counter, int) is False:
1202 raise InvalidParameter ("next: the supplied counter is of "
1203 "invalid type %s; please pass an "
1204 "integer instead" % type (counter))
1205 self.set_object_counter (counter)
fac2cfe1 1206
50710d86 1207 self.iv = self.iv_make ()
72a42219 1208 if self.paramenc == "aes-gcm":
6178061e
PG
1209 self.enc = Cipher \
1210 ( algorithms.AES (self.key)
1211 , modes.GCM (self.iv)
1212 , backend = default_backend ()) \
1213 .encryptor ()
72a42219 1214 elif self.paramenc == "passthrough":
6178061e
PG
1215 self.enc = PassthroughCipher ()
1216 else:
b12110dd
PG
1217 raise InvalidParameter ("next: parameter version %d not known"
1218 % self.paramversion)
48db09ba
PG
1219 hdrdum = hdr_make_dummy (filename)
1220 self.lastinfo = (filename, hdrdum)
30019abf 1221 super().next (self.password, self.paramversion, self.nacl, self.iv)
72a42219 1222
3031b7ae 1223 self.set_object_counter (self.cnt + 1)
48db09ba 1224 return hdrdum
a393d9cb 1225
a393d9cb 1226
cd77dadb 1227 def done (self, cmpdata):
704ceaa5
PG
1228 """
1229 Complete encryption of an object. After this has been called, attempts
1230 of encrypting further data will cause an error until ``.next()`` is
1231 invoked properly.
1232
1233 Returns a 64 bytes buffer containing the object header including all
1234 values including the “late” ones e. g. the ciphertext size and the
1235 GCM tag.
1236 """
36b9932a
PG
1237 if isinstance (cmpdata, bytes) is False:
1238 raise InvalidParameter ("done: comparison input expected as bytes, "
1239 "not %s" % type (cmpdata))
cb7a3911
PG
1240 if self.lastinfo is None:
1241 raise RuntimeError ("done: encryption context not initialized")
48db09ba
PG
1242 filename, hdrdum = self.lastinfo
1243 if cmpdata != hdrdum:
b12110dd
PG
1244 raise RuntimeError ("done: bad sync of header for object %d: "
1245 "preliminary data does not match; this likely "
1246 "indicates a wrongly repositioned stream"
1247 % self.cnt)
6178061e 1248 data = self.enc.finalize ()
633b18a9 1249 self.stats ["out"] += len (data)
cd77dadb 1250 self.ctsize += len (data)
48db09ba
PG
1251 ok, hdr = hdr_from_params (self.version, self.paramversion, self.nacl,
1252 self.iv, self.ctsize, self.enc.tag)
8a990744 1253 if ok is False:
b12110dd
PG
1254 raise InternalError ("error constructing header: %r" % hdr)
1255 return data, hdr, self.fixed
a393d9cb 1256
a393d9cb 1257
cd77dadb 1258 def process (self, buf):
704ceaa5
PG
1259 """
1260 Encrypt a chunk of plaintext with the active encryptor. Returns the
1261 size of the input consumed. This **must** be checked downstream. If the
1262 maximum possible object size has been reached, the current context must
1263 be finalized and a new one established before any further data can be
1264 encrypted. The second argument is the remainder of the plaintext that
1265 was not encrypted for the caller to use immediately after the new
1266 context is ready.
1267 """
36b9932a
PG
1268 if isinstance (buf, bytes) is False:
1269 raise InvalidParameter ("process: expected byte buffer, not %s"
1270 % type (buf))
cb7a3911
PG
1271 bsize = len (buf)
1272 newptsize = self.ptsize + bsize
1273 diff = newptsize - PDTCRYPT_MAX_OBJ_SIZE
1274 if diff > 0:
1275 bsize -= diff
1276 newptsize = PDTCRYPT_MAX_OBJ_SIZE
1277 self.ptsize = newptsize
1278 data = super().process (buf [:bsize])
cd77dadb 1279 self.ctsize += len (data)
cb7a3911 1280 return bsize, data
cd77dadb
PG
1281
1282
39accaaa 1283class Decrypt (Crypto):
a393d9cb 1284
3031b7ae 1285 tag = None # GCM tag, part of header
3031b7ae 1286 last_iv = None # check consecutive ivs in strict mode
39accaaa 1287
1f3fd7b0 1288 def __init__ (self, password=None, key=None, counter=None, fixedparts=None,
ee6aa239 1289 strict_ivs=False):
704ceaa5
PG
1290 """
1291 Sanitizing ctor for the decryption context. ``fixedparts`` specifies a
1292 list of IV fixed parts accepted during decryption. If a fixed part is
1293 encountered that is not in the list, decryption will fail.
1294
1295 :param password: mutually exclusive with ``key``
1296 :type password: bytes
1297 :param key: mutually exclusive with ``password``
1298 :type key: bytes
1299 :type counter: initial object counter the values
1300 ``AES_GCM_IV_CNT_INFOFILE`` and
1301 ``AES_GCM_IV_CNT_INDEX`` are unique in each backup set
1302 and cannot be reused even with different fixed parts.
1303 :type fixedparts: bytes list
1304 """
1f3fd7b0
PG
1305 if password is None and key is None \
1306 or password is not None and key is not None :
1307 raise InvalidParameter ("__init__: need either key or password")
1308
1309 if key is not None:
1310 if isinstance (key, bytes) is False:
1311 raise InvalidParameter ("__init__: key must be provided as "
1312 "bytes, not %s" % type (key))
1313 else: # password, no key
1314 if isinstance (password, str) is False:
1315 raise InvalidParameter ("__init__: password must be a string, not %s"
1316 % type (password))
1317 if len (password) == 0:
1318 raise InvalidParameter ("__init__: supplied empty password but not "
1319 "permitted for PDT encrypted files")
36b9932a 1320 # fixed parts
50710d86 1321 if fixedparts is not None:
36b9932a
PG
1322 if isinstance (fixedparts, list) is False:
1323 raise InvalidParameter ("__init__: IV fixed parts must be "
1324 "supplied as list, not %s"
1325 % type (fixedparts))
b12110dd
PG
1326 self.fixed = fixedparts
1327 self.fixed.sort ()
ee6aa239 1328
a83fa4ed
PG
1329 super().__init__ (password=password, key=key, counter=counter,
1330 strict_ivs=strict_ivs)
39accaaa
PG
1331
1332
b12110dd 1333 def valid_fixed_part (self, iv):
704ceaa5
PG
1334 """
1335 Check if a fixed part was already seen.
1336 """
50710d86 1337 # check if fixed part is known
b12110dd
PG
1338 fixed, _cnt = struct.unpack (FMT_I2N_IV, iv)
1339 i = bisect.bisect_left (self.fixed, fixed)
1340 return i != len (self.fixed) and self.fixed [i] == fixed
50710d86
PG
1341
1342
ee6aa239 1343 def check_consecutive_iv (self, iv):
704ceaa5
PG
1344 """
1345 Check whether the counter part of the given IV is indeed the successor
1346 of the currently present counter. This should always be the case for
1347 the objects in a well formed PDT archive but should not be enforced
1348 when decrypting out-of-order.
1349 """
ee6aa239 1350 fixed, cnt = struct.unpack (FMT_I2N_IV, iv)
3031b7ae
PG
1351 if self.strict_ivs is True \
1352 and self.last_iv is not None \
ee6aa239
PG
1353 and self.last_iv [0] == fixed \
1354 and self.last_iv [1] != cnt - 1:
f6cd676f 1355 raise NonConsecutiveIV ("iv %s counter not successor of "
ee6aa239 1356 "last object (expected %d, found %d)"
afa13ebc 1357 % (fixed, iv_fmt (self.last_iv [1]), cnt))
ee6aa239
PG
1358 self.last_iv = (iv, cnt)
1359
1360
79782fa9 1361 def next (self, hdr):
704ceaa5
PG
1362 """
1363 Start decrypting the next object. The PDTCRYPT header for the object
1364 can be given either as already parsed object or as bytes.
1365 """
dccfe104
PG
1366 if isinstance (hdr, bytes) is True:
1367 hdr = hdr_read (hdr)
36b9932a
PG
1368 elif isinstance (hdr, dict) is False:
1369 # this won’t catch malformed specs though
1370 raise InvalidParameter ("next: wrong type of parameter hdr: "
1371 "expected bytes or spec, got %s"
fbfda3d4 1372 % type (hdr))
36b9932a
PG
1373 try:
1374 paramversion = hdr ["paramversion"]
1375 nacl = hdr ["nacl"]
1376 iv = hdr ["iv"]
1377 tag = hdr ["tag"]
1378 except KeyError:
1379 raise InvalidHeader ("next: not a header %r" % hdr)
1380
30019abf 1381 super().next (self.password, paramversion, nacl, iv)
b12110dd 1382 if self.fixed is not None and self.valid_fixed_part (iv) is False:
f6cd676f
PG
1383 raise InvalidIVFixedPart ("iv %s has invalid fixed part"
1384 % iv_fmt (iv))
3031b7ae 1385 self.check_consecutive_iv (iv)
ee6aa239 1386
36b9932a 1387 self.tag = tag
b12110dd
PG
1388 defs = ENCRYPTION_PARAMETERS.get (paramversion, None)
1389 if defs is None:
1390 raise FormatError ("header contains unknown parameter version %d; "
1391 "maybe the file was created by a more recent "
1392 "version of Deltatar" % paramversion)
50710d86 1393 enc = defs ["enc"]
6178061e
PG
1394 if enc == "aes-gcm":
1395 self.enc = Cipher \
1396 ( algorithms.AES (self.key)
36b9932a 1397 , modes.GCM (iv, tag=self.tag)
6178061e
PG
1398 , backend = default_backend ()) \
1399 . decryptor ()
1400 elif enc == "passthrough":
1401 self.enc = PassthroughCipher ()
1402 else:
b12110dd
PG
1403 raise InternalError ("encryption parameter set %d refers to unknown "
1404 "mode %r" % (paramversion, enc))
f484f2d1 1405 self.set_object_counter (self.cnt + 1)
39accaaa
PG
1406
1407
db1f3ac7 1408 def done (self, tag=None):
704ceaa5
PG
1409 """
1410 Stop decryption of the current object and finalize it with the active
1411 context. This will throw an *InvalidGCMTag* exception to indicate that
1412 the authentication tag does not match the data. If the tag is correct,
1413 the rest of the plaintext is returned.
1414 """
633b18a9 1415 data = b""
db1f3ac7
PG
1416 try:
1417 if tag is None:
f484f2d1 1418 data = self.enc.finalize ()
db1f3ac7 1419 else:
36b9932a
PG
1420 if isinstance (tag, bytes) is False:
1421 raise InvalidParameter ("done: wrong type of parameter "
1422 "tag: expected bytes, got %s"
1423 % type (tag))
f484f2d1 1424 data = self.enc.finalize_with_tag (self.tag)
b0078f26 1425 except cryptography.exceptions.InvalidTag:
f08c604b 1426 raise InvalidGCMTag ("done: tag mismatch of object %d: %s "
b0078f26 1427 "rejected by finalize ()"
f08c604b 1428 % (self.cnt, binascii.hexlify (self.tag)))
50710d86 1429 self.ctsize += len (data)
633b18a9 1430 self.stats ["out"] += len (data)
b0078f26 1431 return data
00b3cd10
PG
1432
1433
47e27926 1434 def process (self, buf):
704ceaa5
PG
1435 """
1436 Decrypt the bytes object *buf* with the active decryptor.
1437 """
36b9932a
PG
1438 if isinstance (buf, bytes) is False:
1439 raise InvalidParameter ("process: expected byte buffer, not %s"
1440 % type (buf))
47e27926
PG
1441 self.ctsize += len (buf)
1442 data = super().process (buf)
1443 self.ptsize += len (data)
1444 return data
1445
1446
00b3cd10 1447###############################################################################
770173c5
PG
1448## testing helpers
1449###############################################################################
1450
cb7a3911 1451def _patch_global (glob, vow, n=None):
770173c5
PG
1452 """
1453 Adapt upper file counter bound for testing IV logic. Completely unsafe.
1454 """
1455 assert vow == "I am fully aware that this will void my warranty."
cb7a3911
PG
1456 r = globals () [glob]
1457 if n is None:
1458 n = globals () [glob + "_DEFAULT"]
1459 globals () [glob] = n
770173c5
PG
1460 return r
1461
cb7a3911
PG
1462_testing_set_AES_GCM_IV_CNT_MAX = \
1463 partial (_patch_global, "AES_GCM_IV_CNT_MAX")
1464
1465_testing_set_PDTCRYPT_MAX_OBJ_SIZE = \
1466 partial (_patch_global, "PDTCRYPT_MAX_OBJ_SIZE")
1467
a808459e
PG
1468def open2_dump_file (fname, dir_fd, force=False):
1469 outfd = -1
1470
1471 oflags = os.O_CREAT | os.O_WRONLY
6690f5e0 1472 if force is True:
a808459e
PG
1473 oflags |= os.O_TRUNC
1474 else:
1475 oflags |= os.O_EXCL
1476
1477 try:
1478 outfd = os.open (fname, oflags,
1479 stat.S_IRUSR | stat.S_IWUSR, dir_fd=dir_fd)
1480 except FileExistsError as exn:
1481 noise ("PDT: refusing to overwrite existing file %s" % fname)
1482 noise ("")
1483 raise RuntimeError ("destination file %s already exists" % fname)
1484 if PDTCRYPT_VERBOSE is True:
1485 noise ("PDT: new output file %s (fd=%d)" % (fname, outfd))
1486
1487 return outfd
1488
770173c5 1489###############################################################################
00b3cd10
PG
1490## freestanding invocation
1491###############################################################################
1492
da82bc58
PG
1493PDTCRYPT_SUB_PROCESS = 0
1494PDTCRYPT_SUB_SCRYPT = 1
f41973a6 1495PDTCRYPT_SUB_SCAN = 2
da82bc58
PG
1496
1497PDTCRYPT_SUB = \
1498 { "process" : PDTCRYPT_SUB_PROCESS
f41973a6
PG
1499 , "scrypt" : PDTCRYPT_SUB_SCRYPT
1500 , "scan" : PDTCRYPT_SUB_SCAN }
da82bc58 1501
e3abcdf0
PG
1502PDTCRYPT_DECRYPT = 1 << 0 # decrypt archive with password
1503PDTCRYPT_SPLIT = 1 << 1 # split archive into individual objects
da82bc58 1504PDTCRYPT_HASH = 1 << 2 # output scrypt hash for file and given password
e3abcdf0 1505
a808459e
PG
1506PDTCRYPT_SPLITNAME = "pdtcrypt-object-%d.bin"
1507PDTCRYPT_RESCUENAME = "pdtcrypt-rescue-object-%0.5d.bin"
e3abcdf0 1508
70ad9458 1509PDTCRYPT_VERBOSE = False
ee6aa239 1510PDTCRYPT_STRICTIVS = False
b07633d3 1511PDTCRYPT_OVERWRITE = False
15d3eefd 1512PDTCRYPT_BLOCKSIZE = 1 << 12
70ad9458
PG
1513PDTCRYPT_SINK = 0
1514PDTCRYPT_SOURCE = 1
1515SELF = None
1516
77058bab
PG
1517PDTCRYPT_DEFAULT_VER = 1
1518PDTCRYPT_DEFAULT_PVER = 1
1519
7b3940e5
PG
1520# scrypt hashing output control
1521PDTCRYPT_SCRYPT_INTRANATOR = 0
1522PDTCRYPT_SCRYPT_PARAMETERS = 1
4f6405d6 1523PDTCRYPT_SCRYPT_DEFAULT = PDTCRYPT_SCRYPT_INTRANATOR
7b3940e5
PG
1524
1525PDTCRYPT_SCRYPT_FORMAT = \
1526 { "i2n" : PDTCRYPT_SCRYPT_INTRANATOR
1527 , "params" : PDTCRYPT_SCRYPT_PARAMETERS }
1528
4c62ddc0 1529PDTCRYPT_TT_COLUMNS = 80 # assume standard terminal
15d3eefd
PG
1530
1531class PDTDecryptionError (Exception):
1532 """Decryption failed."""
1533
e3abcdf0
PG
1534class PDTSplitError (Exception):
1535 """Decryption failed."""
1536
15d3eefd
PG
1537
1538def noise (*a, **b):
591a722f 1539 print (file=sys.stderr, *a, **b)
15d3eefd
PG
1540
1541
89e1073c
PG
1542class PassthroughDecryptor (object):
1543
1544 curhdr = None # write current header on first data write
1545
1546 def __init__ (self):
1547 if PDTCRYPT_VERBOSE is True:
1548 noise ("PDT: no encryption; data passthrough")
1549
1550 def next (self, hdr):
1551 ok, curhdr = hdr_make (hdr)
1552 if ok is False:
1553 raise PDTDecryptionError ("bad header %r" % hdr)
1554 self.curhdr = curhdr
1555
1556 def done (self):
1557 if self.curhdr is not None:
1558 return self.curhdr
1559 return b""
1560
1561 def process (self, d):
1562 if self.curhdr is not None:
1563 d = self.curhdr + d
1564 self.curhdr = None
1565 return d
1566
1567
a83fa4ed 1568def depdtcrypt (mode, secret, ins, outs):
15d3eefd 1569 """
a83fa4ed
PG
1570 Remove PDTCRYPT layer from all objects encrypted with the secret. Used on a
1571 Deltatar backup this will yield a (possibly Gzip compressed) tarball.
15d3eefd
PG
1572 """
1573 ctleft = -1 # length of ciphertext to consume
1574 ctcurrent = 0 # total ciphertext of current object
15d3eefd
PG
1575 total_obj = 0 # total number of objects read
1576 total_pt = 0 # total plaintext bytes
1577 total_ct = 0 # total ciphertext bytes
1578 total_read = 0 # total bytes read
e3abcdf0
PG
1579 outfile = None # Python file object for output
1580
89e1073c 1581 if mode & PDTCRYPT_DECRYPT: # decryptor
a83fa4ed
PG
1582 ks = secret [0]
1583 if ks == PDTCRYPT_SECRET_PW:
1584 decr = Decrypt (password=secret [1], strict_ivs=PDTCRYPT_STRICTIVS)
1585 elif ks == PDTCRYPT_SECRET_KEY:
6257d5b3 1586 key = secret [1]
a83fa4ed
PG
1587 decr = Decrypt (key=key, strict_ivs=PDTCRYPT_STRICTIVS)
1588 else:
1589 raise InternalError ("‘%d’ does not specify a valid kind of secret"
1590 % ks)
89e1073c
PG
1591 else:
1592 decr = PassthroughDecryptor ()
1593
e3abcdf0
PG
1594 def nextout (_):
1595 """Dummy for non-split mode: output file does not vary."""
1596 return outs
1597
1598 if mode & PDTCRYPT_SPLIT:
1599 def nextout (outfile):
1600 """
1601 We were passed an fd as outs for accessing the destination
1602 directory where extracted archive components are supposed
1603 to end up in.
1604 """
1605
1606 if outfile is None:
1607 if PDTCRYPT_VERBOSE is True:
1608 noise ("PDT: no output file to close at this point")
77058bab
PG
1609 else:
1610 if PDTCRYPT_VERBOSE is True:
1611 noise ("PDT: release output file %r" % outfile)
e3abcdf0
PG
1612 # cleanup happens automatically by the GC; the next
1613 # line will error out on account of an invalid fd
1614 #outfile.close ()
1615
1616 assert total_obj > 0
1617 fname = PDTCRYPT_SPLITNAME % total_obj
1618 try:
a808459e
PG
1619 outfd = open2_dump_file (fname, outs, force=PDTCRYPT_OVERWRITE)
1620 except RuntimeError as exn:
1621 raise PDTSplitError (exn)
e3abcdf0
PG
1622 return os.fdopen (outfd, "wb", closefd=True)
1623
15d3eefd 1624
47d22679 1625 def tell (s):
b09a99eb 1626 """ESPIPE is normal on non-seekable stdio stream."""
47d22679
PG
1627 try:
1628 return s.tell ()
1629 except OSError as exn:
2a307f41 1630 if exn.errno == errno.ESPIPE:
47d22679
PG
1631 return -1
1632
e3abcdf0 1633 def out (pt, outfile):
15d3eefd
PG
1634 npt = len (pt)
1635 nonlocal total_pt
1636 total_pt += npt
70ad9458 1637 if PDTCRYPT_VERBOSE is True:
15d3eefd
PG
1638 noise ("PDT:\t· decrypt plaintext %d B" % (npt))
1639 try:
e3abcdf0 1640 nn = outfile.write (pt)
15d3eefd
PG
1641 except OSError as exn: # probably ENOSPC
1642 raise DecryptionError ("error (%s)" % exn)
1643 if nn != npt:
1644 raise DecryptionError ("write aborted after %d of %d B" % (nn, npt))
1645
1646 while True:
1647 if ctleft <= 0:
1648 # current object completed; in a valid archive this marks either
1649 # the start of a new header or the end of the input
1650 if ctleft == 0: # current object requires finalization
70ad9458 1651 if PDTCRYPT_VERBOSE is True:
47d22679 1652 noise ("PDT: %d finalize" % tell (ins))
5d394c0d
PG
1653 try:
1654 pt = decr.done ()
1655 except InvalidGCMTag as exn:
f08c604b
PG
1656 raise DecryptionError ("error finalizing object %d (%d B): "
1657 "%r" % (total_obj, len (pt), exn)) \
1658 from exn
e3abcdf0 1659 out (pt, outfile)
70ad9458 1660 if PDTCRYPT_VERBOSE is True:
15d3eefd
PG
1661 noise ("PDT:\t· object validated")
1662
70ad9458 1663 if PDTCRYPT_VERBOSE is True:
47d22679 1664 noise ("PDT: %d hdr" % tell (ins))
15d3eefd
PG
1665 try:
1666 hdr = hdr_read_stream (ins)
dd47d6a2 1667 total_read += PDTCRYPT_HDR_SIZE
ae3d0f2a
PG
1668 except EndOfFile as exn:
1669 total_read += exn.remainder
dd47d6a2 1670 if total_ct + total_obj * PDTCRYPT_HDR_SIZE != total_read:
15d3eefd
PG
1671 raise PDTDecryptionError ("ciphertext processed (%d B) plus "
1672 "overhead (%d × %d B) does not match "
1673 "the number of bytes read (%d )"
dd47d6a2 1674 % (total_ct, total_obj, PDTCRYPT_HDR_SIZE,
15d3eefd
PG
1675 total_read))
1676 # the single good exit
1677 return total_read, total_obj, total_ct, total_pt
1678 except InvalidHeader as exn:
1679 raise PDTDecryptionError ("invalid header at position %d in %r "
ee6aa239 1680 "(%s)" % (tell (ins), exn, ins))
70ad9458 1681 if PDTCRYPT_VERBOSE is True:
15d3eefd
PG
1682 pretty = hdr_fmt_pretty (hdr)
1683 noise (reduce (lambda a, e: (a + "\n" if a else "") + "PDT:\t· " + e,
1684 pretty.splitlines (), ""))
1685 ctcurrent = ctleft = hdr ["ctsize"]
89e1073c 1686
15d3eefd 1687 decr.next (hdr)
e3abcdf0
PG
1688
1689 total_obj += 1 # used in file counter with split mode
1690
1691 # finalization complete or skipped in case of first object in
1692 # stream; create a new output file if necessary
1693 outfile = nextout (outfile)
15d3eefd 1694
70ad9458 1695 if PDTCRYPT_VERBOSE is True:
15d3eefd 1696 noise ("PDT: %d decrypt obj no. %d, %d B"
47d22679 1697 % (tell (ins), total_obj, ctleft))
15d3eefd
PG
1698
1699 # always allocate a new buffer since python-cryptography doesn’t allow
1700 # passing a bytearray :/
1701 nexpect = min (ctleft, PDTCRYPT_BLOCKSIZE)
70ad9458 1702 if PDTCRYPT_VERBOSE is True:
15d3eefd 1703 noise ("PDT:\t· [%d] %d%% done, read block (%d B of %d B remaining)"
47d22679 1704 % (tell (ins),
15d3eefd
PG
1705 100 - ctleft * 100 / (ctcurrent > 0 and ctcurrent or 1),
1706 nexpect, ctleft))
1707 ct = ins.read (nexpect)
1708 nct = len (ct)
1709 if nct < nexpect:
47d22679 1710 off = tell (ins)
ae3d0f2a
PG
1711 raise EndOfFile (nct,
1712 "hit EOF after %d of %d B in block [%d:%d); "
15d3eefd
PG
1713 "%d B ciphertext remaining for object no %d"
1714 % (nct, nexpect, off, off + nexpect, ctleft,
1715 total_obj))
1716 ctleft -= nct
1717 total_ct += nct
1718 total_read += nct
1719
70ad9458 1720 if PDTCRYPT_VERBOSE is True:
15d3eefd
PG
1721 noise ("PDT:\t· decrypt ciphertext %d B" % (nct))
1722 pt = decr.process (ct)
e3abcdf0 1723 out (pt, outfile)
15d3eefd 1724
d6c15a52 1725
70ad9458 1726def deptdcrypt_mk_stream (kind, path):
d6c15a52 1727 """Create stream from file or stdio descriptor."""
70ad9458 1728 if kind == PDTCRYPT_SINK:
d6c15a52 1729 if path == "-":
70ad9458 1730 if PDTCRYPT_VERBOSE is True: noise ("PDT: sink: stdout")
d6c15a52
PG
1731 return sys.stdout.buffer
1732 else:
70ad9458 1733 if PDTCRYPT_VERBOSE is True: noise ("PDT: sink: file %s" % path)
d6c15a52 1734 return io.FileIO (path, "w")
70ad9458 1735 if kind == PDTCRYPT_SOURCE:
d6c15a52 1736 if path == "-":
70ad9458 1737 if PDTCRYPT_VERBOSE is True: noise ("PDT: source: stdin")
d6c15a52
PG
1738 return sys.stdin.buffer
1739 else:
70ad9458 1740 if PDTCRYPT_VERBOSE is True: noise ("PDT: source: file %s" % path)
d6c15a52
PG
1741 return io.FileIO (path, "r")
1742
1743 raise ValueError ("bogus stream “%s” / %s" % (kind, path))
1744
15d3eefd 1745
a83fa4ed 1746def mode_depdtcrypt (mode, secret, ins, outs):
da82bc58
PG
1747 try:
1748 total_read, total_obj, total_ct, total_pt = \
a83fa4ed 1749 depdtcrypt (mode, secret, ins, outs)
da82bc58
PG
1750 except DecryptionError as exn:
1751 noise ("PDT: Decryption failed:")
1752 noise ("PDT:")
1753 noise ("PDT: “%s”" % exn)
1754 noise ("PDT:")
a83fa4ed 1755 noise ("PDT: Did you specify the correct key / password?")
da82bc58
PG
1756 noise ("")
1757 return 1
1758 except PDTSplitError as exn:
1759 noise ("PDT: Split operation failed:")
1760 noise ("PDT:")
1761 noise ("PDT: “%s”" % exn)
1762 noise ("PDT:")
a83fa4ed 1763 noise ("PDT: Hint: target directory should be empty.")
da82bc58
PG
1764 noise ("")
1765 return 1
1766
1767 if PDTCRYPT_VERBOSE is True:
1768 noise ("PDT: decryption successful" )
1769 noise ("PDT: %.10d bytes read" % total_read)
1770 noise ("PDT: %.10d objects decrypted" % total_obj )
1771 noise ("PDT: %.10d bytes ciphertext" % total_ct )
1772 noise ("PDT: %.10d bytes plaintext" % total_pt )
1773 noise ("" )
1774
1775 return 0
1776
1777
7b3940e5 1778def mode_scrypt (pw, ins=None, nacl=None, fmt=PDTCRYPT_SCRYPT_INTRANATOR):
77058bab 1779 hsh = None
7b3940e5 1780 paramversion = PDTCRYPT_DEFAULT_PVER
77058bab
PG
1781 if ins is not None:
1782 hsh, nacl, version, paramversion = scrypt_hashsource (pw, ins)
1783 defs = ENCRYPTION_PARAMETERS.get(paramversion, None)
1784 else:
1785 nacl = binascii.unhexlify (nacl)
7b3940e5 1786 defs = ENCRYPTION_PARAMETERS.get(paramversion, None)
77058bab
PG
1787 version = PDTCRYPT_DEFAULT_VER
1788
1789 kdfname, params = defs ["kdf"]
1790 if hsh is None:
1791 kdf = kdf_by_version (None, defs)
1792 hsh, _void = kdf (pw, nacl)
da82bc58
PG
1793
1794 import json
7b3940e5
PG
1795
1796 if fmt == PDTCRYPT_SCRYPT_INTRANATOR:
1797 out = json.dumps ({ "salt" : base64.b64encode (nacl).decode ()
1798 , "key" : base64.b64encode (hsh) .decode ()
1799 , "paramversion" : paramversion })
1800 elif fmt == PDTCRYPT_SCRYPT_PARAMETERS:
1801 out = json.dumps ({ "salt" : binascii.hexlify (nacl).decode ()
1802 , "key" : binascii.hexlify (hsh) .decode ()
1803 , "version" : version
1804 , "scrypt_params" : { "N" : params ["N"]
1805 , "r" : params ["r"]
1806 , "p" : params ["p"]
1807 , "dkLen" : params ["dkLen"] } })
1808 else:
1809 raise RuntimeError ("bad scrypt output scheme %r" % fmt)
1810
da82bc58
PG
1811 print (out)
1812
1813
4c62ddc0
PG
1814def noise_output_candidates (cands, indent=8, cols=PDTCRYPT_TT_COLUMNS):
1815 """
1816 Print a list of offsets without garbling the terminal too much.
1817
1818 The indent is counted from column zero; if it is wide enough, the “PDT: ”
1819 marker will be prepended, considered part of the indentation.
1820 """
1821 wd = cols - 1
1822 nc = len (cands)
1823 idt = " " * indent if indent < 5 else "PDT: " + " " * (indent - 5)
1824 line = idt
1825 lpos = indent
1826 sep = ","
1827 lsep = len (sep)
1828 init = True # prevent leading separator
1829
1830 if indent >= wd:
1831 raise ValueError ("the requested indentation exceeds the line "
1832 "width by %d" % (indent - wd))
1833
1834 for n in cands:
1835 ns = "%d" % n
1836 lns = len (ns)
1837 if init is False:
1838 line += sep
1839 lpos += lsep
1840
1841 lpos += lns
1842 if lpos > wd: # line break
1843 noise (line)
1844 line = idt
1845 lpos = indent + lns
1846 elif init is True:
1847 init = False
1848 else: # space
1849 line += ' '
1850 lpos += 1
1851
1852 line += ns
1853
1854 if lpos != indent:
1855 noise (line)
1856
1857
15047fe4
PG
1858SLICE_START = 1 # ordering is important to have starts of intervals
1859SLICE_END = 0 # sorted before equal ends
1860
1861def find_overlaps (slices):
1862 """
1863 Find overlapping slices: iterate open/close points of intervals, tracking
1864 the ones open at any time.
1865 """
1866 bounds = []
1867 inside = set () # of indices into bounds
1868 ovrlp = set () # of indices into bounds
1869
1870 for i, s in enumerate (slices):
1871 bounds.append ((s [0], SLICE_START, i))
1872 bounds.append ((s [1], SLICE_END , i))
1873 bounds = sorted (bounds)
1874
1875 for val in bounds:
1876 i = val [2]
1877 if val [1] == SLICE_START:
1878 inside.add (i)
1879 else:
1880 if len (inside) > 1: # closing one that overlapped
1881 ovrlp |= inside
1882 inside.remove (i)
1883
1884 return [ slices [i] for i in ovrlp ]
1885
1886
a808459e 1887def mode_scan (secret, fname, outs=None, nacl=None):
f41973a6
PG
1888 """
1889 Dissect a binary file, looking for PDTCRYPT headers and objects.
a808459e
PG
1890
1891 If *outs* is supplied, recoverable data will be dumped into the specified
1892 directory.
f41973a6
PG
1893 """
1894 try:
a808459e 1895 ifd = os.open (fname, os.O_RDONLY)
f41973a6
PG
1896 except FileNotFoundError:
1897 noise ("PDT: failed to open %s readonly" % fname)
1898 noise ("")
1899 usage (err=True)
1900
1901 try:
1902 if PDTCRYPT_VERBOSE is True:
1903 noise ("PDT: scan for potential sync points")
a808459e 1904 cands = locate_hdr_candidates (ifd)
f41973a6
PG
1905 if len (cands) == 0:
1906 noise ("PDT: scan complete: input does not contain potential PDT "
1907 "headers; giving up.")
1908 return -1
1909 if PDTCRYPT_VERBOSE is True:
4c62ddc0
PG
1910 noise ("PDT: scan complete: found %d candidates:" % len (cands))
1911 noise_output_candidates (cands)
6c8073ab 1912 except:
a808459e 1913 os.close (ifd)
6c8073ab 1914 raise
f41973a6 1915
15047fe4 1916 junk, todo, slices = [], [], []
6c8073ab 1917 try:
a808459e 1918 nobj = 0
6c8073ab 1919 for cand in cands:
a808459e
PG
1920 nobj += 1
1921 vdt, hdr = inspect_hdr (ifd, cand)
15047fe4 1922
5ed4c57d
PG
1923 vdts = verdict_fmt (vdt)
1924
6c8073ab 1925 if vdt == HDR_CAND_JUNK:
5ed4c57d 1926 noise ("PDT: obj %d: %s object: bad header, skipping" % vdts)
6c8073ab
PG
1927 junk.append (cand)
1928 else:
1929 off0 = cand + PDTCRYPT_HDR_SIZE
1930 if PDTCRYPT_VERBOSE is True:
a808459e 1931 noise ("PDT: obj %d: read payload @%d" % (nobj, off0))
70a33834
PG
1932 pretty = hdr_fmt_pretty (hdr)
1933 noise (reduce (lambda a, e: (a + "\n" if a else "") + "PDT:\t· " + e,
1934 pretty.splitlines (), ""))
6c8073ab 1935
a808459e
PG
1936 ofd = -1
1937 if outs is not None:
1938 ofname = PDTCRYPT_RESCUENAME % nobj
1939 ofd = open2_dump_file (ofname, outs, force=PDTCRYPT_OVERWRITE)
1940
15047fe4 1941 ctsize = hdr ["ctsize"]
a808459e 1942 try:
15047fe4
PG
1943 l = try_decrypt (ifd, off0, hdr, secret, ofd=ofd)
1944 ok = l == ctsize
1945 slices.append ((off0, off0 + l))
a808459e
PG
1946 finally:
1947 if ofd != -1:
1948 os.close (ofd)
70a33834 1949 if vdt == HDR_CAND_GOOD and ok is True:
5ed4c57d
PG
1950 noise ("PDT: %d → ✓ %s object %d–%d"
1951 % (cand, vdts, off0, off0 + ctsize))
70a33834 1952 elif vdt == HDR_CAND_FISHY and ok is True:
5ed4c57d
PG
1953 noise ("PDT: %d → × %s object %d–%d, corrupt header"
1954 % (cand, vdts, off0, off0 + ctsize))
70a33834 1955 elif vdt == HDR_CAND_GOOD and ok is False:
5ed4c57d
PG
1956 noise ("PDT: %d → × %s object %d–%d, problematic payload"
1957 % (cand, vdts, off0, off0 + ctsize))
70a33834 1958 elif vdt == HDR_CAND_FISHY and ok is False:
5ed4c57d
PG
1959 noise ("PDT: %d → × %s object %d–%d, corrupt header, problematic "
1960 "ciphertext" % (cand, vdts, off0, off0 + ctsize))
6c8073ab
PG
1961 else:
1962 raise Unreachable
1963 finally:
a808459e 1964 os.close (ifd)
7b3940e5 1965
70a33834
PG
1966 if len (junk) == 0:
1967 noise ("PDT: all headers ok")
1968 else:
1969 noise ("PDT: %d candidates not parseable as headers:" % len (junk))
1970 noise_output_candidates (junk)
1971
15047fe4
PG
1972 overlap = find_overlaps (slices)
1973 if len (overlap) > 0:
1974 noise ("PDT: %d objects overlapping others" % len (overlap))
1975 for slice in overlap:
1976 noise ("PDT: × %d→%d" % (slice [0], slice [1]))
1977
70ad9458
PG
1978def usage (err=False):
1979 out = print
1980 if err is True:
1981 out = noise
5afcb45d 1982 indent = ' ' * len (SELF)
da82bc58 1983 out ("usage: %s SUBCOMMAND { --help" % SELF)
5afcb45d 1984 out (" %s | [ -v ] { -p PASSWORD | -k KEY }" % indent)
77058bab
PG
1985 out (" %s [ { -i | --in } { - | SOURCE } ]" % indent)
1986 out (" %s [ { -n | --nacl } { SALT } ]" % indent)
1987 out (" %s [ { -o | --out } { - | DESTINATION } ]" % indent)
1988 out (" %s [ -D | --no-decrypt ] [ -S | --split ]" % indent)
7b3940e5 1989 out (" %s [ -f | --format ]" % indent)
70ad9458
PG
1990 out ("")
1991 out ("\twhere")
da82bc58
PG
1992 out ("\t\tSUBCOMMAND main mode: { process | scrypt }")
1993 out ("\t\t where:")
1994 out ("\t\t process: extract objects from PDT archive")
1995 out ("\t\t scrypt: calculate hash from password and first object")
a83fa4ed
PG
1996 out ("\t\t-p PASSWORD password to derive the encryption key from")
1997 out ("\t\t-k KEY encryption key as 16 bytes in hexadecimal notation")
e3abcdf0 1998 out ("\t\t-s enforce strict handling of initialization vectors")
70ad9458
PG
1999 out ("\t\t-i SOURCE file name to read from")
2000 out ("\t\t-o DESTINATION file to write output to")
77058bab 2001 out ("\t\t-n SALT provide salt for scrypt mode in hex encoding")
70ad9458 2002 out ("\t\t-v print extra info")
e3abcdf0
PG
2003 out ("\t\t-S split into files at object boundaries; this")
2004 out ("\t\t requires DESTINATION to refer to directory")
2005 out ("\t\t-D PDT header and ciphertext passthrough")
7b3940e5 2006 out ("\t\t-f format of SCRYPT hash output (“default” or “parameters”)")
70ad9458
PG
2007 out ("")
2008 out ("\tinstead of filenames, “-” may used to specify stdin / stdout")
2009 out ("")
2010 sys.exit ((err is True) and 42 or 0)
2011
2012
a83fa4ed
PG
2013def bail (msg):
2014 noise (msg)
2015 noise ("")
2016 usage (err=True)
2017 raise Unreachable
2018
2019
70ad9458 2020def parse_argv (argv):
6690f5e0 2021 global PDTCRYPT_OVERWRITE
70ad9458 2022 global SELF
7b3940e5
PG
2023 mode = PDTCRYPT_DECRYPT
2024 secret = None
2025 insspec = None
2026 outsspec = None
a808459e 2027 outs = None
7b3940e5 2028 nacl = None
4f6405d6 2029 scrypt_format = PDTCRYPT_SCRYPT_DEFAULT
70ad9458
PG
2030
2031 argvi = iter (argv)
2032 SELF = os.path.basename (next (argvi))
2033
da82bc58
PG
2034 try:
2035 rawsubcmd = next (argvi)
2036 subcommand = PDTCRYPT_SUB [rawsubcmd]
2037 except StopIteration:
a83fa4ed 2038 bail ("ERROR: subcommand required")
da82bc58 2039 except KeyError:
a83fa4ed 2040 bail ("ERROR: invalid subcommand “%s” specified" % rawsubcmd)
da82bc58 2041
59d74e2b
PG
2042 def checked_arg ():
2043 nonlocal argvi
2044 try:
2045 return next (argvi)
2046 except StopIteration:
2047 bail ("ERROR: argument list incomplete")
2048
addcec42 2049 def checked_secret (s):
a83fa4ed
PG
2050 nonlocal secret
2051 if secret is None:
addcec42 2052 secret = s
da82bc58 2053 else:
a83fa4ed 2054 bail ("ERROR: encountered “%s” but secret already given" % arg)
da82bc58 2055
70ad9458
PG
2056 for arg in argvi:
2057 if arg in [ "-h", "--help" ]:
2058 usage ()
2059 raise Unreachable
2060 elif arg in [ "-v", "--verbose", "--wtf" ]:
2061 global PDTCRYPT_VERBOSE
2062 PDTCRYPT_VERBOSE = True
2063 elif arg in [ "-i", "--in", "--source" ]:
59d74e2b 2064 insspec = checked_arg ()
70ad9458 2065 if PDTCRYPT_VERBOSE is True: noise ("PDT: decrypt from %s" % insspec)
a83fa4ed 2066 elif arg in [ "-p", "--password" ]:
59d74e2b 2067 arg = checked_arg ()
addcec42 2068 checked_secret (make_secret (password=arg))
a83fa4ed 2069 if PDTCRYPT_VERBOSE is True: noise ("PDT: decrypting with password")
70ad9458 2070 else:
da82bc58
PG
2071 if subcommand == PDTCRYPT_SUB_PROCESS:
2072 if arg in [ "-s", "--strict-ivs" ]:
2073 global PDTCRYPT_STRICTIVS
2074 PDTCRYPT_STRICTIVS = True
77058bab
PG
2075 elif arg in [ "-o", "--out", "--dest", "--sink" ]:
2076 outsspec = checked_arg ()
2077 if PDTCRYPT_VERBOSE is True: noise ("PDT: decrypt to %s" % outsspec)
da82bc58 2078 elif arg in [ "-f", "--force" ]:
da82bc58
PG
2079 PDTCRYPT_OVERWRITE = True
2080 if PDTCRYPT_VERBOSE is True: noise ("PDT: overwrite existing files")
2081 elif arg in [ "-S", "--split" ]:
2082 mode |= PDTCRYPT_SPLIT
2083 if PDTCRYPT_VERBOSE is True: noise ("PDT: split files")
2084 elif arg in [ "-D", "--no-decrypt" ]:
2085 mode &= ~PDTCRYPT_DECRYPT
2086 if PDTCRYPT_VERBOSE is True: noise ("PDT: not decrypting")
a83fa4ed 2087 elif arg in [ "-k", "--key" ]:
59d74e2b 2088 arg = checked_arg ()
addcec42 2089 checked_secret (make_secret (key=arg))
a83fa4ed 2090 if PDTCRYPT_VERBOSE is True: noise ("PDT: decrypting with key")
da82bc58 2091 else:
a83fa4ed 2092 bail ("ERROR: unexpected positional argument “%s”" % arg)
da82bc58 2093 elif subcommand == PDTCRYPT_SUB_SCRYPT:
77058bab
PG
2094 if arg in [ "-n", "--nacl", "--salt" ]:
2095 nacl = checked_arg ()
2096 if PDTCRYPT_VERBOSE is True: noise ("PDT: salt key with %s" % nacl)
7b3940e5
PG
2097 elif arg in [ "-f", "--format" ]:
2098 arg = checked_arg ()
2099 try:
2100 scrypt_format = PDTCRYPT_SCRYPT_FORMAT [arg]
2101 except KeyError:
2102 bail ("ERROR: invalid scrypt output format %s" % arg)
2103 if PDTCRYPT_VERBOSE is True:
2104 noise ("PDT: scrypt output format “%s”" % scrypt_format)
77058bab
PG
2105 else:
2106 bail ("ERROR: unexpected positional argument “%s”" % arg)
f41973a6 2107 elif subcommand == PDTCRYPT_SUB_SCAN:
a808459e
PG
2108 if arg in [ "-o", "--out", "--dest", "--sink" ]:
2109 outsspec = checked_arg ()
2110 if PDTCRYPT_VERBOSE is True: noise ("PDT: decrypt to %s" % outsspec)
2111 elif arg in [ "-f", "--force" ]:
a808459e
PG
2112 PDTCRYPT_OVERWRITE = True
2113 if PDTCRYPT_VERBOSE is True: noise ("PDT: overwrite existing files")
2114 else:
2115 bail ("ERROR: unexpected positional argument “%s”" % arg)
70ad9458 2116
a83fa4ed 2117 if secret is None:
ecb9676d 2118 if PDTCRYPT_VERBOSE is True:
a83fa4ed 2119 noise ("ERROR: no password or key specified, trying $PDTCRYPT_PASSWORD")
ecb9676d
PG
2120 epw = os.getenv ("PDTCRYPT_PASSWORD")
2121 if epw is not None:
addcec42 2122 checked_secret (make_secret (password=epw.strip ()))
a83fa4ed
PG
2123
2124 if secret is None:
2125 if PDTCRYPT_VERBOSE is True:
2126 noise ("ERROR: no password or key specified, trying $PDTCRYPT_KEY")
2127 ek = os.getenv ("PDTCRYPT_KEY")
2128 if ek is not None:
addcec42 2129 checked_secret (make_secret (key=ek.strip ()))
ecb9676d 2130
a83fa4ed 2131 if secret is None:
da82bc58 2132 if subcommand == PDTCRYPT_SUB_SCRYPT:
a83fa4ed 2133 bail ("ERROR: scrypt hash mode requested but no password given")
da82bc58 2134 elif mode & PDTCRYPT_DECRYPT:
6257d5b3 2135 bail ("ERROR: decryption requested but no password given")
a83fa4ed 2136
a808459e
PG
2137 if mode & PDTCRYPT_SPLIT and outsspec is None:
2138 bail ("ERROR: split mode is incompatible with stdout sink "
2139 "(the default)")
2140
2141 if subcommand == PDTCRYPT_SUB_SCAN and outsspec is None:
2142 pass # no output by default in scan mode
2143 elif mode & PDTCRYPT_SPLIT or subcommand == PDTCRYPT_SUB_SCAN:
2144 # destination must be directory
2145 if outsspec == "-":
2146 bail ("ERROR: mode is incompatible with stdout sink")
2147 try:
2148 try:
2149 os.makedirs (outsspec, 0o700)
2150 except FileExistsError:
2151 # if it’s a directory with appropriate perms, everything is
2152 # good; otherwise, below invocation of open(2) will fail
2153 pass
2154 outs = os.open (outsspec, os.O_DIRECTORY, 0o600)
2155 except FileNotFoundError as exn:
2156 bail ("ERROR: cannot create target directory “%s”" % outsspec)
2157 except NotADirectoryError as exn:
2158 bail ("ERROR: target path “%s” is not a directory" % outsspec)
2159 else:
2160 outs = deptdcrypt_mk_stream (PDTCRYPT_SINK, outsspec or "-")
2161
f41973a6
PG
2162 if subcommand == PDTCRYPT_SUB_SCAN:
2163 if insspec is None:
2164 bail ("ERROR: please supply an input file for scanning")
2165 if insspec == '-':
2166 bail ("ERROR: input must be seekable; please specify a file")
a808459e 2167 return True, partial (mode_scan, secret, insspec, outs, nacl=nacl)
f41973a6 2168
77058bab
PG
2169 if subcommand == PDTCRYPT_SUB_SCRYPT:
2170 if secret [0] == PDTCRYPT_SECRET_KEY:
2171 bail ("ERROR: scrypt mode requires a password")
2172 if insspec is not None and nacl is not None \
2173 or insspec is None and nacl is None :
2174 bail ("ERROR: please supply either an input file or "
2175 "the salt")
70ad9458
PG
2176
2177 # default to stdout
77058bab
PG
2178 ins = None
2179 if insspec is not None or subcommand != PDTCRYPT_SUB_SCRYPT:
2180 ins = deptdcrypt_mk_stream (PDTCRYPT_SOURCE, insspec or "-")
da82bc58
PG
2181
2182 if subcommand == PDTCRYPT_SUB_SCRYPT:
7b3940e5
PG
2183 return True, partial (mode_scrypt, secret [1].encode (), ins, nacl,
2184 fmt=scrypt_format)
da82bc58 2185
a83fa4ed 2186 return True, partial (mode_depdtcrypt, mode, secret, ins, outs)
15d3eefd
PG
2187
2188
00b3cd10 2189def main (argv):
da82bc58 2190 ok, runner = parse_argv (argv)
f08c604b 2191
da82bc58 2192 if ok is True: return runner ()
15d3eefd 2193
da82bc58 2194 return 1
f08c604b 2195
00b3cd10
PG
2196
2197if __name__ == "__main__":
2198 sys.exit (main (sys.argv))
2199