tkt028 · August 6, 2022 15:47
diff --git a/arq_data_format.txt b/arq_data_format.txt
 Arq stores backup data in a format similar to that of the open-source version
 control system 'git'.  

 Content-Addressable Storage
 ---------------------------
 At the most basic level, Arq stores "blobs" using the SHA1 hash of the
 contents as the name, much like git. Because of this, each unique blob is only
 stored once. If 2 files on your system have the same contents, only 1 copy of
 the contents will be stored. If the contents of a file change, the SHA1 hash is
 different and the file is stored as a different blob.

 Files are blobs, and commits and trees are blobs as well.

 (It's not quite that simple actually. To make the names less susceptible to
 lookup tables, Arq actually calculates the SHA1 hash of the computerUUID
 concatenated with the blob's data. But we'll use "SHA1" as shorthand throughout
 this document for this SHA1-derived identifier.)


 "Computer UUID"
 ---------------

 When you first run Arq and add a target ("destination"), it creates a
 "universally unique identifier" (UUID) for your computer (referred to below as
 the "computerUUID"). All backup objects are stored with that as a prefix.


 Encryption Dat File
 -------------------

 The first time you add a folder to Arq for backing up, it prompts you to choose
 an encryption password.  Arq creates 2 randomly-generated encryption keys.  The
 first key is used for encrypting/decrypting; the second key is used for
 creating HMACs.

 Arq stores those keys, encrypted with the encryption password you chose, in a
 file called /<computerUUID>/encryptionv2.dat. You can change your encryption
 password at any time by decrypting this file with the old encryption password
 and then re-encrypting it with your new encryption password.

 The encryptionv2.dat file format is:

 header                      45 4e 43 52 ENCR
                            59 50 54 49 YPTI
                            4f 4e 56 32 ONV2
 salt                        xx xx xx xx
                            xx xx xx xx
 HMACSHA256                  xx xx xx xx
                            xx xx xx xx
                            xx xx xx xx
                            xx xx xx xx
                            xx xx xx xx
                            xx xx xx xx
                            xx xx xx xx
                            xx xx xx xx
 IV                          xx xx xx xx
                            xx xx xx xx
                            xx xx xx xx
                            xx xx xx xx
 encrypted master keys       xx xx xx xx
                            ...


 To create the encryptionv2.dat file:
 1. Generate a random salt.
 2. Generate a random IV.
 3. Generate 2 random 32-byte "master keys" (64 bytes total).
 4. Derive 64-byte encryption key from user-supplied encryption password using PBKDF2/HMACSHA1 (200000 rounds) and the salt from step 1.
 5. Encrypt the master keys with AES256-CBC using the first 32 bytes of the derived key from step 4 and IV from step 2.
 6. Calculate the HMAC-SHA256 of (IV + encrypted master keys) using the second 32 bytes of the derived key from step 4.
 7. Concatenate the items as described in the file format shown above.

 To get the 2 "master keys":
 1. Copy salt from the 8 bytes after the header.
 2. Derive 64-byte encryption key from user-supplied encryption password using PBKDF2/HMACSHA1 (200000 rounds) and the salt from step 1.
 3. Calculate HMAC-SHA256 of (IV + encrypted master keys) using second 32 bytes of key from step 2, and verify against HMAC-SHA256 in the file.
 4. Decrypt the ciphertext using the first 32 bytes of the derived key from step 2 to get 2 32-byte "master keys".

 Note: We use HMACSHA1 as the PRF with PBKDF2 because that's the only one available on Windows (in .NET).


 EncryptedObject
 ---------------

 We use the term "EncryptedObject" throughout this document as shorthand to
 describe an object containing data in the following format:

 header                              41 52 51 4f  ARQO
 HMACSHA256                          xx xx xx xx
                                    xx xx xx xx
                                    xx xx xx xx
                                    xx xx xx xx
                                    xx xx xx xx
                                    xx xx xx xx
                                    xx xx xx xx
                                    xx xx xx xx
 master IV                           xx xx xx xx
                                    xx xx xx xx
                                    xx xx xx xx
                                    xx xx xx xx
 encrypted data IV + session key     xx xx xx xx 
                                    ...
 ciphertext                          xx xx xx xx
                                    ...

 To create an EncryptedObject:
 1. Generate a random session key (Arq reuses it for up to 256 objects before replacing it).
 2. Generate a random "data IV".
 3. Encrypt plaintext with session key and data IV.
 4. Generate a random "master IV".
 5. Encrypt (data IV + session key) with AES256-CBC using the first "master key" from the Encryption Dat File and the "master IV".
 4. Calculate HMAC-SHA256 of (master IV + "encrypted data IV + session key" + ciphertext) using the second 32-byte "master key".
 7. Assemble the data in the format shown above.

 To get the plaintext:
 1. Calculate HMAC-SHA256 of (master IV + "encrypted data IV + session key" + ciphertext) and verify against HMAC-SHA256 in the file using the second "master key" from the Encryption Dat File.
 2. Decrypt "encrypted data IV + session key" using the first "master key" from the Encryption Dat File and the "master IV".
 2. Decrypt the ciphertext using the session key and data IV.



 Folder Configuration Files
 --------------------------

 Each time you add a folder for backup, Arq creates a UUID for it and stores 2
 objects at the target:

 object: /<computer_uuid>/buckets/<folder_uuid>

    This file contains a "plist"-format XML document containing:
        1. the 9-byte header "encrypted"
        2. an EncryptedObject containing a plist like this:

        <plist version="1.0">
            <dict>
                <key>AWSRegionName</key>
                <string>us-east-1</string>
                <key>BucketUUID</key>
                <string>408E376B-ECF7-4688-902A-1E7671BC5B9A</string>
                <key>BucketName</key>
                <string>company</string>
                <key>ComputerUUID</key>
                <string>600150F6-70BB-47C6-A538-6F3A2258D524</string>
                <key>LocalPath</key>
                <string>/Users/stefan/src/company</string>
                <key>LocalMountPoint</key>
                </string>/</string>
                <key>StorageType</key>
                <integer>1</integer>
                <key>VaultName</key>
                <string>arq_408E376B-ECF7-4688-902A-1E7671BC5B9A</string>
                <key>VaultCreatedTime</key>
                <real>12345678.0</real>
                <key>Excludes</key>
                <dict>
                    <key>Enabled</key>
                    <false></false>
                    <key>MatchAny</key>
                    <true></true>
                    <key>Conditions</key>
                    <array></array>
                </dict>
            </dict>
        </plist>

    Only Glacier-backed folders have "VaultName" and "VaultCreatedTime" keys.

    NOTE: The folder's UUID and name are called "BucketUUID" and "BucketName"
    in the plist; this is a holdover from previous iterations of Arq and is not
    to be confused with S3's "bucket" concept.



 Commits, Trees and Blobs
 ------------------------

 When Arq backs up a folder, it creates 3 types of objects: "commits", "trees"
 and "blobs". 

 Each backup that you see in Arq corresponds to a "commit" object in the backup
 data.  Its name is the SHA1 of its contents. The commit contains the SHA1 of a
 "tree" object in the backup data. This tree corresponds to the folder you're
 backing up.  

 Each tree contains "nodes"; each node has either the SHA1 of another tree, or
 the SHA1 of a file (or multiple SHA1s, see "Tree format" below).

 All commits, trees and blobs are stored as EncryptedObjects (see
 "EncryptedObject" above).


 Commit Format
 -------------

 A "commit" contains the following bytes (see "Data Format Documentation" below
 for explanation of [String], [UInt32], [Date], etc):

    43 6f 6d 6d 69 74 56 30 31 31      "CommitV011"
    [String:"<author>"]
    [String:"<comment>"]
    [UInt64:num_parent_commits]        (this is always 0 or 1)
    (
        [String:parent_commit_sha1] /* can't be null */
        [Bool:parent_commit_encryption_key_stretched]] /* present for Commit version >= 4 */
    )   /* repeat num_parent_commits times */
    [String:tree_sha1]] /* can't be null */
    [Bool:tree_encryption_key_stretched]] /* present for Commit version >= 4 */
    [Bool:tree_is_compressed] /* present for Commit version 8 and 9 only; indicates Gzip compression or none */
    [CompressionType:tree_compression_type] /* present for Commit version >= 10 */

    [String:"file://<hostname><path_to_folder>"]
    [String:"<merge_common_ancestor_sha1>"] /* only present for Commit version 7 or *older* (was never used) */
    [Bool:is_merge_common_ancestor_encryption_key_stretched] /* only present for Commit version 4 to 7 */
    [Date:creation_date]
    [UInt64:num_failed_files] /* only present for Commit version 3 or later */
    (
        [String:"<relative_path>"] /* only present for Commit version 3 or later */
        [String:"<error_message>"] /* only present for Commit version 3 or later */
    )   /* repeat num_failed_files times */
    [Bool:has_missing_nodes] /* only present for Commit version 8 or later */
    [Bool:is_complete] /* only present for Commit version 9 or later */
    [Data:config_plist_xml] /* a copy of the XML file as described above */


    
 Tree Format
 -----------

 A tree contains the following bytes:

    54 72 65 65 56 30 31 36             "Treev019"
    [Bool:xattrs_are_compressed] /* present for Tree versions 12-18 */
    [CompressionType:xattrs_compression_type] /* present for Tree version >= 19; indicates Gzip compression or none */
    [Bool:acl_is_compressed] /* present for Tree versions 12-18 */
    [CompressionType:acl_compression_type] /* present for Tree version >= 19; indicates Gzip compression or none */
    [BlobKey:xattrs_blob_key]
    [UInt64:xattrs_size]
    [BlobKey:acl_blob_key]
    [Int32:uid]
    [Int32:gid]
    [Int32:mode]
    [Int64:mtime_sec]
    [Int64:mtime_nsec]
    [Int64:flags]
    [Int32:finderFlags]
    [Int32:extendedFinderFlags]
    [Int32:st_dev]
    [Int32:st_ino]
    [UInt32:st_nlink]
    [Int32:st_rdev]
    [Int64:ctime_sec]
    [Int64:ctime_nsec]
    [Int64:st_blocks]
    [UInt32:st_blksize]
    [UInt64:aggregate_size_on_disk] /* only present for Tree version 11 to 16 (never used) */
    [Int64:create_time_sec] /* only present for Tree version 15 or later */
    [Int64:create_time_nsec] /* only present for Tree version 15 or later */
    [UInt32:missing_node_count] /* only present for Tree version 18 or later */
    (
        [String:"<missing_node_name>"] /* only present for Tree version 18 or later */
    )   /* repeat <missing_node_count> times */
    [UInt32:node_count] 
    (
        [String:"<file name>"] /* can't be null */
        [Node]
    )   /* repeat <node_count> times */


 Each [Node] contains the following bytes:

    [Bool:isTree]
    [Bool:data_are_compressed] /* present for Tree versions 12-18 */
    [CompressionType:data_compression_type] /* present for Tree version >= 19; indicates Gzip compression or none */
    [Bool:xattrs_are_compressed] /* present for Tree versions 12-18 */
    [CompressionType:xattrs_compression_type] /* present for Tree version >= 19; indicates Gzip compression or none */
    [Bool:acl_is_compressed] /* present for Tree versions 12-18 */
    [CompressionType:acl_compression_type] /* present for Tree version >= 19; indicates Gzip compression or none */
    [Int32:data_blob_keys_count] 
    (
        [BlobKey:data_blob_key]
    )   /* repeat <data_blob_keys_count> times */
    [UIn64:data_size]
    [String:"<thumbnail sha1>"] /* only present for Tree version 18 or earlier (never used) */
    [Bool:is_thumbnail_encryption_key_stretched] /* only present for Tree version 14 to 18 */
    [String:"<preview sha1>"] /* only present for Tree version 18 or earlier (never used) */
    [Bool:is_preview_encryption_key_stretched] /* only present for Tree version 14 to 18 */
    [BlobKey:xattrs_blob_key]
    [UInt64:xattrs_size]
    [BlobKey:acl_blob_key]
    [Int32:uid]
    [Int32:gid]
    [Int32:mode]
    [Int64:mtime_sec]
    [Int64:mtime_nsec]
    [Int64:flags]
    [Int32:finderFlags]
    [Int32:extendedFinderFlags]
    [String:"<finder file type>"]
    [String:"<finder file creator>"]
    [Bool:is_file_extension_hidden]
    [Int32:st_dev]
    [Int32:st_ino]
    [UInt32:st_nlink]
    [Int32:st_rdev]
    [Int64:ctime_sec]
    [Int64:ctime_nsec]
    [Int64:create_time_sec]
    [Int64:create_time_nsec]
    [Int64:st_blocks]
    [UInt32:st_blksize]

 Notes:

 - A Node can have multiple data SHA1s if the file is very large. Arq breaks up
  large files into multiple blobs using a rolling checksum algorithm. This way
  Arq only backs up the parts of a file that have changed.
 - "<xattrs_blob_key>" is the key of a blob containing the sorted extended
  attributes of the file (see "XAttrSet Format" below). Note this means
  extended-attribute sets are "de-duplicated".
 - "<acl_blob_key>" is the SHA1 of the blob containing the result of acl_to_text()
  on the file's ACL. Note this means the ACLs are "de-duplicated".
 - "create_time_sec" and "create_time_nsec" contain the value of the
  ATTR_CMN_CRTIME attribute of the file


 XAttrSet Format
 ---------------

 Each XAttrSet blob contains the following bytes:

    58 41 74 74 72 53 65 74  56 30 30 32    "XAttrSetV002"
    [UInt64:xattr_count]
    (
        [String:"<xattr name>"] /* can't be null */
        [Data:xattr_data]
    )


 More on Object Storage
 ----------------------

 In general, each blob is stored as an object with a path of the form:

    /<computer_uuid>/objects/<sha1>

 But for small files, the overhead associated with putting and getting the
 objects to/from the storage destination makes backing them up very inefficient.

 So, small files (files under 64KB in length) are stored in "packs", which are
 explained below.


 Packs
 -----

 Each folder configured for backup maintains 2 "packsets", one for trees and
 commits, and one for all other small files. The packsets are named:

    <folder_uuid>-trees
    <folder_uuid>-blobs

 Small files are separated into 2 packsets because the trees and commits are
 cached locally (so that Arq gives reasonable performance for browsing backups);
 all other small blobs don't need to be cached.

 A packset is a set of "packs". When Arq is backing up a folder, it combines
 small files into a single larger packfile; when the packfile reaches 10MB, it
 is stored at the destination. Also, when Arq finishes backing up a folder it
 stores its unsaved packfiles no matter their sizes.

 When storing a pack, Arq stores the packfile as:

    /<computer_uuid>/packsets/<folder_uuid>-(blobs|trees)/<sha1>.pack

 It also stores an index of the SHA1s contained in the pack as:

    /<computer_uuid>/packsets/<folder_uuid>-(blobs|trees)/<sha1>.index


 Pack Index Format
 -----------------

 magic number                ff 74 4f 63  
 version (2)                 00 00 00 02 network-byte-order
 fanout[0]                   00 00 00 02 (4-byte count of SHA1s starting with 0x00)
 ...
 fanout[255]                 00 00 f0 f2 (4-byte count of total objects == count of SHA1s starting with 0xff or smaller)
 object[0]                   00 00 00 00 (8-byte network-byte-order offset)
                            00 00 00 00
                            00 00 00 00 (8-byte network-byte-order data length)
                            00 00 00 00
                            00 xx xx xx (sha1 starting with 00)
                            xx xx xx xx
                            xx xx xx xx
                            xx xx xx xx
                            xx xx xx xx
                            00 00 00 00 (4 bytes for alignment)
 object[1]                   00 00 00 00 (8-byte network-byte-order offset)
                            00 00 00 00
                            00 00 00 00 (8-byte network-byte-order data length)
                            00 00 00 00
                            00 xx xx xx (sha1 starting with 00)
                            xx xx xx xx
                            xx xx xx xx
                            xx xx xx xx
                            xx xx xx xx
                            00 00 00 00 (4 bytes for alignment)
 object[2]                   00 00 00 00 (8-byte network-byte-order offset)
                            00 00 00 00
                            00 00 00 00 (8-byte network-byte-order data length)
                            00 00 00 00
                            00 xx xx xx (sha1 starting with 00)
                            xx xx xx xx
                            xx xx xx xx
                            xx xx xx xx
                            xx xx xx xx
                            00 00 00 00 (4 bytes for alignment)
 ...
 object[f0f1]                00 00 00 00 (8-byte network-byte-order offset)
                            00 00 00 00
                            00 00 00 00 (8-byte network-byte-order data length)
                            00 00 00 00
                            ff xx xx xx (sha1 starting with ff)
                            xx xx xx xx
                            xx xx xx xx
                            xx xx xx xx
                            xx xx xx xx
                            00 00 00 00 (4 bytes for alignment)
 Glacier archiveId not null  01          (1 byte)                                    /* Glacier only */
 Glacier archiveId strlen    00 00 00 00 (network-byte-order 8 bytes)                /* Glacier only */
                            00 00 00 08                                             /* Glacier only */
 Glacier archiveId string    xx xx xx xx (n bytes)                                   /* Glacier only */
                            xx xx xx xx                                             /* Glacier only */
 Glacier pack size           00 00 00 00 (8-byte network-byte-order data length)     /* Glacier only */
                            00 00 00 00                                             /* Glacier only */
 20-byte SHA1 of all of the  xx xx xx xx
 above                       xx xx xx xx
                            xx xx xx xx
                            xx xx xx xx
                            xx xx xx xx


 Pack File Format
 ----------------

 signature                   50 41 43 4b ("PACK")
 version (2)                 00 00 00 02 (network-byte-order 4 bytes)
 object count                00 00 00 00 (network-byte-order 8 bytes)
 object count                00 00 f0 f2
 object[0] mimetype not null 01          (1 byte) (this is usually zero)
 object[0] mimetype strlen   00 00 00 00 (network-byte-order 8 bytes) (this isn't here if not-null is zero)
                            00 00 00 08
 object[0] mimetype string   xx xx xx xx (n bytes)
                            xx xx xx xx
 object[0] name not null     01          (1 byte) (this is usually zero)
 object[0] name strlen       00 00 00 00 (network-byte-order 8 bytes) (this isn't here if not-null is zero)
                            00 00 00 08
 object[0] name string       xx xx xx xx (n bytes)
                            xx xx xx xx
 object[0] data length       00 00 00 00 (network-byte-order 8 bytes)
                            00 00 00 06
 object[0] data              xx xx xx xx (n bytes)
                            xx xx
 ...
 object[f0f2] mimetype not null 01       (1 byte) (this is usually zero)
 object[f0f2] mimetype len   00 00 00 00 (network-byte-order 8 bytes) (this isn't here if not-null is zero)
                            00 00 00 08
 object[f0f2] mimetype str   xx xx xx xx (n bytes)
                            xx xx xx xx
 object[f0f2] name not null  01          (1 byte) (this is usually zero)
 object[f0f2] name strlen    00 00 00 00 (network-byte-order 8 bytes) (this isn't here if not-null is zero)
                            00 00 00 08
 object[f0f2] name string    xx xx xx xx (n bytes)
                            xx xx xx xx
 object[f0f2] data length    00 00 00 00 (network-byte-order 8 bytes)
                            00 00 00 04
 object[f0f2] data           12 34 12 34
 20-byte SHA1 of all of the  xx xx xx xx
 above                       xx xx xx xx
                            xx xx xx xx
                            xx xx xx xx
                            xx xx xx xx



 Data Format Documentation Conventions
 -------------------------------------

 We used a few shortcuts in some of the data format explanations above:

 [BlobKey:value]

    A [BlobKey] is stored as:

        [String:sha1] /* can't be null */
        [Bool:is_encryption_key_stretched] /* only present for Tree version 14 or later, Commit version 4 or later */
        [UInt32:storage_type] /* 1==S3, 2==Glacier; only present for Tree version 17 or later */
        [String:archive_id] /* only present for Tree version 17 or later, if storage_type==2 */
        [UInt64:archive_size] /* only present for Tree version 17 or later, if storage_type==2 */
        [Date:archive_upload_date] /* only present for Tree version 17 or later, if storage_type==2 */


 [Bool:value]

    A [Bool] is stored as 1 byte, either 00 or 01.

 [String:"<string>"]

    A [String] is stored as:

        00 or 01    isNotNull flag

        if not null:

            00 00 00 00    8-byte network-byte-order length
            00 00 00 0c
            xx xx xx xx    UTF-8 string data
            xx xx xx xx    
            xx xx xx xx    
        
 [UInt32:<the_number>]

    A [UInt32] is stored as:

            00 00 00 00     network-byte-order uint32_t

 [Int32:<the_number>]

    An [Int32] is stored as:

            00 00 00 00     network-byte-order int32_t

 [UInt64:<the_number>]

    A [UInt64] is stored as:

            00 00 00 00     network-byte-order uint64_t
            00 00 00 00

 [Int64:<the_number>]

    An [Int64] is stored as:

            00 00 00 00     network-byte-order int64_t
            00 00 00 00

 [Date:<the_date>]
    
    A [Date] is stored as:

        00 or 01        isNotNull flag
        if not null:

            00 00 01 26     8-byte network-byte-order milliseconds 
            a8 79 09 48     since the first instant of 1 January 1970, GMT.

 [Data:<xattr_data>]

    A [Data] is stored as:

        [UInt64:<length>]       data length
        xx xx xx xx             bytes
        xx xx xx xx
        xx xx xx xx
        ...

 [CompressionType]

    Compression type is stored as an [Int32].
    0 == none
    1 == Gzip
    2 == LZ4
	Arq stores backup data in a format similar to that of the open-source version
	control system 'git'.

	Content-Addressable Storage
	---------------------------
	At the most basic level, Arq stores "blobs" using the SHA1 hash of the
	contents as the name, much like git. Because of this, each unique blob is only
	stored once. If 2 files on your system have the same contents, only 1 copy of
	the contents will be stored. If the contents of a file change, the SHA1 hash is
	different and the file is stored as a different blob.

	Files are blobs, and commits and trees are blobs as well.

	(It's not quite that simple actually. To make the names less susceptible to
	lookup tables, Arq actually calculates the SHA1 hash of the computerUUID
	concatenated with the blob's data. But we'll use "SHA1" as shorthand throughout
	this document for this SHA1-derived identifier.)


	"Computer UUID"
	---------------

	When you first run Arq and add a target ("destination"), it creates a
	"universally unique identifier" (UUID) for your computer (referred to below as
	the "computerUUID"). All backup objects are stored with that as a prefix.


	Encryption Dat File
	-------------------

	The first time you add a folder to Arq for backing up, it prompts you to choose
	an encryption password. Arq creates 2 randomly-generated encryption keys. The
	first key is used for encrypting/decrypting; the second key is used for
	creating HMACs.

	Arq stores those keys, encrypted with the encryption password you chose, in a
	file called /<computerUUID>/encryptionv2.dat. You can change your encryption
	password at any time by decrypting this file with the old encryption password
	and then re-encrypting it with your new encryption password.

	The encryptionv2.dat file format is:

	header 45 4e 43 52 ENCR
	59 50 54 49 YPTI
	4f 4e 56 32 ONV2
	salt xx xx xx xx
	xx xx xx xx
	HMACSHA256 xx xx xx xx
	xx xx xx xx
	xx xx xx xx
	xx xx xx xx
	xx xx xx xx
	xx xx xx xx
	xx xx xx xx
	xx xx xx xx
	IV xx xx xx xx
	xx xx xx xx
	xx xx xx xx
	xx xx xx xx
	encrypted master keys xx xx xx xx
	...


	To create the encryptionv2.dat file:
	1. Generate a random salt.
	2. Generate a random IV.
	3. Generate 2 random 32-byte "master keys" (64 bytes total).
	4. Derive 64-byte encryption key from user-supplied encryption password using PBKDF2/HMACSHA1 (200000 rounds) and the salt from step 1.
	5. Encrypt the master keys with AES256-CBC using the first 32 bytes of the derived key from step 4 and IV from step 2.
	6. Calculate the HMAC-SHA256 of (IV + encrypted master keys) using the second 32 bytes of the derived key from step 4.
	7. Concatenate the items as described in the file format shown above.

	To get the 2 "master keys":
	1. Copy salt from the 8 bytes after the header.
	2. Derive 64-byte encryption key from user-supplied encryption password using PBKDF2/HMACSHA1 (200000 rounds) and the salt from step 1.
	3. Calculate HMAC-SHA256 of (IV + encrypted master keys) using second 32 bytes of key from step 2, and verify against HMAC-SHA256 in the file.
	4. Decrypt the ciphertext using the first 32 bytes of the derived key from step 2 to get 2 32-byte "master keys".

	Note: We use HMACSHA1 as the PRF with PBKDF2 because that's the only one available on Windows (in .NET).


	EncryptedObject
	---------------

	We use the term "EncryptedObject" throughout this document as shorthand to
	describe an object containing data in the following format:

	header 41 52 51 4f ARQO
	HMACSHA256 xx xx xx xx
	xx xx xx xx
	xx xx xx xx
	xx xx xx xx
	xx xx xx xx
	xx xx xx xx
	xx xx xx xx
	xx xx xx xx
	master IV xx xx xx xx
	xx xx xx xx
	xx xx xx xx
	xx xx xx xx
	encrypted data IV + session key xx xx xx xx
	...
	ciphertext xx xx xx xx
	...

	To create an EncryptedObject:
	1. Generate a random session key (Arq reuses it for up to 256 objects before replacing it).
	2. Generate a random "data IV".
	3. Encrypt plaintext with session key and data IV.
	4. Generate a random "master IV".
	5. Encrypt (data IV + session key) with AES256-CBC using the first "master key" from the Encryption Dat File and the "master IV".
	4. Calculate HMAC-SHA256 of (master IV + "encrypted data IV + session key" + ciphertext) using the second 32-byte "master key".
	7. Assemble the data in the format shown above.

	To get the plaintext:
	1. Calculate HMAC-SHA256 of (master IV + "encrypted data IV + session key" + ciphertext) and verify against HMAC-SHA256 in the file using the second "master key" from the Encryption Dat File.
	2. Decrypt "encrypted data IV + session key" using the first "master key" from the Encryption Dat File and the "master IV".
	2. Decrypt the ciphertext using the session key and data IV.



	Folder Configuration Files
	--------------------------

	Each time you add a folder for backup, Arq creates a UUID for it and stores 2
	objects at the target:

	object: /<computer_uuid>/buckets/<folder_uuid>

	This file contains a "plist"-format XML document containing:
	1. the 9-byte header "encrypted"
	2. an EncryptedObject containing a plist like this:

	<plist version="1.0">
	<dict>
	<key>AWSRegionName</key>
	<string>us-east-1</string>
	<key>BucketUUID</key>
	<string>408E376B-ECF7-4688-902A-1E7671BC5B9A</string>
	<key>BucketName</key>
	<string>company</string>
	<key>ComputerUUID</key>
	<string>600150F6-70BB-47C6-A538-6F3A2258D524</string>
	<key>LocalPath</key>
	<string>/Users/stefan/src/company</string>
	<key>LocalMountPoint</key>
	</string>/</string>
	<key>StorageType</key>
	<integer>1</integer>
	<key>VaultName</key>
	<string>arq_408E376B-ECF7-4688-902A-1E7671BC5B9A</string>
	<key>VaultCreatedTime</key>
	<real>12345678.0</real>
	<key>Excludes</key>
	<dict>
	<key>Enabled</key>
	<false></false>
	<key>MatchAny</key>
	<true></true>
	<key>Conditions</key>
	<array></array>
	</dict>
	</dict>
	</plist>

	Only Glacier-backed folders have "VaultName" and "VaultCreatedTime" keys.

	NOTE: The folder's UUID and name are called "BucketUUID" and "BucketName"
	in the plist; this is a holdover from previous iterations of Arq and is not
	to be confused with S3's "bucket" concept.



	Commits, Trees and Blobs
	------------------------

	When Arq backs up a folder, it creates 3 types of objects: "commits", "trees"
	and "blobs".

	Each backup that you see in Arq corresponds to a "commit" object in the backup
	data. Its name is the SHA1 of its contents. The commit contains the SHA1 of a
	"tree" object in the backup data. This tree corresponds to the folder you're
	backing up.

	Each tree contains "nodes"; each node has either the SHA1 of another tree, or
	the SHA1 of a file (or multiple SHA1s, see "Tree format" below).

	All commits, trees and blobs are stored as EncryptedObjects (see
	"EncryptedObject" above).


	Commit Format
	-------------

	A "commit" contains the following bytes (see "Data Format Documentation" below
	for explanation of [String], [UInt32], [Date], etc):

	43 6f 6d 6d 69 74 56 30 31 31 "CommitV011"
	[String:"<author>"]
	[String:"<comment>"]
	[UInt64:num_parent_commits] (this is always 0 or 1)
	(
	[String:parent_commit_sha1] /* can't be null */
	[Bool:parent_commit_encryption_key_stretched]] /* present for Commit version >= 4 */
	) /* repeat num_parent_commits times */
	[String:tree_sha1]] /* can't be null */
	[Bool:tree_encryption_key_stretched]] /* present for Commit version >= 4 */
	[Bool:tree_is_compressed] /* present for Commit version 8 and 9 only; indicates Gzip compression or none */
	[CompressionType:tree_compression_type] /* present for Commit version >= 10 */

	[String:"file://<hostname><path_to_folder>"]
	[String:"<merge_common_ancestor_sha1>"] /* only present for Commit version 7 or older (was never used) */
	[Bool:is_merge_common_ancestor_encryption_key_stretched] /* only present for Commit version 4 to 7 */
	[Date:creation_date]
	[UInt64:num_failed_files] /* only present for Commit version 3 or later */
	(
	[String:"<relative_path>"] /* only present for Commit version 3 or later */
	[String:"<error_message>"] /* only present for Commit version 3 or later */
	) /* repeat num_failed_files times */
	[Bool:has_missing_nodes] /* only present for Commit version 8 or later */
	[Bool:is_complete] /* only present for Commit version 9 or later */
	[Data:config_plist_xml] /* a copy of the XML file as described above */



	Tree Format
	-----------

	A tree contains the following bytes:

	54 72 65 65 56 30 31 36 "Treev019"
	[Bool:xattrs_are_compressed] /* present for Tree versions 12-18 */
	[CompressionType:xattrs_compression_type] /* present for Tree version >= 19; indicates Gzip compression or none */
	[Bool:acl_is_compressed] /* present for Tree versions 12-18 */
	[CompressionType:acl_compression_type] /* present for Tree version >= 19; indicates Gzip compression or none */
	[BlobKey:xattrs_blob_key]
	[UInt64:xattrs_size]
	[BlobKey:acl_blob_key]
	[Int32:uid]
	[Int32:gid]
	[Int32:mode]
	[Int64:mtime_sec]
	[Int64:mtime_nsec]
	[Int64:flags]
	[Int32:finderFlags]
	[Int32:extendedFinderFlags]
	[Int32:st_dev]
	[Int32:st_ino]
	[UInt32:st_nlink]
	[Int32:st_rdev]
	[Int64:ctime_sec]
	[Int64:ctime_nsec]
	[Int64:st_blocks]
	[UInt32:st_blksize]
	[UInt64:aggregate_size_on_disk] /* only present for Tree version 11 to 16 (never used) */
	[Int64:create_time_sec] /* only present for Tree version 15 or later */
	[Int64:create_time_nsec] /* only present for Tree version 15 or later */
	[UInt32:missing_node_count] /* only present for Tree version 18 or later */
	(
	[String:"<missing_node_name>"] /* only present for Tree version 18 or later */
	) /* repeat <missing_node_count> times */
	[UInt32:node_count]
	(
	[String:"<file name>"] /* can't be null */
	[Node]
	) /* repeat <node_count> times */


	Each [Node] contains the following bytes:

	[Bool:isTree]
	[Bool:data_are_compressed] /* present for Tree versions 12-18 */
	[CompressionType:data_compression_type] /* present for Tree version >= 19; indicates Gzip compression or none */
	[Bool:xattrs_are_compressed] /* present for Tree versions 12-18 */
	[CompressionType:xattrs_compression_type] /* present for Tree version >= 19; indicates Gzip compression or none */
	[Bool:acl_is_compressed] /* present for Tree versions 12-18 */
	[CompressionType:acl_compression_type] /* present for Tree version >= 19; indicates Gzip compression or none */
	[Int32:data_blob_keys_count]
	(
	[BlobKey:data_blob_key]
	) /* repeat <data_blob_keys_count> times */
	[UIn64:data_size]
	[String:"<thumbnail sha1>"] /* only present for Tree version 18 or earlier (never used) */
	[Bool:is_thumbnail_encryption_key_stretched] /* only present for Tree version 14 to 18 */
	[String:"<preview sha1>"] /* only present for Tree version 18 or earlier (never used) */
	[Bool:is_preview_encryption_key_stretched] /* only present for Tree version 14 to 18 */
	[BlobKey:xattrs_blob_key]
	[UInt64:xattrs_size]
	[BlobKey:acl_blob_key]
	[Int32:uid]
	[Int32:gid]
	[Int32:mode]
	[Int64:mtime_sec]
	[Int64:mtime_nsec]
	[Int64:flags]
	[Int32:finderFlags]
	[Int32:extendedFinderFlags]
	[String:"<finder file type>"]
	[String:"<finder file creator>"]
	[Bool:is_file_extension_hidden]
	[Int32:st_dev]
	[Int32:st_ino]
	[UInt32:st_nlink]
	[Int32:st_rdev]
	[Int64:ctime_sec]
	[Int64:ctime_nsec]
	[Int64:create_time_sec]
	[Int64:create_time_nsec]
	[Int64:st_blocks]
	[UInt32:st_blksize]

	Notes:

	- A Node can have multiple data SHA1s if the file is very large. Arq breaks up
	large files into multiple blobs using a rolling checksum algorithm. This way
	Arq only backs up the parts of a file that have changed.
	- "<xattrs_blob_key>" is the key of a blob containing the sorted extended
	attributes of the file (see "XAttrSet Format" below). Note this means
	extended-attribute sets are "de-duplicated".
	- "<acl_blob_key>" is the SHA1 of the blob containing the result of acl_to_text()
	on the file's ACL. Note this means the ACLs are "de-duplicated".
	- "create_time_sec" and "create_time_nsec" contain the value of the
	ATTR_CMN_CRTIME attribute of the file


	XAttrSet Format
	---------------

	Each XAttrSet blob contains the following bytes:

	58 41 74 74 72 53 65 74 56 30 30 32 "XAttrSetV002"
	[UInt64:xattr_count]
	(
	[String:"<xattr name>"] /* can't be null */
	[Data:xattr_data]
	)


	More on Object Storage
	----------------------

	In general, each blob is stored as an object with a path of the form:

	/<computer_uuid>/objects/<sha1>

	But for small files, the overhead associated with putting and getting the
	objects to/from the storage destination makes backing them up very inefficient.

	So, small files (files under 64KB in length) are stored in "packs", which are
	explained below.


	Packs
	-----

	Each folder configured for backup maintains 2 "packsets", one for trees and
	commits, and one for all other small files. The packsets are named:

	<folder_uuid>-trees
	<folder_uuid>-blobs

	Small files are separated into 2 packsets because the trees and commits are
	cached locally (so that Arq gives reasonable performance for browsing backups);
	all other small blobs don't need to be cached.

	A packset is a set of "packs". When Arq is backing up a folder, it combines
	small files into a single larger packfile; when the packfile reaches 10MB, it
	is stored at the destination. Also, when Arq finishes backing up a folder it
	stores its unsaved packfiles no matter their sizes.

	When storing a pack, Arq stores the packfile as:

	/<computer_uuid>/packsets/<folder_uuid>-(blobs\|trees)/<sha1>.pack

	It also stores an index of the SHA1s contained in the pack as:

	/<computer_uuid>/packsets/<folder_uuid>-(blobs\|trees)/<sha1>.index


	Pack Index Format
	-----------------

	magic number ff 74 4f 63
	version (2) 00 00 00 02 network-byte-order
	fanout[0] 00 00 00 02 (4-byte count of SHA1s starting with 0x00)
	...
	fanout[255] 00 00 f0 f2 (4-byte count of total objects == count of SHA1s starting with 0xff or smaller)
	object[0] 00 00 00 00 (8-byte network-byte-order offset)
	00 00 00 00
	00 00 00 00 (8-byte network-byte-order data length)
	00 00 00 00
	00 xx xx xx (sha1 starting with 00)
	xx xx xx xx
	xx xx xx xx
	xx xx xx xx
	xx xx xx xx
	00 00 00 00 (4 bytes for alignment)
	object[1] 00 00 00 00 (8-byte network-byte-order offset)
	00 00 00 00
	00 00 00 00 (8-byte network-byte-order data length)
	00 00 00 00
	00 xx xx xx (sha1 starting with 00)
	xx xx xx xx
	xx xx xx xx
	xx xx xx xx
	xx xx xx xx
	00 00 00 00 (4 bytes for alignment)
	object[2] 00 00 00 00 (8-byte network-byte-order offset)
	00 00 00 00
	00 00 00 00 (8-byte network-byte-order data length)
	00 00 00 00
	00 xx xx xx (sha1 starting with 00)
	xx xx xx xx
	xx xx xx xx
	xx xx xx xx
	xx xx xx xx
	00 00 00 00 (4 bytes for alignment)
	...
	object[f0f1] 00 00 00 00 (8-byte network-byte-order offset)
	00 00 00 00
	00 00 00 00 (8-byte network-byte-order data length)
	00 00 00 00
	ff xx xx xx (sha1 starting with ff)
	xx xx xx xx
	xx xx xx xx
	xx xx xx xx
	xx xx xx xx
	00 00 00 00 (4 bytes for alignment)
	Glacier archiveId not null 01 (1 byte) /* Glacier only */
	Glacier archiveId strlen 00 00 00 00 (network-byte-order 8 bytes) /* Glacier only */
	00 00 00 08 /* Glacier only */
	Glacier archiveId string xx xx xx xx (n bytes) /* Glacier only */
	xx xx xx xx /* Glacier only */
	Glacier pack size 00 00 00 00 (8-byte network-byte-order data length) /* Glacier only */
	00 00 00 00 /* Glacier only */
	20-byte SHA1 of all of the xx xx xx xx
	above xx xx xx xx
	xx xx xx xx
	xx xx xx xx
	xx xx xx xx


	Pack File Format
	----------------

	signature 50 41 43 4b ("PACK")
	version (2) 00 00 00 02 (network-byte-order 4 bytes)
	object count 00 00 00 00 (network-byte-order 8 bytes)
	object count 00 00 f0 f2
	object[0] mimetype not null 01 (1 byte) (this is usually zero)
	object[0] mimetype strlen 00 00 00 00 (network-byte-order 8 bytes) (this isn't here if not-null is zero)
	00 00 00 08
	object[0] mimetype string xx xx xx xx (n bytes)
	xx xx xx xx
	object[0] name not null 01 (1 byte) (this is usually zero)
	object[0] name strlen 00 00 00 00 (network-byte-order 8 bytes) (this isn't here if not-null is zero)
	00 00 00 08
	object[0] name string xx xx xx xx (n bytes)
	xx xx xx xx
	object[0] data length 00 00 00 00 (network-byte-order 8 bytes)
	00 00 00 06
	object[0] data xx xx xx xx (n bytes)
	xx xx
	...
	object[f0f2] mimetype not null 01 (1 byte) (this is usually zero)
	object[f0f2] mimetype len 00 00 00 00 (network-byte-order 8 bytes) (this isn't here if not-null is zero)
	00 00 00 08
	object[f0f2] mimetype str xx xx xx xx (n bytes)
	xx xx xx xx
	object[f0f2] name not null 01 (1 byte) (this is usually zero)
	object[f0f2] name strlen 00 00 00 00 (network-byte-order 8 bytes) (this isn't here if not-null is zero)
	00 00 00 08
	object[f0f2] name string xx xx xx xx (n bytes)
	xx xx xx xx
	object[f0f2] data length 00 00 00 00 (network-byte-order 8 bytes)
	00 00 00 04
	object[f0f2] data 12 34 12 34
	20-byte SHA1 of all of the xx xx xx xx
	above xx xx xx xx
	xx xx xx xx
	xx xx xx xx
	xx xx xx xx



	Data Format Documentation Conventions
	-------------------------------------

	We used a few shortcuts in some of the data format explanations above:

	[BlobKey:value]

	A [BlobKey] is stored as:

	[String:sha1] /* can't be null */
	[Bool:is_encryption_key_stretched] /* only present for Tree version 14 or later, Commit version 4 or later */
	[UInt32:storage_type] /* 1==S3, 2==Glacier; only present for Tree version 17 or later */
	[String:archive_id] /* only present for Tree version 17 or later, if storage_type==2 */
	[UInt64:archive_size] /* only present for Tree version 17 or later, if storage_type==2 */
	[Date:archive_upload_date] /* only present for Tree version 17 or later, if storage_type==2 */


	[Bool:value]

	A [Bool] is stored as 1 byte, either 00 or 01.

	[String:"<string>"]

	A [String] is stored as:

	00 or 01 isNotNull flag

	if not null:

	00 00 00 00 8-byte network-byte-order length
	00 00 00 0c
	xx xx xx xx UTF-8 string data
	xx xx xx xx
	xx xx xx xx

	[UInt32:<the_number>]

	A [UInt32] is stored as:

	00 00 00 00 network-byte-order uint32_t

	[Int32:<the_number>]

	An [Int32] is stored as:

	00 00 00 00 network-byte-order int32_t

	[UInt64:<the_number>]

	A [UInt64] is stored as:

	00 00 00 00 network-byte-order uint64_t
	00 00 00 00

	[Int64:<the_number>]

	An [Int64] is stored as:

	00 00 00 00 network-byte-order int64_t
	00 00 00 00

	[Date:<the_date>]

	A [Date] is stored as:

	00 or 01 isNotNull flag
	if not null:

	00 00 01 26 8-byte network-byte-order milliseconds
	a8 79 09 48 since the first instant of 1 January 1970, GMT.

	[Data:<xattr_data>]

	A [Data] is stored as:

	[UInt64:<length>] data length
	xx xx xx xx bytes
	xx xx xx xx
	xx xx xx xx
	...

	[CompressionType]

	Compression type is stored as an [Int32].
	0 == none
	1 == Gzip
	2 == LZ4