Created
November 16, 2016 20:20
-
-
Save diamondap/0e7395a80369d5c74559f89aa023996c to your computer and use it in GitHub Desktop.
DPN sync process for APTrust
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
I sync replication requests from remote nodes to my own node. Any requests in which I'm the to_node go into my processing queue. For each replication request in the queue, I do this: | |
1. Copy the bag from the remote node, via rsync/ssh. | |
2. Calculate the sha256 digest of the bag's tag manifest. | |
3. Send that fixity value back to the ingest node. If I get back a record in which StoreRequested == false, I delete the bag from my staging area and consider the job done. | |
4. If StoreRequested == true, I validate the bag by making sure all required files and tags are present, and all checksums in the manifest-sha256.txt match. If the bag is invalid, I cancel the transfer on the remote node with a cancel reason indicating that the bag did not pass validation. I delete the bag from staging, and am done. | |
5. If the bag is valid, I copy it to long-term storage and delete it from my staging area. | |
6. I update the transfer record on the ingest node to say Stored = true. | |
My own node does not know the bag is stored until the next time I sync from the remote node. |
It is expected that the replication transfer request initially has it's store_requested set to false. store_requested is set to true by the from_node upon successful fixity response from to_node. This represents a state similar to status=confirmed in v1, correct? When you send fixity back, the only field the to_node changes in the put to from_node is the fixity_value, correct?
Yes - store_requested starts out false, and is set to true only when the from_node gets a valid fixity from the to_node. When the to_node sends the fixity value, it updates fixity_value and updated_at.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
By the way, I do validation AFTER calculating the tag manifest fixity because validating a 250GB bag is really expensive, and I don't want to even start that work if I got a bad bag.