OpenNeuro has implemented a data retention policy, stating that datasets that have been in draft state for greater than 28 days may be reverted to the latest snapshot. Unfortunately, we don't currently have an interface for viewing what changes have been made since the last snapshot, so users may not know whether they want to create a new snapshot or not.
This gist shows two ways to view the changes using the OpenNeuro CLI. We will use ds000001 as an example.
The easy but slow approach would be to use the CLI to download two copies of your dataset, the most recent tag and the draft, and run diff -r
on the pair:
$ openneuro download --snapshot 1.0.0 ds000001 ds000001-v1.0.0/
$ openneuro download --draft ds000001 ds000001-draft/
$ diff -r ds000001-v1.0.0 ds000001-draft
This will show changes in text files and binary files like so:
diff --color -r ds000001-v1.0.0/participants.tsv ds000001-draft/participants.tsv
2c2
< sub-01 F 26
---
> sub-01 F 27
Binary files ds000001-v1.0.0/sub-01/anat/sub-01_T1w.nii.gz and ds000001-draft/sub-01/anat/sub-01_T1w.nii.gz differ
By using DataLad, changes can be seen without downloading the full content. This will require installing DataLad and the OpenNeuro CLI, and setting up the credential helper.
Once you have done these things, on your dataset, you can click the "Clone" button and copy the OpenNeuro URL. If my dataset were ds000001, I would get https://openneuro.org/git/0/ds000001. From here I could clone the dataset and compare the latest version:
$ datalad clone https://openneuro.org/git/0/ds000001
[INFO ] Remote origin not usable by git-annex; setting annex-ignore
[INFO ] https://openneuro.org/git/0/ds000001/config download failed: Not Found
[INFO ] access to 1 dataset sibling s3-PRIVATE not auto-enabled, enable with:
| datalad siblings -d "/data/bids/ds000001" enable -s s3-PRIVATE
install(ok): /data/bids/ds000001 (dataset)
$ cd ds000001
Supposing I have one change saved in the draft, I could see that and find out the most recent version:
$ git describe --tags
1.0.0-1-ga5184e8
I would then compare with the latest version with git diff
:
$ git diff 1.0.0
This has the output:
diff --git a/participants.tsv b/participants.tsv
index 4367938..6ca1efd 100644
--- a/participants.tsv
+++ b/participants.tsv
@@ -1,5 +1,5 @@
participant_id sex age
-sub-01 F 26
+sub-01 F 27
sub-02 M 24
sub-03 F 27
sub-04 F 20
diff --git a/sub-01/anat/sub-01_T1w.nii.gz b/sub-01/anat/sub-01_T1w.nii.gz
index 25cb343..7b5fdfe 120000
--- a/sub-01/anat/sub-01_T1w.nii.gz
+++ b/sub-01/anat/sub-01_T1w.nii.gz
@@ -1 +1 @@
-../../.git/annex/objects/V7/Pj/MD5E-s5663237--4608ffbd6b78ce3a325eb338fa556589.nii.gz/MD5E-s5663237--4608ffbd6b78ce3a325eb338fa556589.nii.gz
\ No newline at end of file
+../../.git/annex/objects/K0/1M/MD5E-s5736750--4ba3ad9eaa54aab87d97fa0d60b576ad.nii.gz/MD5E-s5736750--4ba3ad9eaa54aab87d97fa0d60b576ad.nii.gz
\ No newline at end of file
To highlight specific differences in TSV files with many columns, consider using the --word-diff
option, e.g.,
$ git diff --word-diff=color 1.0.0 participants.tsv
diff --git a/participants.tsv b/participants.tsv
index 4367938..6ca1efd 100644
--- a/participants.tsv
+++ b/participants.tsv
@@ -1,5 +1,5 @@
participant_id sex age
sub-01 F [-26-]{+27+}
sub-02 M 24
sub-03 F 27
sub-04 F 20
This does not render well in Markdown, but thanks to @bpoldrack for the tip!