This instruction will guide you through all steps necessary to use Singularity to
- Prepare an open dataset from S3
- Send deal to a local emulated storage provider
f02815405
- Make retrievals from the emulated storage provider using HTTP and Bitswap
- Download latest pre-built Singularity release
- Download latest pre-built sim-sp release
- Do not use package format, i.e. '.deb', '.rpm' as it will be installed to your system PATH
- Extract above to the same folder. After extraction, the folder should contain two executables
singularity
,sim-sp
and some docs and license file - Open a terminal and change the working directory to that folder
- Make sure everything works by running
For Windows user, you will need to use below command for the whole workshop
./singularity version ./sim-sp -h
For Mac user, you may need to open it in finder once to bypass gatekeepersingularity.exe version sim-sp.exe -h
- Download latest IPFS CLI
Singularity is based on SQL database, in this workshop, we will be using the default SQLite database backend. To initialize the database, run
./singularity admin init
Note if you suspect you've messed up anywhere in the below instruction and would like to restart from scratch, you can reset the database using
./singularity admin reset --really-do-it
In this workshop, we are going to use CIViC (Clinical Interpretation of Variants in Cancer) as our source dataset.
Connect to AWS S3 bucket as a storage connection. Let's name the connection as civic
./singularity storage create s3 aws --region us-west-2 --name civic --path civic-aws-opendata
Now we can see the storage connection is saved in the database using below command
./singularity storage list
We can also see what's inside this storage connection using below command. This gives us another assurance that the connection is valid. Note those folders have not been prepared yet.
./singularity storage explore civic
Now create a new preparation named civic
with the storage connection civic
with default parameters
./singularity prep create --name civic --source civic
Start Scanning the datasource for files
./singularity prep start-scan civic civic
Now the data source is marked as ready to be scanned, but we have not yet started running any worker to scan and prepare the dataset. Usually the dataset worker should be always running, but in this workshop, we are going to run it on-demand. The command will take one minute to complete depending on the Internet speed, it will look like it hangs at created pack job 2 with 43 file range
but it will finish soon. The command will exit upon completion of data preparation.
./singularity run dataset-worker --exit-on-error --exit-on-complete
We also want to do one more thing, DAG generation, this contains all folder structure information and can be very useful for retrieval. Run below two command to complete DAG generation. Those should complete almost instantly.
./singularity prep start-daggen civic civic
./singularity run dataset-worker --exit-on-error --exit-on-complete
Great, we have completed the data preparation and can now list all prepared pieces using below command. It will show piece_size
and piece_cid
of each pieces which are very important parameters in deal proposals to storage providers.
./singularity prep list-pieces civic
Also, now all files and folders for this dataset now has a CID which can be used for later retrieval. The CID on the first line with an empty path is called RootCID
, which is the CID of the root folder of this dataset. Make sure you write it down if it is not bafybeibtm5nxak73c7db7z4xmwuergpkqvjkkw7awuwtsehtg3ca55by3q
and replace every place in following instruction that uses bafybeibtm5nxak73c7db7z4xmwuergpkqvjkkw7awuwtsehtg3ca55by3q
./singularity prep explore civic civic
You may start to wonder, where is the CAR files? Singularity uses inline preparation which stores the mapping between the original data files and the CAR files so you don't need to provision extra space for CAR files.
You can run content provider to offer CAR file downloads for Storage Providers. Do not terminate the content-provider until the storage provider has "sealed" the deal in the next section.
./singularity run content-provider
[Optional] You may try downloading CAR files using below command in another terminal window (replace <piece_cid>
with the actual ones from list-pieces
output)
wget http://127.0.0.1:7777/piece/<piece_cid>
Deal making needs two parties. A client which sends the deal proposals and a storage provider that does the sealing. In this demo, we're going to import a leaked private key as our test client
./singularity wallet import 7b2254797065223a22736563703235366b31222c22507269766174654b6579223a226b35507976337148327349586343595a58594f5775453149326e32554539436861556b6c4e36695a5763453d227d
We also want to attach this wallet to our preparation civic
so all deal proposals for this data preparation will be sent from this wallet
./singularity prep attach-wallet civic f0808055
Now for storage provider, we will run an emulated storage provider in another terminal window. This emulated storage provider will accept any boost deal, download the CAR file which is part of the deal from Singularity content provider and offer free retrievals. Do not terminate this window until the end of the workshop.
./sim-sp run
Finally, it's time to send out the deals, to simplify this process, we are going to send deals for all available pieces from this open dataset to the storage provider. Do not change the miner id f02815405
, do NOT replace {PIECE_CID}
.
./singularity deal schedule create --verified=false --preparation civic --provider f02815405 --url-template "http://127.0.0.1:7777/piece/{PIECE_CID}"
Then run the deal pusher to actually send those deals out.
./singularity run deal-pusher
You will now things happening from 3 different terminal window
- Singularity Deal Pusher is sending deal proposals to the emulated storage provider
- sim-sp is receiving boost online deals and is trying to download and parse the CAR files from Singularit Content Provider
- Singularity Content Provider is reading S3 objects and converting into CAR stream for download
Wait until you see two of Deal xxxxxxx-xxxx-xxxx-xxxx-xxxxxxxx completed successfully
, then you are ready for the upcoming retrieval session. You may also kill the deal-pusher and content-provider service and leave sim-sp running for retrievals.
This emulated storage provider offers both HTTP and bitswap retrievals similar to boost.
The CAR files can be downloaded using below command (replace <piece_cid>
with the actual ones from list-pieces
output)
wget http://127.0.0.1:7778/piece/<piece_cid>
The emulated Storage Provider will enable IPFS Gateway so you can actually browse the dataset using the RootCID by go to http://127.0.0.1:7778/ipfs/bafybeibtm5nxak73c7db7z4xmwuergpkqvjkkw7awuwtsehtg3ca55by3q, this RootCID comes from the preparation result ./singularity prep explore civic civic
You may also browse the files using IPFS Gateway and download files via any HTTP clients.
First initialize and run IPFS with below commands
ipfs init
ipfs daemon
Then you can connect to the emulated storage provider using below command in a different terminal window. If you see warning says repo locked
, just try it again
ipfs swarm connect /ip4/127.0.0.1/tcp/24001/p2p/12D3KooWDeNSud283YaRmhqbZDynLNmtATBxjUPAUJxtPyEXXp9u
Finally, you can retrieve the whole dataset with a single RootCID
ipfs get -o out bafybeibtm5nxak73c7db7z4xmwuergpkqvjkkw7awuwtsehtg3ca55by3q
Now you can examine the out
folder which should contain the whole dataset
@xinaxu
I am working on a rewrite of the singularity workshop and wanted to let you know I just wrapped up the first rough draft. I still have some formatting and terminal window management content to update but I wanted to get your thoughts and correct anything I overlooked beforehand. Please let me know if you see anything technically incorrect or otherwise.
https://gist.github.com/SgtCoin/6a9513afedbf8875d01655f039ad9d2e