Refget is a specification to define a standard way to access reference sequences using an identifier system derived from the sequence itself. It is a fundamental building block of GA4GH providing a way for our standards to access sequences and to unambiguously identify them.
Refget is a HTTP based standard so you can access sequences using any HTTP library. All you need to access a refget service is a HTTP library and a known identifier. The following Python code retrieves the first 10 bases of Saccharomyces cerevisiae chromosome I (TRUNC512 identifier 6681ac2f62509cfc220d78751b8dc524
).
import requests
url = 'https://refget.herokuapp.com/sequence/{}'.format('6681ac2f62509cfc220d78751b8dc524')
r = requests.get(url, headers={'Accept':'text/plain'}, params={'start':0, 'end':10})
print(r.text)
'CCACACCACA'
Omit the start and end parameters to retrieve the entire sequence.
import requests
url = 'https://refget.herokuapp.com/sequence/{}/metadata'.format('6681ac2f62509cfc220d78751b8dc524')
r = requests.get(url, headers={'Accept':'application/json'})
print(r.json())
{'metadata': {'aliases': [{{'alias': 'I', 'naming_authority': 'unknown'}], 'length': 230218, 'md5': '6681ac2f62509cfc220d78751b8dc524', 'trunc512': '959cb1883fc1ca9ae1394ceb475a356ead1ecceff5824ae7'}}
- CRAM Reference registry deployed by ENA at EMBL-EBI. Provides access to all submitted INSDC sequence via the refget standard
- Refget reference implementation with an example running at Heroku
- AWS Serverless implementation using S3 as a sequence and metadata storage layer
Sequences such as reference genomes have a multitude of names. For example chromosome 1 from the latest build of the human genome (GRCh38) can be known as chr1
, 1
, CM000663.2
or NC_000001.11
depending on where you accessed your sequence from. Refget instead uses a cryptographic hash function to create an identifier based on the sequence content by digesting the A,C,G and Ts from a chromosome and passing it through the MD5 or SHA512 algorithm creating a string. Chromosome 1 can now be referred to as 6aef897c3d6ff0c78aff06ac189178dd
.
You can contribute changes to the hts-specs GitHub repository. If you want to be more involved we host regular conference calls.