The ENA has multiple APIs. The most important ones are:
- ENA Portal API: search ENA's databases using (potentially complex) queries.
- ENA Browser API: retrieve entire records programmatically
In addition, quick summaries of metadata and file retrieval locations can be retrieved as ENA file reports.
To use the APIs, it is helpful to understand the relationships between different objects in ENA:
Run
: a lane (or equivalent) on an sequencing machine, used to attach sequence read data to experiments.Experiment
: represents the library solution that is created from a sample and used in a sequencing experiment. The experiment object contains details about the sequencing platform and library protocols.Study
: A study groups together experiments to allow them to be cited together in a publication.Sample
: A biological sample that was used to create a library (= experiment). It is common to have multiple libraries and sequencing experiments for a single sample.
In summary:
- One or more runs are part of an experiment.
- One or more experiments are part of a study.
- One or more experiments are associated with a sample.
The ENA Portal can be used to search various databases and return many (but not all) fields.
A query for the ENA REST API can contain multiple fields. The Advanced search web application can be used to explore the different fields and allowed values to construct complex queries.
Each query starts with a result
type that determines what fields to search against.
The following query will return a list of all available results
:
https://www.ebi.ac.uk/ena/portal/api/results?dataPortal=ena
At the time of writing, it returned the following:
resultId description
analysis_study Studies used for nucleotide sequence analyses from reads
analysis Nucleotide sequence analyses from reads
assembly Genome assemblies
coding Coding sequences
wgs_set Genome assembly contig sets (WGS)
tsa_set Transcriptome assembly contig sets (TSA)
tls_set Targeted locus study contig sets (TLS)
environmental Environmental samples
noncoding Non-coding sequences
read_study Studies used for raw reads
read_experiment Experiments used for raw reads
read_run Raw reads
sample Samples
sequence Nucleotide sequences
study Studies
taxon Taxonomic classification
If we are interested in looking for Raw reads
, then we choose the read_run
result type. This choice defines which database fields can be queried. A full set of fields available for querying is also available under this URL:
https://www.ebi.ac.uk/ena/portal/api/searchFields?result=read_run
At the time of writing, it returned the following:
columnId description
accession accession number
altitude Altitude (m)
assembly_quality Quality of assembly
assembly_software Assembly software
base_count number of base pairs
binning_software Binning software
bio_material identifier for biological material including institute and collection code
broker_name broker name
cell_line cell line from which the sample was obtained
cell_type cell type from which the sample was obtained
center_name Submitting center
checklist checklist name (or ID)
collected_by name of the person who collected the specimen
collection_date date that the specimen was collected
collection_date_submitted Collection date submitted
completeness_score Completeness score (%)
contamination_score Contamination score (%)
country locality of sample isolation: country names, oceans or seas, followed by regions and localities
cultivar cultivar (cultivated variety) of plant from which sample was obtained
culture_collection identifier for the sample culture including institute and collection code
[truncated]
To obtain the search results in XML format, the query can also be used in the ENA Browser API
(see below), via its search
endpoint:
https://www.ebi.ac.uk/ena/browser/api/xml/search?result=read_run&query=secondary_study_accession="SRP212869"
To return a list of all accessions with raw reads (e.g. runs) for the same study, we can perform a search in the Portal API against the read_run
result:
https://www.ebi.ac.uk/ena/portal/api/search?result=read_run&query=secondary_study_accession="SRP212869"
The filereport
API endpoint offers summarised reports about a provided accession. It bypasses the search and fetches information directly from a data cache, increasing the speed of delivery.
Note: The &result=read_run
query must be added to the URL.
For example, the following URL retrieves the file report for study SRP212869 in TSV format, including all run_accessions
and the FTP locations of their FASTQ files:
https://www.ebi.ac.uk/ena/portal/api/filereport?accession=SRP212869&result=read_run
File reports are available for the following accession types:
- Study accessions (ERP, SRP, DRP, PRJ prefixes)
- Experiment accessions (ERX, SRX, DRX prefixes)
- Sample accessions (ERS, SRS, DRS, SAM prefixes)
- Run accessions (ERR, SRR, DRR prefixes)
- Find run accessions and descriptions/titles for all runs in the study with
secondary_study_accession
SRP212869:
https://www.ebi.ac.uk/ena/portal/api/search?result=read_run&query=secondary_study_accession=SRP212869&format=tsv
We can specify the result format as ‘&format=tsv’ or ‘&format=json’. TSV is the default.
- Find the same data using the official ENA study accession
PRJNA552470
instead:
https://www.ebi.ac.uk/ena/portal/api/search?result=read_run&query=study_accession=PRJNA552470&format=tsv
- Specify additional fields
Hint: Use the Advanced search
Look up just a few fields:
https://www.ebi.ac.uk/ena/portal/api/search?result=read_run&query=study_accession=PRJNA552470&fields=sample_accession,experiment_accession,study_accession
or lots of fields:
https://www.ebi.ac.uk/ena/portal/api/search?result=read_run&query=study_accession=PRJNA552470&fields=accession,center_name,description,experiment_accession,experiment_alias,fastq_aspera,fastq_ftp,instrument_model,library_layout,library_selection,library_source,library_strategy,parent_study,read_count,sample_accession,sample_alias,sample_description,sample_title,secondary_sample_accession,secondary_study_accession,scientific_name,study_accession,study_alias,study_title,submission_accession,tax_id,tissue_type&format=tsv
The ENA Browser API can be used to retrieve entire ENA Records in EMBL flat file, fasta or XML format dependent on the record type.
The endpoint to retrieve records by accession is:
https://www.ebi.ac.uk/ena/browser/api/xml/<accession>
- Return the record for
Sample
SAMN03401168 in XML format
https://www.ebi.ac.uk/ena/browser/api/xml/SAMN03401168
- Return records for multiple
Experiment
accessions in XML format, we concatenate the accessions with commas:
https://www.ebi.ac.uk/ena/browser/api/xml/SRX952421,SRX952422
- Return records for multiple
Runs
https://www.ebi.ac.uk/ena/browser/api/xml/SRR11028503,SRR11028504