Skip to content

Instantly share code, notes, and snippets.

@tomsing1
Created April 3, 2023 22:18
Show Gist options
  • Save tomsing1/ca27a7969def107eb0632d65d74afcc0 to your computer and use it in GitHub Desktop.
Save tomsing1/ca27a7969def107eb0632d65d74afcc0 to your computer and use it in GitHub Desktop.
Rmd file that retrieves and parses ENA's XML specification for NGS experiments
---
title: "Controlled vocabulary for sequencing experiments"
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
```{r setup}
library(magrittr)
library(stringr)
library(xml2)
```
## ENA's annotation schema
The European Nucleotide Archive (ENA) has published a
[schema](ftp://ftp.ebi.ac.uk/pub/databases/ena/doc/xsd/sra_1_5/SRA.experiment.xsd)
with the controlled vocabulary for annotation of sequencing experiments.
This document highlights the available terms for the
- `library_strategy`
- `library_source`
- `library_selection`
- `library_layout`
annotation fields. Please choose `Keywords` from the following tables:
```{r}
url <- paste0("ftp://ftp.ebi.ac.uk/pub/databases/ena/doc/xsd/sra_1_5/",
"SRA.experiment.xsd")
root <- xml2::read_xml(url)
nodes <- xml2::xml_children(root)
```
### Library strategy
```{r results='asis'}
terms <- nodes %>%
xml_find_all("//*[@name='typeLibraryStrategy']") %>%
xml_children() %>%
xml_find_all("xs:enumeration")
lib_strategy <- data.frame(
Keyword = terms %>%
xml_attr("value", default = NA_character_),
Description = terms %>%
xml_text() %>%
str_squish()
)
knitr::kable(lib_strategy)
```
### Library source
```{r results='asis'}
terms <- nodes %>%
xml_find_all("//*[@name='typeLibrarySource']") %>%
xml_children() %>%
xml_find_all("xs:enumeration")
lib_source <- data.frame(
Keyword = terms %>%
xml_attr("value", default = NA_character_),
Description = terms %>%
xml_text() %>%
str_squish()
)
knitr::kable(lib_source)
```
### Library selection
```{r results='asis'}
terms <- nodes %>%
xml_find_all("//*[@name='typeLibrarySelection']") %>%
xml_children() %>%
xml_find_all("xs:enumeration")
lib_selection <- data.frame(
Keyword = terms %>%
xml_attr("value", default = NA_character_),
Description = terms %>%
xml_text() %>%
str_squish()
)
knitr::kable(lib_selection)
```
### Library layout
```{r results='asis'}
terms <- nodes %>%
xml_find_all("//*[@name='LIBRARY_LAYOUT']") %>%
xml_find_all("./xs:complexType") %>%
xml_children() %>%
xml_find_all("./xs:element")
lib_layout <- data.frame(
Keyword = terms %>%
xml_attr("name", default = NA_character_),
Description = terms %>%
xml_text() %>%
str_squish()
)
knitr::kable(lib_layout)
```
## Reproducibility
<details>
<summary>
Session Information
</summary>
```{r}
sessionInfo()
```
</details>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment