Created
April 3, 2023 22:18
-
-
Save tomsing1/ca27a7969def107eb0632d65d74afcc0 to your computer and use it in GitHub Desktop.
Rmd file that retrieves and parses ENA's XML specification for NGS experiments
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--- | |
title: "Controlled vocabulary for sequencing experiments" | |
--- | |
```{r, include = FALSE} | |
knitr::opts_chunk$set( | |
collapse = TRUE, | |
comment = "#>" | |
) | |
``` | |
```{r setup} | |
library(magrittr) | |
library(stringr) | |
library(xml2) | |
``` | |
## ENA's annotation schema | |
The European Nucleotide Archive (ENA) has published a | |
[schema](ftp://ftp.ebi.ac.uk/pub/databases/ena/doc/xsd/sra_1_5/SRA.experiment.xsd) | |
with the controlled vocabulary for annotation of sequencing experiments. | |
This document highlights the available terms for the | |
- `library_strategy` | |
- `library_source` | |
- `library_selection` | |
- `library_layout` | |
annotation fields. Please choose `Keywords` from the following tables: | |
```{r} | |
url <- paste0("ftp://ftp.ebi.ac.uk/pub/databases/ena/doc/xsd/sra_1_5/", | |
"SRA.experiment.xsd") | |
root <- xml2::read_xml(url) | |
nodes <- xml2::xml_children(root) | |
``` | |
### Library strategy | |
```{r results='asis'} | |
terms <- nodes %>% | |
xml_find_all("//*[@name='typeLibraryStrategy']") %>% | |
xml_children() %>% | |
xml_find_all("xs:enumeration") | |
lib_strategy <- data.frame( | |
Keyword = terms %>% | |
xml_attr("value", default = NA_character_), | |
Description = terms %>% | |
xml_text() %>% | |
str_squish() | |
) | |
knitr::kable(lib_strategy) | |
``` | |
### Library source | |
```{r results='asis'} | |
terms <- nodes %>% | |
xml_find_all("//*[@name='typeLibrarySource']") %>% | |
xml_children() %>% | |
xml_find_all("xs:enumeration") | |
lib_source <- data.frame( | |
Keyword = terms %>% | |
xml_attr("value", default = NA_character_), | |
Description = terms %>% | |
xml_text() %>% | |
str_squish() | |
) | |
knitr::kable(lib_source) | |
``` | |
### Library selection | |
```{r results='asis'} | |
terms <- nodes %>% | |
xml_find_all("//*[@name='typeLibrarySelection']") %>% | |
xml_children() %>% | |
xml_find_all("xs:enumeration") | |
lib_selection <- data.frame( | |
Keyword = terms %>% | |
xml_attr("value", default = NA_character_), | |
Description = terms %>% | |
xml_text() %>% | |
str_squish() | |
) | |
knitr::kable(lib_selection) | |
``` | |
### Library layout | |
```{r results='asis'} | |
terms <- nodes %>% | |
xml_find_all("//*[@name='LIBRARY_LAYOUT']") %>% | |
xml_find_all("./xs:complexType") %>% | |
xml_children() %>% | |
xml_find_all("./xs:element") | |
lib_layout <- data.frame( | |
Keyword = terms %>% | |
xml_attr("name", default = NA_character_), | |
Description = terms %>% | |
xml_text() %>% | |
str_squish() | |
) | |
knitr::kable(lib_layout) | |
``` | |
## Reproducibility | |
<details> | |
<summary> | |
Session Information | |
</summary> | |
```{r} | |
sessionInfo() | |
``` | |
</details> |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment