Skip to content

Instantly share code, notes, and snippets.

@chasemc
Last active June 11, 2018 23:11
Show Gist options
  • Save chasemc/85a152e59b4a81e3cced77a40097e1e9 to your computer and use it in GitHub Desktop.
Save chasemc/85a152e59b4a81e3cced77a40097e1e9 to your computer and use it in GitHub Desktop.
Parsing XML Elements from mzXML files with the R package 'XML'

Parse mzXML

Chase Clark June 11, 2018

Examples of both MALDI MS1 and LC-MS/MS files

Example of IDBac mzXML conversion output... Bruker autoFlex MALDI MS1

First few lines of mzXML file:

<?xml version="1.0" encoding="ISO-8859-1"?>
  <mzXML xmlns="http://sashimi.sourceforge.net/schema_revision/mzXML_3.2"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://sashimi.sourceforge.net/schema_revision/mzXML_3.2 http://sashimi.sourceforge.net/schema_revision/mzXML_3.2/mzXML_idx_3.2.xsd">
  <msRun scanCount="12" startTime="PT0S" endTime="PT0S">
  <parentFile fileName="file://C:\Users\chase\Downloads\New folder\5-23-18\p\0_E24\1\1SLin/fid"
fileType="RAWData"
fileSha1="7e930d9101f845943ada76b116b38213a2d7c39c"/>
  <parentFile fileName="file://C:\Users\chase\Downloads\New folder\5-23-18\sm\0_E24\1\1SRef/fid"
# mzXML path
filePath <- "C:/Users/chase/Downloads/IDBac/Converted_To_mzXML/117.mzXML"

# pointer to mzXML
filePointer <- XML::xmlInternalTreeParse(filePath)

# Get all element names
tags <- base::names(XML::xmlRoot(filePointer)[[1]])

# Show unique XML elements
unique(as.character(tags))
## [1] "parentFile"     "msInstrument"   "dataProcessing" "scan"

Extract one of these from the mzXML file

parentFile <- XML::xmlElementsByTagName(XML::xmlRoot(filePointer)[[1]], "parentFile")
# These are still just pointers
parentFile[1]
## $parentFile
## <parentFile fileName="file://C:\Users\chase\Downloads\New folder\5-23-18\p\0_E24\1\1SLin/fid" fileType="RAWData" fileSha1="7e930d9101f845943ada76b116b38213a2d7c39c"/>

Show that it's still just a list of pointers

summary(parentFile[[1]])
##                 Length                 Class1                 Class2 
##                      1 XMLInternalElementNode        XMLInternalNode 
##                 Class3                   Mode 
##        XMLAbstractNode            externalptr
as.character(parentFile[1])
## [1] "<pointer: 0x0000000011e65c90>"

To get the actual value of the tag

XML::xmlAttrs(parentFile[[1]])
##                                                                          fileName 
## "file://C:\\Users\\chase\\Downloads\\New folder\\5-23-18\\p\\0_E24\\1\\1SLin/fid" 
##                                                                          fileType 
##                                                                         "RAWData" 
##                                                                          fileSha1 
##                                        "7e930d9101f845943ada76b116b38213a2d7c39c"

LC-MS/MS mzXML file example - Bruker QTOF LC-MS/MS data that was converted by MSConvert

# mzXML path
filePath <- "C:/Users/chase/Desktop/D051_1-4_01_1639.mzXML"

# pointer to mzXML
filePointer <- XML::xmlInternalTreeParse(filePath)

# Get all element names
tags <- base::names(XML::xmlRoot(filePointer)[[1]])

# Show unique XML elements
unique(as.character(tags))
## [1] "parentFile"     "msInstrument"   "dataProcessing" "scan"

Extract one of these from the mzXML file

parentFile <- XML::xmlElementsByTagName(XML::xmlRoot(filePointer)[[1]], "parentFile")
# These are still just pointers
parentFile[1]
## $parentFile
## <parentFile fileName="file://C:\Users\Vanessa\OneDrive - University of Illinois at Chicago\MS\D051_1-4_01_1639.d/Analysis.baf" fileType="RAWData" fileSha1="184b4f34fd13965e98f5996859f1eb3fd0d762fd"/>

Show that it's still just a list of pointers

summary(parentFile[[1]])
##                 Length                 Class1                 Class2 
##                      1 XMLInternalElementNode        XMLInternalNode 
##                 Class3                   Mode 
##        XMLAbstractNode            externalptr
cat("\n")
cat("\n")
as.character(parentFile[1])
## [1] "<pointer: 0x0000000018da0690>"

To get the actual value of the tag

XML::xmlAttrs(parentFile[[1]])
##                                                                                                       fileName 
## "file://C:\\Users\\Vanessa\\OneDrive - University of Illinois at Chicago\\MS\\D051_1-4_01_1639.d/Analysis.baf" 
##                                                                                                       fileType 
##                                                                                                      "RAWData" 
##                                                                                                       fileSha1 
##                                                                     "184b4f34fd13965e98f5996859f1eb3fd0d762fd"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment