Skip to content

Instantly share code, notes, and snippets.

@bdkosher
Last active April 10, 2017 14:51
Show Gist options
  • Save bdkosher/58cf49faec71eae2f686b0a7112c4135 to your computer and use it in GitHub Desktop.
Save bdkosher/58cf49faec71eae2f686b0a7112c4135 to your computer and use it in GitHub Desktop.
Extracts application numbers from XML downloads from https://pairbulkdata.uspto.gov
if (args.length < 1) {
println 'Please provide an argument pointing to a bulk data XML file.'
System.exit(1)
}
import java.nio.file.*
import java.nio.charset.Charset
import javax.xml.stream.*
import javax.xml.namespace.QName
def xml = Files.newBufferedReader(new File(args[0]).toPath(), Charset.forName('UTF-8'))
def r = XMLInputFactory.newInstance().createXMLStreamReader(xml)
boolean isApplicationNumber = false
StringBuilder appNumber = new StringBuilder()
while (r.hasNext()) {
int eventType = r.next()
if (eventType == XMLStreamConstants.START_ELEMENT) {
isApplicationNumber = 'uscom:ApplicationNumberText' == "${r.prefix}:${r.localName}".toString()
} else if (eventType == XMLStreamConstants.CHARACTERS) {
appNumber << new String(r.textCharacters, r.textStart, r.textLength)
} else if (eventType == XMLStreamConstants.END_ELEMENT) {
if (isApplicationNumber) {
println appNumber.toString().replaceAll(/\s/, '')
}
isApplicationNumber = false
appNumber = new StringBuilder()
}
}
@bdkosher
Copy link
Author

Usage: groovy ExtractAppNumbersFromPBD.groovy C:\whatever\2017.xml >> app_nums.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment