Downloaded "Antigenic Formulae Of The Salmonella Serovars 2007 9th edition" from:
https://www.pasteur.fr/ip/portal/action/WebdriveActionEvent/oid/01s-000036-089
Remove page numbers:
- match
\d{3}/\d{3}\s+
- replace
- none
Remove alphabet headers:
- match
^\s*[A-Z]\s+
- replace
- none
Remove extra whitespace:
- match
^\s+
- replace
- none
Manually fixed Senftenberg since it was broken across 2 lines
Fix z space subscript number space bracket:
- match
z (\d+) \]
- replace
z\1]
Fix z space subscript number space comma:
- match
z (\d+) ,
- replace
z\1,
Fix z space subscript number:
- match
z (\d+)
- replace
z\1
Manually removed space in:
Paratyphi A
Paratyphi B
Paratyphi C
to give:
ParatyphiA
ParatyphiB
ParatyphiC
Find all serotypes with H0:
- match
^(\w+) {2,}(\S+) {2,}(\S+) {2,}(\S+) {2,}(\S+).*
- replace
\1\t\2\t\3\t\4\t\5
Find all serotypes without H0:
- match
^(\w+) {2,}(\S+) {2,}(\S+) {2,}(\S+).*
- replace
\1\t\2\t\3\t\4\t
Headers:
- Serovar
- O-antigens
- H1
- H2
- H(other)