BED file parsing

Overview

A BED file needs to be parsed and reformatted into a CSV file. Broadly speaking, there are two options: use a GUI, or use scripting.

GUI

Use BBEdit, Atom, or Apple's TextEdit (see caution) to search and replace. An example is shown here:

Caution: Apple's TextEdit has an extremely useful GUI for search and replace, that even simplifies Regex. However, it may replace some characters with one that is not recognized by all text editors, such as the double-quote character. You've been warned.

Scripting

I haven't been able to find a one-step process, but the following scripts accomplish the task, and also can serve as a cookbook for similar tasks.

Parse out only the track line and the following line:

  grep -A1 'track' radCohort_geneList.bed.txt > radCohort_geneList.unparsed.txt 
  # truncated output:
  # track name="SOMETHING" description=""
  # chr1  87863625	87864548 
  # --

remove grep characters

  perl -pe 's/--\n//' radCohort_geneList.unparsed.txt > tmp

append bed line to track line. Use a Regex negative lookahead

  perl -0pe 's/\n(?!([a-z]{5}|$))//g' tmp > tmp2
  # truncated output:
  # track name="ATM" description="ATM serine/threonine kinase [Source:HGNC Symbol;Acc:HGNC:795]" chr11      108222484

reformat line into comma-separated and parse out bed entries

  perl -pe 's/\t/,/g' tmp2 | cut -d, -f1 > tmp3
  # commands can be piped into each other, example:
  perl -pe 's/\t/,/g' tmp2 | sed 's/itemRgb=\"On\"/chromosome=\"/g' | cut -d, -f1 | sed 's/$/\"/' > tmp4
  # truncated output from the above piped command is:
  # track name="ATM" description="ATM serine/threonine kinase [Source:HGNC Symbol;Acc:HGNC:795]" chromosome="chr11"

reformat space-delimited into a comma-separated file, in this case the pattern '" ' is used, but this can be modified as necessary
```
  sed 's/\" /,/g' tmp4 > radCohort_geneList.txt
```

finally, remove the 'track ' string from the beginning of the line:

  sed 's/^track //g' radCohort_geneList.txt > radCohort_geneList.csv

disulfidebond/parse_bed_file.md

Overview

GUI

Scripting