A BED file needs to be parsed and reformatted into a CSV file. Broadly speaking, there are two options: use a GUI, or use scripting.
Use BBEdit, Atom, or Apple's TextEdit (see caution) to search and replace. An example is shown here:
- Caution: Apple's TextEdit has an extremely useful GUI for search and replace, that even simplifies Regex. However, it may replace some characters with one that is not recognized by all text editors, such as the double-quote character. You've been warned.
I haven't been able to find a one-step process, but the following scripts accomplish the task, and also can serve as a cookbook for similar tasks.
-
Parse out only the track line and the following line:
grep -A1 'track' radCohort_geneList.bed.txt > radCohort_geneList.unparsed.txt # truncated output: # track name="SOMETHING" description="" # chr1 87863625 87864548 # --
-
remove grep characters
perl -pe 's/--\n//' radCohort_geneList.unparsed.txt > tmp
-
append bed line to track line. Use a Regex negative lookahead
perl -0pe 's/\n(?!([a-z]{5}|$))//g' tmp > tmp2 # truncated output: # track name="ATM" description="ATM serine/threonine kinase [Source:HGNC Symbol;Acc:HGNC:795]" chr11 108222484
-
reformat line into comma-separated and parse out bed entries
perl -pe 's/\t/,/g' tmp2 | cut -d, -f1 > tmp3 # commands can be piped into each other, example: perl -pe 's/\t/,/g' tmp2 | sed 's/itemRgb=\"On\"/chromosome=\"/g' | cut -d, -f1 | sed 's/$/\"/' > tmp4 # truncated output from the above piped command is: # track name="ATM" description="ATM serine/threonine kinase [Source:HGNC Symbol;Acc:HGNC:795]" chromosome="chr11"
-
reformat space-delimited into a comma-separated file, in this case the pattern '" ' is used, but this can be modified as necessary
sed 's/\" /,/g' tmp4 > radCohort_geneList.txt
-
finally, remove the 'track ' string from the beginning of the line:
sed 's/^track //g' radCohort_geneList.txt > radCohort_geneList.csv