Skip to content

Instantly share code, notes, and snippets.

@alexpreynolds
Created October 24, 2024 04:34
Show Gist options
  • Save alexpreynolds/8e48a969dd17ee6c340b5e3d8e6813d8 to your computer and use it in GitHub Desktop.
Save alexpreynolds/8e48a969dd17ee6c340b5e3d8e6813d8 to your computer and use it in GitHub Desktop.
Convert Gencode GFF3 to GeneSearch widget JSON
#!/usr/bin/env python
import sys
import json
genes = []
for line in sys.stdin:
elems = line.rstrip().split('\t')
chromosome = elems[0]
start = int(elems[3])
end = int(elems[4])
length = end - start
md = elems[8].split(';')
hgnc = list(filter(lambda x: x.startswith('gene_name='), md))[0].split('=')[1]
type = list(filter(lambda x: x.startswith('gene_type='), md))[0].split('=')[1]
posneg = elems[6]
gene = {
'chromosome': chromosome,
'start': start,
'end': end,
'length': length,
'hgnc': hgnc,
'type': type,
'posneg': posneg
}
genes.append(gene)
sys.stdout.write(json.dumps(genes, separators=(',', ':')))
@alexpreynolds
Copy link
Author

alexpreynolds commented Oct 24, 2024

From text GFF3:

$ ./gff3_gene_to_genesearch_json.py < <(awk -vFS="\t" -vOFS="\t" '($3 == "gene")' gencode.gff3) > genesearch.json

Or from compressed records:

$ ./gff3_gene_to_genesearch_json.py < <(gunzip -c gencode.gff3.gz | awk -vFS="\t" -vOFS="\t" '($3 == "gene")') > genesearch.json

Or from the web:

$ ./gff3_gene_to_genesearch_json.py < <(wget -qO- https://url/path/to/gencode.gff3.gz | gunzip -c | awk -vFS="\t" -vOFS="\t" '($3 == "gene")') > genesearch.json

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment