Last active
February 5, 2019 20:01
-
-
Save wflynny/dea420ba14523ac8e662938f61156934 to your computer and use it in GitHub Desktop.
Building 10X reference genomes from Ensembl
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Visit the Ensembl ftp site. | |
# ftp://ftp.ensembl.org/pub/release-95/ | |
# | |
# You want to find data under the following two URLs: | |
# 1. ftp://ftp.ensembl.org/pub/release-95/fasta/[YOUR_SPECIES_HERE]/dna/ | |
# 2. ftp://ftp.ensembl.org/pub/release-95/gtf/[YOUR_SPECIES_HERE]/ | |
# | |
# The first file of interest is under the fasta URL: | |
# [YOUR_SPECIES_HERE].[ASSEMBLY].dna.primary_assembly.fa.gz | |
# or, if that doesn't exist, | |
# [YOUR_SPECIES_HERE].[ASSEMBLY].dna.top_level.fa.gz | |
# | |
# The second file of interest is under the gtf URL: | |
# [YOUR_SPECIES_HERE].[ASSEMBLY].[ASSEMBLY_VERSION].gtf.gz | |
# | |
# With those two URLs in hand, define these 4 things: | |
reference_name="species-assembly" | |
reference_version="3.0.0" # or whatever you want! | |
fasta_url="ftp://.../fasta/...fa.gz" | |
gtf_url="ftp://.../gtf/...gtf.gz" | |
fasta_file=$(basename ${fasta_url}) | |
gtf_file=$(basename ${gtf_url}) | |
gtf_file_filt="${gtf_file%.*}.filtered.${gtf_file##*.}" | |
wget ${fasta_url} | |
gunzip ${fasta_file} | |
wget ${gtf_url} | |
gunzip ${gtf_file} | |
# Check to see what biotypes you have in your data: | |
# e.g. cut -f9 ${gtf_file} | egrep -o 'gene_biotype "(\w+)"' | sort | uniq | |
# load cellranger | |
# module load cellranger/3.0.2 | |
# make STAR compatiable gtf | |
# add whatever "gene_biotypes" you are interested in | |
# here's the bare minimum | |
cellranger mkgtf \ | |
${gtf_file} \ | |
${gtf_file_filt} \ | |
--attribute=gene_biotype:protein_coding \ | |
--attribute=gene_biotype:lincRNA \ | |
--attribute=gene_biotype:antisense | |
# make STAR compatiable reference | |
cellranger mkref \ | |
--genome=${reference_name} \ | |
--fasta=${fasta_file} \ | |
--genes=${gtf_file_filt} \ | |
--ref-version=${reference_version} \ | |
--nthreads=2 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment