git clone https://gist.github.com/cb33c735c7cf3f3cf8e8.git r-and-sql-demo
cd r-and-sql-demo
pwd
Open RStudio, set working directory to whatever pwd reported, and open the demo.R script to follow along.
git clone https://gist.github.com/cb33c735c7cf3f3cf8e8.git r-and-sql-demo
cd r-and-sql-demo
pwd
Open RStudio, set working directory to whatever pwd reported, and open the demo.R script to follow along.
| """ | |
| https://www.biostars.org/p/152517/ | |
| Example of how to work with Ensembl release 81 GTF files, which: | |
| 1) already have genes and transcripts included | |
| 2) have unique IDs for genes, transcripts, and exons in the corresponding | |
| "<featuretype>_id" attribute |
| #!/usr/bin/env bash | |
| # Ryan Dale, July 2015 | |
| # [email protected] | |
| # | |
| # CollectRnaSeqMetrics.jar from Picard [1] needs an interval list corresponding | |
| # to ribosomal RNA. The format is described at [2]. | |
| # | |
| # SAM header creation idea from [3]; idea for using rmsk tables to get rRNA is | |
| # from [4]. |
| The MIT License (MIT) | |
| Copyright (c) 2016 Ryan Dale | |
| Permission is hereby granted, free of charge, to any person obtaining a copy | |
| of this software and associated documentation files (the "Software"), to deal | |
| in the Software without restriction, including without limitation the rights | |
| to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | |
| copies of the Software, and to permit persons to whom the Software is | |
| furnished to do so, subject to the following conditions: |
| from gffutils.iterators import DataIterator | |
| input_filename = 'example.gff' | |
| output_filename = 'output.gff' | |
| with open(output_filename, 'w') as fout: | |
| for feature in DataIterator(input_filename): | |
| # len() works to get the length of a feature in bp | |
| if len(feature) < 1000: | |
| continue |
| #!/bin/bash | |
| set -e | |
| set -o pipefail | |
| # All-in-one installation script to download, configure, and run cloudbiolinux | |
| # to install bioinformatics tools locally without needing sudo. The executables | |
| # will go into $INSTALL_DIR: | |
| INSTALL_DIR=~/tmp/cbl_demo | |
| # See https://github.com/chapmanb/cloudbiolinux for more info on customizing |
| import pybedtools | |
| # This demo uses files that ship with pybedtools | |
| a = pybedtools.example_bedtool('a.bed') | |
| fasta = pybedtools.example_filename('test.fa') | |
| # Use a properly-formatted BED file, and then post-process the resulting fasta. | |
| x = a.sequence(fi=fasta, s=True) | |
| for i, line in enumerate(open(x.seqfn)): | |
| if line.startswith('>') and i >0: |
| ##gff-version 3 | |
| scaffold_28 prediction gene 1 402 0 + . ID=545184;Name=545184 | |
| scaffold_28 prediction gene 805 981 0 - . ID=93782;Name=93782 | |
| scaffold_28 prediction gene 2030 2721 0 + . ID=545205;Name=545205 | |
| scaffold_28 prediction gene 3273 3545 0 - . Name=YOL159C-A;Synteny=no_synteny;SystematicGeneName=YOL159C-A;ID=38792 | |
| scaffold_28 prediction gene 5318 5833 0 - . Name=YOL159C;Synteny=no_synteny;SystematicGeneName=YOL159C;ID=38793 | |
| scaffold_28 prediction gene 6780 8600 0 - . Name=ENB1;Synteny=no_synteny;SystematicGeneName=YOL158C;StandardGeneName=ENB1;ID=38794 | |
| scaffold_28 prediction gene 9698 11467 0 - . Name=IMA4;Synteny=no_synteny;SystematicGeneName=YJL221C;StandardGeneName=IMA4;ID=38795 |
This gist provides example data for the metaseq_demo.py script
| import pybedtools | |
| import pandas | |
| def split_coverage(x): | |
| """ | |
| Split a coverage file created using bedtools coverage -hist -- which will | |
| have trailing "all" hist lines -- into 1) a BedTool object with valid BED | |
| lines and 2) a pandas DataFrame of all coverage, parsed from the trailing | |
| "all" lines. |