git clone https://gist.github.com/cb33c735c7cf3f3cf8e8.git r-and-sql-demo
cd r-and-sql-demo
pwd
Open RStudio, set working directory to whatever pwd
reported, and open the demo.R
script to follow along.
git clone https://gist.github.com/cb33c735c7cf3f3cf8e8.git r-and-sql-demo
cd r-and-sql-demo
pwd
Open RStudio, set working directory to whatever pwd
reported, and open the demo.R
script to follow along.
""" | |
https://www.biostars.org/p/152517/ | |
Example of how to work with Ensembl release 81 GTF files, which: | |
1) already have genes and transcripts included | |
2) have unique IDs for genes, transcripts, and exons in the corresponding | |
"<featuretype>_id" attribute |
#!/usr/bin/env bash | |
# Ryan Dale, July 2015 | |
# [email protected] | |
# | |
# CollectRnaSeqMetrics.jar from Picard [1] needs an interval list corresponding | |
# to ribosomal RNA. The format is described at [2]. | |
# | |
# SAM header creation idea from [3]; idea for using rmsk tables to get rRNA is | |
# from [4]. |
The MIT License (MIT) | |
Copyright (c) 2016 Ryan Dale | |
Permission is hereby granted, free of charge, to any person obtaining a copy | |
of this software and associated documentation files (the "Software"), to deal | |
in the Software without restriction, including without limitation the rights | |
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | |
copies of the Software, and to permit persons to whom the Software is | |
furnished to do so, subject to the following conditions: |
from gffutils.iterators import DataIterator | |
input_filename = 'example.gff' | |
output_filename = 'output.gff' | |
with open(output_filename, 'w') as fout: | |
for feature in DataIterator(input_filename): | |
# len() works to get the length of a feature in bp | |
if len(feature) < 1000: | |
continue |
#!/bin/bash | |
set -e | |
set -o pipefail | |
# All-in-one installation script to download, configure, and run cloudbiolinux | |
# to install bioinformatics tools locally without needing sudo. The executables | |
# will go into $INSTALL_DIR: | |
INSTALL_DIR=~/tmp/cbl_demo | |
# See https://github.com/chapmanb/cloudbiolinux for more info on customizing |
import pybedtools | |
# This demo uses files that ship with pybedtools | |
a = pybedtools.example_bedtool('a.bed') | |
fasta = pybedtools.example_filename('test.fa') | |
# Use a properly-formatted BED file, and then post-process the resulting fasta. | |
x = a.sequence(fi=fasta, s=True) | |
for i, line in enumerate(open(x.seqfn)): | |
if line.startswith('>') and i >0: |
##gff-version 3 | |
scaffold_28 prediction gene 1 402 0 + . ID=545184;Name=545184 | |
scaffold_28 prediction gene 805 981 0 - . ID=93782;Name=93782 | |
scaffold_28 prediction gene 2030 2721 0 + . ID=545205;Name=545205 | |
scaffold_28 prediction gene 3273 3545 0 - . Name=YOL159C-A;Synteny=no_synteny;SystematicGeneName=YOL159C-A;ID=38792 | |
scaffold_28 prediction gene 5318 5833 0 - . Name=YOL159C;Synteny=no_synteny;SystematicGeneName=YOL159C;ID=38793 | |
scaffold_28 prediction gene 6780 8600 0 - . Name=ENB1;Synteny=no_synteny;SystematicGeneName=YOL158C;StandardGeneName=ENB1;ID=38794 | |
scaffold_28 prediction gene 9698 11467 0 - . Name=IMA4;Synteny=no_synteny;SystematicGeneName=YJL221C;StandardGeneName=IMA4;ID=38795 |
This gist provides example data for the metaseq_demo.py script
import pybedtools | |
import pandas | |
def split_coverage(x): | |
""" | |
Split a coverage file created using bedtools coverage -hist -- which will | |
have trailing "all" hist lines -- into 1) a BedTool object with valid BED | |
lines and 2) a pandas DataFrame of all coverage, parsed from the trailing | |
"all" lines. |