ballgown
creates a ballgown object from tablemaker outputballgownrsem
creates a ballgown object from RSEM output. (not yet well-tested).gffRead
andgffReadGR
read GTF (annotation) files into RgffRead
gives you a data framegffReadGR
gives you a GRanges objectgffReadGR
withsplitByTranscript=TRUE
gives you a GRangesList of transcripts in the GTF file
The stattest
function:
- can give it either a full ballgown object, or a feature-by-sample matrix (see argument
gowntable
) - BE CAREFUL WITH PHENOTYPE DATA ORDER
- Returns data frame: feature ID, p-values, and q-values. No nonsense. Optionally returns coefficients (fold changes) for a 2-group comparison.
- can do timecourse tests automatically
- you can specify your own model matrices
- automatically library-adjusts and log transforms, but you can turn this off.
- functions that end in
expr
extract expression matrices. There are default units for each of them:texpr(bg)
gives you an FPKM transcript matrixeexpr(bg)
gives you an exonrcount
matrix (read counts)iexpr(bg)
gives you an intronrcount
matrix (# reads supporting the intron)
- adding the
all
tag to a call to anexpr
function gives you genomic location data too.texpr(bg, 'all')
is particularly informative (contains gene IDs, transcript lengths, etc)
structure(bg)$trans
,structure(bg)$exon
, andstructure(bg)$intron
give assembly structures in GRangesList (transcripts) or GRanges (exon/intron) formtGene
finds the gene to which a transcript belongs. (such grammar wow)transcriptsIDs(bg)
,transcriptNames(bg)
give sets of transcript IDs for bg object.geneIDs(bg)
,geneNames(bg)
give sets of gene IDs for bg object.seqnames(bg)
gives the chromosome names included in the object
indexes(bg)
is list, so use$
indexes(bg)$t2g
,indexes(bg)$i2t
,indexes(bg)$e2t
connect exons, introns, transcripts, genes.indexes(bg)$bamfiles
can contain paths to bam files. Probably wise to have this in same order as pData, but it's not as crucial since no existing functions depend on this.
exprfilter
subsets a ballgown object based mean transcript expression (FPKM or cov)subset
subsets a ballgown object based either on genomic location or on phenotype
pData(bg)
is the phenotype component of a ballgown object.
SUPER IMPORTANT BUSINESS: make SURE the columns of pData are in the same order as the columns of texpr(bg)
. I've tried to implement some automatic checks, but I'm sure they are not perfect.
getGenes
was designed to label assembled transcripts with annotated gene namesannotate_assembly
matches two sets of transcripts based on overlap & calculates percents.checkAssembledTx
plots assembled & annotated transcripts togethercontains
checks whether one set of GRanges is fully contained within another. (I found this useful for determining whether a set of assembled transcripts contained coding sequences).
clusterTranscripts
does k-means or hierarchical clustering of the transcripts within a genepctOverlap
calculates the percent overlap between two GRanges objectscollapseTranscripts
actually calculates cluster-level expression
plotTranscripts
plots all the isoforms of a specific geneplotMeans
plots the isoform structure for a gene but makes a multipanel plot, one for each phenotype group, and colors the isoforms by mean abundance.plotLatentTranscripts
does transcript clustering and plots the results
- One of:
- fasta file containing annotated transcripts
- GTF file containing transcript structures + chromosome sequences. (check out the igenomes indexes).
- specify groups / fold changes.
That's all you need - Polyester can automate the rest.
Can do simple simulations with one function: simulate_experiment
. It's fairly well-documented (in my completely biased opinion -- please let me know if it's confusing).
You can play around with different parts of the simulation using the internal functions. They're all exported.
generate_fragments
reverse_complement
get_reads
add_error
andadd_platform_error
You can also add a bunch of biases to your simulation:
- GC bias (not on github yet, coming ASAP)
- positional bias (in
generate_fragments
) - non-normal fragment length distribution (in
generate_fragments
)' - non-uniform error model (using
add_platform_error
instead ofadd_error
). In my experience this is a bit slow.
I don't have all the features fully wrapped into simulate_experiment yet so usability isn't quite guaranteed for the new stuff yet. I promise I'll put them in soon (definitely before the spring Bioconductor release, and hopefully well before then).
The other major feature of Polyester is that you can just define a read count matrix (number of reads to simulate per transcript per sample) and simulate that exact number of reads per transcript. The function is simulate_experiment_countmat
.
I have some code for ROC curves hidden in the ballgown package ballgown:::assessSim
. (It's too specific to export, but code for calculating sensitivity/specificity for an annotated/assembled simulation is a huge pain to write, so I stuck it in Ballgown. The sim_tx_info.txt
file output by Polyester is basically the exact input you need for this).