This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
## Setup | |
Sys.setenv(CUDA_VISIBLE_DEVICES='') | |
options(tensorflow.extract.warn_tensors_passed_asis = FALSE) | |
library(dplyr, warn.conflicts = FALSE) | |
library(purrr) | |
library(glue) | |
library(envir) | |
library(tensorflow) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import os | |
import hashlib | |
from pathlib import Path | |
import requests | |
import logging | |
from colorama import Fore, Style, init | |
import gzip | |
import shutil | |
import time | |
import random |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import os | |
import random | |
# Get a list of all FASTA files in the bacdive_gff folder | |
FASTA_FILES, = glob_wildcards("fasta/{fasta_file}.fasta") | |
rule all: | |
input: | |
expand("gff/{fasta_file}.gff", fasta_file=FASTA_FILES), | |
expand("reformatted_gff_shuffled/{fasta_file}.gff", fasta_file=FASTA_FILES), |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import os | |
import argparse | |
import random | |
def is_fasta(filename): | |
try: | |
with open(filename, 'r') as f: | |
first_line = f.readline().strip() | |
if not first_line: | |
return 'empty' |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
## app.R ## | |
library(shinydashboard) | |
library(shiny) | |
library(keras) | |
library(deepG) | |
library(ggplot2) | |
library(dplyr) | |
library(DT) | |
library(hdf5r) | |
library(plotly) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--- | |
title: "GenomeNet Viewer" | |
output: | |
flexdashboard::flex_dashboard: | |
orientation: rows | |
social: menu | |
theme: united #cerulean | |
source_code: embed | |
runtime: shiny | |
--- |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#' @title Trains a (mostly) LSTM model on genomic data. Designed for developing genome based language models (GenomeNet) | |
#' | |
#' @description | |
#' Depth and number of neurons per layer of the netwok can be specified. First layer can be a Convolutional Neural Network (CNN) that is designed to capture codons. | |
#' If a path to a folder where FASTA files are located is provided, batches will ge generated using an external generator which | |
#' is recommended for big training sets. Alternative, a dataset can be supplied that holds the preprocessed batches (generated by \code{preprocessSemiRedundant()}) | |
#' and keeps them in RAM. Supports also training on instances with multiple GPUs and scales linear with number of GPUs present. | |
#' @param train_type Either "lm" for language model, "label_header" or "label_folder". Language model is trained to predict next character in sequence. | |
#' label_header/label_folder are trained to predict a corresponding class, given a sequence as input. If "label_header", class will be read from f |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
trainMinimalFunctionalAPI <- function(path = "example_files/fasta") { | |
library(wavenet) | |
message("Initialize model! This can take a few minutes.") | |
maxlen <- 1000 | |
input <- keras::layer_input(batch_shape = c(64, maxlen, 6)) | |
# https://github.com/ibab/tensorflow-wavenet/blob/master/wavenet/ops.py#L46 | |
first <- keras::layer_conv_1d( |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
trainMinimalFunctionalAPI <- function(path = "example_files/fasta") { | |
message("Initialize model! This can take a few minutes.") | |
input <- keras::layer_input(batch_shape = c(256, 50, 6)) | |
cnn <- | |
keras::layer_conv_1d( | |
object = input, | |
kernel_size = 3, |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The Human Microbiome Project Consortium | |
Curtis Huttenhower, Dirk Gevers, Rob Knight, Sahar Abubucker, Jonathan H. Badger, Asif T. Chinwalla, Heather H. Creasy, Ashlee M. Earl, Michael G. FitzGerald, Robert S. Fulton, Michelle G. Giglio, Kymberlie Hallsworth-Pepin, Elizabeth A. Lobos, Ramana Madupu, Vincent Magrini, John C. Martin, Makedonka Mitreva, Donna M. Muzny, Erica J. Sodergren, James Versalovic, Aye M. Wollam, Kim C. Worley, Jennifer R. Wortman, Sarah K. Young, Qiandong Zeng, Kjersti M. Aagaard, Olukemi O. Abolude, Emma Allen-Vercoe, Eric J. Alm, Lucia Alvarado, Gary L. Andersen, Scott Anderson, Elizabeth Appelbaum, Harindra M. Arachchi, Gary Armitage, Cesar A. Arze, Tulin Ayvaz, Carl C. Baker, Lisa Begg, Tsegahiwot Belachew, Veena Bhonagiri, Monika Bihan, Martin J. Blaser, Toby Bloom, Vivien Bonazzi, J. Paul Brooks, Gregory A. Buck, Christian J. Buhay, Dana A. Busam, Joseph L. Campbell, Shane R. Canon, Brandi L. Cantarel, Patrick S. G. Chain, I-Min A. Chen, Lei Chen, Shaila Chhibba, Ken Chu, Dawn M. C |
NewerOlder