Skip to content

Instantly share code, notes, and snippets.

View chasemc's full-sized avatar
:octocat:

Chase Clark chasemc

:octocat:
View GitHub Profile
library(ggplot2)
library(data.table)
library(geofacet)
library(magrittr)
raw_data <- data.table::fread("http://covidtracking.com/api/states/daily.csv")
raw_data$date <- as.Date(as.character(raw_data$date), "%Y%m%d")
raw_data <- raw_data[date > "2020-03-15", ]
@chasemc
chasemc / update_idbac_db.R
Created September 4, 2020 13:51
Update to new IDBAc db schema (currently need "prioritizer" git branch of IDBac)
old_files_path <- "/home/user/Downloads/db/old"
a <- list.files(old_files_path, full.names = FALSE)
a <- tools::file_path_sans_ext(a)
for(i in a) {
pool <- IDBacApp::idbac_connect(fileName = i,
filePath = old_files_path)[[1]]
IDBacApp::idbac_update_db(pool = pool,
copy_overwrite = "copy")
@chasemc
chasemc / rich_print_table.py
Created December 3, 2020 23:35
Print delimited files to terminal with https://github.com/willmcgugan/rich (Dependencies= pandas, rich)
import os
import pathlib
from rich.console import Console
from rich.table import Table
from rich.table import Column
import pandas as pd
def make_rich(df, title="mytitle"):
table = Table(title=title)
@chasemc
chasemc / rmarkdown.md
Last active January 18, 2021 21:04
Count how many lines/rows contain X, from many files

Counting Lines

The Problem

I have a lot of files, in a lot of nested directories: 347430 directories, 379286 files

To anonymize what I’m doing we’ll say I have two types of files: apples.csv.gz and oranges.csv.gz (if you don’t know- “gz” means the

@chasemc
chasemc / _run.sh
Created January 24, 2021 19:14 — forked from jexp/_run.sh
Rendering large graphs with vivagraph.js, neo4j-javscript-driver (binary-bolt), meetup dataset and compiled runtime. Oh the joy :)
npm install neo4j-driver
node test-neo-driver.js
@chasemc
chasemc / antismash_get_cds.sh
Created January 28, 2021 14:21
Get CDS IDs from all antismash genbank files in all subdirectories
find -name "*region*gbk" | xargs grep "CDS " -A3 | grep "/ID=" | cut -d'"' -f2
https://github.com/Micromeda/InterProScan-Docker/blob/master/LICENSE
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
@chasemc
chasemc / ga_and_len.sh
Created February 21, 2021 18:56
Extract genomic_accessions and lengths from "ftp.ncbi.nlm.nih.gov/genomes............._assembly_report.txt"
#!/usr/bin/bash
curl -s $1 |\
sed -ne '/# Sequence-Name\tSequence-Role\tAssigned-Molecule\tAssigned-Molecule-Location\/Type\tGenBank-Accn\tRelationship\tRefSeq-Accn\tAssembly-Unit\tSequence-Length\tUCSC-style-name/,$ p' |\
awk -F"\t" 'NR==1 {for (i=1; i<=NF; i++) {f[$i] = i}}{ print $(f["RefSeq-Accn"]), $(f["Sequence-Length"])}' |\
sed 1d
#!/usr/bin/env Rscript
args = commandArgs(trailingOnly=TRUE)
message("Installing necessary libraries if not already installed")
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
if (!requireNamespace("mzR", quietly = TRUE))
install.packages("mzR")
if (!requireNamespace("data.table", quietly = TRUE))
install.packages("data.table")
@chasemc
chasemc / md5_as_filename.sh
Created April 15, 2021 12:20
Rename files by md5 and chosen extension
#!/usr/bin/bash
# $1 is the file(s') name to find and hash
# $2 is the extension to be given to each renamed file
find $1 -print0 | xargs -0 md5sum |
while read -r newname oldname; do
mv -v "$oldname" "$newname".$2
done