Skip to content

Instantly share code, notes, and snippets.

View seandavi's full-sized avatar

Sean Davis seandavi

View GitHub Profile

Apache Airflow setup

base installations

sudo apt-get update
sudo apt-get install python3 python3-pip virtualenv
# sudo apt-get install emacs tmux 
@seandavi
seandavi / parallel_bulk_example.py
Created April 27, 2019 13:02
elasticsearch parallel bulk indexing example in python
"""parallel bulk indexing
Indexes a json file using parallel_bulk.
"""
import elasticsearch
from elasticsearch import Elasticsearch
from elasticsearch.helpers import parallel_bulk as pb
from collections import deque
@seandavi
seandavi / create_eks_cluster.sh
Created April 21, 2019 19:12
Small bash script to install eksctl and then create cluster
#!/bin/bash
# Start EKS cluster on AWS
# Install eksctl (https://eksctl.io/)
# On mac, homebrew
brew tap weaveworks/tap
brew install weaveworks/tap/eksctl
# start cluster (takes a few minutes)
@seandavi
seandavi / omicidx_intro.Rmd
Last active March 15, 2019 15:33
Quick introduction to using the OmicIDX from R
---
title: "Playing with OmicIDX"
author: "Sean Davis"
date: "3/14/2019"
output:
BiocStyle::html_document:
toc_float: True
---
# Introduction to the OmicIDX API
@seandavi
seandavi / omicidx-beta-graphql-intro.ipynb
Created March 12, 2019 00:47
Quick introduction to the OmicIDX GraphQL API
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@seandavi
seandavi / Dockerfile
Created February 6, 2019 19:04
Dockerfile for blog post on using GCR. Builds SRA-toolkit with dbGaP access as an example
FROM ubuntu:18.04
RUN apt-get update
RUN apt-get install -y wget
# We do things this way to keep the docker image
# size down. See https://nickjanetakis.com/blog/docker-tip-3-chain-your-docker-run-instructions-to-shrink-your-images
RUN wget http://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/2.9.2/sratoolkit.2.9.2-ubuntu64.tar.gz \
&& tar -xvzf sratoolkit.2.9.2-ubuntu64.tar.gz \
&& rm sratoolkit.2.9.2-ubuntu64.tar.gz
@seandavi
seandavi / read_and_process_files_beam.py
Created January 29, 2019 22:21
Read and process full files based on wildcard path using Apache Beam/Google Cloud Platform/DataFlow
from __future__ import print_function
import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions
from apache_beam.io.filesystems import FileSystems
import urllib
import json
import argparse
import logging
logging.basicConfig(level=logging.INFO)
@seandavi
seandavi / fusion_genes.Rmd
Created January 24, 2019 00:21
TARGET osteosarcoma fusion gene analysis sketch
---
title: "fusion genes"
output:
html_document:
self_contained: true
---
```{r include=FALSE}
library(knitr)
@seandavi
seandavi / dataflow_example_sra.py
Last active February 1, 2019 18:49
simple dataflow pipeline from sra json
# requires python 2.7
# pip install apache_beam
from __future__ import print_function
import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions
import json
import argparse
import logging
import urllib2
import urllib
@seandavi
seandavi / mapping_example.json
Created November 20, 2018 15:33
For luqum issue
{
"sra_experiment_joined2": {
"mappings": {
"doc": {
"properties": {
"library_name": {
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"