Skip to content

Instantly share code, notes, and snippets.

View edawson's full-sized avatar

Eric T. Dawson edawson

View GitHub Profile
@edawson
edawson / benchmark-encoding.md
Created July 25, 2022 15:42 — forked from shenwei356/benchmark-encoding.md
k-mer encoding and decoding

Functions:

# encoding: ACTG

def nuc2int_nochecking(b):
    return (ord(b) >> 1) & 3, True
    
def nuc2int_if(b):
    if b == 'a' or b == 'c' or b == 'g' or b == 't' \

or b == 'A' or b == 'C' or b == 'G' or b == 'T':

#############
## Download the 30X hg19-aligned bam from Google's public sequencing of HG002
## and the respective BAI file.
#############
wget https://storage.googleapis.com/brain-genomics-public/research/sequencing/grch37/bam/hiseqx/wgs_pcr_free/30x/HG002.hiseqx.pcr-free.30x.dedup.grch37.bam
wget https://storage.googleapis.com/brain-genomics-public/research/sequencing/grch37/bam/hiseqx/wgs_pcr_free/30x/HG002.hiseqx.pcr-free.30x.dedup.grch37.bam.bai
#!/bin/bash
########################
## In this gist, we'll reuse the commands from our 3.6 tutorial to align reads and generate BAM files.
## Check out the full post at https://medium.com/@johnnyisraeli/accelerating-germline-and-somatic-genomic-analysis-of-whole-genomes-and-exomes-with-nvidia-clara-e3deeae2acc9
and Gists at:
## https://gist.github.com/edawson/e84b2785db75d3c0aea9cc6a59969d45#file-full_pipeline_and_data_prep_parabricks3-6-sh
## and
## https://gist.github.com/edawson/e84b2785db75d3c0aea9cc6a59969d45#file-step_1_align_reads_parabricks3-6-sh
###########
#!/bin/bash
## Download the HG002 30X BAMs, kindly sequenced and shared by Google
gsutil cp gs://brain-genomics-public/research/sequencing/fastq/hiseqx/wgs_pcr_free/30x/HG002.hiseqx.pcr-free.30x.R1.fastq.gz .
gsutil cp gs://brain-genomics-public/research/sequencing/fastq/hiseqx/wgs_pcr_free/30x/HG002.hiseqx.pcr-free.30x.R2.fastq.gz .
## Download GRCh38
wget ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz
gunzip GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz
@edawson
edawson / nextflow cheat sheet
Created November 26, 2020 20:19 — forked from elowy01/nextflow cheat sheet
nextflow cheat sheet
#Example 1:
#!/usr/bin/env nextflow
params.str = 'Hello world!'
process AFcalc {
"""
echo '${params.str}'
@edawson
edawson / cudamap.cc
Created April 3, 2020 19:29 — forked from sjolsen/cudamap.cc
Combining memory-mapped I/O and CUDA mapped memory
#include <sys/mman.h>
#include <fcntl.h>
#include <unistd.h>
#include <cuda_runtime.h>
#include <cerrno>
#include <cstring>
#include <memory>
#include <stdexcept>
@edawson
edawson / readBam.C
Created March 19, 2020 03:02 — forked from PoisonAlien/readBam.C
reading bam files in C using htslib
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <htslib/sam.h>
int main(int argc, char *argv[]){
samFile *fp_in = hts_open(argv[1],"r"); //open bam file
bam_hdr_t *bamHdr = sam_hdr_read(fp_in); //read header
bam1_t *aln = bam_init1(); //initialize an alignment
@edawson
edawson / Genomics_A_Programmers_Guide.md
Created May 17, 2019 14:19 — forked from andy-thomason/Genomics_A_Programmers_Guide.md
Genomics a programmers introduction

Genomics - A programmer's guide.

Andy Thomason is a Senior Programmer at Genomics PLC. He has been witing graphics systems, games and compilers since the '70s and specialises in code performance.

https://www.genomicsplc.com

Bedtools Cheatsheet

General:

Tools Description
flank Create new intervals from the flanks of existing intervals.
slop Adjust the size of intervals.
shift Adjust the position of intervals.
subtract Remove intervals based on overlaps b/w two files.
@edawson
edawson / wdl_idioms.wdl
Last active March 3, 2019 17:16
An example WDL file which documents some idioms of the language
## Tasks are upper camel-cased
task CheckSex{
File sampleBAM
File sampleIndex
## Optional parameters receive a '?' after the type
Int? diskGB
## select_first can be used to set default values
diskGB = select_first([diskGB, 100])