Skip to content

Instantly share code, notes, and snippets.

View clintval's full-sized avatar

Clint Valentine clintval

View GitHub Profile
from collections import defaultdict
import matplotlib.pyplot as plt
import numpy as np
import vcf
from palettable.colorbrewer.qualitative import Paired_12
from scipy.stats import gaussian_kde
from vcf_figures.helpers import *
@clintval
clintval / README.md
Last active December 10, 2019 21:03
A terrible Exome Pipeline in Snakemake

Keybase proof

I hereby claim:

  • I am clintval on github.
  • I am clintval (https://keybase.io/clintval) on keybase.
  • I have a public key ASA1U0QRFP5NAtESP-6ztuMfcyXE23LNCGe9r7Vlb71nRgo

To claim this, I am signing this object:

@clintval
clintval / r-bioinformatics-specialties.csv
Last active December 23, 2019 02:52
Skillsets from the /r/bioinformatics Slack Group
We can make this file beautiful and searchable if this error is corrected: It looks like row 4 should actually have 10 columns, instead of 1 in line 3.
Name,Slack Call Sign,Location,Interests,Skills (General),Preferred Language,Programming Languages,Education Background,Github Username,Preferred group within collab
Mark van der Sman,mvdsman,"Leiden, The Netherlands","Visualisation, genomics, pattern recognition/ML, statistical analysis, phylogenetics/evolution (and many more)","Visualisation, genomics, web development, some transcriptomics and some Machine Learning/Natural Computing. Biologist going for MSc BI",R,"Most experienced in R/RShiny, decent in Python and HTML/CSS/Markdown, basics in C++, PHP, SQL. Flaming hatred towards Matlab","B.Sc. Biology + minor Data Science, going for M.Sc. Bioinformatics",MvdSman,Visualisation
Anthony Fejes,apfejes,"Bay Area, California","Visualization, transcriptomics, epigenomics, genomics,","Biologist, biochemist, programmer, entrepreneur",Python,"Python, C, SQL/No-SQL, Java, etc. (Anything except R, Javascript)","B.Sc. Biochemitry
B.I.S.
M.Sc. Microbiology and Immunology
PhD. Bioinformatics",apfejes,
Dimitrios - Georgios

INFO Fields

Key Definition
SAMPLE Sample name
TYPE Variant Type: SNV Insertion Deletion Complex
DP Total depth
End Chromosome end position
VD Variant depth
AF Allele frequency
@clintval
clintval / genbank-accession-cheatsheet.md
Created December 23, 2019 05:14
GenBank Accession Number Reference Sheet

GenBank Accession Number Reference Sheet:

The International Nucleotide Sequence Database Collaboration (INSDC) consists of the DNA DataBank of Japan (DDBJ), the European Molecular Biology Laboratory (EMBL) and GenBank at NCBI. As part of the Collaboration, all three organizations accept new sequence submissions and share sequence data among the three databases. To facilitate the exchange of data, each member of the collaboration is assigned certain accession prefixes. In addition to the accession number, GenBank records also have a GI number. The GI number is simply a series of digits assigned consecutively to sequences submitted to NCBI.

Format of GenBank accession numbers:

# Requires the STAR executable to be at:
# /pipeline/packages/star
#
# Overhang is set to the read length (template cycles) of 142 - 1:
#
OVERHANG=141
git clone \
https://github.com/dpryan79/ChromosomeMappings.git \
@clintval
clintval / validate-remote-s3-paths.py
Created December 26, 2019 16:55
Validate the S3 URIs in some delimited sample sheets actually exist
# After a `pip install sample-sheet pendant`
from sample_sheet import SampleSheet
from pendant.aws.s3 import S3Uri
def s3_validate_sample_sheet(path):
for sample in SampleSheet(path):
left = S3Uri(sample.PathToFastq1)
right = S3Uri(sample.PathToFastq2)
assert left.object_exists()
object SampleUtil {
/** Join all of the data across a collection of samples. All fields will be joined on the delimiter `";"`. Regardless
* of the lanes the libraries were sequenced on, the resulting sample will have the lanes field cleared to [[None]].
* The merged sample will have its ordinal set to zero.
*
* @throws IllegalArgumentException when there are no libraries to merge
* @throws IllegalArgumentException when trying to join samples with different sample names
*/
def merge(samples: Seq[Sample]): Sample = {
@clintval
clintval / aws-cli-time-travel.md
Created October 16, 2020 15:52
AWS CLI ls time travel

Fooled by the timezone you live in?

EDT - implicit

❯ aws s3 ls s3://example-ngs-data/30-415555663/ | head -n1
2020-10-14 17:57:17 24784494053 sample-1_S21_L003_R1_001.fastq.gz

PDT - explicit