Skip to content

Instantly share code, notes, and snippets.

View clintval's full-sized avatar

Clint Valentine clintval

View GitHub Profile
@clintval
clintval / dxapp.json
Created January 30, 2021 16:15
DNAnexus App for checking md5
{
"name": "md5check",
"title": "MD5 checksum",
"summary": "Generate MD5 checksum of one or more files and save the output info in a text file. Hard timeout policy for 10hrs.Use batch input processing or instance of large storage if you have large amount of data to process. Questions go to [email protected]",
"dxapi": "1.0.0",
"version": "0.0.1",
"inputSpec": [
{
"name": "i_put",
"label": "Input File(s)",
@clintval
clintval / amplicon_record.rs
Created January 26, 2021 05:50
An amplicon record for VarDictJava's amplicon-mode
/// A record of output from VarDict/VarDictJava run in amplicon-aware mode.
#[derive(Debug, Deserialize)]
struct AmpliconVariant<'a> {
pub sample: &'a str,
pub interval_name: &'a str,
pub contig: &'a str,
pub start: u64,
pub end: u64,
pub ref_allele: &'a str,
@clintval
clintval / kindle.rb
Created January 17, 2021 18:41 — forked from tobi/kindle.rb
Download your Kindle Highlights to local markdown files. Great for Obsidian.md.
#!/usr/bin/env ruby
# gem install active_support
require 'active_support/inflector'
require 'active_support/core_ext/string'
# gem install webrick (only ruby3)
require 'webrick'
# gem install mechanize
@clintval
clintval / how-to-download-wistia-hosted.md
Last active January 18, 2021 15:26
Download Wistia Video
  1. Go to the video’s URL
  2. Right-Click on the video and select “Copy link and thumbnail”
  3. Open a notepad and paste the link you just copied.
  4. Search for a link that contains ?wvideo=<id>, where the appended <id> is the video identifier code.
  5. Copy that the video identifier code in notepad, and append it to this link: http://fast.wistia.net/embed/iframe/<id>
  6. Open up a new browser tab using the newly created link, and the video will load full-screen.
  7. Use the developers tools to open up and view the source-code of the new video in another tab.
  8. Search for the first link/URL that ends in “.bin”
  9. Copy and paste that link into a new tab, and then replace the “.bin” with “.mp4”, and hit enter.
  10. The video should now be able to download the video.
@clintval
clintval / Counter.scala
Created November 25, 2020 01:43
Use a Bloom filter to count all the unique elements in an iterator, approximately
package com.twinstrandbio.math
import breeze.util.BloomFilter
import com.fulcrumgenomics.commons.util.SimpleCounter
import scala.reflect.runtime.{universe => ru}
/** Methods for counting. */
object Counter {
@clintval
clintval / aws-cli-time-travel.md
Created October 16, 2020 15:52
AWS CLI ls time travel

Fooled by the timezone you live in?

EDT - implicit

❯ aws s3 ls s3://example-ngs-data/30-415555663/ | head -n1
2020-10-14 17:57:17 24784494053 sample-1_S21_L003_R1_001.fastq.gz

PDT - explicit

object SampleUtil {
/** Join all of the data across a collection of samples. All fields will be joined on the delimiter `";"`. Regardless
* of the lanes the libraries were sequenced on, the resulting sample will have the lanes field cleared to [[None]].
* The merged sample will have its ordinal set to zero.
*
* @throws IllegalArgumentException when there are no libraries to merge
* @throws IllegalArgumentException when trying to join samples with different sample names
*/
def merge(samples: Seq[Sample]): Sample = {
@clintval
clintval / validate-remote-s3-paths.py
Created December 26, 2019 16:55
Validate the S3 URIs in some delimited sample sheets actually exist
# After a `pip install sample-sheet pendant`
from sample_sheet import SampleSheet
from pendant.aws.s3 import S3Uri
def s3_validate_sample_sheet(path):
for sample in SampleSheet(path):
left = S3Uri(sample.PathToFastq1)
right = S3Uri(sample.PathToFastq2)
assert left.object_exists()
# Requires the STAR executable to be at:
# /pipeline/packages/star
#
# Overhang is set to the read length (template cycles) of 142 - 1:
#
OVERHANG=141
git clone \
https://github.com/dpryan79/ChromosomeMappings.git \
@clintval
clintval / genbank-accession-cheatsheet.md
Created December 23, 2019 05:14
GenBank Accession Number Reference Sheet

GenBank Accession Number Reference Sheet:

The International Nucleotide Sequence Database Collaboration (INSDC) consists of the DNA DataBank of Japan (DDBJ), the European Molecular Biology Laboratory (EMBL) and GenBank at NCBI. As part of the Collaboration, all three organizations accept new sequence submissions and share sequence data among the three databases. To facilitate the exchange of data, each member of the collaboration is assigned certain accession prefixes. In addition to the accession number, GenBank records also have a GI number. The GI number is simply a series of digits assigned consecutively to sequences submitted to NCBI.

Format of GenBank accession numbers: