Skip to content

Instantly share code, notes, and snippets.

View disulfidebond's full-sized avatar

disulfidebond disulfidebond

  • UWisconsin-Madison
  • Madison, WI
View GitHub Profile
@disulfidebond
disulfidebond / sort_org_sra_data.md
Created July 24, 2019 17:47
Sort and Organize SRA Data

Overview

The Baylor-09, Baylor-10, and Baylor-11 samples have been uploaded to SRA, but need to be organized. The following workflow was created for this task.

Methods and Code

First, download the metadata file with SRA accessions from https://submit.ncbi.nlm.nih.gov

Then, run the following bash command to parse out the identifiers from the files. As long as you use a unique identifier, such as sample_name, and only search through the downloaded files, the following command will work:

@disulfidebond
disulfidebond / sra_upload_walkthrough.md
Last active July 17, 2019 15:31
SRA Upload Walkthrough

Overview

This writeup will provide instruction and a template to follow when creating submissions to the SRA archive.

Description

  1. You must have a NCBI account before beginning. Note that if you have an existing account from another institution, you may use that account, but be advised that it may be better to register for a new account that links you to your current institution.

  2. Login to NCBI. From the home page, if you click the submission wizard link, it will show you all of the possible submissions. If you select Sequence Read Archive, it will take you to a help page.

@disulfidebond
disulfidebond / bash_black_magic_trickery_2.md
Created May 29, 2019 18:41
Tips and Tricks using Bash to solve Informatics problems

Overview

This is the first of a series of write-ups that demonstrate using Bash tricks to solve an Informatics Problem.
The format is the Overview section briefly describes the Problem, and describes pitfalls and difficulties. The Method section describes any relevant background and CS theory, note it may be blank. The Solution section describes how to solve the described problem.

This problem involves one approach to fix a malformed fasta file. Briefly, a fasta file must have the format:

    >some_header Spaces allowed

AATTCCGGAACCGGAACCAA

@disulfidebond
disulfidebond / ont_guppy_setup.md
Last active March 27, 2023 10:04
ONT Guppy setup

Overview

This markdown file contains the steps involved in configuring a new computer, runnning Ubuntu 16.04, to run ONT Guppy GPU basecalling.

Prerequisites

  • CUDA must be installed, which can be simple or extremely difficult, depending on if the CUDA gods smile on you.
  • The computer must be running Ubuntu 16.04 'xenial', with all updates installed.

Steps

@disulfidebond
disulfidebond / prob_roche_peptide.md
Last active May 24, 2019 17:57
Probability for QA of Roche Peptide array

Description

Given a Peptide Array with region R of l columns and w rows, what is the probability that a randomly selected l * r region is indicative of contamination?

  • Possible contamination is defined as x positive luminescent data values within a total possible number of values t.
  • Contamination is defined as N successive positive Bernoulli trials of possible contamination regions. For example, out of a possible 10 tested regions, if 9 are positive, then this is considered contamination

General Solution

N = Number of tested regions, where each tested region is defined as a Bernoulli trial r = success, where success is defined as above a contamination threshold p = probability of contamination, for example, 4 positive luminescent values within a total possible 10 values would be 4/10 = 0.4

@disulfidebond
disulfidebond / CUDA_install_ubuntu16.md
Last active January 7, 2021 11:15
CUDA install on Ubuntu 16.04

Overview

This gist provides the steps to take when setting up CUDA on Ubuntu 16.04, along with comments and advice. Before starting, it is critical to both read through this guide, and to research what NVIDIA drivers are required for your GPU. Do not rely on the installer to do this for you, and in the event of a conflict between the driver that you looked up and the driver that is suggested by the installer, always go with what you looked up.

Type of installation

There are two main ways to install CUDA on Ubuntu.

Easy Way

  • Install using package manager.
  • The NVIDIA instructions have detailed information for the curious. On the download page, this is the 'deb(network)' option.
  • To install using the package manager:
@disulfidebond
disulfidebond / bash_black_magic_trickery_1.md
Last active May 9, 2019 15:59
Tips and Tricks using Bash to solve Informatics problems

Overview

This is the first of a series of write-ups that demonstrate using Bash tricks to solve an Informatics Problem.
The format is the Overview section briefly describes the Problem, and describes pitfalls and difficulties. The Method section describes any relevant background and CS theory, note it may be blank. The Solution section describes how to solve the described problem.

A BAM file is suspected of being corrupted, due to EOF errors in an analysis. The file was re-downloaded, but the same EOF/truncated error appeared at the end of the analysis, and caused the command to fail. Two steps will be taken to validate the BAM file: re-verify the checksum, and scanning the file in depth.
The former can be done quickly, while the latter will be time and computing intensive. Both will be described here.

@disulfidebond
disulfidebond / stats_commentary_outliers.md
Created April 5, 2019 20:31
Commentary on removing outliers

Overview

Outliers are a common problem in statistical analysis. It is critical to ensure that bias is not accidentally introduced when any dataset is filtered for datapoints that are outside of an expected range. Generally speaking, removing outliers follows a guideline that datapoints may be removed if they incorrectly influence the analysis in a way that is not consistent with the experimental design. Criteria for filtering should always be established ahead of time, and never be changed after the fact.

Methods

Several methods exist for filtering. One is Cook's Test, or Cook's Distance, which measures the affect outlier datapoints have on the remaining datapoints, and if that change is outside a predetermined threshold. Cook's Distance can be implemented in R, Graphpad Prism, and other statistical analysis software.

In this instance, the dataset was being tested for normalcy as part of a larger workflow. A [QQ-plot](https://stats.sta

@disulfidebond
disulfidebond / fun_with_applescript_2.md
Last active April 23, 2020 07:40
Applescript automation with NCBI

Overview

Applescript has a long and somewhat storied history with Apple. It dates back to OS 'Classic', and has been intermittently updated and forgotten over the decades. Although elements of AppleScript can be seen in Swift, and Objective-C to a degree, AppleScript shares very little in common syntactically with either.

Particularly with the advent of Swift, AppleScript has fallen to the wayside, although it is still extremely useful in certain situations, such as automation, or an interface with Bash and the Apple GUI.

Advanced usage

Let's say you want to run more advanced tasks, like clicking a button or link on a webpage.

Building from the previous gist on AppleScript, create an AppleScript file with the following function:

@disulfidebond
disulfidebond / fun_with_applescript_1.md
Last active March 30, 2019 20:25
Applescript automation with NCBI

Overview

Applescript has a long and somewhat storied history with Apple. It dates back to OS 'Classic', and has been intermittently updated and forgotten over the decades. Although elements of AppleScript can be seen in Swift, and Objective-C to a degree, AppleScript shares very little in common syntactically with either.

Particularly with the advent of Swift, AppleScript has fallen to the wayside, although it is still extremely useful in certain situations, such as automation, or an interface with Bash and the Apple GUI.

Basics

The basic usage via Bash (or Terminal) is:

osascript OPTION [COMMAND || SCRIPTNAME]