Skip to content

Instantly share code, notes, and snippets.

View RyanSchu's full-sized avatar

Ryan Schubert RyanSchu

View GitHub Profile
gene_list = []
with open('/home/ryan/multi_coding_subset.txt', 'r') as assoc:
for line in assoc:
intron,gene_vec = line.split('\t')
gene_vec = gene_vec.replace('c','').replace('\"','').replace('(','').replace(')','').replace(' ','').replace('\n','')
newvec = gene_vec.split(',')
for i in newvec:
if i not in gene_list:
gene_list.append(str(i))
@RyanSchu
RyanSchu / Welcome to the Wheeler Lab.md
Last active September 12, 2019 18:57
a primer for new members

Greetings! If you're reading this you've been welcomed into the wheeler lab for the semester. Congradulations! This collection of documents will serve as a guide for some of the various tools you'll be using this semester. By no means is it comprehensive, but the hope is that it will serve as a directory to point you towards more useful resources, including tutorials, cheat sheets, papers, twitter threads, SOPs, and manual pages. Most of the lab is catered towards independent problem solvers. Feel free to shoot any of the senior members a message for help, but you learn the most by just trying. Good luck and get to work!

Things you'll probably use

Everything on this list are things you are likely to use. It has beed divided according to programming language/interface and ordered by how useful I find it, though many of these rankings are arbitrary as I use most of these tools every day.

command line/bash

  • awk x (Also see my [awk cheat sheet]

About Awk

Awk is a text processing language that comes standard with most distributions of Linux/Unix. In my personal experience, awk is faster at parsing, filtering, and writing text files than either python or R with few exceptions. This cheat sheet goes over the basic awk commands that I use the most.

How does awk work

Awk processes a text file line by line and is used to apply some condition to each based on its contents. I have found the most use for it on text files of large matrices (that is text files with distinct, consistent columns) or on text that has clear consistent delimeters. Awk interpretes each column in your line and stores it as a variable from 1 to n where n is the number of columns you have. Say you have a file that looks as such:

ID  gene_name type  start stop Chr
ENSG0 C1orf22 protein_coding 178965 183312 chr2
@RyanSchu
RyanSchu / Qsub_dependencies.md
Last active May 17, 2019 22:57
Creating dependencies in qsub

Hi guys, lately we use a lot of cores for all of us to run things. We want to make sure we are always leaving at least one core open, but it can be a pain to wait for things to finish to qsub new things. This gist shows the basics of how to check on memory/cpu usage and create dependencies so you can submit all your jobs, but not take up all the cores at once. We can do this using the -W flag in qsub.

Checking memory

If the CLI seems to be running slow check the system memory with the command free -h. This will display the following items

total        used        free      shared  buff/cache   available

We care about free memory. If the free memory drops too low (say less than 80GB) someone with sudo privelages (Ryan or Dr. Wheeler) can clear out the buff/cach memory with the following commands

@RyanSchu
RyanSchu / Email Match.md
Last active November 6, 2022 22:46
Matching an email - Regex tutorial

Regular Expression Tutorial: Matching an Email

Many strings have a structure, pattern, or logic that can be used to identify and validate data. Regular expressions (regex) are a means of identifying strings that meet some such structure. This tutorial will go through a regex example that identifies strings that are in a valid email structure.

Summary

/^([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$/