Skip to content

Instantly share code, notes, and snippets.

View danielecook's full-sized avatar
😀
Things are going good

Daniel E Cook danielecook

😀
Things are going good
View GitHub Profile
@danielecook
danielecook / django_chado_schema.py
Last active December 10, 2015 21:18
A first attempt at Django models for the Chado database schema.
# This is an auto-generated Django model module.
# You'll have to do the following manually to clean this up:
# * Rearrange models' order
# * Make sure each model has one field with primary_key=True
# Feel free to rename the models, but don't rename db_table values or field names.
#
# Also note: You'll have to insert the output of 'django-admin.py sqlcustom [appname]'
# into your database.
#
# I made the following changes to the models generated using inspectdb
@danielecook
danielecook / format_pubs.py
Created May 14, 2013 19:59
This script takes a csv containing authors and associated Pubmed identifiers (PMIDs) of their publications and outputs a formatted html document of their publications. The first row of the csv should contain the authors, and each row below their publications (as PMIDs). If you put something other than a PMID in it will simply be outputted - so y…
"""
Daniel E. Cook 2013
(danielecook.com)
This script takes a csv containing authors and associated Pubmed identifiers (PMIDs) of their publications and outputs a formatted html document of their publications.
The first row of the csv should contain the authors, and each row below their publications (as PMIDs). If you put something other than a PMID in it will simply be outputted -
so you can add publications that might not be in pubmed or that you want to display in a certain way.
This script might be useful for individuals who maintains publication lists for researchers at a university, for instance.
@danielecook
danielecook / get_snp.py
Last active December 17, 2015 16:18
This is a set of functions used for pulling SNP information and parsing it into an array from the Entrez Database in Python. Requires biopython (pip install biopython)
from pprint import pprint as pp
from Bio import Entrez
Entrez.email = "[email protected]"
def pull_line(var_set,line):
"""
This function parses data from lines in one of three ways:
1.) Pulls variables out of a particular line when defined as "variablename=[value]" - uses a string to find the variable.
@danielecook
danielecook / setup_ucsc_pub_files.sh
Last active December 21, 2015 09:39
Simple bash script to download publication tables from ucsc genome browser and merge.
# Downloading pub tables. Requires wget. Can be installed using home brew for mac
# More information is available here: http://brew.sh/
mkdir ../data/
# Download Files
wget --timestamping --directory-prefix='../data/' 'http://hgdownload.cse.ucsc.edu/goldenPath/hgFixed/database/pubsArticle.txt.gz'
wget --timestamping --directory-prefix='../data/' 'http://hgdownload.cse.ucsc.edu/goldenPath/hgFixed/database/pubsMarkerAnnot.txt.gz'
## wget --timestamping --directory-prefix='../data/' 'http://hgdownload.cse.ucsc.edu/goldenPath/hgFixed/database/pubsSequenceAnnot.txt.gz'
@danielecook
danielecook / setup_chado_human.sh
Created August 29, 2013 16:14
The script I used to setup a chado database and load in some basic human reference data.
#!/usr/bin/env
# Requires wget - can be installed using homebrew if you are on a mac.
## Installation Variables ##
CHADO_DB_USERNAME=""
CHADO_DB_PASS=""
CHADO_DB_NAME="chado"
PATH_TO_PSQL="/Applications/Postgres.app/Contents/MacOS/bin/psql" # I use a special application for mac, for convenience.
@danielecook
danielecook / fetch_gene_coordinates.py
Last active March 24, 2017 10:56
Quick function for fetching the gene coordinates when given the gene name. Can sepcify build.
# Note: Requires mysqldb; install using:
# pip install MySQL-python
from MySQLdb.constants import FIELD_TYPE
import _mysql
db = None
def fetch_gene_coordinates(gene_name,build):
global db # db is global to prevent reconnecting.
if db is None:
@danielecook
danielecook / extract_excel.py
Created November 5, 2013 16:21
This gist will extract each individual worksheet from an excel workbook and export it as a CSV.
# Extract all worksheets form an excel file and export as individual CSVs
# Install xlrd with 'pip install xlrd'
# Thanks to Boud from http://stackoverflow.com/questions/10802417/how-to-save-an-excel-worksheet-as-csv-from-python-unix
import xlrd
import csv
# Open the workbook
x = xlrd.open_workbook('excel_file.xlsx')
@danielecook
danielecook / extract_excel_worksheets_to_csv.py
Created November 9, 2013 02:13
This quick function exports all of the workbooks within an excel file as individual csv's to make the data easier to work with.
import xlrd # pip install xlrd
import csv
import os
def export_workbook(filename):
# Open workbook for initial extraction
workbook = xlrd.open_workbook(filename)
filename = os.path.splitext(filename)[0] # Remove extension
if not os.path.exists(filename):
os.makedirs(filename)
@danielecook
danielecook / kegg.sh
Created January 15, 2014 17:43
This chunk of code produces 'kegg_merged.txt' which is a file consisting of genes and their respective pathways. This gist downloads a number of files from the UCSC genome browser and merge them together.
# Download KEGG Data (Pathways)
#==============================#
# Download select files from UCSC (hg19)
for var in keggPathway KeggMapDesc knownGene kgXref
do
wget --timestamping --directory-prefix test 'ftp://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/$var.txt.gz'
gunzip kegg/$var.txt.gz
done
@danielecook
danielecook / hapmap_create_sqlite.py
Created January 20, 2014 06:00
Downloads allele frequencies from HapMap 2010-08_phaseII+III and constructs an sqlite database for easy access.
#! /usr/local/bin/Python
import sqlite3
import os
import glob
import time
import sqlalchemy
from sqlalchemy import Table, Column, Index, Integer, String, Float, MetaData, ForeignKey
from sqlalchemy import create_engine
import datetime