Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
library(ggplot2) | |
# plot volcano plot | |
res <- read.table("Male_vs_Female_Pre", head=TRUE) | |
res$Significant <- ifelse(res$adj.P.Val < 0.05, "adj.P.Val < 0.05", "Not Significant") | |
ggplot(res, aes(x = logFC, y = -log10(P.Value))) + | |
geom_point(aes(color = Significant)) + | |
scale_color_manual(values = c("red", "grey")) + | |
theme_bw(base_size = 12) + theme(legend.position = "bottom") + | |
geom_text_repel( |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Aim: Identifying genes that are at least 2 fold higher mRNA levels (FPKM) in a particular tissue as | |
# -compared to all other tissues. | |
# Author: Gireesh Bogu | |
# Date: Jun 25th, 2017 | |
# Location: CRG, Barcelona | |
# Problem: How to idenitfy tissue-specific genes especially when you have | |
# - large number of tissues (>50) and even larger number of samples per tissue (>100 for example). | |
# GTEx (2) has 53 tissue sites and each tissue site has 10 to 400 samples |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Author: Gireesh Bogu | |
# Location: CRG, Barcelona | |
# Date: June 1, 2017 | |
#@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | |
# what it does: joins multiple files (by using their location instead of specifying each file name) with -- | |
# ---similar repeat_ids and renames the columns with file names. | |
# file1 = SRRX10101 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Author: Gireesh Bogu, Date: 27th May 2017, Place: CRG, Barcelona | |
# worked on a file with 200 million rows and 3 columns (50 GB file) (Python 2.7) | |
# make sure the file has 3 columns with a proper header | |
# make sure there are no duplicates in the file | |
# make sure the file is tab delimited | |
################################################################################### | |
# USE THIS IF IT IS NOT A BIG FILE | |
import pandas as pd |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# How to plot a BIG data set (600 million rows/values with 8555 keys) | |
# Use the follow example!! | |
library(dplyr) | |
library(ggplot2) | |
data(diamonds) | |
# plot density of different keys | |
ggplot(diamonds, aes(x=depth)) + geom_line(aes(color= cut), stat="density", size=0.4, alpha=0.4) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Author: Gireesh Bogu | |
# Location: Barcelona | |
# Time: Dec 21, 2016 | |
# Aim-1 [Accomplished]: Sending Flat Rental And Utilities Bills To My Forgetful Flatmate Every First Day Of The Month :/ | |
# Aim-2 [Pending]: Calculating bills from the bank account automatically (This is tricky because of two reasons: (1) probably bank information is hard to access and (2) bills timing on bank statemenet often do not match with the actual monthly bills) | |
# Aim-3 [Pending]: Aim-1 is using gmail but it is nice to extend this to Facebook as she checks it more often than gmail ;) | |
# Add the below usage code to crontab (open it using this command: crontab -e) | |
# Usage: * 09 1 * * /fullpath/annoyingFlatmateAlerts.py (sends email at 9 A.M at every first day of the month) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# see below for UPDATES that include more shorter ways of conversions | |
# How to convert GTF format into BED12 format (Human-hg19)? | |
# How to convert GTF or BED format into BIGBED format? | |
# Why BIGBED (If GTF or BED file is very large to upload in UCSC, you can use trackHubs. However trackHubs do not accept either of the formats. Therefore you would need bigBed format) | |
# First, download UCSC scripts | |
wget http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/gtfToGenePred | |
wget http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/genePredToBed | |
wget http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/bedToBigBed |