Skip to content

Instantly share code, notes, and snippets.

View egeulgen's full-sized avatar
🌊

Ege Ulgen egeulgen

🌊
View GitHub Profile
@egeulgen
egeulgen / split_df_list
Created October 30, 2018 19:22
Splits dataframe into rows by each value in list contained in the dataframe (pandas)
def splitDataFrameList(df,target_column,separator):
''' df = dataframe to split,
target_column = the column containing the values to split
separator = the symbol used to perform the split
returns: a dataframe with each entry for the target column separated, with each element moved into a new row.
The values in the other columns are duplicated across the newly divided rows.
'''
def splitListToRows(row,row_accumulator,target_column,separator):
split_row = row[target_column].split(separator)
for s in split_row:
@egeulgen
egeulgen / excel_col_finder.py
Last active October 23, 2019 12:46
Finds excel-style column name for a zero-based index. e.g. 3 >> D, 26 >> AA, 28 >> AC etc.
def indexExcelColumnFinder(self, idx):
''' Find Excel-style Column Name
Given a 0-based index 'idx', returns the
corresponding Excel-style column naming
(eg. 3 >> D, 26 >> AA, 27 >> AB etc.)
'''
excelColumnNameList = []
alphabet = map(chr, range(65, 91))
if idx < 26:
@egeulgen
egeulgen / boto3_progress_bar.py
Last active October 28, 2024 12:22
To display progress bar and percentage when downloading with boto3
class ProgressPercentage(object):
''' Progress Class
Class for calculating and displaying download progress
'''
def __init__(self, client, bucket, filename):
''' Initialize
initialize with: file name, file size and lock.
Set seen_so_far to 0. Set progress bar length
'''
self._filename = filename
@egeulgen
egeulgen / multi_merge.R
Last active October 23, 2019 12:45
Merges multiple files into a single dataframe
multimerge <- function(mypath){
filenames <- list.files(path=mypath, full.names=TRUE)
datalist <- lapply(filenames, function(x) read.csv(file=x,header=T))
result_df <- Reduce(function(x,y) merge(x,y), datalist)
return(result_df)
}
### Cleaner and faster
# import files
files <- list.files(pattern="*.csv")

Extract fields 2, 4, and 5 from file.txt:

awk '{print $2,$4,$5}' input.txt

Print each line where the 5th field is equal to ‘abc123’:

awk '$5 == "abc123"' file.txt

Print each line where the 5th field is not equal to ‘abc123’:

@egeulgen
egeulgen / factorize.R
Last active November 9, 2023 17:16
Returns all (positive and negative) factors of the given number
FUN <- function(x) {
x <- as.integer(x)
div <- seq_len(abs(x))
factors <- div[x %% div == 0L]
factors <- list(neg = -factors, pos = factors)
return(factors)
}
@egeulgen
egeulgen / issue22.R
Last active October 22, 2019 11:12
M.musculus pathfindR analysis for issue 22
##################################################
## Project: pathfindR
## Script purpose: Try to resolve issue 22
## Date: Oct 15, 2019
## Author: Ege Ulgen
##################################################
options(stringsAsFactors = FALSE)
# Create M.musculus KEGG Gene Sets ----------------------------------------