Dmitry Grapov dgrapov

Biotechnology, data, math, art and code.

dgrapov / covariate_adjust.R

Created January 6, 2024 08:56

Example of linear model base covariate adjustment

	#get linear model residuals
	#' @import dplyr
	#' @export
	dave_lm_adjust<-function(data,formula,test_vars,adjust=TRUE,progress=TRUE){

	if (progress == TRUE){ pb <- txtProgressBar(min = 0, max = ncol(data), style = 3)} else {pb<-NULL}

	out <- lapply(1:length(test_vars), function(i) {
	if (progress == TRUE) {
	setTxtProgressBar(pb, i)

dgrapov / gist:d15aedea295f32fa43d76b0a864c577b

Created May 22, 2019 13:03

Data Science Exercise

	DATA SCIENCE EXERCISE

	The following challenge requires the beer reviews data set called beer_reviews.csv. This data set can be downloaded from the following site: https://data.world/socialmediadata/beeradvocate . Note you can create a free temporary account to download this .csv.

	Questions to answer using this data:
	Which brewery produces the strongest beers by ABV%?
	If you had to pick 3 beers to recommend using only this data, which would you pick?
	Which of the factors (aroma, taste, appearance, palette) are most important in determining the overall quality of a beer?

	Additional math/coding question unrelated to the data:

dgrapov / replace_in.R

Created April 6, 2018 02:07

Deep error

	> in
	Error: unexpected 'in' in "in"

dgrapov / pca.R

Created March 24, 2018 03:09

basic principal components analysis and visualization in R

	# Basic PCA example
	# use www.createdatasol.com for
	# an advanced user interface

	#required packages for plotting
	library(ggplot2)
	library(ggrepel)

	#load data
	data<-read.csv('~/Sampledata.csv',

dgrapov / example.R

Created February 2, 2018 04:48

Example of a shiny app with data upload and different plot options

dgrapov / tanimoto.R

Created January 10, 2018 21:53

fast (?) implementations of tanimoto distance calculations

	#' @title fast_tanimoto
	#' @param mat matrix or data frame of numeric values
	#' @param output 'matrix' (default) or 'edge list' (non-redundant and undirected)
	#' @param progress TRUE, show progress
	#' @imports reshape2
	fast_tanimoto<-function(mat,output='matrix',progress=TRUE){
	mat[is.na(mat)]<-0

	#scoring function
	score<-function(x){sum(x==2)/sum(x>0)}

dgrapov / plotly_select_DT.R

Last active September 10, 2020 01:25

ggplot2 to plotly to shiny to box/lasso select to DT

	#plotly box or lasso select linked to
	# DT data table
	# using Wage data
	# the out group: is sex:Male, region:Middle Atlantic +


	library(ggplot2)
	library(plotly)
	library(dplyr)
	library(ISLR)

dgrapov / SOM example.R

Last active March 11, 2023 11:21

Self-organizing map (SOM) example in R

	#SOM example using wines data set
	library(kohonen)
	data(wines)
	set.seed(7)

	#create SOM grid
	sommap <- som(scale(wines), grid = somgrid(2, 2, "hexagonal"))

	## use hierarchical clustering to cluster the codebook vectors
	groups<-3

dgrapov / example.R

Last active February 21, 2023 15:27

Convert adjacency (or other) matrix to edge list

	library(reshape2)

	gen.mat.to.edge.list<-function(mat,symmetric=TRUE,diagonal=FALSE,text=FALSE){
	#create edge list from matrix
	# if symmetric duplicates are removed
	mat<-as.matrix(mat)
	id<-is.na(mat) # used to allow missing
	mat[id]<-"nna"
	if(symmetric){mat[lower.tri(mat)]<-"na"} # use to allow missing values
	if(!diagonal){diag(mat)<-"na"}

dgrapov / RECA_test.R

Created August 21, 2015 13:42

Testing RECA: Relevant Component Analysis for Supervised Distance Metric Learning

	#R code, testing RECA with the iris data
	library(RECA)

	#test data
	data(iris)
	x<-iris[,-5]
	y<-iris$Species

	#similar groups (species) in each chunk (n=3)
	chunksvec<-as.numeric(y)