David Ochoa d0choa

👨‍💻

Open Targets Platform Coordinator

d0choa / simple_data_mover.sh

Created January 4, 2024 15:58

	#!/bin/bash
	# Job requirements
	#Submit this script with: sbatch thefilename
	#For more details about each parameter, please check SLURM sbatch documentation https://slurm.schedmd.com/sbatch.html

	#SBATCH --time=8:00:00 # walltime
	#SBATCH --ntasks=1 # number of tasks
	#SBATCH --cpus-per-task=16 # number of CPUs Per Task i.e if your code is multi-threaded
	#SBATCH --nodes=1 # number of nodes
	#SBATCH -p datamover # partition(s)

d0choa / distance_clump_v2.py

Created July 4, 2023 20:30

distance clumping based on nested structures and densevectors

	"""Prototype of distance based clumping."""

	from typing import TYPE_CHECKING

	import numpy as np
	import pyspark.ml.functions as fml
	import pyspark.sql.functions as f
	from pyspark.ml.linalg import DenseVector, Vectors, VectorUDT
	from pyspark.sql import SparkSession

d0choa / distance_clump.py

Created June 29, 2023 21:14

Experiment to implement distance based clumps

	"""Prototype of distance based clumping."""

	import pyspark.sql.functions as f
	from pyspark.sql import Column, SparkSession, Window

	spark = SparkSession.builder.getOrCreate()

	data = [
	("s1", "chr1", 3, 2.0, False),
	("s1", "chr1", 4, 3.0, False),

d0choa / coloc_ml.py

Last active April 14, 2022 14:01

	"""
	Compute all vs all Bayesian colocalisation analysis for all Genetics Portal

	This script calculates posterior probabilities of different causal variants
	configurations under the assumption of a single causal variant for each trait.

	Logic reproduced from: https://github.com/chr1swallace/coloc/blob/main/R/claudia.R
	"""

	from functools import reduce

d0choa / estimateLogABF.py

Last active April 1, 2022 20:10

Estimate logABF from credible set

	from pyspark import SparkConf
	from pyspark.sql import SparkSession
	from pyspark.ml.regression import LinearRegression
	from pyspark.ml.feature import VectorAssembler
	from pyspark.ml.linalg import VectorUDT, Vectors
	import pyspark.sql.types as T
	import pyspark.sql.functions as F

	sparkConf = SparkConf()
	sparkConf = sparkConf.set('spark.hadoop.fs.gs.requester.pays.mode', 'AUTO')

d0choa / Coloc_normalisation.py

Last active March 30, 2022 18:36

Experimenting with coloc in pyspark

	import pyspark.sql.functions as F
	from pyspark import SparkConf
	from pyspark.sql import SparkSession
	from functools import reduce

	sparkConf = SparkConf()
	sparkConf = sparkConf.set('spark.hadoop.fs.gs.requester.pays.mode', 'AUTO')
	sparkConf = sparkConf.set('spark.hadoop.fs.gs.requester.pays.project.id',
	'open-targets-eu-dev')

d0choa / potentialNewVariantsInIIndex.py

Last active March 16, 2022 16:01

List of potential new variants in variant index (derived from other datasets)

	from os import sep
	import pyspark.sql.functions as F
	from pyspark import SparkConf
	from pyspark.sql import SparkSession

	sparkConf = SparkConf()
	sparkConf = sparkConf.set('spark.hadoop.fs.gs.requester.pays.mode', 'AUTO')
	sparkConf = sparkConf.set('spark.hadoop.fs.gs.requester.pays.project.id',
	'open-targets-eu-dev')

d0choa / missingTopLoci.py

Last active March 3, 2022 15:17

Diagnostic script to find and explain missing top loci from the V2D dataset

	import pyspark.sql.functions as F
	from pyspark import SparkConf
	from pyspark.sql import SparkSession

	sparkConf = SparkConf()
	sparkConf = sparkConf.set('spark.hadoop.fs.gs.requester.pays.mode', 'AUTO')
	sparkConf = sparkConf.set('spark.hadoop.fs.gs.requester.pays.project.id',
	'open-targets-eu-dev')

	# establish spark connection

d0choa / 2021_approvals.R

Last active September 30, 2024 09:42

Supporting evidence on 2021 FDA approvals

	library("tidyverse")
	library("sparklyr")
	library("sparklyr.nested")
	library("cowplot")
	library("ggsci")

	#Spark config
	config <- spark_config()

	# Allowing to GCP datasets access

d0choa / all_variants_for_genelist.Rmd

Last active November 23, 2022 20:01

All platform variants associated with a list of genes in R

	---
	title: "Batch-query all platform evidence associated with a gene/target list (R)"
	output:
	md_document:
	variant: markdown_github
	---

	How to batch-access information related to a list of targets from the Open Targets Platform is a recurrent question. Here, I provide an example on how to access all target-disease evidence for a set of IFN-gamma signalling related proteins. I will further reduce the evidence to focus on all the coding or non-coding variants clinically-associated with the gene list of interest. I used R and sparklyr, but a Python implementation would be very similar. The platform documentation and the community space have very similar examples.