Skip to content

Instantly share code, notes, and snippets.

View seandavi's full-sized avatar

Sean Davis seandavi

View GitHub Profile
digraph ChatGPT {
node [shape=box]
subgraph cluster_0 {
label="Microsoft Asure"
style=dashed
ChatGPT
}
subgraph cluster_1 {
label="On Prem or Cloud"
style="dashed"
@seandavi
seandavi / annotationhub_alabaster.R
Created June 7, 2023 16:30
Alabaster process all of annotationhub....
library(alabaster)
library(AnnotationHub)
library(jsonlite)
ah = AnnotationHub()
STAGE_DIR = '/tmp/ah_staging'
dir.create(STAGE_DIR, showWarnings = FALSE)
write_error = function(dirname, e) {
jsonlite::write_json(list(message=jsonlite::unbox(e$message)), file.path(dirname,'error.json'))
@seandavi
seandavi / prompts.md
Created June 2, 2023 23:44
LLM prompts

Meetings

Summary from zoom transcript

You are an AI assistant that summarizes meeting notes generated by Zoom. For the meeting notes provided as input, provide these items if possible from the content:

  • meeting date, time, and title
  • meeting attendees as a bulleted list
  • executive summary (3-5 sentences)
@seandavi
seandavi / bioconductor_ehub_ahub_dataconductor.dot
Created May 31, 2023 13:00
Graphviz dot version of experimenthub and annotationhub replacement by dataconductor
# Place the cursor inside "graph" to get some refactoring options
digraph G {
fontname="Helvetica,Arial,sans-serif"
node [fontname="Helvetica,Arial,sans-serif"]
edge [fontname="Helvetica,Arial,sans-serif"]
edge[color="#00000050"]
subgraph cluster_0 {
@seandavi
seandavi / cytoband_table_prompt.txt
Last active May 10, 2023 21:22
Using GPT-4 to augment AnnotationHub resource metadata
I would like you to describe the UCSC genome browser table called "cytoBand". The first few lines of the table are here:
chrom chromStart chromEnd name gieStain
chr1 0 2300000 p36.33 gneg
chr1 2300000 5300000 p36.32 gpos25
chr1 5300000 7100000 p36.31 gneg
chr1 7100000 9100000 p36.23 gpos25
chr1 9100000 12500000 p36.22 gneg
chr1 12500000 15900000 p36.21 gpos50
chr1 15900000 20100000 p36.13 gneg
@seandavi
seandavi / main.tf
Last active June 4, 2023 01:19
Terraform for setting up OpenAI APIs on Microsoft Azure
# <[email protected]>, 2023-06-02
#
# Terraform for setting up GPT-4, GPT-3.5-turbo, and text-embedding-ada-002
# endpoints. Note that not all models are available in all regions, so
# check before changing the region here, currently set to "southcentralus"
#
# Assumes az-cli authenticated (requires Azure subscription) and terraform
# available and installed
#
terraform {
---
title: "data-engineering-R"
---
## Background^[From https://www.stitchdata.com/columnardatabase/]
Suppose you're a retailer maintaining a web-based storefront. An ecommerce site generates a lot of data. Consider product purchase transactions:
![Purchase table](https://www.stitchdata.com/static/purchase-table-69d1c4b69867e15fda5daf0005e9b81d.png)
{"title":"Mesothelioma_52413","status":"Public on Dec 23 2022","submission_date":"2022-08-05","last_update_date":"2022-12-23","type":"genomic","anchor":null,"contact":{"city":"Nagoya","name":{"first":"Shinya","middle":"","last":"Toyokuni"},"email":"[email protected]","state":"Aichi","address":"65 Tsuruma-Cho, Showa-Ku","department":"Pathology","country":"Japan","web_link":null,"institute":"Nagoya University","zip_postal_code":null,"phone":null},"description":null,"accession":"GSM6433302","biosample":null,"tag_count":null,"tag_length":null,"platform_id":"GPL10451","hyb_protocol":"The labeled DNA was hybridized with Agilent SurePrint G3 Mouse CGH 4x180k microarray at 67°C for 24 hours according to the manufacturer's protocol (Version 8.0).","channel_count":2,"scan_protocol":"The slides were scanned in an Agilent DNA microarray scanner with SureScan High-Resolution Technology (G2565CA).","data_row_count":174012,"library_source":null,"overall_design":null,"sra_experiment":null,"data_processing":"The sc
## -----------------------------------------------------------------------------
## GEOquery
## -----------------------------------------------------------------------------
library(GEOquery)
gse = getGEO("GSE103512")[[1]]
## -----------------------------------------------------------------------------
library(SummarizedExperiment)
se = as(gse, "SummarizedExperiment")
@seandavi
seandavi / bio361-geoquery-walkthrough.R
Last active April 4, 2022 16:07
GEOquery simple example for four cancer/normal samplesets
## ----message=FALSE,warning=FALSE----------------------------------------------
pkgs = c(
"ggplot2",
"GEOquery",
"SummarizedExperiment"
)
ins = installed.packages(repos = BiocManager::repositories())
for(pkg in pkgs) {
if(!(pkg %in% rownames(ins)))
BiocManager::install(pkg)