Skip to content

Instantly share code, notes, and snippets.

View jsstevenson's full-sized avatar

James Stevenson jsstevenson

View GitHub Profile
@jsstevenson
jsstevenson / metakb_strenght_nodes.json
Created August 28, 2025 20:47
MetaKB Strength nodes
[
{
"n": {
"identity": 1625,
"labels": [
"Strength"
],
"properties": {
"mappings": "[{\"id\":null,\"extensions\":null,\"coding\":{\"id\":null,\"extensions\":null,\"name\":\"preclinical evidence\",\"system\":\"https://go.osu.edu/evidence-codes\",\"systemVersion\":null,\"code\":\"e000009\",\"iris\":null},\"relation\":\"exactMatch\"}]",
"primary_coding": "{\"id\":null,\"extensions\":null,\"name\":null,\"system\":\"https://civic.readthedocs.io/en/latest/model/evidence/level.html\",\"systemVersion\":null,\"code\":\"D\",\"iris\":null}",

uv notes

uv is a Python tool for managing dependencies, builds, and runtimes. Inspired by major players in other languages like cargo and npm. It should further simplify the hassle of dealing with Python runtimes and environments and I would strongly recommend switching over to it.

installation

  1. Cleanup

Get rid of pyenv

get DictReader for interactions TSV:

import csv
with open("~/Downloads/interactions.tsv") as f:
    next(f)  # skip header rows
    next(f)
    lines = list(csv.DictReader(f, delimiter="\t"))
# This file is autogenerated by maturin v1.7.6
# To update, run
#
# maturin generate-ci github -o tmp.yaml
#
name: CI
on:
push:
branches:
@jsstevenson
jsstevenson / bcl2_interaction_scores.txt
Last active November 25, 2024 15:27
SELECT g.name, d.name, i.score, i.gene_specificity, i.drug_specificity, i.evidence_score FROM interactions i LEFT JOIN genes g on g.id = i.gene_id LEFT JOIN drugs d on d.id = i.drug_id WHERE g.name = 'BCL2' ORDER BY i.score
name | name | score | gene_specificity | drug_specificity | evidence_score
------+------------------------------------------------------------+----------------------+---------------------+----------------------+----------------
BCL2 | BORTEZOMIB | 0.007488459062167905 | 0.17546867625942625 | 0.042676899500264064 | 1
BCL2 | CARBOPLATIN | 0.00757760738433657 | 0.17546867625942625 | 0.04318495782764816 | 1
BCL2 | DOCETAXEL ANHYDROUS | 0.007668903858846649 | 0.17546867625942625 | 0.04370525852436681 | 1
BCL2 | VINCRISTINE | 0.007858259509682369 | 0.17546867625942625 | 0.044784400710153646 | 1
BCL2 | DOXORUBICIN HYDROCHLORIDE | 0.008601608382219891 | 0.17546867625942625

misc notes

  • Dependencies on intake form seem incomplete -- OBI, Uberon?

  • treatment of 'history' is weird. BFO history (process) is imported but unreferenced. "smoking history" is defined as a "quality" but not as a subclass of "medical history".

Criteria

1. Ontology scope

{
"pre_mapped": {
"id": "ga4gh:VA.FLe4-pSUs7vjdVtVD4TmUNL4JhrBbqTd",
"type": "Allele",
"extensions": [
{
"name": "vrs_ref_allele_seq",
"value": "Y"
}
],
{
"schemaVersion": 1,
"label": "QC",
"message": "N/A",
"color": "white",
"logoSvg": "<svg id=\"Layer_1\" data-name=\"Layer 1\" xmlns=\"http://www.w3.org/2000/svg\" viewBox=\"0 0 135.54 133.3\"><defs><style>.cls-1{fill:#fff;}.cls-2{fill:#231f20;}</style></defs><path class=\"cls-1\" d=\"M85.31,20.58l-4-16.64s0,0,0,0c-1.11-.14-2.21-.26-3.33-.35s-2.24-.15-3.36-.18-2.26,0-3.4,0c-.82,0-1.66.06-2.49.11L63.21,19.66c-.22,0-.43.11-.65.15l0,.11c-1.58.3-3.13.68-4.65,1.12s-3,1-4.45,1.53L40.52,11.64c-.58.32-1.17.63-1.74,1-1,.56-1.91,1.14-2.83,1.74s-1.84,1.23-2.73,1.88-1.77,1.31-2.63,2l-.73.6L34.72,35c-.15.16-.28.33-.42.48v0c-1.12,1.18-2.17,2.42-3.17,3.7s-1.95,2.59-2.84,4h-.06v0l-16.83-1c-.31.62-.62,1.23-.91,1.85-.46,1-.88,2-1.29,3.07-.28.69-.53,1.39-.79,2.09-.12.35-.26.7-.38,1.05-.36,1-.69,2.1-1,3.17-.1.33-.17.68-.26,1l13.4,10.13a54.29,54.29,0,0,0-.56,10.73L5.25,84.54c.09.52.16,1,.26,1.55.21,1.08.44,2.16.7,3.23s.54,2.14.85,3.2.65,2.12,1,3.16c.14.42.31.82.46,1.24h0L25.22,97l0,0h.13a51.6,51.6,0,0,0,5.88,9
@jsstevenson
jsstevenson / tmp-dashboard-results.json
Created May 27, 2024 17:12
tmp-dashboard-results.json
{
"oboscore": {
"dashboard_score_max_impact": {
"dashboard": 1,
"impact": 1,
"impact_external": 3,
"no_base": 5,
"overall_error": 20,
"overall_info": 5,
"overall_warning": 10,

notes

Each kind of response is slightly different, but this tries to make them more consistent in a few ways:

  • No more gene vs normalized gene object. Everything is a GA4GH core Gene. This means no more associated_with vs xref, so one less kind of MatchType.
  • The outermost level includes the query, additional parameters passed to the API endpoint (I think (...?) this is good practice to include) and service information
  • The outermost level also includes a match key that points to what the individual Python QueryHandler methods would return. IMO it makes more sense to move this stuff into the REST API response because these are things that you don't otherwise typically include in Python-to-Python methods (e.g. another class doesn't need to know what version of Gene Normalizer is running, it's literally sharing the environment).
  • match objects include source metadata and warnings. In some of the responses, we have previously included source metadata closer to the actual source matches, b