Skip to content

Instantly share code, notes, and snippets.

View linuskohl's full-sized avatar
🙏

Linus Kohl linuskohl

🙏
View GitHub Profile
#!/usr/bin/perl
#
# Author: Linus Kohl
# E-Mail: [email protected]
# Org: MunichResearch
#
use strict;
use UMLS::Interface;
use UMLS::Similarity::lch;
use UMLS::Similarity::path;
@linuskohl
linuskohl / biosses_meta.csv
Last active June 26, 2020 19:24
Gizem Soğancıoğlu, Hakime Öztürk, Arzucan Özgür; BIOSSES: a semantic sentence similarity estimation system for the biomedical domain. Bioinformatics 2017; 33 (14): i49-i58. doi: 10.1093/bioinformatics/btx238
Id Text1 Text2 Annotator A Annotator B Annotator C Annotator D Annotator E Avg Var
1 0 94 4 4 4 4 4 4 0
2 1 95 3 3 3 3 3 3 0
3 2 96 2 2 3 2 2 2.2 0.2
4 3 97 3 3 4 3 3 3.2 0.2
5 4 98 3 3 4 3 3 3.2 0.2
6 5 99 3 3 4 3 3 3.2 0.2
7 6 100 1 1 3 2 1 1.6 0.8
8 7 101 3 3 3 3 3 3 0
9 8 102 2 1 1 2 1 1.4 0.3
@linuskohl
linuskohl / biosses_texts.csv
Created June 26, 2020 19:23
Gizem Soğancıoğlu, Hakime Öztürk, Arzucan Özgür; BIOSSES: a semantic sentence similarity estimation system for the biomedical domain. Bioinformatics 2017; 33 (14): i49-i58. doi: 10.1093/bioinformatics/btx238
We can make this file beautiful and searchable if this error is corrected: Unclosed quoted field in line 7.
Id,Text
0,It has recently been shown that Craf is essential for Kras G12D-induced NSCLC.
1,"The Bcl-2 inhibitor ABT-737 induces regression of solid tumors and its derivatives are in the early clinical phase as cancer therapeutics; however, it targets Bcl-2, Bcl-XL, and Bcl-w, but not Mcl-1, which induces resistance against apoptotic cell death triggered by ABT-737."
2,Previous studies demonstrated that the decrease level of 5 hmC in tumors was due to the reduced expression of TET1/2/3 and IDH2 genes or tumor derived IDH1 and IDH2 mutations.
3,"More recently, IDH mutations and resultant 2-hydroxyglutarate (2HG) production in leukemia cells were reported to induce global DNA hypermethylation through impaired TET2 catalytic function."
4,Recent in vitro studies using shRNA-based approaches have suggested a role for TET2 in regulating myeloid differentiation and in regulating stem/progenitor cell proliferation.
5,"Recently, it was reported that expression of IDH1R132H suppresses TET2 activity and the mutations of
import io
import os
import string
import csv
import xml
import re
import unicodedata
import itertools
import requests
from functools import partial
# Load similarities
cui_similarities = pd.read_csv("cui_pairings_out.csv", header=None, names=["cui_0","cui_1","lch","path","wup"])
# Build index for faster access
cui_similarities_reverse = cui_similarities.copy()
cui_similarities_reverse.rename(columns={"cui_0": "cui_1", "cui_1": "cui_0"}, inplace=True)
cui_table = pd.concat([cui_similarities, cui_similarities_reverse], sort=False)
cui_table.set_index(["cui_0","cui_1"], inplace=True)
cui_table = cui_table.sort_index(level='cui_1')
cui_table = cui_table.sort_index()
# Create evaluation DataFrame containing BIOSSES pairings and additional information for evaluation
evaluation = pd.DataFrame(biosses_meta.loc[:,['Text1', 'Text2', 'Avg', 'Var']])
# Add CUI information
evaluation['Text1_CUIs'] = evaluation['Text1'].apply(lambda x: biosses_texts.loc[x,'UMLS_CUIs'])
evaluation['Text2_CUIs'] = evaluation['Text2'].apply(lambda x: biosses_texts.loc[x,'UMLS_CUIs'])
# Add UMLS terms
evaluation['Text1_UMLS_TERMS'] = evaluation['Text1'].apply(lambda x: biosses_texts.loc[x,'UMLS_Terms'])
evaluation['Text2_UMLS_TERMS'] = evaluation['Text2'].apply(lambda x: biosses_texts.loc[x,'UMLS_Terms'])
# Add texts for evaluation purposes
evaluation['Text1'] = evaluation['Text1'].apply(lambda x: biosses_texts.loc[x,'Text'])
@linuskohl
linuskohl / dataset.json
Created April 13, 2024 08:50
Staedel-Museum Datensatz
This file has been truncated, but you can view the full file.
[
{
"identifier": "sg1153",
"title": "The Rose Lover",
"subjects": [
"adult man",
"walking, hiking (recreation)",
"peeping, voyeur",
"flowers: rose",
"couple of lovers",