Skip to content

Instantly share code, notes, and snippets.

View theotheo's full-sized avatar
🌴
On vacation

theotheo

🌴
On vacation
View GitHub Profile
@theotheo
theotheo / extraction-evolution-ascii.md
Last active March 27, 2026 09:05
BloodGPT: Extraction evolution diagram (draw.io XML)

From "Save Pages" to "Extract Facts"

Evolution of document extraction approach — BloodGPT BG-1059

SOURCE DOCUMENT (МедЛаб Диагностика, 5 дат):
┌──────────────────┬────────┬─────────┬──────────┬──────────┬──────────┬──────────┬──────────┐
│ Показатель       │ Ед.    │ Норма   │ 15.01.24 │ 28.02.24 │ 10.04.24 │ 22.06.24 │ 05.09.24 │
├──────────────────┼────────┼─────────┼──────────┼──────────┼──────────┼──────────┼──────────┤
│ Гемоглобин (HGB) │ г/л    │ 130–160 │ 118 ↓    │ 122 ↓    │ 135      │ 142      │ 138      │
│ Холестерин общий │ ммоль/л│ <5.2    │ 6.8 ↑    │ 6.2 ↑    │ 5.5 ↑    │ 4.9      │ 4.7      │
@theotheo
theotheo / extraction-evolution.md
Created March 27, 2026 08:18
BloodGPT: Evolution of document extraction approach — from Save Pages to Extract Facts (BG-1059)

From "Save Pages" to "Extract Facts"

Evolution of document extraction approach — BloodGPT, March 2026

Context

Problem: our recognition pipeline goes forward and forgets pages. Multi-date lab tables, page breaks, mixed document types → data lost. We needed a way to preserve all information from medical documents.

This diagram shows the thinking evolution from "save raw text" to the current fact-based extraction with FHIR-mapped types.

Example document (used throughout)

@theotheo
theotheo / hourglass-pipeline.drawio
Last active March 18, 2026 18:40
BloodGPT Hourglass Pipeline Diagram
<mxGraphModel dx="1200" dy="800" grid="1" gridSize="10" guides="1" tooltips="1" connect="1" arrows="1" fold="1" page="0" pageScale="1" pageWidth="1200" pageHeight="800" math="0" shadow="0">
<root>
<mxCell id="0"/>
<mxCell id="1" parent="0"/>
<!-- Title -->
<mxCell id="title" value="BloodGPT B2B Pipeline" style="text;html=1;fontSize=20;fontStyle=1;fontColor=#37474F;align=center;verticalAlign=bottom;" vertex="1" parent="1">
<mxGeometry x="100" y="0" width="400" height="36" as="geometry"/>
</mxCell>
@theotheo
theotheo / selenium_postnauka.py
Created November 14, 2020 13:27
Грязный скрипт для скрепинга теста Постнауки про социологов https://postnauka.ru/tests/155870
# %%
import time
import json
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.support import expected_conditions
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
@theotheo
theotheo / app.gs
Created December 25, 2019 15:53
list all files google drive folder
// Based on https://stackoverflow.com/a/36277049/1248256
// TODO: Set folder ID
var folderId = '';
// Main function 1: List all folders, & write into the current sheet.
function listFolers(){
getFolderTree(folderId, false);
};
@theotheo
theotheo / doc.md
Last active October 13, 2019 04:53
ссылки python
@theotheo
theotheo / list.md
Last active October 12, 2019 06:42
python books and courses

начальный

2017, Ричардсон Крейг, Программируем с Minecraft. Создай свой мир с помощью Python

Майнкрафт!

2019, Лутц Марк, Изучаем Python. Том 1

Большая подробная книга

курс на Степике "Программирование на Python" https://stepik.org/course/67/promo

Начальный курс

@theotheo
theotheo / .adoc
Created July 31, 2019 10:59
пример asciidoctor-reveal.js слайдов

Введение

Слова введения

нотсы для спикера

Цитата

from dostoevsky.tokenization import UDBaselineTokenizer
from dostoevsky.word_vectors import SocialNetworkWordVectores
from dostoevsky.models import SocialNetworkModel
tokenizer = UDBaselineTokenizer()
tokens = tokenizer.split('всё очень плохо') # [('всё', 'ADJ'), ('очень', 'ADV'), ('плохо', 'ADV')]
word_vectors_container = SocialNetworkWordVectores()
vectors = word_vectors_container.get_word_vectors(tokens)
@theotheo
theotheo / Extract_from_Blum.ipynb
Last active April 7, 2019 03:54
Quick and dirty extract lists from Blum book
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.