SOURCE DOCUMENT (МедЛаб Диагностика, 5 дат):
┌──────────────────┬────────┬─────────┬──────────┬──────────┬──────────┬──────────┬──────────┐
│ Показатель │ Ед. │ Норма │ 15.01.24 │ 28.02.24 │ 10.04.24 │ 22.06.24 │ 05.09.24 │
├──────────────────┼────────┼─────────┼──────────┼──────────┼──────────┼──────────┼──────────┤
│ Гемоглобин (HGB) │ г/л │ 130–160 │ 118 ↓ │ 122 ↓ │ 135 │ 142 │ 138 │
│ Холестерин общий │ ммоль/л│ <5.2 │ 6.8 ↑ │ 6.2 ↑ │ 5.5 ↑ │ 4.9 │ 4.7 │
Problem: our recognition pipeline goes forward and forgets pages. Multi-date lab tables, page breaks, mixed document types → data lost. We needed a way to preserve all information from medical documents.
This diagram shows the thinking evolution from "save raw text" to the current fact-based extraction with FHIR-mapped types.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| <mxGraphModel dx="1200" dy="800" grid="1" gridSize="10" guides="1" tooltips="1" connect="1" arrows="1" fold="1" page="0" pageScale="1" pageWidth="1200" pageHeight="800" math="0" shadow="0"> | |
| <root> | |
| <mxCell id="0"/> | |
| <mxCell id="1" parent="0"/> | |
| <!-- Title --> | |
| <mxCell id="title" value="BloodGPT B2B Pipeline" style="text;html=1;fontSize=20;fontStyle=1;fontColor=#37474F;align=center;verticalAlign=bottom;" vertex="1" parent="1"> | |
| <mxGeometry x="100" y="0" width="400" height="36" as="geometry"/> | |
| </mxCell> |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # %% | |
| import time | |
| import json | |
| from selenium import webdriver | |
| from selenium.webdriver.common.by import By | |
| from selenium.webdriver.common.action_chains import ActionChains | |
| from selenium.webdriver.support import expected_conditions | |
| from selenium.webdriver.support.wait import WebDriverWait | |
| from selenium.webdriver.common.keys import Keys | |
| from selenium.webdriver.common.desired_capabilities import DesiredCapabilities |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| // Based on https://stackoverflow.com/a/36277049/1248256 | |
| // TODO: Set folder ID | |
| var folderId = ''; | |
| // Main function 1: List all folders, & write into the current sheet. | |
| function listFolers(){ | |
| getFolderTree(folderId, false); | |
| }; |
Майнкрафт!
Большая подробная книга
курс на Степике "Программирование на Python" https://stepik.org/course/67/promo
Начальный курс
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| from dostoevsky.tokenization import UDBaselineTokenizer | |
| from dostoevsky.word_vectors import SocialNetworkWordVectores | |
| from dostoevsky.models import SocialNetworkModel | |
| tokenizer = UDBaselineTokenizer() | |
| tokens = tokenizer.split('всё очень плохо') # [('всё', 'ADJ'), ('очень', 'ADV'), ('плохо', 'ADV')] | |
| word_vectors_container = SocialNetworkWordVectores() | |
| vectors = word_vectors_container.get_word_vectors(tokens) |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
NewerOlder