Skip to content

Instantly share code, notes, and snippets.

More detailed thoughts about data extraction

This gist contains some ideas about using LLMs to extract data from papers (specifically related to biology, aging research and the like).

Just to quickly expand a bit on what I was trying to say when our meeting was cut off:

I think the LLM data extraction can be viewed as a problem tractable at 3 different layers:
1. purely text based, e.g. use `pdftotext` to turn a PDF into a text document, then use LLMs to summarize, extract, tag, ... papers in order to have machine readable data.
@Vindaar
Vindaar / dynlib_based_nim_repl_clean.nim
Created September 22, 2023 10:20
Dynlib based Nim REPL using compiler API, clean up a bit
import std / [strutils, strformat, tables, dynlib, os]
import noise, shell
import compiler/[llstream, renderer, types, magicsys, ast,
transf, # for code transformation (for -> while etc)
injectdestructors, # destructor injection
pathutils, # AbsoluteDir
modulegraphs] # getBody
import ./nimeval_dynlib_clean
@Vindaar
Vindaar / dynlib_based_nim_repl_v2.nim
Last active September 22, 2023 00:29
Dynlib based Nim REPL using compiler API
import std / [strutils, strformat, tables, dynlib, os]
import noise, shell
import compiler/[llstream, renderer, types, magicsys, ast,
transf, # for code transformation (for -> while etc)
injectdestructors, # destructor injection
pathutils, # AbsoluteDir
modulegraphs] # getBody
import ./nimeval_dynlib
@Vindaar
Vindaar / dynlib_based_nim_repl.nim
Created September 22, 2023 00:25
Toy dynlib based Nim REPL
import noise, strutils, strformat, shell, tables, dynlib
proc printHelp() = echo ""
const procTmpl = """
{.push cdecl, exportc, dynlib.}
$#
{.pop.}
"""
const exprTmpl = """
@Vindaar
Vindaar / mandelbrot.nim
Last active September 9, 2023 16:14
Embedding ggplotnim in SDL2
import datamancer
import std / [math, complex]
const xn = 960
const yn = 960
const xmin = -2.0
const xmax = 0.6
const ymin = -1.5
const ymax = 1.5
const MAX_ITERS = 200
@Vindaar
Vindaar / weather.json
Last active December 22, 2022 16:39
Wind speed and angles linear interpolation
{"type":"Feature","geometry":{"type":"Point","coordinates":[18.9276,69.69,100]},"properties":{"meta":{"updated_at":"2022-12-22T14:42:02Z","units":{"air_pressure_at_sea_level":"hPa","air_temperature":"celsius","cloud_area_fraction":"%","precipitation_amount":"mm","relative_humidity":"%","wind_from_direction":"degrees","wind_speed":"m/s"}},"timeseries":[{"time":"2022-12-22T14:00:00Z","data":{"instant":{"details":{"air_pressure_at_sea_level":986.2,"air_temperature":-2.1,"cloud_area_fraction":86.7,"relative_humidity":81.6,"wind_from_direction":184.4,"wind_speed":6.1}},"next_12_hours":{"summary":{"symbol_code":"lightsnowshowers_day"}},"next_1_hours":{"summary":{"symbol_code":"lightsnow"},"details":{"precipitation_amount":0.2}},"next_6_hours":{"summary":{"symbol_code":"snow"},"details":{"precipitation_amount":2.2}}}},{"time":"2022-12-22T15:00:00Z","data":{"instant":{"details":{"air_pressure_at_sea_level":985.9,"air_temperature":-1.9,"cloud_area_fraction":84.9,"relative_humidity":80.7,"wind_from_direction":187.0,"wi
import ggplotnim, math
import arraymancer
const ε = 3
proc φ(r: float): float =
result = exp(-pow((ε.float * r), 2.0))
proc toMatrix(n: int, start, stop: float): Tensor[float] =
result = zeros[float]([n, n])
let xs = linspace(start, stop, n)
@Vindaar
Vindaar / generalized_unit_system_conversion.nim
Created November 25, 2022 15:07
PoC for a generalized conversion of unit system A to system B
#[
This code solves the unit conversion from system A to system B generically.
If natural units are involed, the remaining units (e.g. Energy in particle physics
natural units) are combined with the constants set to 1 that replace real units.
It solves the following system of linear equations which arises from:
`Π_i,α A_i^α = Π_i^β B_i^β`
where `A_i` is the i-th unit of unit system `A` and `α` the power needed to raise
@Vindaar
Vindaar / cluster_breaking_dbscan.csv
Created October 29, 2021 14:35
Cluster causing a out of bounds access in dbscan / kd tree
x y
702 376
699 376
703 376
719 376
723 376
654 375
656 375
657 375
660 375