This gist contains some ideas about using LLMs to extract data from papers (specifically related to biology, aging research and the like).
Just to quickly expand a bit on what I was trying to say when our meeting was cut off:
I think the LLM data extraction can be viewed as a problem tractable at 3 different layers:
1. purely text based, e.g. use `pdftotext` to turn a PDF into a text document, then use LLMs to summarize, extract, tag, ... papers in order to have machine readable data.