Shawn Graham shawngraham

Perform coreference resolution on the text, replacing all mentions of the same entity with a consistent unique identifier. Prioritize resolving pronouns based on proximity and grammatical role, but consider the semantic context to avoid incorrect substitutions. Do not resolve 'it' if it refers to an implied or abstract concept (e.g., 'It is widely believed...'). RULES: 1. Identify all pronouns and noun phrases referring to the same real-world entity. 2. The first mention of an entity should retain its original descriptive text and be used for all subsequent mentions. 3. Preserve the original text structure and context. 4. Ensure replacements are consistent throughout the text. 5. Ensure that Organization names are not changed. 6. Replace 'they' with the appropriate entities. 7. Replace 'it' with the appropriate entity ONLY if 'it' clearly refers to a previously mentioned concrete noun phrase. Avoid replacing 'it' if it introduces a new concept or refers to a clause, proposition, or abstract idea. 8. Replace s

using the schema at https://github.com/shawngraham/structured-cultural-heritage-crime/blob/main/prompt.py and the Gemini 1.5 Pro model with 2 million token context window, I dropped the 129 articles from Trafficking Culture that we annotated and analyzed, and had it generate the triplets with a single call to the model. Initial result was some 2000 triplets; using sublime text permuted for unique lines, so about a 1000 when all is said and done.

You are a knowledge graph triplet extractor. When given a text file, you return a properly formatted CSV with triplets with three columns subject,predicate,object in grammatical order. Only use the PREDEFINED RELATIONS LISTED BELOW and use the full name for each entity.

The predefined relations are:

    'is_the_owner_of': Denotes a business relationship where an actor controls, or is the legal owner of, a business, gallery, auction house, or other for-profit organization.

hasLocation: Connect sites to their geographic coordinates or descriptive location

Example: Little John's Farm hasLocation Reading, Berkshire

hasGeology: Describe the underlying geological composition of a site

Example: Little John's Farm hasGeology "alluvial deposits overlying river gravels"

	import mesa

	class LetterAgent(mesa.Agent):
	def __init__(self, model):
	super().__init__(model)
	self.letters_sent = 0
	self.letters_received = 0

	def step(self):
	print(f"Hi, I am agent {self.unique_id}.")

	%%capture
	!python3 -m pip install git+https://github.com/p-lod/plodlib
	!pip install requests_cache
	!pip install rdflib

	import plodlib
	import json
	import pandas as pd
	from string import Template
	import rdflib as rdf

	Input Text:
	This archive presents appendices B-I and supplementary material resulting from the programme of archaeological works undertaken during the construction scheme to widen the A1 trunk road between Dishforth and Leeming Bar in North Yorkshire. The Iron Age to early medieval evidence from Healam Bridge, along with other evidence for Roman activity along the route is published in two volumes

	Extracted Entities:

	Input Text:
	This collection comprises images, spreadsheets, reports, vector graphics, and scanned site records and drawings from archaeological recording by Archaeological Research Services at Lower Radbourne Deserted Medieval Village, Warwickshire. The work was undertaken between April and December 2021. Area C32070 was dominated by intercutting features predominantly dated to two broad phases, prehistoric and medieval. The prehistoric features were represented by a large ring ditch, potentially dating to the Early Bronze Age, four smaller potential Bronze Age ring ditches and a series of inte

	# and here we're going to load the models from huggingface and test them against the same output
	# so you can see the difference between fine-tuning the original smol-135 makes
	# this was trained on free-tier google colab for not very long on 800 rows of training data that I
	# wrangled into correct shape. Training scripts etc will be shared in due course, but not bad for a first stab, eh?
	import torch
	from transformers import pipeline, AutoModelForCausalLM, AutoTokenizer
	from peft import PeftModel
	import json

	def generate_response_pipeline(pipeline, input_text, max_length=500):

	# you will need a free api key from Groq (nb, not GROK!!)
	# because groq gives access to certain models through a very fast infrastructure
	# go to https://console.groq.com/keys and sign up.
	# as of dec 2024 you do not need to provide credit card details or that sort of thing;
	# if you do, consult the llm.datasette.io for alternative models you could use.

	!pip install LLM
	!llm install llm-groq
	!llm keys set groq


	import subprocess
	import os

	def iterate_model_conversation(model1, model2, initial_prompt, output_folder, num_rounds=10):
	"""
	Iterate a conversation between two models using an output folder.

	:param model1: First model command (without -c and prompt)
	:param model2: Second model command (without -c and prompt)

	import ollama
	import argparse

	def read_file(file):
	"""Read and return file contents, with error handling."""
	try:
	with open(file, 'r') as f:
	return f.read()
	except FileNotFoundError:
	print(f"The file {file} could not be found.")