Perform coreference resolution on the text, replacing all mentions of the same entity with a consistent unique identifier. Prioritize resolving pronouns based on proximity and grammatical role, but consider the semantic context to avoid incorrect substitutions. Do not resolve 'it' if it refers to an implied or abstract concept (e.g., 'It is widely believed...'). RULES: 1. Identify all pronouns and noun phrases referring to the same real-world entity. 2. The first mention of an entity should retain its original descriptive text and be used for all subsequent mentions. 3. Preserve the original text structure and context. 4. Ensure replacements are consistent throughout the text. 5. Ensure that Organization names are not changed. 6. Replace 'they' with the appropriate entities. 7. Replace 'it' with the appropriate entity ONLY if 'it' clearly refers to a previously mentioned concrete noun phrase. Avoid replacing 'it' if it introduces a new concept or refers to a clause, proposition, or abstract idea. 8. Replace s
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import mesa | |
class LetterAgent(mesa.Agent): | |
def __init__(self, model): | |
super().__init__(model) | |
self.letters_sent = 0 | |
self.letters_received = 0 | |
def step(self): | |
print(f"Hi, I am agent {self.unique_id}.") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
%%capture | |
!python3 -m pip install git+https://github.com/p-lod/plodlib | |
!pip install requests_cache | |
!pip install rdflib | |
import plodlib | |
import json | |
import pandas as pd | |
from string import Template | |
import rdflib as rdf |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Input Text: | |
This archive presents appendices B-I and supplementary material resulting from the programme of archaeological works undertaken during the construction scheme to widen the A1 trunk road between Dishforth and Leeming Bar in North Yorkshire. The Iron Age to early medieval evidence from Healam Bridge, along with other evidence for Roman activity along the route is published in two volumes | |
Extracted Entities: | |
Input Text: | |
This collection comprises images, spreadsheets, reports, vector graphics, and scanned site records and drawings from archaeological recording by Archaeological Research Services at Lower Radbourne Deserted Medieval Village, Warwickshire. The work was undertaken between April and December 2021. Area C32070 was dominated by intercutting features predominantly dated to two broad phases, prehistoric and medieval. The prehistoric features were represented by a large ring ditch, potentially dating to the Early Bronze Age, four smaller potential Bronze Age ring ditches and a series of inte |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# and here we're going to load the models from huggingface and test them against the same output | |
# so you can see the difference between fine-tuning the original smol-135 makes | |
# this was trained on free-tier google colab for not very long on 800 rows of training data that I | |
# wrangled into correct shape. Training scripts etc will be shared in due course, but not bad for a first stab, eh? | |
import torch | |
from transformers import pipeline, AutoModelForCausalLM, AutoTokenizer | |
from peft import PeftModel | |
import json | |
def generate_response_pipeline(pipeline, input_text, max_length=500): |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# you will need a free api key from Groq (nb, not GROK!!) | |
# because groq gives access to certain models through a very fast infrastructure | |
# go to https://console.groq.com/keys and sign up. | |
# as of dec 2024 you do not need to provide credit card details or that sort of thing; | |
# if you do, consult the llm.datasette.io for alternative models you could use. | |
!pip install LLM | |
!llm install llm-groq | |
!llm keys set groq |
using the schema at https://github.com/shawngraham/structured-cultural-heritage-crime/blob/main/prompt.py and the Gemini 1.5 Pro model with 2 million token context window, I dropped the 129 articles from Trafficking Culture that we annotated and analyzed, and had it generate the triplets with a single call to the model. Initial result was some 2000 triplets; using sublime text permuted for unique lines, so about a 1000 when all is said and done.
You are a knowledge graph triplet extractor. When given a text file, you return a properly formatted CSV with triplets with three columns subject,predicate,object in grammatical order. Only use the PREDEFINED RELATIONS LISTED BELOW and use the full name for each entity.
The predefined relations are:
'is_the_owner_of': Denotes a business relationship where an actor controls, or is the legal owner of, a business, gallery, auction house, or other for-profit organization.
hasLocation: Connect sites to their geographic coordinates or descriptive location
Example: Little John's Farm hasLocation Reading, Berkshire
hasGeology: Describe the underlying geological composition of a site
Example: Little John's Farm hasGeology "alluvial deposits overlying river gravels"
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import subprocess | |
import os | |
def iterate_model_conversation(model1, model2, initial_prompt, output_folder, num_rounds=10): | |
""" | |
Iterate a conversation between two models using an output folder. | |
:param model1: First model command (without -c and prompt) | |
:param model2: Second model command (without -c and prompt) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import ollama | |
import argparse | |
def read_file(file): | |
"""Read and return file contents, with error handling.""" | |
try: | |
with open(file, 'r') as f: | |
return f.read() | |
except FileNotFoundError: | |
print(f"The file {file} could not be found.") |
NewerOlder