Skip to content

Instantly share code, notes, and snippets.

import mesa
class LetterAgent(mesa.Agent):
def __init__(self, model):
super().__init__(model)
self.letters_sent = 0
self.letters_received = 0
def step(self):
print(f"Hi, I am agent {self.unique_id}.")
@shawngraham
shawngraham / search.py
Created January 6, 2025 13:51
get images by motif from p-lod
%%capture
!python3 -m pip install git+https://github.com/p-lod/plodlib
!pip install requests_cache
!pip install rdflib
import plodlib
import json
import pandas as pd
from string import Template
import rdflib as rdf
@shawngraham
shawngraham / results.txt
Created December 20, 2024 19:22
retraining ModernBERT to identify archaeological metadata
Input Text:
This archive presents appendices B-I and supplementary material resulting from the programme of archaeological works undertaken during the construction scheme to widen the A1 trunk road between Dishforth and Leeming Bar in North Yorkshire. The Iron Age to early medieval evidence from Healam Bridge, along with other evidence for Roman activity along the route is published in two volumes
Extracted Entities:
Input Text:
This collection comprises images, spreadsheets, reports, vector graphics, and scanned site records and drawings from archaeological recording by Archaeological Research Services at Lower Radbourne Deserted Medieval Village, Warwickshire. The work was undertaken between April and December 2021. Area C32070 was dominated by intercutting features predominantly dated to two broad phases, prehistoric and medieval. The prehistoric features were represented by a large ring ditch, potentially dating to the Early Bronze Age, four smaller potential Bronze Age ring ditches and a series of inte
@shawngraham
shawngraham / comparison.py
Last active December 20, 2024 17:05
I finetuned smol-135 on archaeological metadata from ADS reports to make an archae metadata extractor. This code snippet passes a row of downloaded data (where all the fields have been smooshed together: ie, dirty data!) through smol135 AND my finetuned version so you can see the difference.
# and here we're going to load the models from huggingface and test them against the same output
# so you can see the difference between fine-tuning the original smol-135 makes
# this was trained on free-tier google colab for not very long on 800 rows of training data that I
# wrangled into correct shape. Training scripts etc will be shared in due course, but not bad for a first stab, eh?
import torch
from transformers import pipeline, AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import json
def generate_response_pipeline(pipeline, input_text, max_length=500):
@shawngraham
shawngraham / 0-install-what-you-need.py
Created December 18, 2024 19:42
run these two code blocks in colab.research.google.com
# you will need a free api key from Groq (nb, not GROK!!)
# because groq gives access to certain models through a very fast infrastructure
# go to https://console.groq.com/keys and sign up.
# as of dec 2024 you do not need to provide credit card details or that sort of thing;
# if you do, consult the llm.datasette.io for alternative models you could use.
!pip install LLM
!llm install llm-groq
!llm keys set groq
@shawngraham
shawngraham / prompt.md
Created December 12, 2024 16:49
a prompt for coreference resolution. Seems to work well with gemini-1.5-pro

Perform coreference resolution on the text, replacing all mentions of the same entity with a consistent unique identifier. Prioritize resolving pronouns based on proximity and grammatical role, but consider the semantic context to avoid incorrect substitutions. Do not resolve 'it' if it refers to an implied or abstract concept (e.g., 'It is widely believed...'). RULES: 1. Identify all pronouns and noun phrases referring to the same real-world entity. 2. The first mention of an entity should retain its original descriptive text and be used for all subsequent mentions. 3. Preserve the original text structure and context. 4. Ensure replacements are consistent throughout the text. 5. Ensure that Organization names are not changed. 6. Replace 'they' with the appropriate entities. 7. Replace 'it' with the appropriate entity ONLY if 'it' clearly refers to a previously mentioned concrete noun phrase. Avoid replacing 'it' if it introduces a new concept or refers to a clause, proposition, or abstract idea. 8. Replace s

@shawngraham
shawngraham / 1.prompt.md
Last active December 7, 2024 00:42
2 million tokens is a lot

using the schema at https://github.com/shawngraham/structured-cultural-heritage-crime/blob/main/prompt.py and the Gemini 1.5 Pro model with 2 million token context window, I dropped the 129 articles from Trafficking Culture that we annotated and analyzed, and had it generate the triplets with a single call to the model. Initial result was some 2000 triplets; using sublime text permuted for unique lines, so about a 1000 when all is said and done.

You are a knowledge graph triplet extractor. When given a text file, you return a properly formatted CSV with triplets with three columns subject,predicate,object in grammatical order. Only use the PREDEFINED RELATIONS LISTED BELOW and use the full name for each entity.

The predefined relations are:

    'is_the_owner_of': Denotes a business relationship where an actor controls, or is the legal owner of, a business, gallery, auction house, or other for-profit organization.

hasLocation: Connect sites to their geographic coordinates or descriptive location

Example: Little John's Farm hasLocation Reading, Berkshire

hasGeology: Describe the underlying geological composition of a site

Example: Little John's Farm hasGeology "alluvial deposits overlying river gravels"

@shawngraham
shawngraham / 0.conversation.py
Last active November 25, 2024 15:34
what happens when gemma and lamma talk to each other about canadian history? Let's find out. Uses llm.datasette.io and the groq plugin.
import subprocess
import os
def iterate_model_conversation(model1, model2, initial_prompt, output_folder, num_rounds=10):
"""
Iterate a conversation between two models using an output folder.
:param model1: First model command (without -c and prompt)
:param model2: Second model command (without -c and prompt)
@shawngraham
shawngraham / 1-nuextractor.py
Last active November 21, 2024 15:48
using ollama and nuextract-tiny1.5 (via https://ollama.com/sroecker/nuextract-tiny-v1.5/) to extract structured text.
import ollama
import argparse
def read_file(file):
"""Read and return file contents, with error handling."""
try:
with open(file, 'r') as f:
return f.read()
except FileNotFoundError:
print(f"The file {file} could not be found.")