Skip to content

Instantly share code, notes, and snippets.

@bdelacretaz
Last active November 8, 2024 13:57
Show Gist options
  • Save bdelacretaz/48f11e2b47e8fea68a6aeef8201e2dab to your computer and use it in GitHub Desktop.
Save bdelacretaz/48f11e2b47e8fea68a6aeef8201e2dab to your computer and use it in GitHub Desktop.
Client-side similarity search using the orama vector database
<!doctype html>
<html>
<head>
<style>
.error {
font-weight: bold;
color: red;
}
.output {
background-color: beige;
padding: 1em;
}
</style>
</head>
<body>
<h2>Client-side similarity search</h2>
<p>This examples uses the orama client-side vector DB and @xenova/transformers to compute embeddings.</p>
<p>Model: <b id="model"></b></p>
<p><em id="progress">Loading...</em></p>
<h2>What's this?</h2>
<p>
This page simulates dynamic article recommendations for someone reading articles on a news website.
</p>
<p>
The recommendations are computed locally in the browser based on the title of the article
that the user is currently reading, provided as input to the similarity search.
</p>
<p>
For now we use a static list of article titles, in the code of this page.
</p>
<input disabled id="run" type="submit" value="Run another recommendation"></input>
<h3>Recommendations</h3>
<p>Input title: <b id="input"></b></p>
<ul class="output" id="response"></ul>
<script type="module">
import { create, insert, insertMultiple, search } from 'https://cdn.jsdelivr.net/npm/@orama/orama@latest/+esm'
import { pipeline } from "https://cdn.jsdelivr.net/npm/@xenova/transformers";
import { faker } from 'https://esm.sh/@faker-js/faker';
// TODO use BBC news feeds as a data source, https://www.bbc.co.uk/news/10628494
const bbc = [
"'A humanitarian at his core'",
"'Biggest injustice in history' - Brazil reacts to Ballon d'Or result",
"'Helping a human is better than scoring a goal' - Adebayor",
"'I can’t afford to leave home on £1,500 a month – here’s what I need from the Budget'",
"'My farm was destroyed by drought then floods - I am confused'",
"'Never in a million years did we think our baby could have a stroke'",
"'We are poisoning ourselves': Ghana gold rush sparks environmental disaster",
"'You can't show weakness' - why African leaders maintain secrecy around their health",
"A party in power for 58 years pledges change for Botswana",
"A student of Mourinho - is 'crowd pleaser' Amorim right for Man Utd?",
"Africa providing 'incredible talent' for NBA",
"African POTY debate as Boniface & Salah miss out",
"Alex Salmond laid to rest in funeral near family home",
"Alex Salmond: Champion of independence leaves a fractured political legacy",
"All aboard the sparkling railway breaking new ground for East Africa",
"And what could Harris/Trump do differently?",
"Assisted dying could lead to coercion - Streeting",
"Assisted dying could lead to coercion, says Streeting",
"Asylum seekers moved off Bibby Stockholm barge",
"At least 93 killed and missing in Israeli strike on Gaza, health ministry says",
"Attack on Chad military base kills at least 40 soldiers",
"BBC reporter: They ransacked my home and left my town in ruins",
"Badenoch says she may need a softer approach",
"Ballon d'Or nominee Lookman hunts for greatness",
"Bethell earns first England Test call-up for NZ tour",
"Bibby Stockholm: Asylum seekers moved off barge",
"Blue skies and bitter tears: Africa's top shots",
"Bolivian government denies attempt to kill Evo Morales",
"Born in France but searching for a future in Africa",
"Boulter makes light work of Hong Kong opener",
"Boy fell ill after former spy helped him feed ducks",
"Brides-for-cash suspects arrested in South Africa",
"Budget watchdog will break impartiality, claims Hunt",
"Budget: 'I can't afford to leave home on £1,500 a month'",
"Bus fares to rise to £3 in England under new cap",
"Calls for MP to resign role over fuel payment vote",
"Can Rachel Reeves use her defining Budget to escape UK's 'doom loop'?",
"Challenged on death of Dawn Sturgess, Russia’s ambassador appears to laugh and dismisses inquiry",
"Chancellor sets out new funding for extra NHS appointments",
"Children saved from car stuck in path of oncoming train",
"Conservative leadership: How does the contest work and who chooses the winner?",
"Conservative leadership: Who are the candidates?",
"Deadly drugs found in fake anxiety medicines being bought online in UK",
"Dream wins and nightmares for Labour: Starmer's 100 days in power",
"Emotional Ngannou stops Ferreira in first round on MMA return",
"Employers' National Insurance hike to raise £20bn",
"End of hereditary peers moves one step closer",
"England's Slade and Spencer start against All Blacks",
"Faisal Islam: Tax-raising Budget will affect you for years",
"Former Bolivian president shares 'assassination attempt' video",
"Former Tory MP reprimanded for sexual misconduct",
"Former Trump aide Steve Bannon released from jail",
"Former Trump aide Steve Bannon to host podcast after prison release",
"Harris or Trump: How UK is preparing for new US president",
"Henry Zeffman: How a LinkedIn post sparked a transatlantic row",
"Hezbollah announces Naim Qassem as new leader",
"Hotel collapse in Argentina kills one, say reports",
"How a communist from the Tata family became one of Britain's first Asian MPs",
"How a cult leader 'radicalised' group into coroner kidnap attempt",
"Huge Australian 'fiasco' ship to be mothballed in Edinburgh",
"I knew I could have a stroke, but not at 31",
"I'm open-minded on England smacking ban - Phillipson",
"Inside a hospital on the front line of Sudan’s hunger crisis",
"Is Elon Musk’s Starlink a game changer for Africa?",
"Japan’s politics gets a rare dose of upheaval after snap election",
"Just Stop Oil activists given London protest ban",
"Kemi Badenoch: Political scrapper set on 'governing right'",
"Labour suspends MP after video appears to show him punching man",
"Lebanon says 60 killed in Israel strikes on eastern valley",
"Libya to appeal over Afcon sanctions linked to Nigeria boycott",
"Long and winding road that took a recording console from Beatles session to the skip",
"Lost Chopin waltz unearthed after almost 200 years",
"Man's brain tumour shrinks by half in therapy trial",
"Mount Fuji remains snowless for longer than ever before",
"Mount Fuji still without snow in late October - longest wait in 130 years",
"Mpox - what we know... and what we don't",
"New Zealand win their first Women's T20 World Cup title with a comprehensive win over South Africa in Dubai.",
"Nigerian MP apologises after viral taxi slapping video",
"No new freeports in Budget after 'comms cock-up'",
"No tax rises in payslips for 'working people', vows minister",
"PhD student finds lost city in Mexico jungle by accident",
"Philippines' Duterte admits to drug war 'death squad'",
"Putin gathers allies to show West's pressure isn’t working",
"Rabada takes 300th Test wicket - how does he compare?",
"Rachel Reeves - Labour's first chancellor for 14 years",
"Rachel Reeves: Playing Labour's first big gambit",
"Rachel Reeves: Playing Labour's very close first big gambit",
"Reeves pledges £1.4bn for 'crumbling' classrooms",
"Risking death to smuggle alcohol past Somali bandits and Islamist fighters",
"Robert Jenrick: In a hurry to fix the Tory brand",
"Sammy Wilson admits meetings with Sinn Féin",
"Sanctions for Russian disinformation linked to Kate rumours",
"Senior Malawi politician accused of plotting to kill president",
"Sir Keir Starmer: Working people know who they are",
"South Africa beaten by New Zealand in T20 World Cup final",
"South Africa beaten 32:0 by New Zealand in World Cup final",
"South Africa government split over Ukraine visa deal",
"South Africa unity government split over Ukraine visa deal",
"Southport murders accused facing terror charge",
"Speaker rebukes Reeves for Budget comments in US",
"Sporting confirm Man Utd interest in Amorim",
"Sporting confirm Man Utd strong interest in Amorim's team",
"Starmer: MP Mike Amesbury CCTV footage 'shocking'",
"Tariffs hurt his business. He's voting for Trump anyway",
"Tax rises needed to avert austerity, Starmer says",
"The brothers breaking freestyle football world records",
"The man battling Nigeria’s 'witch-hunters'",
"The man lined up to be Kenya's next deputy president",
"He will likely be Kenya's next deputy president",
"They're tough-talking and on the Tory right - but how do Badenoch and Jenrick differ?",
"TikTok founder becomes China's richest man",
"Tommy Robinson Jailed",
"Tommy Robinson jailed for contempt of court",
"Tommy Robinson jailed for contempt of court and other issues",
"Too loyal? Too stubborn? Questions for Ineos after Ten Hag sacking",
"Watch: Ballot drop box set on fire in Washington state",
"We asked Puerto Ricans about 'island of garbage' joke. Here's what they said",
"We need answers, says family of murdered MP David Amess",
"What a discovered lost Maya city might have looked like",
"What form could reparations for slavery take?",
"What is a 'working person', according to Labour",
"What is assisted dying and could the law change?",
"What satellite images reveal about Israel's strikes on Iran",
"What the US election outcome means for Ukraine, Gaza and world conflict",
"What time is the Budget and what could be in it?",
"When is the Budget and what might be in it?",
"Why an A road with 'charm' has been voted UK's best - and which others made top 10?",
"Why are we building homes when so many are standing empty?",
"Why is Turkey deepening its ties with Somalia?",
"Why the King can't say 'sorry' for slavery",
"Why the next Tory leader needs to go Cornish",
"Worry over toxic Delhi air as pollution worsens",
"Would raising employer National Insurance break Labour's pledge?",
"Zambia mourns seven footballers killed in bus crash",
"Zimbabwe set new T20 world record in Gambia win",
];
function setInfo(id, text, cssClass) {
id = id ? id : 'progress';
console.log(id, text);
const e = document.getElementById(id);
if (e == null) {
throw Error("Cannot find label " + id);
}
if (cssClass) {
e.classList.add(cssClass);
}
e.innerText = text;
}
function el(name, text, parent) {
const e = document.createElement(name);
if(text) {
e.textContent = text;
}
if(parent) {
parent.append(e);
}
return e;
}
const params = new URLSearchParams(window.location.search);
const pKeep = params.get('k');
const pMult = params.get('m');
const pSim = params.get('s');
const titles = [];
const multiply = pMult ? pMult : 3;
const keepOnly = pKeep ? pKeep : bbc.length;
setInfo(null, `Generating ${bbc.length * multiply} titles...`);
for(var t of bbc.slice(0,keepOnly)) {
//titles.push(t);
for(var i=0; i<multiply; i++) {
titles.push(`${t} - ${faker.person.firstName()} ${faker.person.lastName()} interviewed in ${faker.location.city()}`);
}
}
setInfo(null, `${titles.length} titles generated.`);
/*
// Variant with randomly generated titles
let titles = [];
const nTitles = 50;
setInfo(null, `Generating ${nTitles} titles`);
for(var i=1; i<nTitles; i++) {
titles.push(faker.helpers.fake(`
{{name.firstName}} {{name.lastName}},
arriving from {{location.country}},
talks {{music.genre}}
with {{music.artist}}
in {{location.city}}.
`));
}
*/
function randomTitle() {
return titles[Math.ceil(Math.random() * titles.length - 1)];
}
// https://huggingface.co/Xenova has a list of available models
const model = {
name: 'Xenova/flan-t5-small'
}
setInfo('model', model.name);
const pipePromise = pipeline('feature-extraction', model.name);
async function getEmbeddingFromText(text) {
const pipe = await pipePromise;
const output = await pipe(text, {
pooling: "mean",
normalize: true,
});
return Array.from(output.data);
}
const refEmbed = await getEmbeddingFromText('nothing');
model.dimensions = refEmbed.length;
setInfo(null, `${model.name} has ${model.dimensions} dimensions`);
let db;
async function main() {
setInfo(null, 'Initializing DB..');
db = create({
schema: {
title: 'string',
embedding: `vector[${model.dimensions}]`,
},
});
setInfo(null, `DB created, loading ${titles.length} titles..`);
const data = [];
setInfo(null, `Computing ${titles.length} embeddings...`);
for (var title of titles) {
data.push({ title, embedding: await getEmbeddingFromText(title) });
}
setInfo(null, `Loading ${titles.length} titles..`);
insertMultiple(db,data);
setInfo(null, `${titles.length} titles loaded`);
recommend();
}
async function recommend() {
const input = randomTitle();
const minimumSimilarity = pSim ? pSim : 0.8;
document.querySelector('#response').innerHTML = '';
setInfo('input', `(${minimumSimilarity}) ${input}`);
const results = search(db, {
mode: 'vector',
vector: {
value: await getEmbeddingFromText(input),
property: 'embedding',
},
similarity : minimumSimilarity,
// includeVectors 'true' will return the embeddings in the response (which can be very large).
includeVectors: true,
})
var html = '';
const response = document.querySelector('#response');
response.innerHTML = '';
const scoreFmt = new Intl.NumberFormat('en-IN', { maximumSignificantDigits: 3 });
if(results.count <= 1) {
el('li','No results', response);
} else {
for(var r of results.hits) {
if(r.document.title != input) {
el('li',`(${scoreFmt.format(r.score)}) ${r.document.title}`,response);
}
}
}
document.querySelector('#run').removeAttribute('disabled');
}
main()
.catch(e => { setInfo(null, e, 'error'); })
document.querySelector('#run').addEventListener('click', () => recommend());
</script>
</body>
</html>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment