Skip to content

Instantly share code, notes, and snippets.

View benwtrent's full-sized avatar
🏠
Working from home

Benjamin Trent benwtrent

🏠
Working from home
View GitHub Profile
@benwtrent
benwtrent / helpers.py
Created January 31, 2023 16:09
Some helpers around loading and testing BEIR data sets with elasticsearch
import loaders
from sentence_transformers import SentenceTransformer
# Load queries, qrels, etc. and create embeddings for the queries
queries = loaders.load_jsonl(jsonl_path=Path("./data/queries.jsonl"))
embedding_model = SentenceTransformer(model_id, device="mps")
query_embeddings = embedding_model.encode([d['text'] for d in queries])
query_embeddings = query_embeddings.tolist()
query_and_embeddings = [dict(item, **{'embedding': embedding}) for (item, embedding) in zip(queries, query_embeddings)]
qrels = loaders.load_beir_qrels(qrels_file=Path("./data/qrels/test.tsv"))
@benwtrent
benwtrent / rallyrun0.txt
Last active November 18, 2022 23:42
Rally runs with new Lucene build
------------------------------------------------------
_______ __ _____
/ ____(_)___ ____ _/ / / ___/_________ ________
/ /_ / / __ \/ __ `/ / \__ \/ ___/ __ \/ ___/ _ \
/ __/ / / / / / /_/ / / ___/ / /__/ /_/ / / / __/
/_/ /_/_/ /_/\__,_/_/ /____/\___/\____/_/ \___/
------------------------------------------------------
| Metric | Task | Baseline | Contender | Diff | Unit | Diff % |
|--------------------------------------------------------------:|---------------------------------------------:|---------------:|---------------:|-------------:|-------:|---------:|
package org.apache.pylucene.codecs;
import org.apache.lucene.codecs.lucene95.Lucene95Codec;
import org.apache.lucene.codecs.KnnVectorsFormat;
public class PyLucene95Codec extends Lucene95Codec {
private long pythonObject;
public void pythonExtension(long pythonObject){
this.pythonObject = pythonObject;
@benwtrent
benwtrent / PyLucene94Codec.java
Created November 10, 2022 14:31
PyLucene94Codec extension
package org.apache.pylucene.codecs;
import org.apache.lucene.codecs.lucene94.Lucene94Codec;
import org.apache.lucene.codecs.KnnVectorsFormat;
public class PyLucene94Codec extends Lucene94Codec {
private long pythonObject;
public void pythonExtension(long pythonObject){
this.pythonObject = pythonObject;
@benwtrent
benwtrent / lucenepyknn.py
Created October 18, 2022 19:27
Ann Benchmark's integration using Lucene KNN.
"""
ann-benchmarks interface for Apache Lucene.
"""
import sklearn.preprocessing
import numpy as np
from struct import Struct
import lucene
@benwtrent
benwtrent / EntropyBenchmark.java
Created November 10, 2020 16:39
Benchmark for calculating the entropy for a string.
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Level;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.annotations.Param;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.Setup;
@benwtrent
benwtrent / datafeed_config.json
Last active October 15, 2020 12:53
Calculating the percent true and using it in a job
{
"indices" : [
"kibana_sample_data_flights"
],
"query" : {
"bool" : {
"must" : [
{
"match_all" : { }
}
PUT users/_mapping
{
"properties": {
"geo.location.point": {
"type": "geo_point"
}
}
}
POST users/_update_by_query
@benwtrent
benwtrent / es-commands.es
Created June 26, 2020 11:39
helpful actions to support anomalous behavior by geo area
###
# transform definition
###
{
"id" : "count_by_grid_15",
"source" : {
"index" : [
"gtfs*"
],
"query" : {
package org.elasticsearch.benchmark.ml;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Level;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.annotations.Param;