Skip to content

Instantly share code, notes, and snippets.

View frutik's full-sized avatar

Andrew Kornilov frutik

View GitHub Profile
Machine Learning Scientist II - Content Intelligence
About the job
At Booking.com, data drives our decisions. Technology is at our core and innovation is everywhere. But our company is more than datasets, lines of code or A/B tests. We’re the thrill of the first night in a new place. The excitement of the next morning. The friends you make. The journeys you take. The sights you see. And the food you sample. Through our products, partners and people, we can empower everyone to experience the world.
The Content Intelligence team builds the Content Intelligence Platform by consuming millions of images and textual inputs every day, and then enriching it with ML capabilities. Eventually, these will serve downstream applications and personalize our customers' experience (think of a way to choose and surface the right images and reviews when customers book their next vacation).
Moreover the team is taking a key role in building in-house LLMs for different needs as: moderation, translation, AI trip planner chatbot
/**
* Levenshtein distance between two strings.
* - O(m*n) time
* - O(min(m,n)) memory
*/
function levenshtein(a, b) {
if (a === b) return 0;
if (!a) return b.length;
if (!b) return a.length;
Performance implications of medium size values and TOAST in Postgres and how to mitigate them - https://pganalyze.com/blog/5mins-postgres-TOAST-performance
The Surprising Impact of Medium-Size Texts on PostgreSQL Performance. Why TOAST is the best thing since sliced bread - https://hakibenita.com/sql-medium-text-performance
Speed up your queries by avoiding to hit the TOAST - https://medium.com/@walttonm/speed-up-your-queries-by-avoiding-to-hit-the-toast-a71feaaeacc2
from mlx_lm import load, generate
model, tokenizer = load('Qwen/Qwen2-7B-Instruct-MLX', tokenizer_config={"eos_token": "<|im_end|>"})
prompt = "Why people call putin khuilo."
messages = [
{"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
@frutik
frutik / hamming_dist.sql
Created November 20, 2024 20:03 — forked from MNoorFawi/hamming_dist.sql
Hamming Distance in PostgreSQL Database
CREATE OR REPLACE FUNCTION hamming_distance(
A0 bigint, A1 bigint, A2 bigint, A3 bigint,
B0 bigint, B1 bigint, B2 bigint, B3 bigint
)
RETURNS integer AS $$
BEGIN
RETURN
bits_count(A0 # B0) +
bits_count(A1 # B1) +
bits_count(A2 # B2) +
@frutik
frutik / bloated_indices
Last active October 22, 2024 20:48
useful postgresql queries - when your postgres went crazy
SELECT
c.relname AS index_name,
pg_size_pretty(pg_relation_size(c.oid)) AS index_size,
pg_size_pretty(pg_total_relation_size(c.oid) - pg_relation_size(c.oid)) AS index_bloat_size,
ROUND((pg_total_relation_size(c.oid) - pg_relation_size(c.oid)) / pg_relation_size(c.oid)::numeric * 100, 2) AS bloat_percentage
FROM
pg_class c
JOIN
pg_namespace n ON c.relnamespace = n.oid
WHERE
import onnxruntime as ort
from transformers import AutoTokenizer
session = ort.InferenceSession('./bge-small-en/model.onnx')
tokenizer = AutoTokenizer.from_pretrained("./bge-small-en")
inputs = tokenizer("hello world.", padding="longest", return_tensors="np")
inputs_onnx = {key: ort.OrtValue.ortvalue_from_numpy(value) for key, value in inputs.items()}
https://brandur.org/fragments/postgres-partitioning-2022
https://pganalyze.com/blog/postgresql-partitioning-django
https://django-postgres-extra.readthedocs.io/en/master/table_partitioning.html
https://hevodata.com/learn/postgresql-partitions/
https://www.2ndquadrant.com/en/blog/postgresql-12-foreign-keys-and-partitioned-tables/
https://www.postgresql.org/docs/current/ddl-partitioning.html#DDL-PARTITIONING-DECLARATIVE-LIMITATIONS
def divide_chunks(l, n):
for i in range(0, len(l), n):
yield l[i:i + n]
a = 'i.strip() for i in a.split('.') if i.strip()]
c = list(divide_chunks(b, 3))
d = ['. '.join(i + ['']).strip() for i in c]
y = '\n\n'.join(d)
print(y)
kubectl get --raw "/apis/metrics.k8s.io/v1beta1/nodes" -v=8 | python -m json.tool