This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Machine Learning Scientist II - Content Intelligence | |
| About the job | |
| At Booking.com, data drives our decisions. Technology is at our core and innovation is everywhere. But our company is more than datasets, lines of code or A/B tests. We’re the thrill of the first night in a new place. The excitement of the next morning. The friends you make. The journeys you take. The sights you see. And the food you sample. Through our products, partners and people, we can empower everyone to experience the world. | |
| The Content Intelligence team builds the Content Intelligence Platform by consuming millions of images and textual inputs every day, and then enriching it with ML capabilities. Eventually, these will serve downstream applications and personalize our customers' experience (think of a way to choose and surface the right images and reviews when customers book their next vacation). | |
| Moreover the team is taking a key role in building in-house LLMs for different needs as: moderation, translation, AI trip planner chatbot |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| /** | |
| * Levenshtein distance between two strings. | |
| * - O(m*n) time | |
| * - O(min(m,n)) memory | |
| */ | |
| function levenshtein(a, b) { | |
| if (a === b) return 0; | |
| if (!a) return b.length; | |
| if (!b) return a.length; |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Performance implications of medium size values and TOAST in Postgres and how to mitigate them - https://pganalyze.com/blog/5mins-postgres-TOAST-performance | |
| The Surprising Impact of Medium-Size Texts on PostgreSQL Performance. Why TOAST is the best thing since sliced bread - https://hakibenita.com/sql-medium-text-performance | |
| Speed up your queries by avoiding to hit the TOAST - https://medium.com/@walttonm/speed-up-your-queries-by-avoiding-to-hit-the-toast-a71feaaeacc2 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| from mlx_lm import load, generate | |
| model, tokenizer = load('Qwen/Qwen2-7B-Instruct-MLX', tokenizer_config={"eos_token": "<|im_end|>"}) | |
| prompt = "Why people call putin khuilo." | |
| messages = [ | |
| {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."}, | |
| {"role": "user", "content": prompt} | |
| ] | |
| text = tokenizer.apply_chat_template( |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| CREATE OR REPLACE FUNCTION hamming_distance( | |
| A0 bigint, A1 bigint, A2 bigint, A3 bigint, | |
| B0 bigint, B1 bigint, B2 bigint, B3 bigint | |
| ) | |
| RETURNS integer AS $$ | |
| BEGIN | |
| RETURN | |
| bits_count(A0 # B0) + | |
| bits_count(A1 # B1) + | |
| bits_count(A2 # B2) + |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| SELECT | |
| c.relname AS index_name, | |
| pg_size_pretty(pg_relation_size(c.oid)) AS index_size, | |
| pg_size_pretty(pg_total_relation_size(c.oid) - pg_relation_size(c.oid)) AS index_bloat_size, | |
| ROUND((pg_total_relation_size(c.oid) - pg_relation_size(c.oid)) / pg_relation_size(c.oid)::numeric * 100, 2) AS bloat_percentage | |
| FROM | |
| pg_class c | |
| JOIN | |
| pg_namespace n ON c.relnamespace = n.oid | |
| WHERE |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import onnxruntime as ort | |
| from transformers import AutoTokenizer | |
| session = ort.InferenceSession('./bge-small-en/model.onnx') | |
| tokenizer = AutoTokenizer.from_pretrained("./bge-small-en") | |
| inputs = tokenizer("hello world.", padding="longest", return_tensors="np") | |
| inputs_onnx = {key: ort.OrtValue.ortvalue_from_numpy(value) for key, value in inputs.items()} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| https://brandur.org/fragments/postgres-partitioning-2022 | |
| https://pganalyze.com/blog/postgresql-partitioning-django | |
| https://django-postgres-extra.readthedocs.io/en/master/table_partitioning.html | |
| https://hevodata.com/learn/postgresql-partitions/ | |
| https://www.2ndquadrant.com/en/blog/postgresql-12-foreign-keys-and-partitioned-tables/ | |
| https://www.postgresql.org/docs/current/ddl-partitioning.html#DDL-PARTITIONING-DECLARATIVE-LIMITATIONS |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| def divide_chunks(l, n): | |
| for i in range(0, len(l), n): | |
| yield l[i:i + n] | |
| a = 'i.strip() for i in a.split('.') if i.strip()] | |
| c = list(divide_chunks(b, 3)) | |
| d = ['. '.join(i + ['']).strip() for i in c] | |
| y = '\n\n'.join(d) | |
| print(y) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| kubectl get --raw "/apis/metrics.k8s.io/v1beta1/nodes" -v=8 | python -m json.tool |
NewerOlder