Skip to content

Instantly share code, notes, and snippets.

View malteos's full-sized avatar

malteos malteos

View GitHub Profile
@malteos
malteos / deepseek-retrieval-evaluation.md
Created February 6, 2025 21:47
DeepSeek deep thinking about retrieval system evaluation with multiple annotators and low inter-rater agreement.

Prompt

In the context of an information retrieval system, I have human annotated data in the form of query - retrieved documents. The documents are labeled to according their relevance for the given query. Each document is annotated by two to three annotators. How to evaluate two retrieval systems with this test data, while accounting for a low inter-rater agreement. Provide Python code as an example.

Thinking

Okay, so I need to figure out how to evaluate two retrieval systems when there's low inter-rater agreement in the annotations. Let's start by understanding the problem. The user has queries and retrieved documents, each document labeled by 2-3 annotators for relevance. But since the annotators don't agree much, traditional evaluation metrics like average precision or NDCG might not be reliable because they depend on consistent relevance labels.

First, I should recall what inter-rater agreement means. Low agreement suggests that the relevance judgments are subjective or ambiguous. So, using a sin

@malteos
malteos / mteb_bm25.py
Created June 26, 2024 06:13
Run BM25 baseline on MTEB retrieval tasks
"""Evaluate BM25 on MTEB tasks
Usage:
python bm25.py -t <task name> --output_folder=./data/results
Notes:
- https://github.com/xhluca/bm25s (promissing implememntation)
- https://github.com/beir-cellar/beir/blob/main/examples/retrieval/evaluation/lexical/evaluate_bm25.py
- https://colab.research.google.com/drive/1HfutiEhHMJLXiWGT8pcipxT5L2TpYEdt?usp=sharing#scrollTo=nqotyXuIBPt6
@malteos
malteos / docker-without-desktop-macos.md
Created May 28, 2024 11:03
Run Docker (without Docker Desktop) on MacOS with Apple Silicon (M1/M2/...)

Run Docker (without Docker Desktop) on MacOS with Apple Silicon (M1/M2/...)

Docker Desktop requires an expensive license for commercial use: https://www.docker.com/pricing/faq/

# Install minikube
brew install minikube

# Install Docker CLI
brew install docker
#!/bin/bash
#SBATCH --job-name=oxw-bloom-1b7-twc-german
#SBATCH --ntasks-per-node=1 # crucial - only 1 task per dist per node!
#SBATCH --nodes=4
#SBATCH --gres=gpu:4 # ---> does not matter on JUWELS
#SBATCH --cpus-per-task=48 # number of cores per tasks
#SBATCH --hint=nomultithread # we get physical cores not logical
#SBATCH --time=0-12:00:00
#SBATCH --output=%j.%x.out
#SBATCH --partition=booster
# Copyright 2022 EleutherAI and The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,

Connect via SSH to a Slurm compute job that runs as Enroot container

Being able to SSH directly into a compute job has the advantage of using all remote development tools such as using your IDE's debugger also for GPU jobs (VSCode, PyCharm, ...).

  • Slurm: Scheduling system that many HPC clusters use
  • Enroot: Container system like Docker for NVIDIA GPUs

General problem:

import argparse
import os
import torch
from transformers.models.auto import AutoModelForCausalLM
LAYER_FILE_PREFIX = 'layer_'
MODEL_FILE_PREFIX = 'model_'
EMBEDDING_LAYER_INDEX = 1
@malteos
malteos / letsencrypt-ssl-dns-docker.sh
Created December 2, 2018 12:38
Obtain Lets-Encrypt SSL Certificate via Docker DNS challenge
# Obtain Lets-Encrypt SSL Certificate via Docker DNS challenge
# adjust:
# - domains (-d foo.me)
mkdir letsencrypt_etc letsencrypt_var
docker run -it --rm --name certbot \
-v "./letsencrypt_etc:/etc/letsencrypt" \
-v "./letsencrypt_var:/var/lib/letsencrypt" \
certbot/certbot certonly -d foo.me -d *.foo.me --manual --preferred-challenges dns
@malteos
malteos / Mixxx_2deck_keyboard_mapping.kbd.cfg
Created April 20, 2017 13:24
Keyboard mapping for Mixxx DJ software. High-mid-low equalizer.
[AutoDJ]
[Master]
[VinylControl]
[PreviewDeck1]
[Channel1]
play y

Returns only 'Main Page'

curl -XPOST localhost:9200/wiki_content/_search?pretty -d '
{
  "_source": [
    "title"
  ],
  "query": {
    "bool": {
      "should": [