Skip to content

Instantly share code, notes, and snippets.

@mvdbeek
mvdbeek / sorted-pondering-seahorse.md
Created April 1, 2026 08:56
Plan: Harden API parameter validation for galaxyproject/galaxy (#22341-#22350)

Harden API Parameter Validation — 10 Sentry Issues

Context

The last 10 GitHub issues filed on galaxyproject/galaxy (#22341–#22350) are all caused by malformed or malicious API inputs that bypass validation and crash deep inside application code, producing 500 errors logged to Sentry. Every one of these should be caught at the API frontier and returned as a 4xx response.


Fix 1 — #22350: ValueError parsing date b'file' in history controller

@mvdbeek
mvdbeek / diagnosis.md
Last active March 30, 2026 14:13
Root cause analysis: flaky test_swift_objectstore composite_output (galaxyproject/galaxy#21226)

Root Cause Analysis: Flaky test_swift_objectstore.py::test_tools[composite_output]

Issue: galaxyproject/galaxy#21226
Test: test/integration/objectstore/test_swift_objectstore.py::test_tools[composite_output]

Symptoms

The test transiently fails in two ways:

  1. IncompleteRead: IncompleteRead(5242880 bytes read, 24576 more expected) — the server sets Content-Length for the full file but the connection closes after exactly 5 MB (80 × 64 KB chunks).
@mvdbeek
mvdbeek / issue_21230_investigation.md
Created March 28, 2026 16:59
Investigation: flaky test_search_delete_hdca_output (#21230) — array_agg ORDER BY silently dropped by SQLAlchemy

Investigation: Flaky test_search_delete_hdca_output (Issue #21230)

Summary

func.array_agg(column, order_by=column) in SQLAlchemy silently drops the order_by keyword argument, generating array_agg(path_sig) instead of array_agg(path_sig ORDER BY path_sig). This means the HDCA signature arrays used for job search equivalence checking are unordered, causing non-deterministic comparison results on PostgreSQL.

Root Cause

In lib/galaxy/managers/jobs.py, the agg_expression method:

@mvdbeek
mvdbeek / test_n_plus_one_workflow_download.py
Created March 26, 2026 15:05
Test demonstrating N+1 fix for workflow download (GALAXY-MAIN-14JQ)
"""Demonstrate the N+1 fix for workflow download (GALAXY-MAIN-14JQ).
Creates a workflow with many steps and connections, then measures
the number of SQL statements emitted when accessing input_connections
with and without the selectinload fix.
"""
import logging
import threading
import time
@mvdbeek
mvdbeek / test_url_builder_benchmark.py
Created March 20, 2026 13:39
Benchmark: UrlBuilder.url_path_for caching — 87x speedup for history contents serialization
"""Benchmark for UrlBuilder.url_path_for caching.
Demonstrates the performance difference when serializing N history items,
each requiring up to 2 url_path_for calls. Without caching, each call
does a linear scan through all registered routes. With caching, only the
first call per route name scans; subsequent calls go directly to the
matching route.
"""
import sys
@mvdbeek
mvdbeek / nested_collection_query_planner.md
Last active March 14, 2026 09:28
PostgreSQL query planner catastrophe with dataset_collection_element self-joins in Galaxy

PostgreSQL Query Planner Catastrophe: dataset_collection_element Self-Joins

Summary

Nested dataset collection queries in Galaxy degrade from ~8ms to 843 seconds when PostgreSQL's planner chooses hash/merge joins for dataset_collection_element self-joins. The fix: replace DCE-to-DCE joins with nested ARRAY(subquery) expressions.

Root Cause

@mvdbeek
mvdbeek / bwa_mem2_schema.json
Created March 9, 2026 10:47
BWA-MEM2 Galaxy tool parameter request schema
{
"$defs": {
"BatchDataInstance": {
"additionalProperties": false,
"properties": {
"src": {
"enum": [
"hda",
"ldda",
"hdca"
@mvdbeek
mvdbeek / plan.md
Created March 3, 2026 17:45
Plan: Fix Interrupted Celery set_meta Causes Stuck Non-Terminal Jobs (#20186)

Plan: Fix Interrupted Celery set_meta Causes Stuck Non-Terminal Jobs (#20186)

Problem Analysis

When metadata_strategy: directory_celery (or celery_extended) is configured, if the Celery process is interrupted (OOM killed, process restart, etc.) while executing a set_job_metadata task, jobs become permanently stuck in a non-terminal state (running) with no recovery mechanism.

Root Cause

The handler blocks forever on .get() when a worker dies.

@mvdbeek
mvdbeek / ISSUE_21642.md
Created February 26, 2026 12:58
Triage artifacts for Galaxy issue #21642 - Remote data fetch not respecting quota

Issue #21642: Fetching data from repositories does not seem to respect storage quota

State: OPEN Author: martenson Labels: area/backend, area/jobs, kind/bug Assignees: mvdbeek Comments: 1

Description

@mvdbeek
mvdbeek / migration.diff
Created February 24, 2026 10:23
26.0 db migratons
diff --git a/lib/galaxy/model/migrations/alembic/env.py b/lib/galaxy/model/migrations/alembic/env.py
index 98091912b80..c9db41cc84d 100644
--- a/lib/galaxy/model/migrations/alembic/env.py
+++ b/lib/galaxy/model/migrations/alembic/env.py
@@ -1,7 +1,7 @@
import logging
import re
+from collections.abc import Callable
from typing import (
- Callable,