The last 10 GitHub issues filed on galaxyproject/galaxy (#22341–#22350) are all caused by malformed or malicious API inputs that bypass validation and crash deep inside application code, producing 500 errors logged to Sentry. Every one of these should be caught at the API frontier and returned as a 4xx response.
Issue: galaxyproject/galaxy#21226
Test: test/integration/objectstore/test_swift_objectstore.py::test_tools[composite_output]
The test transiently fails in two ways:
- IncompleteRead:
IncompleteRead(5242880 bytes read, 24576 more expected)— the server setsContent-Lengthfor the full file but the connection closes after exactly 5 MB (80 × 64 KB chunks).
func.array_agg(column, order_by=column) in SQLAlchemy silently drops the order_by keyword argument, generating array_agg(path_sig) instead of array_agg(path_sig ORDER BY path_sig). This means the HDCA signature arrays used for job search equivalence checking are unordered, causing non-deterministic comparison results on PostgreSQL.
In lib/galaxy/managers/jobs.py, the agg_expression method:
| """Demonstrate the N+1 fix for workflow download (GALAXY-MAIN-14JQ). | |
| Creates a workflow with many steps and connections, then measures | |
| the number of SQL statements emitted when accessing input_connections | |
| with and without the selectinload fix. | |
| """ | |
| import logging | |
| import threading | |
| import time |
| """Benchmark for UrlBuilder.url_path_for caching. | |
| Demonstrates the performance difference when serializing N history items, | |
| each requiring up to 2 url_path_for calls. Without caching, each call | |
| does a linear scan through all registered routes. With caching, only the | |
| first call per route name scans; subsequent calls go directly to the | |
| matching route. | |
| """ | |
| import sys |
Nested dataset collection queries in Galaxy degrade from ~8ms to 843 seconds
when PostgreSQL's planner chooses hash/merge joins for dataset_collection_element
self-joins. The fix: replace DCE-to-DCE joins with nested ARRAY(subquery) expressions.
| { | |
| "$defs": { | |
| "BatchDataInstance": { | |
| "additionalProperties": false, | |
| "properties": { | |
| "src": { | |
| "enum": [ | |
| "hda", | |
| "ldda", | |
| "hdca" |
When metadata_strategy: directory_celery (or celery_extended) is configured, if the Celery process is interrupted (OOM killed, process restart, etc.) while executing a set_job_metadata task, jobs become permanently stuck in a non-terminal state (running) with no recovery mechanism.
The handler blocks forever on .get() when a worker dies.
| diff --git a/lib/galaxy/model/migrations/alembic/env.py b/lib/galaxy/model/migrations/alembic/env.py | |
| index 98091912b80..c9db41cc84d 100644 | |
| --- a/lib/galaxy/model/migrations/alembic/env.py | |
| +++ b/lib/galaxy/model/migrations/alembic/env.py | |
| @@ -1,7 +1,7 @@ | |
| import logging | |
| import re | |
| +from collections.abc import Callable | |
| from typing import ( | |
| - Callable, |