Skip to content

Instantly share code, notes, and snippets.

@TomAugspurger
Created November 7, 2025 13:13
Show Gist options
  • Select an option

  • Save TomAugspurger/3b22919f29d7b49ad080cff1239999b4 to your computer and use it in GitHub Desktop.

Select an option

Save TomAugspurger/3b22919f29d7b49ad080cff1239999b4 to your computer and use it in GitHub Desktop.
(cudf-polars-bench) root@gpu-h100-0468:/app# for query in {1..22}; do
> python -m cudf_polars.experimental.benchmarks.pdsh \
> --executor="streaming" \
> --runtime="rapidsmpf" \
> --path="/data/tpch-rs/scale-1000" \
> --suffix="" \
> --stream-policy="pool" \
> --n-workers 1 \
> --no-print-results \
> --no-summarize \
> --iterations=2 \
> --rmm-async \
> --blocksize 2_000_000_000 \
> -o /data/profiles/rapidsmpf-sf1k-q$query.ndjson \
> "${query}"
> done
Query 1 - Iteration 0 finished in 29.8400s
Exception in callback Future.set_result(None)()
handle: <Handle Future.set_result(None)()>
Traceback (most recent call last):
File "/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/asyncio/runners.py", line 195, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/asyncio/base_events.py", line 691, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "rapidsmpf/streaming/core/node.pyx", line 255, in runner
File "/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/asyncio/tasks.py", line 684, in _wrap_awaitable
return await awaitable
^^^^^^^^^^^^^^^
File "rapidsmpf/streaming/core/node.pyx", line 174, in run
File "/app/.venv/lib/python3.12/site-packages/cudf_polars/experimental/rapidsmpf/io.py", line 401, in scan_node
await asyncio.gather(*tasks)
File "/app/.venv/lib/python3.12/site-packages/cudf_polars/experimental/rapidsmpf/io.py", line 382, in _producer
await read_chunk(
File "/app/.venv/lib/python3.12/site-packages/cudf_polars/experimental/rapidsmpf/io.py", line 250, in read_chunk
df = await asyncio.to_thread(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/asyncio/threads.py", line 25, in to_thread
return await loop.run_in_executor(None, func_call)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/concurrent/futures/thread.py", line 59, in run
result = self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.12/site-packages/cudf_polars/dsl/ir.py", line 787, in do_evaluate
chunk = reader.read_chunk()
^^^^^^^^^^^^^^^^^^^
File "pylibcudf/io/parquet.pyx", line 366, in pylibcudf.io.parquet.ChunkedParquetReader.read_chunk
File "pylibcudf/io/parquet.pyx", line 385, in pylibcudf.io.parquet.ChunkedParquetReader.read_chunk
MemoryError: std::bad_alloc: out_of_memory: CUDA error (failed to allocate 1041233288 bytes) at: /tmp/pip-build-env-6sp37l_w/normal/lib/python3.12/site-packages/librmm/include/rmm/mr/device/cuda_async_view_memory_resource.hpp:87: cudaErrorMemoryAllocation out of memory
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/asyncio/events.py", line 88, in _run
self._context.run(self._callback, *self._args)
asyncio.exceptions.InvalidStateError: invalid state
Exception in callback Future.set_result(None)()
handle: <Handle Future.set_result(None)()>
Traceback (most recent call last):
File "/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/asyncio/runners.py", line 195, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/asyncio/base_events.py", line 691, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "rapidsmpf/streaming/core/node.pyx", line 255, in runner
File "/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/asyncio/tasks.py", line 684, in _wrap_awaitable
return await awaitable
^^^^^^^^^^^^^^^
File "rapidsmpf/streaming/core/node.pyx", line 174, in run
File "/app/.venv/lib/python3.12/site-packages/cudf_polars/experimental/rapidsmpf/io.py", line 401, in scan_node
await asyncio.gather(*tasks)
File "/app/.venv/lib/python3.12/site-packages/cudf_polars/experimental/rapidsmpf/io.py", line 382, in _producer
await read_chunk(
File "/app/.venv/lib/python3.12/site-packages/cudf_polars/experimental/rapidsmpf/io.py", line 250, in read_chunk
df = await asyncio.to_thread(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/asyncio/threads.py", line 25, in to_thread
return await loop.run_in_executor(None, func_call)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/concurrent/futures/thread.py", line 59, in run
result = self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.12/site-packages/cudf_polars/dsl/ir.py", line 787, in do_evaluate
chunk = reader.read_chunk()
^^^^^^^^^^^^^^^^^^^
File "pylibcudf/io/parquet.pyx", line 366, in pylibcudf.io.parquet.ChunkedParquetReader.read_chunk
File "pylibcudf/io/parquet.pyx", line 385, in pylibcudf.io.parquet.ChunkedParquetReader.read_chunk
MemoryError: std::bad_alloc: out_of_memory: CUDA error (failed to allocate 1041233288 bytes) at: /tmp/pip-build-env-6sp37l_w/normal/lib/python3.12/site-packages/librmm/include/rmm/mr/device/cuda_async_view_memory_resource.hpp:87: cudaErrorMemoryAllocation out of memory
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/asyncio/events.py", line 88, in _run
self._context.run(self._callback, *self._args)
asyncio.exceptions.InvalidStateError: invalid state
[gpu-h100-0468:634117:0:634628] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x64)
Segmentation fault (core dumped)
Query 2 - Iteration 0 finished in 3.4007s
Query 2 - Iteration 1 finished in 0.6447s
Query 3 - Iteration 0 finished in 15.5810s
Query 3 - Iteration 1 finished in 6.8035s
Query 4 - Iteration 0 finished in 27.1455s
Exception in callback Future.set_result(None)()
handle: <Handle Future.set_result(None)()>
Traceback (most recent call last):
File "/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/asyncio/runners.py", line 195, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/asyncio/base_events.py", line 691, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "rapidsmpf/streaming/core/node.pyx", line 255, in runner
File "/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/asyncio/tasks.py", line 684, in _wrap_awaitable
return await awaitable
^^^^^^^^^^^^^^^
File "rapidsmpf/streaming/core/node.pyx", line 174, in run
File "/app/.venv/lib/python3.12/site-packages/cudf_polars/experimental/rapidsmpf/io.py", line 401, in scan_node
await asyncio.gather(*tasks)
File "/app/.venv/lib/python3.12/site-packages/cudf_polars/experimental/rapidsmpf/io.py", line 382, in _producer
await read_chunk(
File "/app/.venv/lib/python3.12/site-packages/cudf_polars/experimental/rapidsmpf/io.py", line 250, in read_chunk
df = await asyncio.to_thread(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/asyncio/threads.py", line 25, in to_thread
return await loop.run_in_executor(None, func_call)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/concurrent/futures/thread.py", line 59, in run
result = self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.12/site-packages/cudf_polars/dsl/ir.py", line 787, in do_evaluate
chunk = reader.read_chunk()
^^^^^^^^^^^^^^^^^^^
File "pylibcudf/io/parquet.pyx", line 366, in pylibcudf.io.parquet.ChunkedParquetReader.read_chunk
File "pylibcudf/io/parquet.pyx", line 385, in pylibcudf.io.parquet.ChunkedParquetReader.read_chunk
MemoryError: std::bad_alloc: out_of_memory: CUDA error (failed to allocate 1728253400 bytes) at: /tmp/pip-build-env-6sp37l_w/normal/lib/python3.12/site-packages/librmm/include/rmm/mr/device/cuda_async_view_memory_resource.hpp:87: cudaErrorMemoryAllocation out of memory
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/asyncio/events.py", line 88, in _run
self._context.run(self._callback, *self._args)
asyncio.exceptions.InvalidStateError: invalid state
[gpu-h100-0468:635669:0:636156] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x64)
Segmentation fault (core dumped)
Query 5 - Iteration 0 finished in 15.4438s
Query 5 - Iteration 1 finished in 7.6821s
Query 6 - Iteration 0 finished in 2.7370s
Query 6 - Iteration 1 finished in 2.5784s
Query 7 - Iteration 0 finished in 9.9120s
Query 7 - Iteration 1 finished in 9.0191s
Query 8 - Iteration 0 finished in 20.5095s
Query 8 - Iteration 1 finished in 8.3281s
Query 9 - Iteration 0 finished in 22.3638s
Query 9 - Iteration 1 finished in 20.4289s
Query 10 - Iteration 0 finished in 13.6276s
Query 10 - Iteration 1 finished in 8.3870s
Query 11 - Iteration 0 finished in 1.4619s
Query 11 - Iteration 1 finished in 0.6590s
Query 12 - Iteration 0 finished in 6.0584s
Query 12 - Iteration 1 finished in 5.6534s
Query 13 - Iteration 0 finished in 43.9872s
[gpu-h100-0468:640347:0:640878] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x1)
Segmentation fault (core dumped)
Query 14 - Iteration 0 finished in 6.5882s
Query 14 - Iteration 1 finished in 6.1999s
Query 15 - Iteration 0 finished in 6.1214s
Query 15 - Iteration 1 finished in 5.2628s
Query 16 - Iteration 0 finished in 2.1549s
Query 16 - Iteration 1 finished in 1.7197s
[gpu-h100-0468:642437:0:642863] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x157)
==== backtrace (tid: 642863) ====
0 /app/.venv/lib/python3.12/site-packages/libucx/lib/libucs.so(ucs_handle_error+0x294) [0x15547e3b3b14]
1 /app/.venv/lib/python3.12/site-packages/libucx/lib/libucs.so(+0x34cca) [0x15547e3b3cca]
2 /app/.venv/lib/python3.12/site-packages/libucx/lib/libucs.so(+0x34f7e) [0x15547e3b3f7e]
3 /usr/lib64/libpthread.so.0(+0x12990) [0x155555116990]
4 python(PyType_IsSubtype+0) [0x1a2c75c]
5 /app/.venv/lib/python3.12/site-packages/rapidsmpf/streaming/core/utilities.cpython-312-x86_64-linux-gnu.so(+0x3766) [0x15547d20c766]
6 /app/.venv/lib/python3.12/site-packages/rapidsmpf/streaming/core/channel.cpython-312-x86_64-linux-gnu.so(+0x12500) [0x15546a5e7500]
7 /app/.venv/lib/python3.12/site-packages/librapidsmpf/lib64/librapidsmpf.so(_ZN4coro11thread_pool8executorEm+0xdc) [0x15547c11ddec]
8 /usr/lib64/libstdc++.so.6(+0xc2b23) [0x155547c2db23]
9 /usr/lib64/libpthread.so.0(+0x81ca) [0x15555510c1ca]
10 /usr/lib64/libc.so.6(clone+0x43) [0x1555543d48d3]
=================================
Segmentation fault (core dumped)
Exception in callback Future.set_result(None)()
handle: <Handle Future.set_result(None)()>
Traceback (most recent call last):
File "/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/asyncio/runners.py", line 195, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/asyncio/base_events.py", line 691, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "rapidsmpf/streaming/core/node.pyx", line 255, in runner
File "/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/asyncio/tasks.py", line 684, in _wrap_awaitable
return await awaitable
^^^^^^^^^^^^^^^
File "rapidsmpf/streaming/core/node.pyx", line 174, in run
File "/app/.venv/lib/python3.12/site-packages/cudf_polars/experimental/rapidsmpf/nodes.py", line 65, in default_node_single
df = await asyncio.to_thread(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/asyncio/threads.py", line 25, in to_thread
return await loop.run_in_executor(None, func_call)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/concurrent/futures/thread.py", line 59, in run
result = self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.12/site-packages/cudf_polars/dsl/ir.py", line 1768, in do_evaluate
group_keys, raw_tables = grouper.aggregate(requests, stream=df.stream)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "pylibcudf/groupby.pyx", line 163, in pylibcudf.groupby.GroupBy.aggregate
File "pylibcudf/groupby.pyx", line 198, in pylibcudf.groupby.GroupBy.aggregate
MemoryError: std::bad_alloc: out_of_memory: CUDA error (failed to allocate 2303887224 bytes) at: /tmp/pip-build-env-6sp37l_w/normal/lib/python3.12/site-packages/librmm/include/rmm/mr/device/cuda_async_view_memory_resource.hpp:87: cudaErrorMemoryAllocation out of memory
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/asyncio/events.py", line 88, in _run
self._context.run(self._callback, *self._args)
asyncio.exceptions.InvalidStateError: invalid state
Exception in callback Future.set_result(None)()
handle: <Handle Future.set_result(None)()>
Traceback (most recent call last):
File "/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/asyncio/runners.py", line 195, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/asyncio/base_events.py", line 691, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "rapidsmpf/streaming/core/node.pyx", line 255, in runner
File "/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/asyncio/tasks.py", line 684, in _wrap_awaitable
return await awaitable
^^^^^^^^^^^^^^^
File "rapidsmpf/streaming/core/node.pyx", line 174, in run
File "/app/.venv/lib/python3.12/site-packages/cudf_polars/experimental/rapidsmpf/nodes.py", line 65, in default_node_single
df = await asyncio.to_thread(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/asyncio/threads.py", line 25, in to_thread
return await loop.run_in_executor(None, func_call)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/concurrent/futures/thread.py", line 59, in run
result = self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.12/site-packages/cudf_polars/dsl/ir.py", line 1768, in do_evaluate
group_keys, raw_tables = grouper.aggregate(requests, stream=df.stream)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "pylibcudf/groupby.pyx", line 163, in pylibcudf.groupby.GroupBy.aggregate
File "pylibcudf/groupby.pyx", line 198, in pylibcudf.groupby.GroupBy.aggregate
MemoryError: std::bad_alloc: out_of_memory: CUDA error (failed to allocate 2303887224 bytes) at: /tmp/pip-build-env-6sp37l_w/normal/lib/python3.12/site-packages/librmm/include/rmm/mr/device/cuda_async_view_memory_resource.hpp:87: cudaErrorMemoryAllocation out of memory
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/asyncio/events.py", line 88, in _run
self._context.run(self._callback, *self._args)
asyncio.exceptions.InvalidStateError: invalid state
Exception in callback Future.set_result(<rapidsmpf.st...x1554766c1b50>)()
handle: <Handle Future.set_result(<rapidsmpf.st...x1554766c1b50>)()>
Traceback (most recent call last):
File "/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/asyncio/runners.py", line 195, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/asyncio/base_events.py", line 691, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "rapidsmpf/streaming/core/node.pyx", line 255, in runner
File "/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/asyncio/tasks.py", line 684, in _wrap_awaitable
return await awaitable
^^^^^^^^^^^^^^^
File "rapidsmpf/streaming/core/node.pyx", line 174, in run
File "/app/.venv/lib/python3.12/site-packages/cudf_polars/experimental/rapidsmpf/nodes.py", line 65, in default_node_single
df = await asyncio.to_thread(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/asyncio/threads.py", line 25, in to_thread
return await loop.run_in_executor(None, func_call)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/concurrent/futures/thread.py", line 59, in run
result = self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.12/site-packages/cudf_polars/dsl/ir.py", line 1768, in do_evaluate
group_keys, raw_tables = grouper.aggregate(requests, stream=df.stream)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "pylibcudf/groupby.pyx", line 163, in pylibcudf.groupby.GroupBy.aggregate
File "pylibcudf/groupby.pyx", line 198, in pylibcudf.groupby.GroupBy.aggregate
MemoryError: std::bad_alloc: out_of_memory: CUDA error (failed to allocate 2303887224 bytes) at: /tmp/pip-build-env-6sp37l_w/normal/lib/python3.12/site-packages/librmm/include/rmm/mr/device/cuda_async_view_memory_resource.hpp:87: cudaErrorMemoryAllocation out of memory
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/asyncio/events.py", line 88, in _run
self._context.run(self._callback, *self._args)
asyncio.exceptions.InvalidStateError: invalid state
Exception in callback Future.set_result(None)()
handle: <Handle Future.set_result(None)()>
Traceback (most recent call last):
File "/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/asyncio/runners.py", line 195, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/asyncio/base_events.py", line 691, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "rapidsmpf/streaming/core/node.pyx", line 255, in runner
File "/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/asyncio/tasks.py", line 684, in _wrap_awaitable
return await awaitable
^^^^^^^^^^^^^^^
File "rapidsmpf/streaming/core/node.pyx", line 174, in run
File "/app/.venv/lib/python3.12/site-packages/cudf_polars/experimental/rapidsmpf/nodes.py", line 65, in default_node_single
df = await asyncio.to_thread(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/asyncio/threads.py", line 25, in to_thread
return await loop.run_in_executor(None, func_call)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/concurrent/futures/thread.py", line 59, in run
result = self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.12/site-packages/cudf_polars/dsl/ir.py", line 1768, in do_evaluate
group_keys, raw_tables = grouper.aggregate(requests, stream=df.stream)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "pylibcudf/groupby.pyx", line 163, in pylibcudf.groupby.GroupBy.aggregate
File "pylibcudf/groupby.pyx", line 198, in pylibcudf.groupby.GroupBy.aggregate
MemoryError: std::bad_alloc: out_of_memory: CUDA error (failed to allocate 2303887224 bytes) at: /tmp/pip-build-env-6sp37l_w/normal/lib/python3.12/site-packages/librmm/include/rmm/mr/device/cuda_async_view_memory_resource.hpp:87: cudaErrorMemoryAllocation out of memory
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/asyncio/events.py", line 88, in _run
self._context.run(self._callback, *self._args)
asyncio.exceptions.InvalidStateError: invalid state
[gpu-h100-0468:642939:0:643371] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x1)
Segmentation fault (core dumped)
Query 19 - Iteration 0 finished in 7.6374s
Query 19 - Iteration 1 finished in 7.2134s
Query 20 - Iteration 0 finished in 6.7641s
Query 20 - Iteration 1 finished in 6.2747s
Query 21 - Iteration 0 finished in 84.7858s
Query 21 - Iteration 1 finished in 81.3950s
Query 22 - Iteration 0 finished in 1.2861s
Query 22 - Iteration 1 finished in 0.9246s
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment