I hereby claim:
- I am mlin on github.
- I am mlin (https://keybase.io/mlin) on keybase.
- I have a public key ASCO-NadYMiwqGxb9_4cD-VFjMbVqrk7ors-n9seZl_A5wo
To claim this, I am signing this object:
I hereby claim:
To claim this, I am signing this object:
| #!/usr/bin/env python3 | |
| import sys | |
| import time | |
| import docker | |
| import multiprocessing | |
| from argparse import ArgumentParser, REMAINDER | |
| def swarmsub(image, command=None, cpu=1, mounts=None): | |
| client = docker.from_env() |
| version 1.0 | |
| task split_vcf_for_spark { | |
| # Quickly split a large .vcf.gz file into a specified number of compressed partitions. | |
| # | |
| # Motivation: calling SparkContext.textFile on a single large vcf.gz can be painfully slow, | |
| # because it's decompressed and parsed in ~1 thread. Use this to first split it up (with a | |
| # faster multithreaded pipeline); then tell Spark to parallel load the data using textFile on a | |
| # glob pattern. | |
| # |
| #!/bin/bash | |
| # Running inside a docker container, periodically read the container's CPU/memory usage counters | |
| # and log them to standard error. Fields: | |
| # | |
| # cpu_pct average user %CPU usage over the most recent period | |
| # mem_MiB container's current RSS (excludes file cache), in mebibytes (= 2**20 bytes) | |
| # cpu_total_s container's user CPU time consumption since this script started, in seconds | |
| # elapsed_s wall time elapsed since this script started, in seconds | |
| # |
| #!/usr/bin/env python3 | |
| """ | |
| Generate a standalone WDL document from a given workflow using imported tasks. Requires: miniwdl | |
| python3 paste_wdl_imports.py [-o STANDALONE.wdl] WORKFLOW.wdl | |
| For each "call imported_namespace.task_name [as alias]" in the workflow, appends the task's source | |
| code with the task name changed to "imported_namespace__task_name", and rewrites the call to refer | |
| to this new name (keeping the original alias). Also blanks out the import statements. |
| <!DOCTYPE html> | |
| <html> | |
| <head> | |
| <meta charset="utf8" /> | |
| <title>htsget</title> | |
| <!-- needed for adaptive design --> | |
| <meta name="viewport" content="width=device-width, initial-scale=1"> | |
| <style> | |
| body { |
| #!/usr/bin/env python3 | |
| # run this script using LD_LIBRARY_PATH to manipulate the SQlite3 library version | |
| import os | |
| import random | |
| import time | |
| import sqlite3 | |
| N = 100000 | |
| random.seed(42) |
Context: static.wiki and Show HN post
We downloaded static.wiki's 40.3 GiB SQLite database of English Wikipedia and created a compressed version of it with sqlite_zstd_vfs, our read/write Zstandard compression layer for SQLite3. The compressed version is 10.4 GiB (26%), and the VFS supports HTTP random access in the spirit of the original (although we don't yet have a WebAssembly build; it's a library for CLI & desktop apps for now). You can try it out on Linux or macOS x86-64:
pip3 install genomicsqlite
genomicsqlite https://f000.backblazeb2.com/file/mlin-public/static.wiki/en.zstd.db \
"select text from wiki_articles where title = 'SQLite'"
| FROM ubuntu:20.04 | |
| RUN apt-get -qq update && DEBIAN_FRONTEND=noninteractive apt-get install -y \ | |
| wget curl python3-pip python-is-python3 | |
| RUN pip3 install --system miniwdl==1.4.2 | |
| ENV UDOCKER_VERSION=1.3.1 | |
| WORKDIR /usr/local | |
| RUN wget -nv https://github.com/indigo-dc/udocker/releases/download/v${UDOCKER_VERSION}/udocker-${UDOCKER_VERSION}.tar.gz \ | |
| && tar zxf udocker-${UDOCKER_VERSION}.tar.gz \ | |
| && rm udocker-${UDOCKER_VERSION}.tar.gz |
| <!DOCTYPE html> | |
| <html> | |
| <head> | |
| <meta charset="utf-8" /> | |
| <meta name="generator" content="pandoc" /> | |
| <meta http-equiv="X-UA-Compatible" content="IE=EDGE" /> |