This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| #!/usr/bin/env python | |
| """ | |
| This script processes SAM (Sequence Alignment/Map format) inputs from standard | |
| input and extracts alignment information that is then provided in a tab-separated | |
| table. The following fields are produced: query, target, query_length, query_start, | |
| query_end, target_start, target_end, alignment_length, alignment_identity. | |
| This script was designed for use with SAM files produced by minimap2. However, | |
| it will work with any SAM data that: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| from pathlib import Path | |
| from typing import Iterator, Optional, Union | |
| import polars as pl | |
| from needletail import parse_fastx_file | |
| from polars.io.plugins import register_io_source | |
| def scan_fastx(fastx_file: Union[str, Path]) -> pl.LazyFrame: | |
| schema = pl.Schema( |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| #!/usr/bin/env python | |
| import json | |
| import re | |
| import sys | |
| from typing import Generator, Dict, Any, Optional | |
| import requests | |
| from tqdm import tqdm |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| duckdb -c " | |
| INSTALL httpfs; | |
| LOAD httpfs; | |
| INSTALL parquet; | |
| LOAD parquet; | |
| COPY ( | |
| SELECT * | |
| FROM read_parquet('s3://sra-pub-metadata-us-east-1/sra/metadata/*') | |
| ) TO STDOUT WITH (FORMAT CSV, DELIMITER E'\t', HEADER);" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| from typing import Sequence | |
| import math | |
| def find_cutoff(values: Sequence[float]) -> float: | |
| """ | |
| Determine the cutoff point in a biphasic distribution curve by identifying | |
| the "bending point" where the curve transitions from slowly growing values | |
| to rapidly growing values, using the maximum perpendicular distance method. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| #!/usr/bin/env python | |
| import shutil | |
| from pathlib import Path | |
| from typing import Literal | |
| import pyarrow as pa | |
| import pyarrow.parquet as pq | |
OlderNewer