Created
July 16, 2020 16:29
-
-
Save aaronwolen/e0d88a3b0b44139418315775f910f572 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| { | |
| "cells": [ | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "To make things easy I used one of the test arrays included with the repo" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 1, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "import tiledbvcf\n", | |
| "uri = \"libtiledbvcf/test/inputs/arrays/v2/ingested_2samples_GT_DP_PL\"\n", | |
| "ds = tiledbvcf.Dataset(uri, mode = \"r\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Extracting the `fmt` attribute for a single variant whose VCF record looks like this:\n", | |
| "\n", | |
| "```\n", | |
| "1\t13354\t.\tT\t<NON_REF>\t.\tLowQual\tEND=13374\tGT:GQ:MIN_DP:DP:PL\t0/0:42:14:15:0,24,360\n", | |
| "```\n", | |
| "\n", | |
| "Pulling out the `fmt` field returns the integer-representation of the data, similar to what you saw." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 3, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "0 [2, 0, 0, 0, 71, 81, 0, 1, 0, 0, 0, 1, 0, 0, 0...\n", | |
| "Name: fmt, dtype: object" | |
| ] | |
| }, | |
| "execution_count": 3, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "df = ds.read(attrs = [\"pos_start\", \"fmt\"], regions=['1:13354-13354'], samples = [\"HG00280\"])\n", | |
| "df.fmt" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Converting it back to bytes reveals the names of the 2 `info` attributes that are not explicitly stored as attributes: `GQ` and `MIN_DP`, which is good.\n" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 4, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "b'\\x02\\x00\\x00\\x00GQ\\x00\\x01\\x00\\x00\\x00\\x01\\x00\\x00\\x00*\\x00\\x00\\x00MIN_DP\\x00\\x01\\x00\\x00\\x00\\x01\\x00\\x00\\x00\\x0e\\x00\\x00\\x00'" | |
| ] | |
| }, | |
| "execution_count": 4, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "df.fmt[0].tobytes()" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| " And accessing those attributes directly returns the expected values:\n" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 2, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/html": [ | |
| "<div>\n", | |
| "<style scoped>\n", | |
| " .dataframe tbody tr th:only-of-type {\n", | |
| " vertical-align: middle;\n", | |
| " }\n", | |
| "\n", | |
| " .dataframe tbody tr th {\n", | |
| " vertical-align: top;\n", | |
| " }\n", | |
| "\n", | |
| " .dataframe thead th {\n", | |
| " text-align: right;\n", | |
| " }\n", | |
| "</style>\n", | |
| "<table border=\"1\" class=\"dataframe\">\n", | |
| " <thead>\n", | |
| " <tr style=\"text-align: right;\">\n", | |
| " <th></th>\n", | |
| " <th>pos_start</th>\n", | |
| " <th>fmt_GQ</th>\n", | |
| " <th>fmt_MIN_DP</th>\n", | |
| " </tr>\n", | |
| " </thead>\n", | |
| " <tbody>\n", | |
| " <tr>\n", | |
| " <th>0</th>\n", | |
| " <td>13354</td>\n", | |
| " <td>42</td>\n", | |
| " <td>14</td>\n", | |
| " </tr>\n", | |
| " </tbody>\n", | |
| "</table>\n", | |
| "</div>" | |
| ], | |
| "text/plain": [ | |
| " pos_start fmt_GQ fmt_MIN_DP\n", | |
| "0 13354 42 14" | |
| ] | |
| }, | |
| "execution_count": 2, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "ds = tiledbvcf.Dataset(uri, mode = \"r\")\n", | |
| "ds.read(attrs = [\"pos_start\", \"fmt_GQ\", \"fmt_MIN_DP\"], regions=['1:13354-13354'], samples = [\"HG00280\"])" | |
| ] | |
| } | |
| ], | |
| "metadata": { | |
| "kernelspec": { | |
| "display_name": "Python 3", | |
| "language": "python", | |
| "name": "python3" | |
| }, | |
| "language_info": { | |
| "codemirror_mode": { | |
| "name": "ipython", | |
| "version": 3 | |
| }, | |
| "file_extension": ".py", | |
| "mimetype": "text/x-python", | |
| "name": "python", | |
| "nbconvert_exporter": "python", | |
| "pygments_lexer": "ipython3", | |
| "version": "3.7.6" | |
| } | |
| }, | |
| "nbformat": 4, | |
| "nbformat_minor": 2 | |
| } |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment