Created
June 4, 2018 11:53
-
-
Save klprint/1cf87f8ace1353a96ddaaf764abd5eb6 to your computer and use it in GitHub Desktop.
Parse a 10x chromium sparse matrix output into a single file, inserting the ENSEMBL gene ID and the cell barcode
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
# The following script parses the 10x chromoium sparse matrix. | |
# It replaces the First column with the ENSEMBL gene ID and the second, | |
# if needed, with the cell barcode (just uncomment the second awk script). | |
# It needs the three 10x chromium outputs as follows: | |
# 1. genes.tsv | |
# 2. matrix.mtx | |
# 3. barcodes.tsv | |
# How does it do that? | |
# The first awk generates a hashtable (h) which stores the linenumber | |
# where a specific gene is located in the genes.tsv file. | |
# Next, it goes through each line of the matrix.mtx file and replaces the first column | |
# with the appropriate ENSEMBL gene ID. | |
# If the second awk statement is uncommented, the same is done with the barcodes.tsv file, | |
# replacing the second column in matrix.mtx with the cell barcode. | |
# The output is saved in the file parsed_sparse.mtx | |
awk 'NR == FNR {h[NR] = $1; next} {print h[$1],$2,$3}' genes.tsv matrix.mtx | \ | |
# awk 'NR == FNR {h[NR] = $1; next} {print $1,h[$2],$3}' barcodes.tsv - | \ | |
tail -n +4 \ | |
> parsed_sparse.mtx |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment