Skip to content

Instantly share code, notes, and snippets.

View genomewalker's full-sized avatar

Antonio Fernandez-Guerra genomewalker

View GitHub Profile
@genomewalker
genomewalker / miniprot.md
Last active August 5, 2024 07:47
Evaluate miniprot results

Let's process the PAF output from miniprot to get some stats:

for i in *paf; do python ../paf-stats.py -i ${i} -o ${i/paf/tsv} ; done
@genomewalker
genomewalker / get-rna.py
Last active August 4, 2024 06:29
Get xRNA from Prokka gff
import argparse
import gzip
from Bio import SeqIO
from Bio.SeqFeature import SeqFeature, FeatureLocation
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
def extract_rrna_trna_features(input_file, output_file):
# Determine if the input file is gzipped
if input_file.endswith(".gz"):
@genomewalker
genomewalker / get-pub-data.R
Last active June 28, 2024 09:34
Code for making publication lists
library(scholar) # to get publications and impact factors
library(stringr) # to modify text
library(cowplot) # for plotting
library(ggplot2)
library(ggrepel)
library(lemon)
library(dplyr)
# Set variables
Scholar_ID <- "wA7Hrk8AAAAJ"
@genomewalker
genomewalker / Snakefile
Created June 19, 2024 19:23
map-by-node_workflow
‎‎​
@genomewalker
genomewalker / assembly-stats.md
Last active June 20, 2023 04:54
NCBI assembly stats

In our workflow, we utilize the distinct groups in which NCBI organizes their data. These groups can be found in column 25 of the assembly_summary.txt file, as described here. The groups are as follows:

  • archaea
  • bacteria
  • fungi
  • invertebrate
  • metagenomes
  • other
  • plant
  • protozoa
  • vertebrate_mammalian
@genomewalker
genomewalker / taxdb-comb-data.R
Last active June 15, 2023 21:49
Combine taxdb data
library(tidyverse)
# Read in the data
setwd("/maps/projects/lundbeck/scratch/taxDB/v6/metadata/src_files")
ncbi_assm_stats <- list.files(".", pattern = "genome_metadata.txt", full.names = TRUE)
ncbi_assm_stats <- map_dfr(ncbi_assm_stats, function(X) {
read_tsv(X, col_names = TRUE)
}) %>%
select(-filename) %>%

Diferences on bowtie2 alignment time between concatenated and non-concatenated genomes

  • Concat number of references: 52,517
  • No concat number of references: 8,086,857
  • Number of reads: 164,369,171
# CONCAT
$ bowtie2-build --seed 42 --threads 24 genomes-concat.fa genomes-concat

...
@genomewalker
genomewalker / compare-ar-fastq.py
Last active March 31, 2023 22:26
Find LCS in read pairs
from tqdm import tqdm
from collections import defaultdict
import argparse
from concurrent.futures import ThreadPoolExecutor, as_completed
from multiprocessing import Pool
import gzip
from itertools import zip_longest
from mimetypes import guess_type
from functools import partial
from cdifflib import CSequenceMatcher
@genomewalker
genomewalker / jamf.md
Created November 3, 2022 19:51 — forked from a7ul/jamf.md
removing all restrictions on jamf managed macos device - Provided you have root access.

REMOVE JAMF RESTRICTIONS ON MAC

REMOVE ONLY RESTRICTIONS

sudo jamf removeMDMProfile removes all restrictions

sudo jamf manage brings back all restrictions and profiles

REMOVE ALL RESTRICTIONS AND DISABLE JAMF BINARIES WHILE KEEPING YOUR ACCESS TO VPN AND OTHER SERVICES

sudo jamf removeMDMProfile removes all restrictions

@genomewalker
genomewalker / get-constituent-genomes.jpg
Last active April 26, 2024 08:59
Get constituent genomes from metagenomes
get-constituent-genomes.jpg