Skip to content

Instantly share code, notes, and snippets.

View genomewalker's full-sized avatar

Antonio Fernandez-Guerra genomewalker

View GitHub Profile
@genomewalker
genomewalker / assembly-stats.md
Last active June 20, 2023 04:54
NCBI assembly stats

In our workflow, we utilize the distinct groups in which NCBI organizes their data. These groups can be found in column 25 of the assembly_summary.txt file, as described here. The groups are as follows:

  • archaea
  • bacteria
  • fungi
  • invertebrate
  • metagenomes
  • other
  • plant
  • protozoa
  • vertebrate_mammalian
@genomewalker
genomewalker / Snakefile
Created June 19, 2024 19:23
map-by-node_workflow
‎‎​
@genomewalker
genomewalker / get-pub-data.R
Last active June 28, 2024 09:34
Code for making publication lists
library(scholar) # to get publications and impact factors
library(stringr) # to modify text
library(cowplot) # for plotting
library(ggplot2)
library(ggrepel)
library(lemon)
library(dplyr)
# Set variables
Scholar_ID <- "wA7Hrk8AAAAJ"
@genomewalker
genomewalker / get-rna.py
Last active August 4, 2024 06:29
Get xRNA from Prokka gff
import argparse
import gzip
from Bio import SeqIO
from Bio.SeqFeature import SeqFeature, FeatureLocation
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
def extract_rrna_trna_features(input_file, output_file):
# Determine if the input file is gzipped
if input_file.endswith(".gz"):
@genomewalker
genomewalker / miniprot.md
Last active August 5, 2024 07:47
Evaluate miniprot results

Let's process the PAF output from miniprot to get some stats:

for i in *paf; do python ../paf-stats.py -i ${i} -o ${i/paf/tsv} ; done