Antonio Fernandez-Guerra genomewalker

In our workflow, we utilize the distinct groups in which NCBI organizes their data. These groups can be found in column 25 of the assembly_summary.txt file, as described here. The groups are as follows:

archaea
bacteria
fungi
invertebrate
metagenomes
other
plant
protozoa
vertebrate_mammalian

Let's process the PAF output from miniprot to get some stats:

for i in *paf; do python ../paf-stats.py -i ${i} -o ${i/paf/tsv} ; done

	library(scholar) # to get publications and impact factors
	library(stringr) # to modify text
	library(cowplot) # for plotting
	library(ggplot2)
	library(ggrepel)
	library(lemon)
	library(dplyr)

	# Set variables
	Scholar_ID <- "wA7Hrk8AAAAJ"

	import argparse
	import gzip
	from Bio import SeqIO
	from Bio.SeqFeature import SeqFeature, FeatureLocation
	from Bio.Seq import Seq
	from Bio.SeqRecord import SeqRecord

	def extract_rrna_trna_features(input_file, output_file):
	# Determine if the input file is gzipped
	if input_file.endswith(".gz"):