Skip to content

Instantly share code, notes, and snippets.

@epaule
epaule / elixir snippet
Created December 9, 2022 11:48
comments on CSV parsing using Elixir & Nimnble & Flow from https://www.poeticoding.com/processing-large-csv-files-with-elixir-streams/
NimbleCSV.define(MyParser, separator: ";", escape: "\"")
def load_stream file do
file
|> File.stream!()
|> Flow.from_enumerable()
|> Flow.map(&MyParser.parse_string/1)
|> Enum.count()
end
@epaule
epaule / cobiont_stats.cr
Last active November 23, 2022 13:06
quick thing to parse two cobiont files
#!/bin/env crystal
# cobiont_stats.cr contamination_file1 contamination_file_2
# returns a Array of String
def read_contamination_file(file : String)
ids = Array(String).new
File.each_line(file){|line|
ids << $1 if /REMOVE\s+(\S+)/.match(line)
}
return ids.uniq
@epaule
epaule / filter_merged.cr
Last active March 17, 2023 16:55
rough parser for a ASG contamination file
#!/bin/env crystal
require "option_parser"
phylum="Arthropoda|insect"
dir="20230226_qqAmaFero1.20230225.haplotigs.fa_asg_cobiont_check_run/collected_tables/"
OptionParser.parse do |parser|
parser.banner = "Usage: filter_merged --phylum xyz --infile <in.merged>"
parser.on("-p PHYLUM","--phylum=PHYLUM","Specifies the phylum(s) of the host separated by | [default=#{phylum}]"){|p|phylum=p}
parser.on("-d directory","--directory=DIR","merged ASG directory[default=#{dir}]"){|d|dir=d}
@epaule
epaule / filter_fasta.rb
Last active July 8, 2022 16:09
filter fasta file by size (less than)
#!/usr/bin/env ruby
# usage: ruby filter_fasta.rb size fasta.file
require 'bio'
s = ARGV.shift.to_i
Bio::FlatFile.auto(ARGF) do |ff|
ff.each do |entry|
if entry.seq.length < s
# SUPER_1_1 is 1085289420bp
# SUPER_1_2 joins and both become SUPER_1
samtools view -h split_1.mapped.bam |perl -pne 's/SUPER_1_1/SUPER_1/g' |perl -ne 'chomp;@F=split(/\t/,$_);next if $F[1] eq "SN:SUPER_1";if(/SN:SUPER_1_2/){$F[1]="SN:SUPER_1";$F[2]="LN:2170579067"};if($F[2] eq "SUPER_1_2"){$F[2]="SUPER_1";$F[3]+=1085289420;$F[7]+=1085289420 if $F[6] eq "="};if($F[6] eq "SUPER_1_2"){$F[6]="SUPER_1";$F[7]+=1085289420};print join "\t", @F;print "\n"' | /software/grit/conda/envs/snake_env/bin/PretextMap --sortby nosort --mapq 0 -o fixed.pretext --highRes
GitHub Repositories
=======================
contamination files: https://github.com/epaule/btk_sequences_to_remove
blast scripts: https://github.com/Aquatic-Symbiosis-Genomics-Project/BLAST-scripts
decon blast
===========
bash decon_blastBTK.sh <FASTA-file> <CSV file with ticks> <output directory>
Useful one-liners:
@epaule
epaule / rc_release_bump.rb
Last active January 24, 2022 13:46
create the new files for a rc release
#!/usr/bin/env ruby
require "fileutils"
class Assembly
def initialize(id,dir)
@id=id
@dir=dir
end
@epaule
epaule / fix_braker_gtf.pl
Last active October 8, 2021 15:55
fix the braker GTF, so it can be used with htsseq
#!/usr/bin/env perl
while (<>){
chomp;
@F=split(/\s\s+/);
if ($F[2] eq 'gene'){
my $t = "gene_id \"$F[-1]\";";
$F[-1]=$t;
}elsif($F[2] eq 'transcript'){
my $t="transcript_id \"$F[-1]\";";
@epaule
epaule / filter_tpf.pl
Created July 22, 2021 08:12
remove scaffolds from a TPF based on a decon file
#!/usr/bin/env perl
# filter_tpf.pl decon_file TPF
# * will leave gaps/etc in the file
my %ids;
open IN,$ARGV[0];
while (<IN>){
$ids{$1}=1 if /^REMOVE\s+(\w+)/
}
close IN;
@epaule
epaule / change_mrna_gff3.pl
Last active February 20, 2019 11:29
fiddles with the GFF3 mRNA spans based on CDSes
#!/usr/bin/env perl
my $inf = shift;
open IN, $inf;
my %cds;
# slurpy block
while (<IN>){