Skip to content

Instantly share code, notes, and snippets.

View pgonzale60's full-sized avatar

Pablo Gonzalez pgonzale60

View GitHub Profile
@pgonzale60
pgonzale60 / error_profile_simulation_for_tidk.sh
Created June 5, 2024 13:29
Tests tidk explore v0.2.41 on different error rates and telomere lengths
#!/bin/bash
# Define the parameters
sequence_lengths=(600 12000 30000)
error_rates=(0.1 0.05 0.01)
num_sequences=1000
motif="TTAGGG"
simulation_dir="simulation"
tidk_output_dir="tidk_output"
summary_file="tidk_explore_summary_results.tsv"
@pgonzale60
pgonzale60 / trf2tsv.sh
Created March 7, 2024 20:51
Run tandem repeat finder and convert the output to TSV
ln -s $assembly ${strain}.fasta
$trf_cmd ${strain}.fasta 2 7 7 80 10 50 2000 -f -h -d -m
rm ${strain}.fasta
mv ${strain}.fasta.2.7.7.80.10.50.2000.dat ${strain}.trf.dat
mv ${strain}.fasta.2.7.7.80.10.50.2000.mask ${strain}.trf.mask
grep -P '^[S0-9]' ${strain}.trf.dat | cut -d ' ' -f 1,2,3,4,6,7,8,13,14 | awk '{if(\$1 ~ /Sequence/){chr=\$2} else {print chr, \$0}}' | tr [:blank:] '\t' > ${strain}.trf.tsv
rm ${strain}.fasta
@pgonzale60
pgonzale60 / softclipped_telo_coords.pl
Created September 21, 2023 10:29
extract positions of softclipped telomeres from BAM
#!/usr/bin/env perl
use strict;
use warnings;
use Getopt::Long;
use File::Basename;
use Data::Dumper;
# globals
@pgonzale60
pgonzale60 / hifi2pairedend.py
Created August 3, 2023 21:00
Script to convert HiFi reads to paired-end fastq reads. This script will extract substrings from moving windows per long read, with each read being 150bp long and spaced by 300bp.
import argparse
from Bio import SeqIO
from Bio.SeqRecord import SeqRecord
from Bio.Seq import Seq
# define the window size, step size and the distance between paired reads
window_size = 150
step_size = 100
paired_distance = 300
phred_score = 41 # maximum score in Illumina 1.8+ fastq format
@pgonzale60
pgonzale60 / get_miltel_coords.R
Created June 28, 2023 18:35
This program gets coordinates in TSV file, after filtering the output of the miltel program from the bleties package
# Function to remove alphabetic and special characters from a string
get_integers <- function(x) {
sub(".+[^0-9]([0-9]+)", "\\1", x)
}
# Print script usage and information
print_usage <- function() {
cat("------------------------------------------------------------------------\n")
cat("Script Usage:\n")
cat("This program processes a TSV file generated by the miltel program from the bleties package.\n")
@pgonzale60
pgonzale60 / explore_miltel_threshold.R
Created June 28, 2023 18:32
This program processes a TSV file generated by the miltel program from the bleties package
# Function to remove alphabetic and special characters from a string
get_integers <- function(x) {
sub(".+[^0-9]([0-9]+)", "\\1", x)
}
# Print script usage and information
print_usage <- function() {
cat("------------------------------------------------------------------------\n")
cat("Script Usage:\n")
cat("This program processes a TSV file generated by the miltel program from the bleties package.\n")
@pgonzale60
pgonzale60 / multispecies_synteny_with_diminution_freeze.R
Created March 13, 2023 14:04
Oxford plot with sequence coordinates transformed to genome wide coordinates
library(tidyverse)
# requires(gtools) # for mixedsort function with roman numerals
### Funcions ####
# Function for loading buscos
read_busco <- function(buscoFile){
read_tsv(buscoFile,
col_names = c("Busco_id", "Status", "Sequence",
"start", "end", "strand", "Score", "Length",
"OrthoDB_url", "Description"),
@pgonzale60
pgonzale60 / bam_hic2mcool.sh
Created April 6, 2022 14:37
Going from a BAM of mapped Hi-C reads to a balanced multiresulltion cool file and displayed in higlass
# singularity pull docker://4dndcic/4dn-hic:v42.2
dn4="/software/singularity-v3.6.4/bin/singularity run -B /lustre:/lustre /lustre/scratch123/tol/teams/blaxter/projects/tol-nemotodes/sw/.singularity/4dn-hic_v42.sif"
$dn4 run-bam2pairs.sh merge.mkdup.sorted.bam nxOscSper1.1
basename=nxOscSper1.1.4dn
cut -f1,2 $ref.fai > $ref.fai_sizes # vim adjust order of chr
@pgonzale60
pgonzale60 / multispecies_oxford_plot.R
Last active August 25, 2022 13:10
Compares oxford plots of multiple species using shared coordinates rather than facets. Designed for BUSCO's full table.
library(tidyverse)
# User input ####
x_species <- c("nxOscDolc1.1", "nxOscOnir1.2")
y_species <- c("nxOscSper1.1", "nxOscSpeu1.1")
# path to BUSCO table directory
# path must end in "/" (e.g. busco_dir/)
busco_dir <- "analyses/paper1/prelim_to_20220220/nemaChromQC/"
@pgonzale60
pgonzale60 / busco_to_synteny_plot.R
Last active August 25, 2022 13:07
Take in two BUSCO full table and generate a dot plot of colinearity from hits in long sequences
library(tidyverse)
library(scales)
library(gtools)
min_seq_size <- 5e5 # Sequences shorter than this will not be plotted (the size of the sequence is inferred from the maximum coordinate of a BUSCO)
ref_busco <- "~/Downloads/CABPSW02.yahs_scaffolds_final_nematoda_odb10_full_table.tsv" # path to reference file
query_busco <- "~/Downloads/APS7_sophie.v4_nopipe.fasta.yahs_scaffolds_final_nematoda_odb10_full_table.tsv" # path to query file
ref_species <- "A. rhodensis" # text in x axis of plot
query_species <- "A. freiburgensis" # text in y axis of plot