Skip to content

Instantly share code, notes, and snippets.

View sinamajidian's full-sized avatar

Sina Majidian sinamajidian

View GitHub Profile
@sinamajidian
sinamajidian / README.MD
Last active July 20, 2025 20:49
A simple prompt with LLM in python

Code is from Deep Learning AI course How Transformer LLMs Work

Set up

conda create -n llm python=3.12
conda activate llm
conda install conda-forge::transformers
conda install conda-forge::jupyterlab  
conda install pytorch::pytorch
@sinamajidian
sinamajidian / run_metabuli.MD
Last active July 20, 2025 20:43
How we ran metabuli for the Movi color study

In our Movi color study, we ran metabuli using the following script. We used two read datasets from CAMI (long and short read) for metagenomic classification.

Install

conda create -n metab python=3.12
conda activate metab

wget https://mmseqs.com/metabuli/metabuli-linux-avx2.tar.gz
@sinamajidian
sinamajidian / README.MD
Last active July 20, 2025 20:43
calculate completeness score
import pyham
import logging

treeFile=fastoma_out+'/species_tree_checked.nwk'
orthoxmlFile=fastoma_out+'/FastOMA_HOGs.orthoxml'

logging.basicConfig(format='%(asctime)s %(levelname)-8s %(message)s', level=logging.INFO, datefmt='%Y-%m-%d %H:%M:%S')
@sinamajidian
sinamajidian / README.md
Last active February 15, 2025 20:11
PhylogeneticTree using FastOMA

PhylogeneticTree with FastOMA

Updating codes frm the F1000 paper on PhylogeneticTree on github for FastOMA

@sinamajidian
sinamajidian / README.md
Last active July 20, 2025 20:44
Impute-first pipeline for variant calling

Impute-first pipeline for variant calling

This is bash script is based on the Impute-First github and the preprint.

Inputs:

  • Reference genome
  • HGSVC2 Reference panel
  • PLINK genetic mac
  • Novaseq HG002 sequencing reads
@sinamajidian
sinamajidian / README.md
Created December 5, 2024 17:50
cannot find -lcurl

I faced with this error several times and searching on the net only results in using sudo apt-get.

/usr/bin/ld: cannot find -lcurl
/usr/bin/ld: cannot find -lbz2
collect2: error: ld returned 1 exit status
@sinamajidian
sinamajidian / ead_chop.py
Created March 29, 2022 10:46
Chop each read in a fastq file into few chunks
#!/usr/bin/python3
import numpy as np
from sys import argv
file_fq_input_addrss = argv[1]
file_fq_output_addrss = argv[2]
file_fq_input= open(file_fq_input_addrss,'r');
@sinamajidian
sinamajidian / README.md
Last active July 20, 2025 20:44
Create a diploid genome using SURVIVOR

A diploid genome using SURVIVOR

Using the following bash code, you can create a diploid genome using SURVIVOR. Finally, you will have three files :

  • Two fasta file: sim1.fasta and sim2.fasta.
  • Truely phased vcf file: sim_e_merg.vcf.

Some lines of the intermediate files:

@sinamajidian
sinamajidian / depth_mean_stddev.py
Created October 10, 2020 07:27 — forked from williamrowell/depth_mean_stddev.py
Calculate the mean and standard deviation of a coverage depth distribution as if it were normally distributed.
#!/usr/bin/env python3
"""Coverage mean and standard deviation of autosomes
Estimate the mean and standard deviation from a mosdepth coverage BED for
positions with coverage in the range (0, 2 * non-zero mode). This estimate
behaves well for PacBio HiFi WGS of human germline aligned to either hs37d5 and
GRCh38, and may be useful for other situations as well.
$ bash mosdepth --threads 3 --no-per-base --by 500 -m "${BAM%.*}.median" "${BAM}"
$ tabix ${BAM%.*}.median.regions.bed.gz {1..22} | python depth_mean_stddev.py
@sinamajidian
sinamajidian / README.md
Last active March 31, 2022 22:33
Extracting the high coverage region (as bed file) from a BAM file

A simpe code for extracting the regions with high coverage (>depth_treshold) from a bed file. The input of the code is a bed file containing rows of regions with its depth as following format

21	9424000	9424500	69.00
21	9424500	9425000	69.00
21	9425000	9425500	67.00
21	9425500	9426000	66.00
21	9426000	9426500	65.00
21	9426500	9427000	66.00