Skip to content

Instantly share code, notes, and snippets.

View sinamajidian's full-sized avatar

Sina Majidian sinamajidian

View GitHub Profile
@sinamajidian
sinamajidian / run_metabuli.MD
Last active July 20, 2025 20:43
How we ran metabuli for the Movi color study

In our Movi color study, we ran metabuli using the following script. We used two read datasets from CAMI (long and short read) for metagenomic classification.

Install

conda create -n metab python=3.12
conda activate metab

wget https://mmseqs.com/metabuli/metabuli-linux-avx2.tar.gz
@sinamajidian
sinamajidian / README.MD
Last active July 20, 2025 20:49
A simple prompt with LLM in python

Code is from Deep Learning AI course How Transformer LLMs Work

Set up

conda create -n llm python=3.12
conda activate llm
conda install conda-forge::transformers
conda install conda-forge::jupyterlab  
conda install pytorch::pytorch
@sinamajidian
sinamajidian / README.MD
Last active September 4, 2025 17:21
Convert Bird genotype data to BED PoMoCNV

The input is the genotype data from Supplementary file S13 from the study by Edwards et al. There are 14112 genes for the 94 samples. I assume the first part of the sample name AA_SRR23446543#1 is the population ID and the last letter is the haplotype ID. So there are 15 samples for AI (A. insularis), 15 for AW (A. woodehouseii), 14 for AC (A. coerulescens), and 1 for each of AA (A. californica), CY (Cyanocorax yucatanicus), and CS (Cyanocitta cristata).

Note that coordiantes are arbitary, 100 bases per each gene. If we have the gene length, we could make it accurate, probably shouldn't matter. We could double check whether CNV length matters in the PoMoCNV framework.

f= "data_S13.csv"
f_read= open(f,'r')
@sinamajidian
sinamajidian / eval_kraken.py
Last active September 5, 2025 13:38
code for comparing kraken output two column with truth and calculate F1 when true reads defined at species level
import os
import sys
import pandas as pd
import sys
import numpy as np
"""
python eval_kraken kraken_out.1st4th.csv true.csv nodes.dmp taxa.ids > out .stat
For comparing kraken output with truth and calculate F1 when true reads defined at species level