Code is from Deep Learning AI course How Transformer LLMs Work
conda create -n llm python=3.12
conda activate llm
conda install conda-forge::transformers
conda install conda-forge::jupyterlab
conda install pytorch::pytorch
In our Movi color study, we ran metabuli using the following script. We used two read datasets from CAMI (long and short read) for metagenomic classification.
conda create -n metab python=3.12
conda activate metab
wget https://mmseqs.com/metabuli/metabuli-linux-avx2.tar.gz
import pyham
import logging
treeFile=fastoma_out+'/species_tree_checked.nwk'
orthoxmlFile=fastoma_out+'/FastOMA_HOGs.orthoxml'
logging.basicConfig(format='%(asctime)s %(levelname)-8s %(message)s', level=logging.INFO, datefmt='%Y-%m-%d %H:%M:%S')
This is bash script is based on the Impute-First github and the preprint.
Inputs:
I faced with this error several times and searching on the net only results in using sudo apt-get
.
/usr/bin/ld: cannot find -lcurl
/usr/bin/ld: cannot find -lbz2
collect2: error: ld returned 1 exit status
#!/usr/bin/python3 | |
import numpy as np | |
from sys import argv | |
file_fq_input_addrss = argv[1] | |
file_fq_output_addrss = argv[2] | |
file_fq_input= open(file_fq_input_addrss,'r'); |
Using the following bash code, you can create a diploid genome using SURVIVOR. Finally, you will have three files :
sim1.fasta
and sim2.fasta
.sim_e_merg.vcf
.Some lines of the intermediate files:
#!/usr/bin/env python3 | |
"""Coverage mean and standard deviation of autosomes | |
Estimate the mean and standard deviation from a mosdepth coverage BED for | |
positions with coverage in the range (0, 2 * non-zero mode). This estimate | |
behaves well for PacBio HiFi WGS of human germline aligned to either hs37d5 and | |
GRCh38, and may be useful for other situations as well. | |
$ bash mosdepth --threads 3 --no-per-base --by 500 -m "${BAM%.*}.median" "${BAM}" | |
$ tabix ${BAM%.*}.median.regions.bed.gz {1..22} | python depth_mean_stddev.py |
A simpe code for extracting the regions with high coverage (>depth_treshold
) from a bed file.
The input of the code is a bed file containing rows of regions with its depth as following format
21 9424000 9424500 69.00
21 9424500 9425000 69.00
21 9425000 9425500 67.00
21 9425500 9426000 66.00
21 9426000 9426500 65.00
21 9426500 9427000 66.00