This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#The LCS (Longest common substring) problem is to find the longest string which is a substring in two or more strings. | |
#Unlike subsequence, substring emphasizes on its continuity. | |
#Kmer refers to all the possible substrings whose length is K in a string. It is widely used in sequence assembly. | |
#To compare two sequences, basically is to find all the common Kmers between the two strings. | |
#In order to extend this method to multisequence alignment, LCS is not a very good idea because the longest substring might not be exist in the next string. | |
#Here use DP to list all common substring between two sequence. And then compare with other sequences | |
#data_set_is_on_the_bottom | |
from Bio import SeqIO |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#Blast_blastp shell command muiti2multi comparison | |
#formatdb | |
formatdb -i xx -p F | |
#blastp | |
for k in ./faa; do blastall -p blastp -i xx -d $k -e 1e-3 -o xx_${k##*/}.txt;done | |
#.faa.txt_rename | |
rename -v s/\.faa.txt/\.txt/ * | |
#zusammen | |
for k in ../faa_prokka/*.faa; do m=${k##*/}; for j in ../faa_prokka/*.faa; do n=${j##/};if ["$k"!="$j"]; then blastall -p blastp li $k -d $j -e 1e-5 -o ${m%.*}_${n%.*}.txt;fi;done;done; |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
k = 15 | |
n = 9838 | |
f0 = open('a.txt','r') | |
f = open('c.txt','w') | |
s = [line.strip('\n') for line in f0.readlines()] | |
def sum3(ss,nn): | |
oo = [[ss[i]+ss[j] for j in range(i)] for i in range(nn)] | |
for i in range(nn): | |
probe = 0 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#Multiple alignment result of Blast | |
#query/alignment/identity/positive/coverage are collected | |
#Only for the best result | |
from Bio.Blast import NCBIStandalone | |
import os, sys | |
path='/.../.../...' | |
for i in os.listdir(path): | |
result_handle = open(str(i)) | |
blast_parser = NCBIStandalone.BlastParser() |