This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(defn nussinov | |
"Uses the Nussinov algorithm to compute an optimal RNA structure by | |
maximizing base pairs in the structure. The function requires an | |
input string s. The output is a list of base pair locations [i | |
j]. It will also print out the sequence and the structure so that it | |
can be visually inspected. An example sequence of 'GGGAAAUCC' will | |
give the answer ([2 6] [1 7] [0 8]). Locations are 0 based (ie seq | |
goes from 0 to n-1)." | |
[s] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(defn sto->aln | |
"Convert a stockhom format alignment file into its ClustalW | |
equivalent ALN format. STOIN is the filespec for the stockholm | |
format file and ALNOUT is the filespec for the resulting | |
conversion (it is overwritten if it already exists!)" | |
[stoin alnout] | |
(let [seq-lines (second (join-sto-fasta-lines stoin "")) | |
seq-lines (map (fn [[nm [uid sl]]] | |
[nm [uid (map #(str/join "" %) (partition-all 60 (str/replace-re #"\." "-" sl)))]]) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(let [[nm sq] (cond | |
(.startsWith l "#=GC SS_cons") | |
[(str/join " " (butlast (str/split #"\s+" l))) (last (str/split #"\s+" l))] ;;splits the line apart and hopefully creates vector ["#GC SS_cons" structure] | |
(.startsWith l "#") | |
(str/split #"\s{2,}+" l) | |
:else | |
(str/split #"\s+" l)) | |
prev (get m nm [(gen-uid) ""])] | |
(assoc m nm [(first prev) | |
(str (second prev) sq)])) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(ns smith-waterman) | |
(defn- array-keys | |
"positions of the array to work on" | |
[s1 s2] | |
(for [i (range (count s1)) ;initialize scoring array. similar to a sparse matrix | |
j (range (count s2))] | |
[i j])) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/perl | |
my $p = 'AAMAAT-ATAMAAAT-AT'; #example protein | |
my @l = find_orfs($p); #calls subfunction | |
my ($longest, $longest_aa) = longest_orf($p, @l); #calls subfunction | |
print "protein = $longest_aa longest = $longest\n"; | |
sub find_orfs ($protein) { | |
my ($protein) = @_; | |
print "protein $protein\n"; | |
my $len = length ($protein); |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(defn join-sto-fasta-lines [infilespec origin] | |
(let [[seq-lines gc-lines] (sto-GC-and-seq-lines infilespec) | |
gc-lines (if (not= origin "") | |
(concat (take 1 gc-lines) [origin] (drop 1 gc-lines)) | |
gc-lines) | |
recombined-seqs (sort-by | |
#(-> % second first) | |
(vec | |
(reduce |
NewerOlder