Skip to content

Instantly share code, notes, and snippets.

@ship561
ship561 / gist:2876766
Created June 5, 2012 18:32
Implementation of Nussinov RNA folding algorithm in Clojure. This algorithm will find the optimal structure with the max number of base pairs.
(defn nussinov
"Uses the Nussinov algorithm to compute an optimal RNA structure by
maximizing base pairs in the structure. The function requires an
input string s. The output is a list of base pair locations [i
j]. It will also print out the sequence and the structure so that it
can be visually inspected. An example sequence of 'GGGAAAUCC' will
give the answer ([2 6] [1 7] [0 8]). Locations are 0 based (ie seq
goes from 0 to n-1)."
[s]
@ship561
ship561 / gist:1780967
Created February 9, 2012 16:37
convert a stockholm file to clustal w format.
(defn sto->aln
"Convert a stockhom format alignment file into its ClustalW
equivalent ALN format. STOIN is the filespec for the stockholm
format file and ALNOUT is the filespec for the resulting
conversion (it is overwritten if it already exists!)"
[stoin alnout]
(let [seq-lines (second (join-sto-fasta-lines stoin ""))
seq-lines (map (fn [[nm [uid sl]]]
[nm [uid (map #(str/join "" %) (partition-all 60 (str/replace-re #"\." "-" sl)))]])
@ship561
ship561 / gist:1584616
Created January 9, 2012 19:55
let statement in joining sto lines
(let [[nm sq] (cond
(.startsWith l "#=GC SS_cons")
[(str/join " " (butlast (str/split #"\s+" l))) (last (str/split #"\s+" l))] ;;splits the line apart and hopefully creates vector ["#GC SS_cons" structure]
(.startsWith l "#")
(str/split #"\s{2,}+" l)
:else
(str/split #"\s+" l))
prev (get m nm [(gen-uid) ""])]
(assoc m nm [(first prev)
(str (second prev) sq)]))
@ship561
ship561 / gist:1456807
Last active January 15, 2020 23:52
smith waterman and needleman wunsch in clojure. sample code to solve a simple alignment. returns a map of all top scoring alignments.
(ns smith-waterman)
(defn- array-keys
"positions of the array to work on"
[s1 s2]
(for [i (range (count s1)) ;initialize scoring array. similar to a sparse matrix
j (range (count s2))]
[i j]))
@ship561
ship561 / gist:1455761
Created December 10, 2011 17:56
find the longest orf in an aa seq.
#!/usr/bin/perl
my $p = 'AAMAAT-ATAMAAAT-AT'; #example protein
my @l = find_orfs($p); #calls subfunction
my ($longest, $longest_aa) = longest_orf($p, @l); #calls subfunction
print "protein = $longest_aa longest = $longest\n";
sub find_orfs ($protein) {
my ($protein) = @_;
print "protein $protein\n";
my $len = length ($protein);
@ship561
ship561 / gist:1411240
Created November 30, 2011 22:06
update on sto reading
(defn join-sto-fasta-lines [infilespec origin]
(let [[seq-lines gc-lines] (sto-GC-and-seq-lines infilespec)
gc-lines (if (not= origin "")
(concat (take 1 gc-lines) [origin] (drop 1 gc-lines))
gc-lines)
recombined-seqs (sort-by
#(-> % second first)
(vec
(reduce