Skip to content

Instantly share code, notes, and snippets.

@chaoxu
Created August 22, 2011 15:57
Show Gist options
  • Select an option

  • Save chaoxu/1162740 to your computer and use it in GitHub Desktop.

Select an option

Save chaoxu/1162740 to your computer and use it in GitHub Desktop.
Solution for Dyalog competition in 2011
:Namespace Problem1
⎕IO ⎕ML ⎕WX ⎕RL1 0 0 1093457855
adjacency{
a([;1]) edge indexed by numbers
mat(2([;1]))0 create a empty matrix
mat[aa]1 set adjacent edges to 1
mat
}
allTrips{
(n m) find the dimensions
mat((n).=(n)) add one to the diagonal of the adjacency matrix
mat((.))mat repeatly generate connectivity matrix until a fixed point
}
distance{
(a x b y)((1÷180))×dms2dd¨, radian mode. a,b are latitude x,y are longitude
6371×(-2)(((2(x-y))×(2a)×(2b))+(1a)×(1b)) the distance formula
}
dms2dd{
val(360 60 60[1 2 3])÷3600 convert to degrees
val+([4]'SW')×-2×val if contains 'S' or 'W', degree is negative -val = (-2val+val)
}
reduceSegments{
airports
segments
m(segments)
closureallTrips airports adjacency segments
dis{
(a b)airports[;2 3]
a distance b
}
weightdis¨(airports[;1]segments) Find the weight of all edges
oldsegments,weight
indexweight order by weight
rindexindex this allows one to change back to the original order
edges(segments)[index]
mstm0
The reverse-delete algorithm greedly delete edges without changing connectivity
There are other algorithms,this is easiest to code in APL
Kruskal's algorithm should perform better by a constant factor
reverseDelete{
zallTrips airports adjacency(1,mst/edges)
mst[m-(1)]~zclosure
1
}
z(reverseDelete{0=})edges apply reverse-delete until all edges are tested
mstmst[rindex] change to original order
edgesedges[rindex]
new(mst/edges),mst/weight
(old new)
}
:EndNamespace
:Namespace Problem2
⎕IO ⎕ML ⎕WX ⎕RL1 0 0 21902271
addNoise{
vec ← pct addNoise vec
w
u(2*31)-1 largest number can be generated by the random generator
r(?()u)<×u product a list of numbers, see how many of them < the percentage
w[r/r]'X' put 'X' in positions that < percentage
w
}
patFind{ hits←text patFind(pattern tolerance)
wtoLower[1] search only lower case
t0(w)-[2]
m
l(w),/toLower n-wise reduce to get a list of substrings
z(+/¨({w=}¨l))t find positions above certain tolerance
zz/z
s(z)(w)+z-1 select substrings above the tolerance
z,¨()¨[s]
}
toLower{
w
low⎕UCS 96+26
i⎕Aw
j(i<⎕A)
w[j/(j)]low[j/i]
w
}
:EndNamespace
:Namespace Problem3
⎕IO ⎕ML ⎕WX ⎕RL1 0 0 1093457855
convolve{
r←cm convolve pattern
C
normal+/+/C produce the value for normalization
normalnormalnormal0
(py px) produce useful number for index calculation
(m n)
(hm hn)(m n)÷2
w
the convolve algorithm for one single block
c{
wrap{1+|-1} Index get wrapped around
y(m)+(-1)-hm
x(n)+(-1)-hn
+/+/w[py wrap y;px wrap x]×C
}
r((py).c(px))÷normal Do convolution on every position
1r0
}
convolveRGB{
r←cm convolveRGB patterm
Basically, convert things in components and use convolve
then combine the results
C
p(256 256 256)÷255
p{C convolve }¨((p[1;;])(p[2;;])(p[3;;]))
(65536 256 1)+.×p×255
}
gaussianMat{ mat←gaussianMat sigma
Apply the formula dirrectly
s2××
n1+2×3×
c{((-((n-1)÷2))-1)*2}
v{(1÷s)×(*(-((c )+(c ))÷s))}
(n)(.v)n
}
scale{ r←hv scale pattern
simply apply expand
(m n)
(y x)
(xn)\(ym)
}
toGray{
m256 256 256
+m÷3×255
}
toRGB{
n×255
+/n.×65536 256 1
}
:EndNamespace
:Namespace Problem4
⎕IO ⎕ML ⎕WX ⎕RL1 0 0 1093457855
freqMat{ f←freqMat train
f return the frequency a vector
f{+/¨4()¨'A' 'C' 'G' 'T'}
f¨,
}
makePWM{(+1)÷(4++/[;1])}
score{ s←pwm score seq
just applying the formula
n()[2]
p,/
f{
+/+/¨(,¨4()¨'A' 'C' 'G' 'T')/¨p
}
f¨n,/
}
top5{ r←pwm top5 seq
apply score to all substring and figure out the best
n()[2]
z( score )[5]
[(z)(n)+z-1]
}
:EndNamespace
:Namespace Problem5
⎕IO ⎕ML ⎕WX ⎕RL1 0 0 1093457855
cosine{ c←v1 cosine v2
⎕DIV1 make sure 0/0=0
a0.5*+.×
b0.5*+.×
c+.×
c÷a×b
}
dictCount{ counts←dict dictCount string
tokentokenize
word(token) produce a list of words in the text and dict, this way
freq+token.word it is faster than do a outer product of text and dict
r()0
r[word]freq
r
}
makeDict{,/(tokenize)¨}
search{ r←titles search query
Here I demostrate a fast algorithm (a slower but intuitive version--
the slowsearch, also exists in this name space)
The algorithm depend on the following properties
a cosine b = 0 whenever a[i]=0 v b[i]=0 for all i
Therefore one can filter out items without any word in the query.
There is no need to calculate the cosine with the entire dictionary
Only the dictionary of the titles contain the query is required, because
if c ← (a=0 ⍲ b=0)/a and d ← (a=0 ⍲ b=0)/b
It's easy to show a cosine b = c cosine d
q
qd(tokenize q) the dictionary for the query
z{/(tokenize )qd}¨
tz/ t contains the titles that contains word in the query
basis(makeDict t)
qv(basis dictCount q)
z{qv cosine basis dictCount }¨t z contains the cosine of each title with the query
t[(10z)z] show the top 10 that is above 0
}
slowsearch{ identical to search, but slower
dmakeDict
sd dictCount
z{s cosine d dictCount }¨ find cosine of query with each title
iz
z(z>0)/z
[(10z)i] show the top 10 that is above 0
}
toLower{
w
low⎕UCS 96+26
i⎕Aw
j(i<⎕A)
w[j/(j)]low[j/i]
w
}
tokenize{ tokens←tokenize string
char⎕A,⎕D,⎕UCS 96+26 list of useful characters
texttoLower make all characters lower case
ttextchar
⎕ML3 use the IBM's APL, I don't see a easier
zttext way to do this with Dyalog's APL
⎕ML0
z
}
:EndNamespace
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment