I would like to be able to cross-tabulate vectors or, more generally, factors in Julia. It may be that the capability already exists but, if so, I haven't been able to discover the function name.
Assuming that the capability needs to be created, I started by defining the types
abstract table
type crosstab <: table
levels::Tuple
table::Array{Uint32}
end
and a constructor from a single abstract array as
function crosstab{T}(A::AbstractArray{T})
dd = Dict{T, Uint32}()
for a in A dd[a] = has(dd, a)? dd[a] + 1: 1 end
levs = sort!(keys(dd))
counts = similar(levs, Uint32)
for i in 1:numel(levs) counts[i] = dd[levs[i]] end
crosstab((levs,), counts)
end
I think this does what I want.
julia> srand(1234321)
julia> rr = randi(411, (8252,))
8252-element Int64 Array:
269
248
337
324
180
86
43
242
143
44
:
281
311
405
32
141
64
102
241
192
371
julia> crosstab(rr)
crosstab(([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 ... 405, 406, 407, 408, 409, 410, 411],),[24, 14, 22, 26, 23, 15, 11, 23, 14, 12 ... 31, 24, 23, 20, 17, 19, 21, 16, 20])
Now I would like to scale up to multiple arrays of the same underlying type. The first phase is to find the unique elements in each array. I did this with both a Set and a Dict. The version using a Set
function uniqueS{T}(A::AbstractArray{T}, sorted::Bool)
ss = Set{T}()
for a in A add(ss, a) end
ans = [s for s in ss]
sorted? sort!(ans): ans
end
uniqueS(col) = uniqueS(col, false)
produces an array of
julia> typeof(uniqueS(rr, true))
Array{Any,1}
whereas the version using a Dict
function uniqueD{T}(A::AbstractArray{T}, sorted::Bool)
dd = Dict{T, Bool}()
for a in A dd[a] = true end
sorted? sort!(keys(dd)): keys(dd)
end
uniqueD(A) = uniqueD(A, false)
returns a Vector{T}
julia> typeof(uniqueD(rr, true))
Array{Int64,1}
It turns out that uniqueD is just a bit faster than uniqueS
What packages do you have loaded? I tried running your function, but "add" was not defined.