Skip to content

Instantly share code, notes, and snippets.

@dmbates
Created May 31, 2012 16:00
Show Gist options
  • Save dmbates/2844387 to your computer and use it in GitHub Desktop.
Save dmbates/2844387 to your computer and use it in GitHub Desktop.
Cross-tabulation in Julia

I would like to be able to cross-tabulate vectors or, more generally, factors in Julia. It may be that the capability already exists but, if so, I haven't been able to discover the function name.

Assuming that the capability needs to be created, I started by defining the types

abstract table
type crosstab <: table
    levels::Tuple
    table::Array{Uint32}
end

and a constructor from a single abstract array as

function crosstab{T}(A::AbstractArray{T})
    dd = Dict{T, Uint32}()
    for a in A dd[a] = has(dd, a)? dd[a] + 1: 1 end
    levs = sort!(keys(dd))
    counts = similar(levs, Uint32)
    for i in 1:numel(levs) counts[i] = dd[levs[i]] end
    crosstab((levs,), counts)
end

I think this does what I want.

julia> srand(1234321)

julia> rr = randi(411, (8252,))
8252-element Int64 Array:
 269
 248
 337
 324
 180
  86
  43
 242
 143
  44
   :
 281
 311
 405
  32
 141
  64
 102
 241
 192
 371

julia> crosstab(rr)
crosstab(([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12  ...  405, 406, 407, 408, 409, 410, 411],),[24, 14, 22, 26, 23, 15, 11, 23, 14, 12  ...  31, 24, 23, 20, 17, 19, 21, 16, 20])

Now I would like to scale up to multiple arrays of the same underlying type. The first phase is to find the unique elements in each array. I did this with both a Set and a Dict. The version using a Set

function uniqueS{T}(A::AbstractArray{T}, sorted::Bool)
    ss = Set{T}()
    for a in A add(ss, a) end
    ans = [s for s in ss]
    sorted? sort!(ans): ans
end
uniqueS(col) = uniqueS(col, false)

produces an array of

julia> typeof(uniqueS(rr, true))
Array{Any,1}

whereas the version using a Dict

function uniqueD{T}(A::AbstractArray{T}, sorted::Bool)
    dd = Dict{T, Bool}()
    for a in A dd[a] = true end
    sorted? sort!(keys(dd)): keys(dd)
end
uniqueD(A) = uniqueD(A, false)

returns a Vector{T}

julia> typeof(uniqueD(rr, true))
Array{Int64,1}

It turns out that uniqueD is just a bit faster than uniqueS

@mikewojnowicz
Copy link

What packages do you have loaded? I tried running your function, but "add" was not defined.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment