You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
aliasVegaLite,as: Vl# a helper to plot labeled datamk_data_layer=fnlabeled_data->Vl.new()|>Vl.data_from_series(x: Nx.to_flat_list(labeled_data[y: 0]),y: Nx.to_flat_list(labeled_data[y: 1]),label: Nx.to_flat_list(labeled_data[y: 2]))|>Vl.mark(:point)|>Vl.encode_field(:x,"x",type: :quantitative,title: "X")|>Vl.encode_field(:y,"y",type: :quantitative,title: "Y")|>Vl.encode_field(:color,"label",type: :nominal)endVl.new(title: "Raw Data w/ True Labels",width: 700,height: 700)|>Vl.layers([mk_data_layer.(labeled)])
Clustering - Initialization
k=2# the unlabeled datadata=labeled[y: 0..1]# calculate initial centroids randomly uniformly in the space that the data spansinitial_centroids=0..(n_dims-1)|>Enum.reduce(nil,fnix,acc->pos=[x_min+(x_max-x_min)*:rand.uniform(),y_min+(y_max-y_min)*:rand.uniform()]caseaccdonil->Nx.tensor([pos++[ix]],names: [:x,:y])_->Nx.concatenate([acc,Nx.tensor([pos++[ix]])])endend)
# helper function to calculate the distance from data to centroids (unlabeled)dist_fn=fnd,centroids->c=Nx.new_axis(centroids,1)Nx.subtract(d,c)|>Nx.power(2)|>Nx.sum(axes: [2])|>Nx.sqrt()end# hepler function to find labelsfind_labels=fnd,centroids->dist_fn.(d,centroids)|>Nx.argmin(axis: 0)endnew_labels=find_labels.(data,initial_centroids[y: 0..(n_dims-1)])alg_labeled=Nx.concatenate([data,Nx.new_axis(new_labels,1)],axis: 1)
Vl.new(title: "Result of First Iteration",width: 700,height: 700)|>Vl.layers([mk_data_layer.(alg_labeled),mk_centroid_layer.(new_centroids)])
Clustering - N Iterations
n_iters=10# rename some variablescentroids=new_centroidslabels=new_labels{final_centroids,final_labels}=Enum.reduce(1..n_iters,{centroids,labels},fn_ix,{pvs_centroids,pvs_labels}->new_centroids=calc_centroids_map.(data,pvs_labels,pvs_centroids)new_centroids=label_centroids.(new_centroids)new_labels=find_labels.(data,new_centroids[y: 0..(n_dims-1)]){new_centroids,new_labels}end)
alg_labeled=Nx.concatenate([data,Nx.new_axis(final_labels,1)],axis: 1)true_labels_layer=Vl.new()|>Vl.data_from_series(x: Nx.to_flat_list(labeled[y: 0]),y: Nx.to_flat_list(labeled[y: 1]),label: Nx.to_flat_list(labeled[y: 2]))|>Vl.mark(:point,size: 200)|>Vl.encode_field(:x,"x",type: :quantitative,title: "X")|>Vl.encode_field(:y,"y",type: :quantitative,title: "Y")|>Vl.encode_field(:color,"label",type: :nominal)Vl.new(title: "Result of N Iterations",width: 700,height: 700)|>Vl.layers([mk_data_layer.(alg_labeled),true_labels_layer,mk_centroid_layer.(final_centroids)])
Thanks @polvalente ! This all makes sense. Re: (3), the outer circle is actually a "true" label since I generated the data at the outset from two distributions and the color corresponds to which distribution. It's a little contrived, but it was a helpful comparison for me to see if the algorithm was doing what I thought it should.
Thanks @polvalente ! This all makes sense. Re: (3), the outer circle is actually a "true" label since I generated the data at the outset from two distributions and the color corresponds to which distribution. It's a little contrived, but it was a helpful comparison for me to see if the algorithm was doing what I thought it should.