Michael Chein michaelChein

Generating a Euclidean matrix using numpy's broadcasting

This is an algorithm for calculating a euclidean distance matrix between each point to each other point in a data set, written as part of a agglomerative clustering exercise, using numpy's broadcasting, but could be applied to any other calculation done on multidimensional data sets.

The basic concept is that, when adding or multiplying two vectors of sizes (m,1) and (1,m), numpy will broadcast (duplicate the vector) so that it allows the calculation. For example multiplying a vector [1,2,3,4,...10] with a transposed version of itself, will yield the multiplication table. For Example:

The same technique can be used for matrices. In this case, I was looking to generate a Euclidean distance matrix for the iris data set.

	import numpy as np
	from sklearn.datasets import load_iris

	Load data:
	data, _ = load_iris(return_X_y=True)

	def euc_matrix(A):
	# generate distance matrix:
	B = np.rot90(A[:,:,None],1,(1,2)) #[:,:,None] is needed to add a dimension
	C = np.rot90(B,1,(0,1))

	result = []
	for i,_ in enumerate(data):
	for j,_ in enumerate(data):
	result[i,j] = np.sqrt(np.sum((data[i,:]-data[j,:])**2))

	def backprop(current_layer=1):
	if current_layer is output_layer:
	∂_L = current_layer - labels
	current_layer.∆w = ∂_L * g'(Z(current_layer.nodes)) * (current_layer-1).nodes
	return ∂_L
	else:
	∂_L = backprop(current_layer+1)
	∂_L = ∂_L * g'(Z((current_layer+1).nodes)) * (current_layer+1).W
	current_layer.∆w = ∂_L * g'(Z(current_layer.nodes)) * (current_layer-1).nodes
	return ∂_L

	def train_network(network, iterations, alpha):
	for i in range(iterations):
	# forward
	for layer in network[1:]:
	layer.nodes = activation_function((layer-1).nodes @ layer.weights)
	# backward
	for layer in network.reverse(): # We iterate our network in reverse order.
	if layer is output_layer: # Calculate the loss
	∂_L = network[L].nodes - labels
	elif layer is (output_layer-1): # These are the first weights to be updated (W100 in the diagrams)