#Visualizing Large-scale and High-dimensional Data
- LargeVis - a technique to visualize large-scale and high-dimensional data in low-dimensional space.
- Problem relates to both information visualization and machine learning (and data mining) domain.
- Link to the paper
- State of the art method for visualization problem.
- Start by constructing a similarity structure from the data and then project the structure to 2/3 dimensional space.
- An accelerated version proposed that uses a K-nearest Neighbor (KNN) graph as the similarity structure.
- Limitations
- Computational cost of constructing KNN graph for high-dimensional data.
- Efficiency of graph visualization techniques breaks down for large data (O(NlogN) complexity).
- Parameters are sensitive to the dataset.
- Constructs KNN graph (in a more efficient manner as compared to t-SNE).
- Uses a principled probabilistic model for graph visualization.
- Let us say the input is in the form of N datapoints in d dimensional space.
- Random projection tree used for nearest-neighborhood search in the high-dimensional space with euclidean distance as metric of distance.
- Tree is constructed by partitioning the entire space and the nodes in the tree correspond to subspaces.
- For every non-leaf node of the tree, select a random hyperplane that splits the subspace (corresponding to the leaf node) into two children.
- This is done till the number of nodes in the subspace reaches a threshold.
- Once the tree is constructed, each data point gets assigned a leaf node and points in the subspace of the leaf node are the candidate nearest neighbors of that datapoint.
- For high accuracy, a large number of trees should be constructed (which increases the computational cost).
- The paper counters this bottleneck by using the idea "a neighbor of my neighbor is also my neighbor" to increase the accuracy of the constructed graph.
- Basically construct an approximate KNN graph using random projection trees and then for each node, search its neighbor's neighbors as well.
- Edges are assigned weights just like in t-SNE.
- Given a pair of vertices, the probability of observing an edge between them is given as a function of the distance between the projection of the pair of vertices in the lower dimensional space.
- The probability of observing an edge with weight w is given as wth power of probability of observing an edge.
- Given a weighted graph, G, the likelihood of the graph can be modeled as the likelihood of observed edges plus the likelihood of negative edges (vertex pairs without edges).
- To simplify the objective function, some negative edges are sampled corresponding to each vertex using a noisy distribution.
- The edges are sampled with probability proportional to their weight and then treated as binary edges.
- The resulting objective function can be optimized using asynchronous stochastic gradient descent (very effective on sparse graphs).
- The overall complexity is O(sMN), s is the dimension of lower dimensional space, M is the number of negative samples and N is the number of vertices.
- Data is first preprocessed with embedding learning techniques like SkipGram and LINE and brought down to 100-dimensional space.
- One limitation is that the data is preprocessed to reduce the number of dimensions to 100. This seems to go against the claim of scaling for hundreds of dimensions.
- KNN construction is fastest for LargeVis followed by random projection trees, NN Descent, and vantage point trees (used in t-SNE).
- LargeVis requires very few iterations to create highly accurate KNN graphs.
- KNN Classifier is used to measure the accuracy of graph visualization quantitatively.
- Compared to t-SNE, LargeVis is much more stable with respect to learning rate across datasets.
- LargeVis benefits from its linear complexity (as compared to t-SNE's log linear complexity) and for default learning rate, outperforms t-SNE for larger datasets.