T-SNE

Overview

• T-SNE takes a high dimensional dataset and reduces it to a lower dimensional space (usually 2 or 3 for visualization) and retains a lot of information.

Architecture

• For every point, randomly project it onto a lower dimension space (the space you’re converting to).
• The first step is to determine the similarity of all the points in the vector space.
• For every point x, calculate distance to every other point. Plot it on a normal distribution and with x as center. Project a line from the other point onto the gaussian distribution. The length of the line is called “unscaled similarity”.
• Apply softmax on top of unscaled similarity.
• Close points have high similarity values and far away points have low similarity values.
• Perplexity Parameter - expected density around each point.
• Create a matrix - remember that (i,j) similarity might not be the same as (j,i) similarity - just average the two of them.
• Calculate initial similarity matrix and then calculate the matrix at every iteration. Adjust so that the latter looks like the former.
• Instead of Gaussian distribution, use a T distribution (not tall in the middle but taller in the edges). If we use Gaussian, clusters would all be clumped in the middle and hard to see.
• T-SNE moves the point a little bit at a time and at each step optimizes the matrix to look like the initial.