T-SNE

Nov 09, 2019 clustering Comments

T-SNE takes a high dimensional dataset and reduces it to a lower dimensional space (usually 2 or 3 for visualization) and retains a lot of information.

For every point, randomly project it onto a lower dimension space (the space you’re converting to).
The first step is to determine the similarity of all the points in the vector space.
For every point x, calculate distance to every other point. Plot it on a normal distribution and with x as center. Project a line from the other point onto the gaussian distribution. The length of the line is called “unscaled similarity”.
Apply softmax on top of unscaled similarity.
Close points have high similarity values and far away points have low similarity values.
Perplexity Parameter - expected density around each point.
Create a matrix - remember that (i,j) similarity might not be the same as (j,i) similarity - just average the two of them.
Calculate initial similarity matrix and then calculate the matrix at every iteration. Adjust so that the latter looks like the former.
Instead of Gaussian distribution, use a T distribution (not tall in the middle but taller in the edges). If we use Gaussian, clusters would all be clumped in the middle and hard to see.
T-SNE moves the point a little bit at a time and at each step optimizes the matrix to look like the initial.

Code. Learn. Explore