T-SNE
Overview
- T-SNE takes a high dimensional dataset and reduces it to a lower dimensional space (usually 2 or 3 for visualization) and retains a lot of information.
Architecture
- For every point, randomly project it onto a lower dimension space (the space you’re converting to).
- The first step is to determine the similarity of all the points in the vector space.
- For every point x, calculate distance to every other point. Plot it on a normal distribution and with x as center. Project a line from the other point onto the gaussian distribution. The length of the line is called “unscaled similarity”.
- Apply softmax on top of unscaled similarity.
- Close points have high similarity values and far away points have low similarity values.
- Perplexity Parameter - expected density around each point.
- Create a matrix - remember that (i,j) similarity might not be the same as (j,i) similarity - just average the two of them.
- Calculate initial similarity matrix and then calculate the matrix at every iteration. Adjust so that the latter looks like the former.
- Instead of Gaussian distribution, use a T distribution (not tall in the middle but taller in the edges). If we use Gaussian, clusters would all be clumped in the middle and hard to see.
- T-SNE moves the point a little bit at a time and at each step optimizes the matrix to look like the initial.
References
Kaushik Rangadurai
Code. Learn. Explore