Visualizing TSNE Maps with Three.js

yboris · on Aug 31, 2020

PSA: Try UMAP instead of t-SNE

It's much faster and usually results in better clustering / representation.

jointpdf · on Aug 31, 2020

I’m a huge fan of UMAP, but this [0] paper suggests that t-SNE can be tuned to produce UMAP-like results (the algorithms are extremely similar—you can recover t-SNE with certain UMAP parameter choices). One of the insights is to use PCA first to better preserve the global structure.

For example, see figure 9 in the paper: the plot on the left is the typical result of default t-SNE (distance between global structures not well-represented, since everything is jammed together), and the plot on the right is very UMAPish.

Basically, there are a lot of preprocessing and parameter choices involved in producing these embedding plots, so it’s advisable to try to understand the effects of these choices regardless of which algorithm you choose.

[0]: https://www.nature.com/articles/s41467-019-13056-x

tetris11 · on Aug 31, 2020

I thought UMAP's main advantage was being able to project new data without having to recompute the embedding, whereas tSNE still does - making persistent plots difficult

throwaway_2047 · on Aug 31, 2020

https://en.wikipedia.org/wiki/T-distributed_stochastic_neigh...

In case someone wondering, like me, what TSNE is. Which I still don't understand after reading

polm23 · on Aug 31, 2020

If you have a machine learning model and you want to see what things it thinks are similar, you can use TSNE to visualize that by rendering similar points close together in two or three dimensions. UMAP is another method used for similar purposes.

ArnoVW · on Aug 31, 2020

it's an algorithm for projecting data to lower dimension. I.e. you have an Excel sheet with 20.000 lines (representing customers for ex) and 200 columns (representing blood pressure, height, weight, etc).

What you want to do is "visualise" those 20.000 points in 2D or 3D so you can get an idea of how the data is distributed. So you use t-SNE to "compress" those 200 columns to 2 or 3, and you display that.

Traditionally you would use Primary Component Analysis, but that only uses linear projection, and will not be able to project data that has non-linear relationships in the distributions.

Another algorithm, sometimes more powerfull and scalable is LargeViz.

lmeyerov · on Aug 31, 2020

UMAP has largely replaced t-SNE in our toolkit as one of our top go-to viz pipelines. Unlike most examples out there, we post-process with k-nn to expose the graph of correlations over arbitrary data sets -- bank accounts fraud scores, cancer protein mutations, twitter bots, malware files, etc. -- and then investigate. Algorithms like UMAP figure out this connectivity anyways (see also: TDA), and useful for guiding subsequent explorations. If you're doing an interactive analysis, like looking at data in a Jupyter notebook, super powerful to expose that inferred connectivity and make it interactive (on-the-fly filtering, clustering, etc.) on it.

Tool-wise, we do it in a few lines over tables with many rows/columns via end-to-end GPU acceleration using https://www.RAPIDS.ai (GPU dataframes + UMAP) + Graphistry (GPU viz, which we make).

dumb1224 · on Aug 31, 2020

Do you mean principal component analysis? My naive understanding after reading the original paper is that the algorithm is training a transformation to project the high dimensional data into low dimensions by best preserving both the global and local proximity. So that the samples similar at high dim space should also be close in the low dim one. It has some assumption of the distribution of the data in low dim so it won't be a random guess. It's using the t-distribution at low dim hence the name t-SNE. Correct me if any mistakes.

jlg23 · on Aug 31, 2020

This google tech talk is a pretty gentle introduction: https://www.youtube.com/watch?v=RJVL80Gg3lA

nestorD · on Aug 31, 2020

Here is another, very good, interactive visualization of t-SNE on distill : https://distill.pub/2016/misread-tsne/

abhgh · on Sept 1, 2020

I was about to post that article myself; good write-up on the pitfalls of interpreting t-sne viz.

danaugrs · on Aug 31, 2020

I've built an implementation of t-SNE in Go (https://github.com/danaugrs/go-tsne) and really like the fact that your visualization has a short Z dimension. Very interesting effect.

jononor · on Aug 31, 2020

Here is a fun demo of TNSE, projecting MNIST digits interactively in your browser. To start it hit "iterate". https://nicola17.github.io/tfjs-tsne-demo/

joshribakoff · on Sept 1, 2020

I’d highly recommend https://github.com/react-spring/react-three-fiber it reduces a lot of threeJS boilerplate. I’m using it to visualize reactive programs in https://rx-store.github.io/rx-store/

swiley · on Aug 31, 2020

Writing an OSM map renderer in JS was the first modern JavaScript I ever wrote. I did it to prepare for my last summer internship which was some pretty intense modern JS.