What is T-SNE?
High-dimensional single-cell technologies, such as multicolor flow cytometry, mass cytometry, and image cytometry, can measure dozens of parameters at the single-cell level. FCS Express integrates t-Distributed Stochastic Neighbor Embedding, otherwise known as t-SNE, which is a tool that allows you to map high-dimensional cytometry data onto a two dimension plot while conserving the original high-dimensional structure to help you visualize and analyze high-dimensional data.
Using t-SNE Transformations in FCS Express
The final result of the algorithm in FCS Express is a 2D plot in which the positions of cells reflect their proximity in their original high-dimensional space. Plots can further be colored with density or heat mapping of each parameter, allowing for easy visualization of populations.
Given that t-SNE is an highly demanding algorithm, the De Novo Software 's team made a great effort to improve its speed within FCS Express.
To give you an idea of how t-SNE is performing within FCS Express, we have run some speed tests to show how the two methods that are used to calculate t-SNE compare with each other in version 6.
The table below shows the elapsed time (in seconds) for t-SNE calculation using the Barnes-Hut Approximation (Amount of Approximation = 0.5)with a different combination of number of considered events and number of considered parameters. In these tests, t-SNE was not estimated for unsampled events. Please refer to the Defining a t-SNE Transformation section in the software reference manual for more information about the methods and options that are available in FCS Express.
We then repeated the previous tests by enabling the Estimate t-SNE for Unsampled Events option. By enabling this option, the events that did not participate in the t-SNE calculation will be mapped to the nearest point that did participate in the t-SNE mapping. With the file used for this test, the estimation had been performed on almost half a million of events. The result of these tests can be seen in the table and scatter plots below.
The chart below shows how t-SNE in FCS Express compares to tSNE as performed in R:
- t-SNE stands for T-Distributed Stochastic Neighbor Embedding.
- t-SNE is a nonlinear data reduction algorithm that takes multidimensional data and represents the original data in two dimensions, while preserving the original spacing of the data sets in the original high-dimensional space.
- For a better understanding and more effective use of t-SNE, please click here for an excellent overview.
- t-SNE is a nonlinear method.
- It differs from other methods in that it is focused on unrolling the native data into a three dimensional space and emphasizing the clusters found within the native high-dimensional data set.
- Exact t-SNE calculates the pair-wise distance between every pair of data points.
- The Barnes-Hut Approximation calculates an approximation of the Exact t-SNE method in that the Barnes-Hut method approximates the distributions. The Barnes-Hut method calculates the distance between each data point and its closest neighboring points only.
- If the Amount of Approximation is set to 0, the Barnes-Hut method is virtually identical to the Exact t-SNE method.
- Because the data can be randomly initialized, the data set can produce different results with each analysis.
- However, FCS Express introduced a method by which the user can specify a Seed Value.
- The Seed Value is a numerical constant that is used to standardize the initialization.
- This is the amount of approximation that is performed with the Barnes-Hut method.
- This value can range from 0 to 1.0, where 0 is No Approximation, and 1 is the maximum amount of approximation. If the amount of approximation is set to 0, the Barnes-Hut method will be virtually identical to the Exact t-SNE method.
- In FCS Express, the default Amount of Approximation is 0.5.
- Perplexity is a measure for information. In t-SNE, the perplexity is used to sets the number of nearest neighbors considered.
- In t-SNE, typical values for the perplexity range between 5 and 50.
- By default, FCS Express uses a sample size of 3000. However, a user can specify any number for the subset of cells to be included in the transformation.
- However, the larger the sample size, the more time it will take to calculate the transformation.
- The maximum number for the sample size will be the total number of cells within the specified gate of the transformation.
- t-SNE is a resource-intensive algorithm because it inspects every single data point and measures the distances between every pair of points.
- The Barnes-Hut method measures the distance between every point and a subset of points.
- Currently, the result of a t-SNE transformation is not saved within the layout or with the .FCS file itself and instead is calculated automatically or manually on demand when opening a layout.
- If your t-SNE transformation is computation intensive we suggest to export the tSNE X and tSNE Y parameters with your original data file as a new FCS file. The resulting file can then be accessed and examined quickly and easily without recalculating the transformation. The new tSNE X and tSNE Y parameters may be easily exported via the Export Data dialog. We also recommend to saving the original layout in which the transformation was created so you can easily review the settings used in the transformation at any time.
- Github by Laurens van der Maaten, co-author of original implementation.
- Distill article discussing the effective use of t-SNE
Learn more about tSNE in FCS Express via a recorded webinar.
FCS Express on Mac
Upgrading FCS Express