Using t-SNE Transformations in FCS Express
High-dimensional single-cell technologies, such as multicolor flow cytometry, mass cytometry, and image cytometry, can measure dozens of parameters at the single-cell level. FCS Express integrates t-Distributed Stochastic Neighbor Embedding, otherwise known as t-SNE, which is a tool that allows you to map high-dimensional cytometry data onto a two dimension plot while conserving the original high-dimensional structure to help you visualize and analyze high-dimensional data.
The final result of the algorithm in FCS Express is a 2D plot in which the positions of cells reflect their proximity in their original high-dimensional space. Plots can further be colored with density or heat mapping of each parameter allowing for easy visualization of populations.
Given that t-SNE is an highly demanding algorithm, the De Novo Software 's team made a great effort to improve its speed within FCS Express.
To give you an idea of how t-SNE is performing within FCS Express 6, we have run some speed tests to show how the two methods that are used to calculate t-SNE compare with each other.
The table below shows the elapsed time (in seconds) for t-SNE calculation using the Barnes-Hut Approximation (Amount of Approximation = 0.5) with a different combination of number of considered events and number of considered parameters. In these tests, t-SNE was not estimated for unsampled events. Please refer to the Defining a t-SNE Transformation section in the software reference manual for more information about the methods and options that are available in FCS Express.
We then repeated the previous tests by enabling the Estimate t-SNE for Unsampled Events option. By enabling this option, the events that did not participate in the t-SNE calculation will be mapped to the nearest point that did participate in the t-SNE mapping. With the file used for this test, the estimation had been performed on almost half a million of events. The result of these tests can be seen in the table and scatter plots below.
The chart below shows how t-SNE in FCS Express compares to tSNE as performed in R:
1. What is t-SNE?
- t-SNE stands for T-Distributed Stochastic Neighbor Embedding.
- t-SNE is a nonlinear data reduction algorithm that takes multidimensional data and represents the original data in 2 dimensions, while preserving the original spacing of the data sets in the original high-dimensional space.
2. How does t-SNE differ when compared to other transformations, like PCA or K-Means?
- t-SNE is a nonlinear method.
- It differs from other methods in that it is focused on unrolling the native data into a three dimensional space and emphasizing the clusters found within the native high-dimensional data set.
3. What is the difference between Exact t-SNE and the Barnes - Hut approximation?
- Exact t-SNE calculates the pair wise distance between every pair of data points.
- The Barnes-Hut Approximation calculates an approximation of the Exact t-SNE method in that the Barnes-Hut method approximates the distributions. The Barnes-Hut method calculates the distance between each data point and its closest neighboring points only.
- If the Amount of Approximation is set to 0, the Barnes-Hut method is virtually identical to the Exact t-SNE method.
4. Why do my t-SNE results differ each time I run t-SNE on the same data set.?
- Because the data can be randomly initialized, the data set can produce different results with each analysis.
- However, in FCS Express 6 v6.1.0009, we introduced a method by which the user can specify a Seed Value.
- The Seed Value is a numerical constant that is used to standardize the initialization.
5. What is the Amount of Approximation?
- This is the amount of approximation that is performed with the Barnes-Hut method.
- This value can range from 0 to 1.0, where 0 is No Approximation, and 1 is the maximum amount of approximation. If the amount of approximation is set to 0, the Barnes-Hut method will be virtually identical to the Exact t-SNE method.
- In FCS Express, the default Amount of Approximation is 0.5.
6. How large of a sample size should I use to analyze my data with t-SNE?
- By default, FCS Express uses a sample size of 100. However, a user can specify a number for the subset of cells that can be included in the transformation.
- However, the larger the sample size, the more time it will take to calculate the transformation.
- The maximum number for the sample size will be the total number of cells within the specified gate of the transformation.
7. Why does t-SNE take so long to calculate?
- t-SNE is a resource-intensive algorithm because it inspects every single data point and measures the distances between every pair of points.
- The Barnes-Hut method measures the distance between every point and a subset of points.