How can I do pre-processing for high-dimensional data analysis?

Data Pre-Processing aims to prepare the data for dimensionality reduction and clustering algorithms, and is a crucial part of the entire High-Dimensional data analysis workflow.

Data Pre-Processing includes many different type of steps, such as data cleaning, scaling, gating, downsampling, normalization, merging,... Some of these steps are highly recommended, others are optional. Which pre-processing steps should be run and in which order, depends on the scientific question, on the dataset and on the algorithms that will be run downstream of the pre-processing.

Creating a merged file with cleaned, gated and equal-sized downsampled data from each sample

The instructions on this page outline an example of data pre-processing pipeline.

In this example, data will be:

  • Cleaned with FlowCut (*)
  • Gated on a gate of interest
  • Downsampled using Interval downsampling (*)
  • Merged

When the instruction below is completed, the pipeline should resemble the example at the right.

Users can customize this workflow at their convenience (e.g. to generate a pipeline that create a cleaned-only or gated-only or downsampled-only merged file).

(*) Important Note

Multiple cleaning algorithms and multiple downsampling methods are available in FCS Express. FlowCut and Interval Downsampling are used in this example.

If FlowAI is used as the cleaning algorithm instead of FlowCut, please carefully read the FlowAI chapter of the User Manual and fulfill all the requirements to run it properly (i.e. run it on un-compensated and linearly-scaled data).

If a Density-dependent downsampling (i.e. Target Density Downsampling or Weighted density Downsampling) is used as downsampling method, we suggest to run it on Scaled-data and not on linear data (please refer to the Scaling pipeline step).

 

Let's start!

Previous How can I create a merged data with equally-sized downsampled samples?
Next How can I explore tSNE/UMAP plots?