How can I do pre-processing for high-dimensional data analysis?
|Data Pre-Processing aims to prepare the data for dimensionality reduction and clustering algorithms, and is a crucial part of the entire High-Dimensional data analysis workflow.
Data Pre-Processing includes many different type of steps, such as data cleaning, scaling, gating, downsampling, normalization, merging,... Some of these steps are highly recommended, others are optional. Which pre-processing steps should be run and in which order, depends on the scientific question, on the dataset and on the algorithms that will be run downstream of the pre-processing.
Creating a merged file with cleaned, gated and equal-sized downsampled data from each sample
The instructions on this page outline an example of data pre-processing pipeline.
|In this example, data will be:
When the instruction below is completed, the pipeline should resemble the example at the right.
Users can customize this workflow at their convenience (e.g. to generate a pipeline that create a cleaned-only or gated-only or downsampled-only merged file).
|(*) Important Note
Multiple cleaning algorithms and multiple downsampling methods are available in FCS Express. FlowCut and Interval Downsampling are used in this example.
If FlowAI is used as the cleaning algorithm instead of FlowCut, please carefully read the FlowAI chapter of the User Manual and fulfil all the requirements to run it properly (i.e. run it on un-compensated and linearly-scaled data).
If a Density-dependent downsampling (i.e. Target Density Downsampling or Weighted density Downsampling) is used as downsampling method, we suggest to run it on Scaled-data and not on linear data (please refer to the Scaling pipeline step).
1. Load one of the data files into the layout. This is a single file, not a merge dataset.
2. Gate on the population of interest. This is a good way to gate out unwanted events (e.g. dead cells, doublets, dump channels,...) and to identify the population of interest. The population defined at this stage will be the one exported from each sample. Please note that Data Specific gates will NOT be applied at export.
3. Create an FCS Express Pipeline (Tools > Transformation > + > Pipeline). In the root pipeline step, be sure to select No Gate as input gate, and to select all the parameters (unselected parameters will not be exported) as input parameters.
FlowCut allows the user to perform quality control on flow cytometry data to improve both manual and automated downstream analysis. For more details on FlowCut, please refer to the FlowCut chapter in the User Manual.
As suggested by the Authors of FlowCut, we will run FlowCut on Compensated and Scaled data. Compensation is usually applied by default to data when data is loaded into FCS Express (see the Layout Options chapter of the user Manual for more details) so, unless users changed this default setting, no action is needed with this regard. However, a Scaling pipeline step is still required and must indeed be created upstream the FlowCut pipeline step.
4. Create one or multiple Scaling step and scale all the parameters that have to be cleaned.
Tips: the "Suffix for Transformed Parameters" field can be used to customize the name of the scaled parameters and thus to distinguish parameters that are scaled using different Scaling steps.
5. Create a FlowCut step and set all the options as needed (please refer to the FlowCut chapter in the User Manual for more details). Scaled-parameters have to be used as input parameters for the Magic-Downsampling pipeline step. In the picture below, all non-scatter and non-time scaled-parameters are selected as input parameters for the Magic-Downsampling pipeline step.
FlowCut's Authors do not specify whether a gate can/must be applied prior to run FlowCut. In this example, we do not apply any gate prior to it, so we set the Gate dropdown menu in the root pipeline step to No Gate. Since there are no downsampling steps between the root step and the FlowCut step, FlowCut will be thus run on the whole dataset.
To gate on the population of interest defined in Step #2, a Gate Downsampling step is added downstream FlowCut.
6. Create a Gate Downsampling step and select the gate of interest. This is the gate created in Step #2.
Although a Scaling step has been already run at the very beginning of the pipeline, said scaling step was performed on the entire dataset (i.e. uncleaned and ungated dataset) and thus might not perfectly fit the cleaned and gated dataset (e.g. outliers might have biased the choice of the scale in the first scaling step). To account for this scenario, a second Scaling step can be easily added at this stage to properly scale the cleaned and gated dataset.
Note: The scaled parameters obtained by this additional scaling steps, might be used as input parameters for the high-dimensional data analysis that will be run after data merging.
7. Create a Scaling pipeline step and scale all the parameters that will be used for the downstream dimensionality reduction and clustering algorithms. The input parameters for this second Scaling step are the raw parameters, NOT the scaled-parameters generated by the first Scaling step.
Tips: the "Suffix for Transformed Parameters" field can be used to customize the name of the scaled parameters and thus to distinguish parameters generated by different Scaling steps.
In this example, an Interval Downsampling step is used.
8. Create an Interval Downsampling step and select the number of events to sample. The Gate Downsampling step used upstream will narrows down events to the gate of interest, while the Interval Downsampling steps will select the user-defined amount of data points from said gate.
Note: If a Density-dependent downsampling (i.e. Target Density Downsampling or Weighted density Downsampling) is used as downsampling method, we suggest to run it on Scaled-data and not on linear data (please refer to the Scaling pipeline step). If an additional Scaling step has been performed on cleaned-gate data (see Step #7 above), we suggest using the parameter result from that step If not, parameter resulting from Step #4 can be used.
If an additional Scaling step has been performed on cleaned-gated data (i.e. Step #7 above), the scaled-parameters generated by the first Scaling step (i.e. Scaling for FlowCut) can be removed at the end of the pipeline so that they will neither be listed in the parameter list on plots, nor be exported in the subsequent Batch Export.
8. Create a Parameter Removal step and select parameters to remove.
Note: if a second Scaling step has been added to the pipeline after the Gate Downsampling step, scaled-parameters generated by it might be appropriate for the downstream dimensionality reduction and clustering, and thus can be maintained and not removed.
Now that the pipeline is ready, we can merge the files of interest and run the pipeline on each of them on the fly during the merging process, so that only cleaned-gated-scaled-downsampled data will be merged. This can be done using the Batch Export tool.
In the Batch Export dialog, be sure to select the appropriate file format (e.g. DNS), the appropriate compensation and the pipeline created in the previous steps.
Do not select any gate in the Batch Export dialog (the gate selected in the pipeline, if any, will be automatically used).
The exported files will contain the same number of cleaned events from the gate of interest.
A merged file created via the procedure above is depicted below (10,000 events have been taken from each of the 9 files of interest).
FCS Express on Mac
Upgrading FCS Express