How can I do pre-processing for high-dimensional data analysis?
|Data Pre-Processing aims to prepare the data for dimensionality reduction and clustering algorithms, and is a crucial part of the entire High-Dimensional data analysis workflow.
Data Pre-Processing includes many different type of steps, such as data cleaning, scaling, gating, downsampling, normalization, merging,... Some of these steps are highly recommended, others are optional. Which pre-processing steps should be run and in which order, depends on the scientific question, on the dataset and on the algorithms that will be run downstream of the pre-processing.
Creating a merged file with cleaned, gated and equal-sized downsampled data from each sample
The instructions on this page outline an example of data pre-processing pipeline.
|In this example, data will be:
When the instruction below is completed, the pipeline should resemble the example at the right.
Users can customize this workflow at their convenience (e.g. to generate a pipeline that create a cleaned-only or gated-only or downsampled-only merged file).
|(*) Important Note
Multiple cleaning algorithms and multiple downsampling methods are available in FCS Express. FlowCut and Interval Downsampling are used in this example.
If FlowAI is used as the cleaning algorithm instead of FlowCut, please carefully read the FlowAI chapter of the User Manual and fulfil all the requirements to run it properly (i.e. run it on un-compensated and linearly-scaled data).
If a Density-dependent downsampling (i.e. Target Density Downsampling or Weighted density Downsampling) is used as downsampling method, we suggest to run it on Scaled-data and not on linear data (please refer to the Scaling pipeline step).
1. Load one of the data files into the layout. This is a single file, not a merge dataset.
2. Gate on the population of interest. This is a good way to gate out unwanted events (e.g. dead cells, doublets, dump channels,...) and to identify the population of interest. The population defined at this stage will be the one exported from each sample. Please note that Data Specific gates will NOT be applied at export.
3. Create an FCS Express Pipeline (Tools > Transformation > + > Pipeline). In the root pipeline step, be sure to select No Gate as input gate, and to select all the parameters (unselected parameters will not be exported) as input parameters.
FlowCut allows the user to perform quality control on flow cytometry data to improve both manual and automated downstream analysis. For more details on FlowCut, please refer to the FlowCut chapter in the User Manual.
As suggested by the Authors of FlowCut, we will run FlowCut on Compensated and Scaled data. Compensation is usually applied by default to data when data is loaded into FCS Express (see the Layout Options chapter of the user Manual for more details) so, unless users changed this default setting, no action is needed with this regard. However, a Scaling pipeline step is still required and must indeed be created upstream the FlowCut pipeline step.
4. Create one or multiple Scaling step and scale all the parameters that have to be cleaned.
Tips: the "Suffix for Transformed Parameters" field can be used to customize the name of the scaled parameters and thus to distinguish parameters that are scaled using different Scaling steps.
5. Create a FlowCut step and set all the options as needed (please refer to the FlowCut chapter in the User Manual for more details). Scaled-parameters have to be used as input parameters for the Magic-Downsampling pipeline step. In the picture below, all non-scatter and non-time scaled-parameters are selected as input parameters for the Magic-Downsampling pipeline step.
FlowCut's Authors do not specify whether a gate can/must be applied prior to run FlowCut. In this example, we do not apply any gate prior to it, so we set the Gate dropdown menu in the root pipeline step to No Gate. Since there are no downsampling steps between the root step and the FlowCut step, FlowCut will be thus run on the whole dataset.
To gate on the population of interest defined in Step #2, a Gate Downsampling step is added downstream FlowCut.
6. Create a Gate Downsampling step and select the gate of interest. This is the gate created in Step #2.
Although a Scaling step has been already run at the very beginning of the pipeline, said scaling step was performed on the entire dataset (i.e. uncleaned and ungated dataset) and thus might not perfectly fit the cleaned and gated dataset (e.g. outliers might have biased the choice of the scale in the first scaling step). To account for this scenario, a second Scaling step can be easily added at this stage to properly scale the cleaned and gated dataset.
Note: The scaled parameters obtained by this additional scaling steps, might be used as input parameters for the high-dimensional data analysis that will be run after data merging.
7. Create a Scaling pipeline step and scale all the parameters that will be used for the downstream dimensionality reduction and clustering algorithms. The input parameters for this second Scaling step are the raw parameters, NOT the scaled-parameters generated by the first Scaling step.
Tips: the "Suffix for Transformed Parameters" field can be used to customize the name of the scaled parameters and thus to distinguish parameters generated by different Scaling steps.
In this example, an Interval Downsampling step is used.
8. Create an Interval Downsampling step and select the number of events to sample. The Gate Downsampling step used upstream will narrows down events to the gate of interest, while the Interval Downsampling steps will select the user-defined amount of data points from said gate.
Note: If a Density-dependent downsampling (i.e. Target Density Downsampling or Weighted density Downsampling) is used as downsampling method, we suggest to run it on Scaled-data and not on linear data (please refer to the Scaling pipeline step). If an additional Scaling step has been performed on cleaned-gate data (see Step #7 above), we suggest using the parameter result from that step If not, parameter resulting from Step #4 can be used.
If an additional Scaling step has been performed on cleaned-gated data (i.e. Step #7 above), the scaled-parameters generated by the first Scaling step (i.e. Scaling for FlowCut) can be removed at the end of the pipeline so that they will neither be listed in the parameter list on plots, nor be exported in the subsequent Batch Export.
8. Create a Parameter Removal step and select parameters to remove.
Note: if a second Scaling step has been added to the pipeline after the Gate Downsampling step, scaled-parameters generated by it might be appropriate for the downstream dimensionality reduction and clustering, and thus can be maintained and not removed.
Now that the pipeline is ready, we can merge the files of interest and run the pipeline on each of them on the fly during the merging process, so that only cleaned-gated-scaled-downsampled data will be merged. This can be done using the Batch Export tool.
In the Batch Export dialog, be sure to select the appropriate file format (e.g. DNS), the appropriate compensation and the pipeline created in the previous steps.
Do not select any gate in the Batch Export dialog (the gate selected in the pipeline, if any, will be automatically used).
The exported files will contain the same number of cleaned events from the gate of interest.
A merged file created via the procedure above is depicted below (10,000 events have been taken from each of the 9 files of interest).
- Can I get more information regarding the Add-Ons that can be purchased with a license?
- Can I lock my template based on an electronic signature?
- Does FCS Express have any features to help meet 21 CFR Part 11 compliance?
- Does FCS Express have Quality Control features?
- Does FCS Express offer Single Sign On capability?
- How do I configure SQL Server to host a database for FCS Express?
- What database options are available when I purchase the Security option?
- What is the difference between the different types of Users that are available with a Security and Logging license?
- What is the difference between the Logging option and System Level Audit Trails?
- What SQL Server permissions are needed?
- Can I track usage of the internet dongle?
- Can I try out the Internet Dongle before I make a purchase?
- Can the administrator log users out?
- Do you have to be connected to the internet at all times with the Internet dongle?
- How can users be added to an internet dongle license?
- How do I activate my dongle?
- How do I change my internet dongle/site license password?
- How many people can be logged in at the same time?
- How many user accounts can I create?
- If a user left the computer running can the user log themselves out from another computer?
- What are the differences between the internet dongle and network licensing options?
- What happens if I lose my internet connection?
- What happens if the user leaves the computer without logging out?
- What happens to the users login in case of an unexpected interruption? For instance, a software crash, power failure, etc.
- Why am I receiving a message that FCS Express cannot connect to De Novo Software servers?
- Can I mix Flow, Image, and Plus site licenses? Can I mix site licenses with and without add-ons?
- How are site licenses billed?
- How do you calculate the number of site license users?
- How many people can be logged into the site license at the same time?
- How many user accounts can I create on the site license?
- Can I convert my Cytek license from the countercode licensing option to another licensing option?
- How can I claim my license purchased through BD Accuri Cytometers?
- How can I claim my license purchased through BD Biosciences?
- How can I claim my license purchased through Nexcelom Biosciences?
- How can I claim my license purchased through Sysmex-Partec GmbH?
- How can I claim the FCS Express license that came with my Cytek instrument purchase?
- Can FCS Express integrate Python scripts?
- Can I use the FlowAI script in FCS Express?
- Can I use the FlowClean R Script with FCS Express?
- How can I recreate ratiometric data acquired in FACSDiva?
- How do I use R Integration with FCS Express?
- How does FCS Express implement software compensation?
- If my data does not have a Time parameter, can I create one?
- What is compensation?
- What is the compensation workflow in FCS Express?
- Can I customize the display of my data from different instruments?
- Can I disable the live updating feature?
- How can I display all of my detectors for my Cytek data?
- How can I set FCS Express so my FCS 3.0 biexponential data looks the same as it did in the BD FACSDiva software?
- How do I display Summit data in FCS Express as it appears in the Summit Software?
- How do I fix the biexponential axes on a plot?
- How do I rescale CytoFLEX data so it displays as it did at acquisition?
- How do I update my density and contour plots created in Version 4 to use the newest color palette?
- What are resolution options?
- What is Biexponential and Hyperlog Scaling?
- What is the best way to set FCS Express to display FCS 3.0 data from FACSDiva on a 4 decade log scale?
- Where can I get more information regarding DNA analysis using the Multicycle AV?
- Why can’t I change my plot axis labels from the Name keyword to the Stain keyword?
- Why do my dot plots appear sparse and blocky?
- Why is the text on the right most label cut off my plot?
- How are statistics in FCS Express calculated compared to how they are calculated in BD FACSDiva?
- How can I display my statistical data in Scientific Notation?
- What is “Stain Index” and how do I calculate it with FCS Express?
- What is MFI (Mean or Median Fluorescence Intensity) and how do I calculate it in FCS Express?
- Why is the Geometric Mean being reported as NaN or ##ERROR##?
- Are Beckman Coulter LMD files unique?
- Can I find a support resource page for the analysis of Cytek data in FCS Express?
- How can I easily create the "filename" column in the "ExtraKeywordsTable.csv" file?
- How can I load data from the BD Accuri C6 Flow Cytometer?
- How do I change the display in my plots from one data file to another data file?
- How do I export .ICE files from Thermo Cellomics HCS Studio?
- How do I tell FCS Express what plate size to use if that information is not included in the data file?
- How do I upload files to the De Novo Software FTP site?
- How do I use BD Accuri CFlow files with Multicycle DNA analysis in FCS Express?
- What is the Elapsed Time setting in the Gallios software and how do I convert it to real time?
- Why are there sometimes access violations when I save and load files?
- Why do I get the message that a data file exported from a FACSDiva™ Experiment is invalid?
- How do I adjust the axes to display small particle data from Amnis CellStream?
- How do I choose which images and parameters to view in a Data Grid?
- How do I export/save data from IDEAS software and load it in FCS Express?
- How do I make my images in the data grid larger?
- How do I pseudo-color images in a data grid?
- How do I work with Amnis derived image cytometry data in FCS Express?
- Can I display heat maps with my Image Cytometry data?
- Can I work with data from PerkinElmer Instruments?
- Do you offer 21 CFR Part 11 compliance options for the Image Cytometry Version?
- Do you offer image segmentation or image analysis?
- How do I use CellProfiler Data with FCS Express?
- How do I use ImageJ with FCS Express?
- What file formats are compatible with FCS Express Image Cytometry?
- Where can I find Nexcelom Resources and Applications?
FCS Express on Mac
Upgrading FCS Express
- Can different versions of FCS Express exist on the same computer?
- How can I view and convert my V3 layouts to FCS Express 7?
- How do I import my version 3 security databases into newer versions of FCS Express?
- How do I update Density Plots created in Version 4?
- Is there an upgrade discount from earlier versions of FCS Express?
- Why are my density plots from V3 not displayed correctly in later versions?
- Why are there fewer outlier dots on my FCS Express 5 and later density plots than in V4?