When should I use clusters?

If dividing your dataset into several groups may help you, then clustering is what you need. Clusters are used to split your data into similar groups to better analyze and explore those groups individually, and/or filter your data before training a model. These groups are calculated according to a distance measure between the instances. Each cluster is represented by a centroid computed using the mean for each numeric field and the mode for each categorical field.

Sample use cases include:

  • Unusual Instance Discovery or Item Discovery
  • Fraud Detection
  • Identifying Incorrect Data
  • Removing Outliers
  • Customer and Market Segmentation
  • Portfolio Management
  • Active Learning for Disease Diagnoses
