Type your query or search by category
< All Topics

How does AI Studio calculate the distance between instances to create clusters?

What the AI Studio clustering algorithm attempts to do is basically grouping the data points together by proximity to one another. This proximity is differently computed depending on the field type.

  • For numericfields it is measured with the Euclidean distance, where the total distance from each data point to its assign centroid is minimized.
  • For categorical fields, AI Studio uses a special binary distance (0 or 1) function where:

if valA == valB  then

distance = 0

else 

distance = 1 or user-defined scale value

endif

AI Studio also assigns as the centroid the most common category of the member instances and then computes the Euclidean distance as normal.

  • For text and items fields AI Studio follows a different approach and uses cosine similarity to calculate the distance metric. The terms the algorithm picks for a centroid are the terms that minimize the average cosine distance between the centroid and the points in its neighborhood.
Table of Contents
Top