What type models does AI Studio work with?
AI Studio supports many proven Machine Learning techniques including regression models, classification, clustering, time series forecasting and anomalies or Anomaly Detection (Isolation Forest), associations or Association Discovery.
Known as the workhorse of Machine Learning algorithms, logistic regression is a supervised Machine Learning method for solving classification problems. Available through AI Studio Canvas and API, it seeks to learn the coefficient values from the training data using non-linear optimization techniques. It commonly serves as a benchmark model for other techniques due to its simplicity and fast training speed.
You can create a logistic regression model by selecting which fields from your dataset you want to use as input fields (or predictors) and which categorical field you want to predict (the objective field). You can access the documentation on how to use Logistic Regression with AI Studio Canvas
Time Series is a supervised ML method for analyzing time-based data when historical patterns can explain future behavior. It is commonly used for predicting churn, sales forecasting, web traffic, production, inventory analysis and many other high-value use cases.
Clustering splits your data into several similar groups or clusters to better analyze, explore and filter your data. You can use it before training a model if you like (where membership in each cluster can become another input field) or simply to cluster your data to have a different overview. This type of modeling can handle several strategies for missing values.
You can select different clustering techniques: K-means, when you need to specify in advance the number of clusters to be found, and G-means, when AI Studio learns the number of different clusters by iteratively taking existing clusters and testing whether the cluster’s neighborhood appears Gaussian in its distribution.
Anomalies or Anomaly Detector:
The Anomaly Detector identifies the instances within a dataset that do not conform to an expected pattern. AI Studio uses the Isolation Forest algorithm to detect anomalies. This algorithm uses an ensemble of randomized trees to generate anomaly scores. The basic idea behind is to overfit decision tree models and generate an anomaly score based on how many splits are needed to isolate an instance from the rest of the data points. As such, this algorithm does not need labeled data as some less versatile anomaly detection methods require.
Anomaly Detectors, also called anomalies, are scalable, competitive, and almost parameter-free. They can handle missing data and categorical fields, and explain which fields contributed most to an anomaly. There is no data rescaling needed nor distance metric required.
Associations or Association Discovery:
AI Studio is the first Machine Learning service offering Association Discovery in the cloud. Association Discovery, also called associations, is a well-known method to find interesting associations between values, rather than variables, in high-dimensional datasets. It can discover meaningful relations among values across thousands of variables, which traditional statistical methods cannot deal with. Association Discovery is commonly used for a wide variety of purposes such as market basket analysis, web usage patterns, intrusion detection, fraud detection, or bioinformatics, to analyze public genomic and proteomic databases among others.
Models or Decision Tree Models:
The AI Studio binary decision tree models predict the value of the target field based on the input fields. You can use it for classification and regression problems. Each tree node tries to split the data in the most optimal way so that the classification splits maximize information gain and regression splits minimize squared errors. For text fields, each word is treated as a separate value in essence becoming tokens.
The main advantage of the AI Studio decision tree models is that they are very easy-to-understand compared to other Machine Learning techniques. Decision tree models express human readable rules that can be exported to make new predictions. They can handle redundant or irrelevant variables and can offer multiple strategies to handle missing data. Furthermore, due to the simplicity of these models they are easy to tune up.
An ensemble combines several individual models built out of different subsamples of your data. Ensembles are a robust method that usually reduces overfitting and increase model performance. Random Decision Forest is among the top tier performers of all Machine Learning algorithms.
Currently, you can build ensembles following three basic Machine Learning techniques: Bagging, Random Decision Forests, and Boosting. Please read the ensembles chapter of this document to see how you can employ these powerful techniques through AI Studio Canvas