Type your query or search by category
< All Topics

What is a dataset?

Datasets are the collection of data required to build your AI Studio Pipelines and the starting point for any modeling procedure. A dataset in AI Studio is a structured version of your data. AI Studio computes both general statistics for the dataset and individual statistics per field. For the general statistics, AI Studio provides the count of valid instances, the missing values, and errors. For each field of your dataset, AI Studio analyzes and computes its minimum, mean, median, maximum, standard deviation, kurtosis, skewness, terms count, among others. The statistics provided per field differ for each type of field (numeric, categorical, text and items, date-time).

The main goal of datasets is enabling effective wrangling of your data, so you can build the right AI Studio model for your problem. This is a key step to ultimately achieve the best results for your Machine Learning Pipelines.

Datasets can be built from an existing:

  • Source
  • Dataset, to sample your data
  • Dataset, to split it
  • Dataset, to filter it
  • Dataset, to extend it

To learn more about datasets, please:

Watch this video to see how you can build AI use cases with your data sets on HyperSense.

Table of Contents