Creating a new split

Before running experiments in Model Playground, you need to create a split.

To do so, please access Model Playground in the burger menu on the left and click the Create split button.

In general, we divide your data into three types of sets:

• Train set - data used to train the model;
• Validation set - data used to evaluate the model during the training process;
• Test set - data used for the final evaluation of the model.

Partitioning data is essential because otherwise the model is evaluated on the same data it was trained on. In such a case, you will get a biased metric value that would not reflect the model’s generalization capabilities.

Parameters used for creating a new split

  • Split strategy - a strategy used to divide your dataset. The list includes:
    • Random sampling - the data is randomly allocated to the Test, Train, and Validation sets according to the specified proportions. One data point is assigned only to one set without repetition;
    • Stratification - this strategy preserves the proportion of each label in each set. It may be helpful if the dataset has a class imbalance.
    • Dataset partitioning - allows to assign each dataset to a particular partition. When selecting less than 100%, the algorithm will randomly sample the data from the selected datasets. Dataset partitioning is appropriate when you have already performed the data split and uploaded the partitions in separate datasets.
  • Semi-Supervised Learning - this feature is available for Object Detection and Instance Segmentation tasks. Toggle it on to enable the feature.
  • Part of a project to be used for split creation:
    • Image Statuses - select the files based on their status;
    • Datasets - select the datasets required for your experiment;
    • Classes - include particular classes in the split (optional);
    • Tags - include particular tags in the split (optional).
  • Split percentage ratio - tweak the percentage of images to be divided into the Train, Test, and Validation datasets. The default percentages for the Train, Test, and Validation datasets are 70%, 20%, and 10%, respectively.
Please note that if you want to run experiments more quickly at the cost of statistical accuracy, you can reduce the size of your split. We recommend doing so if your project contains large datasets.
Last updated on Sep 25, 2022

Get AI confident. Start using Hasty today.

Automate 90% of the work, reduce your time to deployment by 40%, and replace your whole ML software stack with our platform.