metadata

comments: true
description: >-
  Explore various computer vision datasets supported by Ultralytics for object
  detection, segmentation, pose estimation, image classification, and
  multi-object tracking.
keywords: >-
  computer vision, datasets, Ultralytics, YOLO, object detection, instance
  segmentation, pose estimation, image classification, multi-object tracking

Datasets Overview

Ultralytics provides support for various datasets to facilitate computer vision tasks such as detection, instance segmentation, pose estimation, classification, and multi-object tracking. Below is a list of the main Ultralytics datasets, followed by a summary of each computer vision task and the respective datasets.

Detection Datasets

Bounding box object detection is a computer vision technique that involves detecting and localizing objects in an image by drawing a bounding box around each object.

Argoverse: A dataset containing 3D tracking and motion forecasting data from urban environments with rich annotations.
COCO: A large-scale dataset designed for object detection, segmentation, and captioning with over 200K labeled images.
COCO8: Contains the first 4 images from COCO train and COCO val, suitable for quick tests.
Global Wheat 2020: A dataset of wheat head images collected from around the world for object detection and localization tasks.
Objects365: A high-quality, large-scale dataset for object detection with 365 object categories and over 600K annotated images.
OpenImagesV7: A comprehensive dataset by Google with 1.7M train images and 42k validation images.
SKU-110K: A dataset featuring dense object detection in retail environments with over 11K images and 1.7 million bounding boxes.
VisDrone: A dataset containing object detection and multi-object tracking data from drone-captured imagery with over 10K images and video sequences.
VOC: The Pascal Visual Object Classes (VOC) dataset for object detection and segmentation with 20 object classes and over 11K images.
xView: A dataset for object detection in overhead imagery with 60 object categories and over 1 million annotated objects.

Instance Segmentation Datasets

Instance segmentation is a computer vision technique that involves identifying and localizing objects in an image at the pixel level.

COCO: A large-scale dataset designed for object detection, segmentation, and captioning tasks with over 200K labeled images.
COCO8-seg: A smaller dataset for instance segmentation tasks, containing a subset of 8 COCO images with segmentation annotations.

Pose Estimation

Pose estimation is a technique used to determine the pose of the object relative to the camera or the world coordinate system.

COCO: A large-scale dataset with human pose annotations designed for pose estimation tasks.
COCO8-pose: A smaller dataset for pose estimation tasks, containing a subset of 8 COCO images with human pose annotations.

Classification

Image classification is a computer vision task that involves categorizing an image into one or more predefined classes or categories based on its visual content.

Caltech 101: A dataset containing images of 101 object categories for image classification tasks.
Caltech 256: An extended version of Caltech 101 with 256 object categories and more challenging images.
CIFAR-10: A dataset of 60K 32x32 color images in 10 classes, with 6K images per class.
CIFAR-100: An extended version of CIFAR-10 with 100 object categories and 600 images per class.
Fashion-MNIST: A dataset consisting of 70,000 grayscale images of 10 fashion categories for image classification tasks.
ImageNet: A large-scale dataset for object detection and image classification with over 14 million images and 20,000 categories.
ImageNet-10: A smaller subset of ImageNet with 10 categories for faster experimentation and testing.
Imagenette: A smaller subset of ImageNet that contains 10 easily distinguishable classes for quicker training and testing.
Imagewoof: A more challenging subset of ImageNet containing 10 dog breed categories for image classification tasks.
MNIST: A dataset of 70,000 grayscale images of handwritten digits for image classification tasks.

Oriented Bounding Boxes (OBB)

Oriented Bounding Boxes (OBB) is a method in computer vision for detecting angled objects in images using rotated bounding boxes, often applied to aerial and satellite imagery.

DOTAv2: A popular OBB aerial imagery dataset with 1.7 million instances and 11,268 images.

Multi-Object Tracking

Multi-object tracking is a computer vision technique that involves detecting and tracking multiple objects over time in a video sequence.

Argoverse: A dataset containing 3D tracking and motion forecasting data from urban environments with rich annotations for multi-object tracking tasks.
VisDrone: A dataset containing object detection and multi-object tracking data from drone-captured imagery with over 10K images and video sequences.