ARKitScenes is a diverse real-world dataset for understanding 3D indoor scenes using mobile RGB-D data.

ARKitScenes is not only the first RGB-D dataset captured using a now widely available depth sensor, but also the largest collection of indoor scene understanding data ever collected. In addition to raw and processed data, ARKitScenes includes high-resolution depth maps captured with a stationary laser scanner, and manually labeled 3D oriented bounding boxes for large furniture classifications.

ARKitScenes Helper scripts are also provided for two downstream tasks: 3D object detection and RGB-D guided upsampling.

This repository contains data, scripts for visualizing and manipulating assets, and training code described in the paper.

main features

• ARKitScenes is the first RGB-D dataset captured using the widely used Apple LiDAR scanner. In addition to raw data, camera poses and surface reconstructions are provided for each scene.

• ARKitScenes is the largest indoor 3D dataset, consisting of 5,047 captures of 1,661 unique scenes.

• Provides high-quality ground truth of registered RGB-D frames and oriented bounding boxes of room-defining objects.

Below is an overview of the RGB-D dataset and its ground truth assets compared to ARKitScenes. HR and LR, representing high and low resolution, respectively, were available for a subset of 2,257 captures of 841 unique scenes.


data collection

An illustration of the iPad Pro scan setup is provided in the image below, a grid overlay to assist in data collection using the iPad Pro, an example of one of the scan patterns captured using the iPad pro, and red markers showing the selected location in the room where the laser scanner was fixed .

