CVJul 4, 2024

Slice-100K: A Multimodal Dataset for Extrusion-based 3D Printing

arXiv:2407.04180v34 citationsh-index: 48Has Code
Originality Synthesis-oriented
AI Analysis

This dataset addresses a gap for researchers and practitioners in digital manufacturing, enabling development of multimodal foundation models, though it is incremental as it builds on existing datasets.

The authors tackled the lack of a large curated dataset for extrusion-based 3D printing by creating Slice-100K, a multimodal dataset of over 100,000 G-code files with CAD models and metadata, and demonstrated its utility by finetuning GPT-2 for G-code translation between formats.

G-code (Geometric code) or RS-274 is the most widely used computer numerical control (CNC) and 3D printing programming language. G-code provides machine instructions for the movement of the 3D printer, especially for the nozzle, stage, and extrusion of material for extrusion-based additive manufacturing. Currently, there does not exist a large repository of curated CAD models along with their corresponding G-code files for additive manufacturing. To address this issue, we present Slice-100K, a first-of-its-kind dataset of over 100,000 G-code files, along with their tessellated CAD model, LVIS (Large Vocabulary Instance Segmentation) categories, geometric properties, and renderings. We build our dataset from triangulated meshes derived from Objaverse-XL and Thingi10K datasets. We demonstrate the utility of this dataset by finetuning GPT-2 on a subset of the dataset for G-code translation from a legacy G-code format (Sailfish) to a more modern, widely used format (Marlin). Our dataset can be found at https://github.com/idealab-isu/Slice-100K. Slice-100K will be the first step in developing a multimodal foundation model for digital manufacturing.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes