SkeletonGaussian: Editable 4D Generation through Gaussian Skeletonization

1University of Science and Technology of China
Teaser Image


SkeletonGaussian enables high-fidelity 4D generation from diverse inputs (text, image, and video) and allows for direct motion editing through explicit skeleton-driven deformation.

Abstract

4D generation has made remarkable progress in synthesizing dynamic 3D objects from input text, images, or videos. However, existing methods often represent motion as an implicit deformation field, which limits direct control and editability.

To address this, we propose SkeletonGaussian, a novel framework for generating editable, dynamic 3D Gaussians from monocular video input. Our approach introduces a hierarchical, articulated representation that decomposes motion into sparse, rigid motion explicitly driven by a skeleton and fine-grained, non-rigid motion. Concretely, we extract a robust skeleton and drive rigid motion via linear blend skinning, followed by a hexplane-based refinement for non-rigid deformations—enhancing interpretability and editability.

Experimental results show that SkeletonGaussian surpasses existing methods in generation quality while enabling intuitive motion editing, establishing a new paradigm for editable 4D generation.

Motion Editing Pipeline

Our method allows users to directly edit the skeleton of the generated 3D Gaussian, enabling real-time pose manipulation.

Generated Results

Method

Our 4D generation pipeline consists of three stages: static 3D Gaussian generation, rigid motion modeling, and non-rigid motion refinement. Any existing image-to-3D method can be used for initialization. We extract a robust skeleton using UniRig and drive rigid motion via Linear Blend Skinning (LBS). Finally, fine-grained details are modeled using a HexPlane-based deformation field. This decomposition allows for explicit control over the object's pose.

Method Overview

Results and Comparison

We compare SkeletonGaussian with state-of-the-art 4D generation methods. Our method achieves superior visual quality and motion fidelity.

Comparison Results

Additional Visualizations

More Visualizations

Front View Results

Failure Cases & Analysis

We discuss limitations such as incorrect skeleton extraction on complex topology or non-articulated objects.

Failure Case 1
Failure Case 2

BibTeX

@inproceedings{wu2026skeletongaussian,
  title={SkeletonGaussian: Editable 4D Generation through Gaussian Skeletonization},
  author={Wu, Lifan and Zhu, Ruijie and Ai, Yubo and Zhang, Tianzhu},
  booktitle={AAAI},
  year={2026}
}