Abstract

This work explores 2D Gaussian splats as a new primitive for representing videos. We propose GSVC, an approach to learning a set of 2D Gaussian splats that can effectively represent and compress video frames.

GSVC incorporates the following techniques: (i) To exploit temporal redundancy among adjacent frames, which can speed up training and improve the compression efficiency, we predict the Gaussian splats of a frame based on its previous frame; (ii) To control the trade-offs between file size and quality, we remove Gaussian splats with low contribution to the video quality; (iii) To capture dynamics in videos, we randomly add Gaussian splats to fit content with large motion or newly-appeared objects; (iv) To handle significant changes in the scene, we detect key frames based on loss differences during the learning process.

Experiment results show that GSVC achieves good rate-distortion trade-offs, comparable to state-of-the-art video codecs such as AV1 and HEVC, and a rendering speed of 1500 fps for a 1920x1080 video.

Gaussian Center Distribution

Gaussian Center Distribution of Beauty

Gaussian Center Distribution of HoneyBee

Gaussian Center Distribution of Jockey

Comparison Results

Comparison with I-frame Only Method

GSVC can achieve a significant improvement over the I-frame only method (Ours-GI), which use GaussianImage to encode each frame independently, demonstrating the contributions of predictive frames, pruning, augmentation, and key-frame detections brought to the representation.

Notice the floating dots in the up-left video for I-frame only method.

◄ ►

I-frame Only GSVC (Ours)

Comparison with Neural-based Methods

The neural-based approach achieves significant improvements over GSVC and even some state-of-the-art codecs such as VVC in terms of MS-SSIM.

These neural-based approaches, however, are pre-trained on the UVG dataset, and thus the superior performance is not surprising. The standard MPEG/H26x video codec and GSVC do not require any pre-training and can generalize well to any video.

◄ ►

DCVC-DC GSVC (Ours)

Comparison with Standardized Video Codec

GSVC achieves better or comparable performance in terms of MS-SSIM and VMAF (which consider image structure and human perception) except VVC. Our experiments on GSVC yield a lower PSNR in certain scenarios. But, it is important to note that we use \(L_2\) as a loss function, and we can tune the convergence condition to further improve the PSNR, without increasing \(N\), if we sacrifice the encoding time.

◄ ►

AVC GSVC (Ours)

◄ ►

HEVC GSVC (Ours)

◄ ►

VVC GSVC (Ours)

◄ ►

AV1 GSVC (Ours)

BibTeX

@article{wang2025gsvc,
      title={{GSVC}: Efficient Video Representation and Compression Through {2D Gaussian} Splatting},
      author={Wang, Longan and Shi, Yuang and Ooi, Wei Tsang},
      journal={arXiv preprint arXiv:2501.12060},
      year={2025}
}

GSVC:

Efficient Video Representation and Compression Through 2D Gaussian Splatting

Rendered Gaussian-based Video by GSVC

TL;DR: We introduce GSVC, an approach to learning a set of 2D Gaussian splats that can effectively represent and compress video frames.

Abstract