GSVC:

Efficient Video Representation and Compression Through 2D Gaussian Splatting


National University of Singapore,
The 35th Workshop on Network and Operating System Support for Digital Audio and Video (NOSSDAV'25).

Rendered Gaussian-based Video by GSVC



TL;DR: We introduce GSVC, an approach to learning a set of 2D Gaussian splats that can effectively represent and compress video frames.

Abstract


This work explores 2D Gaussian splats as a new primitive for representing videos. We propose GSVC, an approach to learning a set of 2D Gaussian splats that can effectively represent and compress video frames.

GSVC incorporates the following techniques: (i) To exploit temporal redundancy among adjacent frames, which can speed up training and improve the compression efficiency, we predict the Gaussian splats of a frame based on its previous frame; (ii) To control the trade-offs between file size and quality, we remove Gaussian splats with low contribution to the video quality; (iii) To capture dynamics in videos, we randomly add Gaussian splats to fit content with large motion or newly-appeared objects; (iv) To handle significant changes in the scene, we detect key frames based on loss differences during the learning process.

Experiment results show that GSVC achieves good rate-distortion trade-offs, comparable to state-of-the-art video codecs such as AV1 and HEVC, and a rendering speed of 1500 fps for a 1920x1080 video.

Gaussian Center Distribution


Comparison Results


Comparison with I-frame Only Method

GSVC can achieve a significant improvement over the I-frame only method (Ours-GI), which use GaussianImage to encode each frame independently, demonstrating the contributions of predictive frames, pruning, augmentation, and key-frame detections brought to the representation.

Notice the floating dots in the up-left video for I-frame only method.

I-frame Only GSVC (Ours)

Comparison with Neural-based Methods

The neural-based approach achieves significant improvements over GSVC and even some state-of-the-art codecs such as VVC in terms of MS-SSIM.

These neural-based approaches, however, are pre-trained on the UVG dataset, and thus the superior performance is not surprising. The standard MPEG/H26x video codec and GSVC do not require any pre-training and can generalize well to any video.

DCVC-DC GSVC (Ours)

Comparison with Standardized Video Codec

GSVC achieves better or comparable performance in terms of MS-SSIM and VMAF (which consider image structure and human perception) except VVC. Our experiments on GSVC yield a lower PSNR in certain scenarios. But, it is important to note that we use \(L_2\) as a loss function, and we can tune the convergence condition to further improve the PSNR, without increasing \(N\), if we sacrifice the encoding time.

AVC GSVC (Ours)
HEVC GSVC (Ours)
VVC GSVC (Ours)
AV1 GSVC (Ours)

BibTeX

@article{wang2025gsvc,
      title={{GSVC}: Efficient Video Representation and Compression Through {2D Gaussian} Splatting},
      author={Wang, Longan and Shi, Yuang and Ooi, Wei Tsang},
      journal={arXiv preprint arXiv:2501.12060},
      year={2025}
}