Home
Scholarly Works
Discrete diffusion model with contrastive learning...
Journal article

Discrete diffusion model with contrastive learning for music to natural and long dance generation

Abstract

With the deep integration of culture and technology, the digital research on cultural content like music and dance is constantly evolving. This paper focuses on the research of music and dance generation, aiming to boost cultural dissemination. The core challenge of this task is to generate natural dance sequences that align with the duration of the provided music. Therefore, we propose a discrete diffusion model with contrastive learning. First, the dance VQ-VAE model is introduced and pre-trained to learn the mapping relationship between dance data and discrete token sequences. Second, the Music-conditional Contrast Learning loss is designed to enhance the training of the discrete diffusion model, enabling it to predict discrete token sequences conditioned on musical features. Subsequently, the discrete token sequences are decoded into dance sequences with the dance VQ-VAE decoder. Finally, the temporal consistency between multiple sequences is enhanced by implementing time constraints to generate long dance sequences.

Authors

Wang H; Jiang Y; Zhou X; Jiang W

Journal

Heritage Science, Vol. 13, No. 1,

Publisher

Springer Nature

Publication Date

April 17, 2025

DOI

10.1038/s40494-025-01668-0

ISSN

3059-3220
View published work (Non-McMaster Users)

Contact the Experts team