Audio Samples of the paper "Laughter Synthesis using Pseudo Phonetic Tokens with a Large-scale In-the-wild Laughter Corpus"

Authors: Detai Xin, Shinnosuke Takamichi, Ai Morimatsu, Hiroshi Saruwatari

The University of Tokyo, Tokyo, Japan

Accetped by INTERSPEECH 2023

[Paper] [Corpus] [Code]

Contents

 

Abstract:

We present a large-scale in-the-wild Japanese laughter corpus and a laughter synthesis method. Previous work on laughter synthesis lacks not only data but also proper ways to represent laughter. To solve these problems, we first propose an in-the-wild corpus comprising 3.5 hours of laughter, which is to our best knowledge the largest laughter corpus designed for laughter synthesis. We then propose pseudo phonetic tokens (PPTs) to represent laughter by a sequence of discrete tokens, which are obtained by training a clustering model on features extracted from laughter by a pretrained self-supervised model. Laughter can then be synthesized by feeding PPTs into a text-to-speech system. We further show PPTs can be used to train a language model for unconditional laughter generation. Results of comprehensive subjective and objective evaluations demonstrate that the proposed method significantly outperforms a baseline method, and can generate natural laughter unconditionally.

You can find more samples at here

Laughter synthesis

Samples in this section were synthesized from existing PPTs(discrete codes)/phonemes of the test set, but were not used during training

Ground Truth HiFi-GAN L5 L8 L12 baseline

Unconditional laughter generation

To synthesize non-existing laughter, we trained a language model on PPTs and sample laughter PPTs unconditionally.

The Ground Truth samples in this section are only used for reference, their content is, and should be, different from the synthesized one.

Ground Truth L5 L8 L12