DEMO

Improving conversion and attack capability for any-to-many voice conversion with post-processing

Authors:

Ziyi Chen, Hua Hua, Yuxiang Zhang, Ming Li, Pengyuan Zhang

Address:

Key Laboratory of Speech Acoustics & Content Understanding, Institute of Acoustics, Chinese Academy of Sciences, China

University of Chinese Academy of Sciences, Beijing, China

Data Science Research Center, Duke Kunshan University, Kunshan, China

Email:

chenziyi@hccl.ioa.ac.cn

1. Audio examples of the target speaker. Baker represents audios in the databaker open source mandarin dataset. MM represents audios of the male speaker in M2VoC Dev set and MF represents audios of the female speaker in M2VoC Dev set.

baker:

	Example 11	Example 12	Example 13	Example 14
baker

MM:

	Example 21	Example 22	Example 23	Example 24
MM

MF:

	Example 31	Example 32	Example 33	Example 34
MF

2. Voice conversion result. GT represents the ground truth. ppgfast represents conformer ppg + modified fastspeech + hifigan. ppgtaco represents conformer ppg + modified tacotron2 + hifigan. proposed represents proposed method.

baker :

	Chat style utterance 1	Chat style utterance 2	Narrative style utterance 1	Narrative style utterance 2
GT
ppgfast
ppgtaco
proposed

MM :

	Chat style utterance 1	Chat style utterance 2	Narrative style utterance 1	Narrative style utterance 2
GT
ppgfast
ppgtaco
proposed

MF :

	Chat style utterance 1	Chat style utterance 2	Narrative style utterance 1	Narrative style utterance 2
GT
ppgfast
ppgtaco
proposed