Toward Degradation-Robust Voice Conversion
Authors:
Chien-yu Huang*, Kai-Wei Chang*, Hung-yi Lee
National Taiwan University, Taiwan
* These authors contributed equally.
Robust-VC Code
Table 1: VC with SE Concatenation and Denoising Training
Source speaker
VCTK - p246 (Male)
Target speaker
VCTK - p343 (Female)
Transcription
“ Rangers can expect a physical battle in the national stadium tonight. ”
Clean utterance
Degraded utterance
Source
Your browser does not support the audio tag!
Your browser does not support the audio tag!
Target
Your browser does not support the audio tag!
Your browser does not support the audio tag!
Converted Result
Scenarios
AdaIN-VC
S2VC
S2VC-W2V
Clean + baseline model
Your browser does not support the audio tag!
Your browser does not support the audio tag!
Your browser does not support the audio tag!
Degraded + baseline model
Your browser does not support the audio tag!
Your browser does not support the audio tag!
Your browser does not support the audio tag!
Degraded + DEMUCS
Your browser does not support the audio tag!
Your browser does not support the audio tag!
Your browser does not support the audio tag!
Degraded + MetricGAN+
Your browser does not support the audio tag!
Your browser does not support the audio tag!
Your browser does not support the audio tag!
Degraded + Conv-TasNet
Your browser does not support the audio tag!
Your browser does not support the audio tag!
Your browser does not support the audio tag!
Degraded + Denoising training
Your browser does not support the audio tag!
Your browser does not support the audio tag!
Your browser does not support the audio tag!
Table 2: Speech Enhancement Model Performance
Utterance 1
VCTK - p246 (Male)
“ Rangers can expect a physical battle in the national stadium tonight. ”
Utterance 2
VCTK - p343 (Male)
“ Then came the crunch. ”
Utterance 1
Utterance 2
Clean utterance
Your browser does not support the audio tag!
Your browser does not support the audio tag!
Degraded utterance
Your browser does not support the audio tag!
Your browser does not support the audio tag!
Enhancement Result
Utterance 1
Utterance 2
DEMUCS
Your browser does not support the audio tag!
Your browser does not support the audio tag!
MetricGAN+
Your browser does not support the audio tag!
Your browser does not support the audio tag!
Conv-TasNet
Your browser does not support the audio tag!
Your browser does not support the audio tag!
Table 6: VC Under Embedding Attack and Defense
Source speaker
VCTK - p246 (Male)
Target speaker
VCTK - p343 (Female)
Transcription
“ Rangers can expect a physical battle in the national stadium tonight. ”
Clean utterance
+ Adversarial Noise (Attack)
Source
Your browser does not support the audio tag!
Target
Your browser does not support the audio tag!
Your browser does not support the audio tag!
Attack and Defense Result on S2VC
Scenarios
S2VC
Clean + baseline model
Your browser does not support the audio tag!
Attack + baseline model
Your browser does not support the audio tag!
Attack + Demucs (Defense)
Your browser does not support the audio tag!
Attack + Denoising Training (Defense)
Your browser does not support the audio tag!
Attack + Denoising Training + Adversarial Training (Defense)
Your browser does not support the audio tag!
© 台大語音處理實驗室 NTU Speech Lab