-
Notifications
You must be signed in to change notification settings - Fork 1.9k
【Hackathon 8th No.9】在 PaddleSpeech 中复现 DAC 训练需要用到的 loss #3988
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks for your contribution! |
paddlespeech/t2s/modules/losses.py
Outdated
"DDSP: Differentiable Digital Signal Processing." | ||
International Conference on Learning Representations. 2019. | ||
|
||
Implementation copied from: https://github.com/descriptinc/lyrebird-audiotools/blob/961786aa1a9d628cca0c0486e5885a457fe70c1a/audiotools/metrics/spectral.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why don't use develop branch?
tests/unit/tts/test_losses.py
Outdated
|
||
|
||
def test_multi_scale_stft_loss(): | ||
a = np.linspace(0, 199.98, 10000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest use true wav to test loss.
tests/unit/tts/test_losses.py
Outdated
|
||
|
||
def test_sisdr_loss(): | ||
a = np.linspace(0, 199.98, 10000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same with above
tests/unit/tts/test_losses.py
Outdated
def __call__(self, x): | ||
return x * (-0.2) | ||
|
||
a = np.linspace(0, 199.98, 10000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same with above
paddlespeech/t2s/modules/losses.py
Outdated
weight : float, optional | ||
Weight of this loss, defaults to 1.0. | ||
|
||
Implementation copied from: https://github.com/descriptinc/lyrebird-audiotools/blob/961786aa1a9d628cca0c0486e5885a457fe70c1a/audiotools/metrics/distance.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same with above
paddlespeech/t2s/modules/losses.py
Outdated
pow: float=2.0, | ||
weight: float=1.0, | ||
match_stride: bool=False, | ||
window_type: str=None, ): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when default value is None
, suggest use Optional
type
paddlespeech/t2s/modules/losses.py
Outdated
|
||
def __init__( | ||
self, | ||
scaling: int=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int=True
seems somewhat ambiguous
paddlespeech/t2s/modules/losses.py
Outdated
x: Union[AudioSignal, paddle.Tensor], | ||
y: Union[AudioSignal, paddle.Tensor]): | ||
eps = 1e-8 | ||
# nb, nc, nt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you mean tensor shape, suggest use (B, C, T)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have modified it as requested. Please check it.
Are you interested in participating in the training of the DAC model (Hackathon 8th No.5) as well? |
我看一下? |
任务五似乎是要先把任务9完成才能做的?那我是先等任务9合进去再开始嘛? @zxcd |
是的,请先完成任务9 |
啊,任务9已经完成可以检查了的。 |
并不知道你都改完了,上面的评论请都回复下,请看下 Code Review 注意事项。 |
tests/unit/tts/test_losses.py
Outdated
|
||
|
||
def get_input(): | ||
x = np.array([ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if audio file not have special request, you can use our demo audio, such as https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav
or https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/tests/unit/audiotools/test_audiotools.sh
If you have special request, pin the audio link to me, I will upload it to server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if audio file not have special request, you can use our demo audio, such as
https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav
or https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/tests/unit/audiotools/test_audiotools.shIf you have special request, pin the audio link to me, I will upload it to server.
It doesn't have a file for the test loss function, so I'm using the wav
file you provided. Now that I've finished revising, please check
e1c9ed6
to
0afc86d
Compare
我觉得应该找一下做audiotools这个pr的人复核一下,他的导包似乎有问题。 @luotao1 |
也请 @DrRyanHuang 有空看下 |
@cchenhaifeng This is demonstrated in the testing of I previously encountered the issue of cyclic imports as well. In this PR, you can fix it using deferred imports; I see that you’ve already done so. If there are any other issues, we can communicate at any time. |
413c466
to
d92c45e
Compare
Okay, I'll get back to you if I have any other questions |
|
Well, it's not normal. Does the code pass the unit tests for |
I haven't tried this, I'm testing |
The local results are correct |
The CI environment uses:
Have you aligned your PaddlePaddle version with the CI environment? |
My Python is 3.12 |
No problem locally |
|
All the test code has been verified, please check @zxcd |
@@ -26,3 +24,5 @@ | |||
from .audio_signal import AudioSignal | |||
from .audio_signal import STFTParams | |||
from .loudness import Meter | |||
from paddlespeech.t2s.modules import fft_conv1d |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unified use of relative paths or absolute paths in one file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
tests/unit/ci.sh
Outdated
@@ -1,6 +1,7 @@ | |||
function main(){ | |||
set -ex | |||
speech_ci_path=`pwd` | |||
pip install ffmpeg flatten_dict ffmpy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test_audiotools.sh
will install these pkg, I think this don't need to add it twice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
tests/unit/tts/test_losses.py
Outdated
@@ -0,0 +1,61 @@ | |||
# Copyright (c) 2024 PaddlePaddle Authors. All Rights Reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
tests/unit/tts/test_losses.py
Outdated
x, y = get_input() | ||
loss = MultiScaleSTFTLoss() | ||
pd_loss = loss(x, y) | ||
np.allclose(pd_loss.numpy(), 7.5622) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the accuracy be aligned to 1e-6?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
paddlespeech/t2s/modules/losses.py
Outdated
noise = (e_res**2).sum(axis=1) | ||
sdr = -10 * paddle.log10(signal / noise + eps) | ||
|
||
if self.clip_min is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whether should also add self.clip_min != 'None'
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
ci It has all passed, and there will be errors in the loop of packet routing, so the method of delaying packet routing is used. @zxcd |
tests/unit/tts/test_losses.py
Outdated
x, y = get_input() | ||
loss = MultiScaleSTFTLoss() | ||
pd_loss = loss(x, y) | ||
np.allclose(pd_loss.numpy(), 7.562150, rtol=1e-06) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you put the code generated using audiotools for 7.562150
in the note for easy verification?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where are the code comments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
tests/unit/tts/test_losses.py
Outdated
x, y = get_input() | ||
loss = GANLoss(My_discriminator0()) | ||
pd_loss0, pd_loss1 = loss(x, y) | ||
np.allclose(pd_loss0.numpy(), -0.102722, rtol=1e-06) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use self.assertEqual
or assert
include np.allclose
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use
self.assertEqual
orassert
includenp.allclose
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
hi, @cchenhaifeng
|
PR types
New features
PR changes
APIs
Describe
PaddlePaddle/community#1062