Skip to content

【Hackathon 8th No.9】在 PaddleSpeech 中复现 DAC 训练需要用到的 loss #3988

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Feb 26, 2025

Conversation

cchenhaifeng
Copy link
Contributor

@cchenhaifeng cchenhaifeng commented Feb 14, 2025

PR types

New features

PR changes

APIs

Describe

PaddlePaddle/community#1062

Copy link

paddle-bot bot commented Feb 14, 2025

Thanks for your contribution!

@CLAassistant
Copy link

CLAassistant commented Feb 14, 2025

CLA assistant check
All committers have signed the CLA.

@mergify mergify bot added the T2S label Feb 14, 2025
@cchenhaifeng
Copy link
Contributor Author

image
这是我在本地测试过后,拿到的数值结果,请检查

@mergify mergify bot added the Test label Feb 14, 2025
@cchenhaifeng
Copy link
Contributor Author

cchenhaifeng commented Feb 15, 2025

image
原先的代码仓是不是出问题了?我并没有修改过这里的代码 @zxcd @luotao1

"DDSP: Differentiable Digital Signal Processing."
International Conference on Learning Representations. 2019.

Implementation copied from: https://github.com/descriptinc/lyrebird-audiotools/blob/961786aa1a9d628cca0c0486e5885a457fe70c1a/audiotools/metrics/spectral.py
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't use develop branch?



def test_multi_scale_stft_loss():
a = np.linspace(0, 199.98, 10000)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest use true wav to test loss.



def test_sisdr_loss():
a = np.linspace(0, 199.98, 10000)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same with above

def __call__(self, x):
return x * (-0.2)

a = np.linspace(0, 199.98, 10000)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same with above

weight : float, optional
Weight of this loss, defaults to 1.0.

Implementation copied from: https://github.com/descriptinc/lyrebird-audiotools/blob/961786aa1a9d628cca0c0486e5885a457fe70c1a/audiotools/metrics/distance.py
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same with above

pow: float=2.0,
weight: float=1.0,
match_stride: bool=False,
window_type: str=None, ):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when default value is None, suggest use Optional type


def __init__(
self,
scaling: int=True,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int=True seems somewhat ambiguous

x: Union[AudioSignal, paddle.Tensor],
y: Union[AudioSignal, paddle.Tensor]):
eps = 1e-8
# nb, nc, nt
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you mean tensor shape, suggest use (B, C, T)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have modified it as requested. Please check it.

@zxcd
Copy link
Collaborator

zxcd commented Feb 17, 2025

Are you interested in participating in the training of the DAC model (Hackathon 8th No.5) as well?

@cchenhaifeng
Copy link
Contributor Author

Are you interested in participating in the training of the DAC model (Hackathon 8th No.5) as well?

我看一下?

@cchenhaifeng
Copy link
Contributor Author

任务五似乎是要先把任务9完成才能做的?那我是先等任务9合进去再开始嘛? @zxcd

@luotao1
Copy link
Collaborator

luotao1 commented Feb 20, 2025

是的,请先完成任务9

@cchenhaifeng
Copy link
Contributor Author

是的,请先完成任务9

啊,任务9已经完成可以检查了的。

@luotao1
Copy link
Collaborator

luotao1 commented Feb 20, 2025

任务9已经完成可以检查了的。

并不知道你都改完了,上面的评论请都回复下,请看下 Code Review 注意事项



def get_input():
x = np.array([
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if audio file not have special request, you can use our demo audio, such as https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav or https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/tests/unit/audiotools/test_audiotools.sh

If you have special request, pin the audio link to me, I will upload it to server.

Copy link
Contributor Author

@cchenhaifeng cchenhaifeng Feb 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if audio file not have special request, you can use our demo audio, such as https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav or https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/tests/unit/audiotools/test_audiotools.sh

If you have special request, pin the audio link to me, I will upload it to server.

image
It doesn't have a file for the test loss function, so I'm using the wav file you provided. Now that I've finished revising, please check

@cchenhaifeng
Copy link
Contributor Author

我觉得应该找一下做audiotools这个pr的人复核一下,他的导包似乎有问题。 @luotao1

@luotao1
Copy link
Collaborator

luotao1 commented Feb 21, 2025

可以进如流群讨论
image

我觉得应该找一下做audiotools这个pr的人复核一下,他的导包似乎有问题

也请 @DrRyanHuang 有空看下

@DrRyanHuang
Copy link
Contributor

DrRyanHuang commented Feb 21, 2025

我觉得应该找一下做audiotools这个pr的人复核一下,他的导包似乎有问题。

@cchenhaifeng
Currently, since audiotools exists independently within PaddleSpeech, its installation and testing are also conducted separately. If you wish to use audiotools, you can install it yourself by following the requirements specified in paddlespeech/audiotools/requirements.txt.

This is demonstrated in the testing of audiotools:
https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/tests/unit/audiotools/test_audiotools.sh#L1

I previously encountered the issue of cyclic imports as well. In this PR, you can fix it using deferred imports; I see that you’ve already done so. If there are any other issues, we can communicate at any time.

@cchenhaifeng
Copy link
Contributor Author

我觉得应该找一下做audiotools这个pr的人复核一下,他的导包似乎有问题。

@cchenhaifeng Currently, since audiotools exists independently within PaddleSpeech, its installation and testing are also conducted separately. If you wish to use audiotools, you can install it yourself by following the requirements specified in paddlespeech/audiotools/requirements.txt.

This is demonstrated in the testing of audiotools: https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/tests/unit/audiotools/test_audiotools.sh#L1

I previously encountered the issue of cyclic imports as well. In this PR, you can fix it using deferred imports; I see that you’ve already done so. If there are any other issues, we can communicate at any time.

Okay, I'll get back to you if I have any other questions

@cchenhaifeng
Copy link
Contributor Author

我觉得应该找一下做audiotools这个pr的人复核一下,他的导包似乎有问题。

@cchenhaifeng Currently, since audiotools exists independently within PaddleSpeech, its installation and testing are also conducted separately. If you wish to use audiotools, you can install it yourself by following the requirements specified in paddlespeech/audiotools/requirements.txt.

This is demonstrated in the testing of audiotools: https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/tests/unit/audiotools/test_audiotools.sh#L1

I previously encountered the issue of cyclic imports as well. In this PR, you can fix it using deferred imports; I see that you’ve already done so. If there are any other issues, we can communicate at any time.

image
Is it normal for the test to get stuck here all the time?

@DrRyanHuang
Copy link
Contributor

Is it normal for the test to get stuck here all the time?

Well, it's not normal. Does the code pass the unit tests for utils.py on your local machine? If it does, then the issue might be with the CI system.

@cchenhaifeng
Copy link
Contributor Author

cchenhaifeng commented Feb 21, 2025

Is it normal for the test to get stuck here all the time?

Well, it's not normal. Does the code pass the unit tests for utils.py on your local machine? If it does, then the issue might be with the CI system.

I haven't tried this, I'm testing test_losses.py , but I'm just modifying the packet method, not the logic diamagnetic, and theoretically it should pass.

@cchenhaifeng
Copy link
Contributor Author

Is it normal for the test to get stuck here all the time?

Well, it's not normal. Does the code pass the unit tests for utils.py on your local machine? If it does, then the issue might be with the CI system.

The local results are correct

@DrRyanHuang
Copy link
Contributor

The local results are correct

The CI environment uses:

  • Python 3.8
  • paddlepaddle-gpu==2.5

Have you aligned your PaddlePaddle version with the CI environment?
We need to support both Paddle 2.5 and Paddle 2.6.

@cchenhaifeng
Copy link
Contributor Author

The local results are correct

The CI environment uses:

* Python 3.8

* paddlepaddle-gpu==2.5

Have you aligned your PaddlePaddle version with the CI environment? We need to support both Paddle 2.5 and Paddle 2.6.

My Python is 3.12

@cchenhaifeng
Copy link
Contributor Author

The local results are correct

The CI environment uses:

* Python 3.8

* paddlepaddle-gpu==2.5

Have you aligned your PaddlePaddle version with the CI environment? We need to support both Paddle 2.5 and Paddle 2.6.

No problem locally

@cchenhaifeng
Copy link
Contributor Author

image
All the test code has been verified, I'm getting it to run again, please check @zxcd

@cchenhaifeng
Copy link
Contributor Author

All the test code has been verified, please check @zxcd

@@ -26,3 +24,5 @@
from .audio_signal import AudioSignal
from .audio_signal import STFTParams
from .loudness import Meter
from paddlespeech.t2s.modules import fft_conv1d
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unified use of relative paths or absolute paths in one file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

tests/unit/ci.sh Outdated
@@ -1,6 +1,7 @@
function main(){
set -ex
speech_ci_path=`pwd`
pip install ffmpeg flatten_dict ffmpy
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_audiotools.sh will install these pkg, I think this don't need to add it twice?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -0,0 +1,61 @@
# Copyright (c) 2024 PaddlePaddle Authors. All Rights Reserved.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2025

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

x, y = get_input()
loss = MultiScaleSTFTLoss()
pd_loss = loss(x, y)
np.allclose(pd_loss.numpy(), 7.5622)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the accuracy be aligned to 1e-6?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

noise = (e_res**2).sum(axis=1)
sdr = -10 * paddle.log10(signal / noise + eps)

if self.clip_min is not None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whether should also add self.clip_min != 'None'?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@cchenhaifeng
Copy link
Contributor Author

ci It has all passed, and there will be errors in the loop of packet routing, so the method of delaying packet routing is used. @zxcd

x, y = get_input()
loss = MultiScaleSTFTLoss()
pd_loss = loss(x, y)
np.allclose(pd_loss.numpy(), 7.562150, rtol=1e-06)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put the code generated using audiotools for 7.562150 in the note for easy verification?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are the code comments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image
Here

@cchenhaifeng
Copy link
Contributor Author

image
As you can see, the test_util is in a loop after the deletion. @zxcd

x, y = get_input()
loss = GANLoss(My_discriminator0())
pd_loss0, pd_loss1 = loss(x, y)
np.allclose(pd_loss0.numpy(), -0.102722, rtol=1e-06)
Copy link
Collaborator

@zxcd zxcd Feb 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use self.assertEqual or assert include np.allclose

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use self.assertEqual or assert include np.allclose

Done

Copy link
Collaborator

@zxcd zxcd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zxcd zxcd merged commit d7bf915 into PaddlePaddle:develop Feb 26, 2025
5 checks passed
@luotao1
Copy link
Collaborator

luotao1 commented Mar 4, 2025

hi, @cchenhaifeng

  • 非常感谢你对飞桨的贡献,我们正在运营一个PFCC组织。PFCC是飞桨开源的贡献者俱乐部,只有给飞桨合入过代码的开发者才能加入,俱乐部里每两周会有一次例会(按兴趣参加),也会时不时办线下meetup面基,详情可见 https://github.com/luotao1 主页说明。
  • 如果你对PFCC有兴趣,请发送邮件至 [email protected],我们会邀请你加入~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants