Skip to content

Controlling strength of ApplyImpulseResponse #388

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
worldveil opened this issue Apr 21, 2025 · 3 comments
Open

Controlling strength of ApplyImpulseResponse #388

worldveil opened this issue Apr 21, 2025 · 3 comments

Comments

@worldveil
Copy link
Contributor

When a ApplyImpulseResponse gets applied, it is sometimes quite strong. To the point you can't really hear the original audio.

The silliest thing I can think of is doing some kind of min|max_snr_db argument pair, sampling that target SNR, and then taking the dry (original) and wet (convolved) signals and adding them together in such a way the SNR is satisfied.

What do you think @iver56 ?

@iver56
Copy link
Owner

iver56 commented Apr 21, 2025

That's a legit question! I imagine two ways of getting less prominent perturbations:

1. Mix the input audio with the output audio (as you suggested)
This could be done (I have thought about it before) with a wrapper class that inputs the transform instance and the output amount (as a fraction) that you want in your mix. This class is not implemented yet.

Note that in the case of ApplyImpulseResponse, the input audio and the output audio are typically not time-aligned, depending on the chosen RIR, so you might end up getting unexpected/unwanted coloration artifacts (comb!) or even flanging. In other words, I would not recommend this approach in your case.

2. Change the RIRs
A more realistic-sounding solution to your problem is to use less extreme RIRs. One way to achieve that is to massage your dataset of RIRs, e.g. by removing long RIRs or by modifying them (e.g. taper the end of it somehow). Alternatively, you can find a different dataset of RIRs.

I guess it could be possible to do any kind of RIR modification on the fly, but it would be for advanced users. Maybe I could add a rir_transform argument that is a callable that can modify the RIR before it gets used. What do you think?

@worldveil
Copy link
Contributor Author

RE: (1) time alignment is not necessarily the goal, diversity of the training set is :)

I may just do this more manually, but do you have an example in the code of wrapper class that inputs the transform instance so that if I do find it works well, the pattern is likely PR'able back to the repo?

@iver56
Copy link
Owner

iver56 commented Apr 21, 2025

You could have a look at PostGain, a class that I've been toying with, but which is not officially released/exposed yet: https://github.com/iver56/audiomentations/blob/main/audiomentations/core/post_gain.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants