Skip to content

Unnatural pitch gliding in the converted voice #74

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
TheTrustedComputer opened this issue Jul 19, 2024 · 2 comments
Open

Unnatural pitch gliding in the converted voice #74

TheTrustedComputer opened this issue Jul 19, 2024 · 2 comments
Labels
bug Something isn't working help wanted Extra attention is needed question Further information is requested

Comments

@TheTrustedComputer
Copy link

Describe the bug
I'm unsure if this is a feature or a bug, but I've heard these strange pitch glides in RVC models with pitch guidance enabled during inference; it also affects real-time. Interestingly, this quirk is virtually absent in the RVC-Boss's repo.

To Reproduce

  1. Open the web GUI.
  2. Select any RVC model.
  3. Create an input audio containing tones with a large pitch difference and convert.
  4. Listen to the output audio; you'll hear them.

Expected behavior
The converted voice doesn't exhibit pitch-gliding behavior. Or rather, its gliding behavior is minimized.

Screenshots
Not applicable.

Desktop (please complete the following information):

  • OS and version: Arch Linux (version N/A)
  • Python version: 3.10.13
  • Commit/Tag with the issue: Latest

Additional context
All pitch detection algorithms except FCPE sounded poorly at tracking them with RMVPE being the worst offender. It appears the hop length was set too high for my comfort level.

@fumiama fumiama added bug Something isn't working help wanted Extra attention is needed question Further information is requested labels Jul 24, 2024
@fumiama
Copy link
Owner

fumiama commented Jul 24, 2024

Maybe it is because this repo used an unified F0 extracting class, which introduced the interpolation behavior, but I'm not 100% sure. If you don't mind, could you provide the audio you use for our testing? Thanks!

@TheTrustedComputer
Copy link
Author

https://drive.google.com/file/d/1EufMz1hMB5M_HMJoIuS7pB9wh8aslnyE/view
The audio contains sawtooth waves at 120 Hz and 960 Hz for half a second, respectively.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants