Page MenuHomeSolus

pulseaudio: too high quality resample-method
Closed, ResolvedPublic


Please note the problem below.
In the file
I saw the following:
resample-method = soxr-mq
This is suitable for listening to music or other audio content, but is not suitable for blind users, as it causes significant delays. During printing, as well as during quick navigation through the list of files, you can see that the speech synthesizer lags behind (that is, between pressing a key and the response from the synthesizer is a significant period of time).
In the case of Espeak-NG I also notice that there is a specific overtone (something close to wheezing).
I also connected to the Speech-dispatcher a speech synthesizer that I work with using the sd_generic module (this is a module that allows you to connect to Speech-dispatcher speech synthesizers that have only a command line interface) .. I couldn t work with it because the endings of the words started to just disappear ...
Eventually - I commented on this line, and after that - the problems disappeared.
Could you change the default resample-method? I already had experience setting up Pulseaudio so I figured out what the problem was. But other blind users, especially beginners in Linux, may not guess that the reason for this behavior is Pulseaudio.
I suggest using either the default resample-method or ffmpeg. Although I m not sure if ffmpeg will suit absolutely all users.
I will be very grateful for your attention to this problem.

Event Timeline

bmivzkrp created this task.Oct 24 2021, 2:13 PM
JoshStrobl renamed this task from Solus-MATE: too high quality resample-method to pulseaudio: too high quality resample-method.Oct 24 2021, 5:02 PM
JoshStrobl triaged this task as Needs More Info priority.Oct 24 2021, 5:05 PM
JoshStrobl added a subscriber: JoshStrobl.

While I am not strictly opposed to switching to something else besides soxr-mq, you need to provide objective data that proves it is soxr-mq that is specifically causing this and not a symptom of any of your hardware or poor performance in other places like the synthesizer. Otherwise while the cause is noble, it would actively diminish a broader user experience of higher audio fidelity for a smaller subset of users.

@JoshStrobl This situation has long been known to all, which is why other distributions do not use similar things.
Speech synthesizers do nothing special. They simply generate a sound stream in Raw or Wav format, and either play it with Pulseaudio, or with user-selected means (such as Play from Sox).
When listening to long sound tracks, this effect is invisible, only when playing very short sound events.
Blind users depend on the sound, so they tried to configure some sound in different distributions, and the effect was the same. Therefore, as a rule, stopped on ffmpeg. I know several such blind users.
I still suggest not to lean towards the majority, but to find some reasonable compromise, so that it was good for both blind users and people who see.
This is one hundred percent not a synthesizer, and not my equipment. But I can t show you the sound reproduction spectrogram.
Please note the following detail: finding information on how to set up Pulseaudio is very simple. It will be difficult for a blind user to understand what the problem is. He will try to contact the Orca Mailing List, or the developers of Espeak-NG. Will get the answer that everyone is fine and decide that this Solus somehow destroys the accessibility ...
That is, it is easier for a person who sees to set up Pulseaudio for themselves than for a blind user to find the cause of sound problems.

I'm gonna step in here and say this has nothing to do with the pulseaudio resampling method. Latency is very low there and we have always had the kind of delay with speech-dispatcher, even with the default speex resampling settings. There may be a reduction in that latency with speex, but it absolutely is still there and not the root-cause in the slightest. I *specifically* researched all of this before we switched resamplers and sox-mq had markedly better quality without sacrificing huge CPU time.

DataDrake added a comment.EditedOct 24 2021, 7:22 PM

I personally just tested both resamplers on Budgie and could not hear any noticeable latency apart from speech-dispatcher stuttering or delaying to start and that was the case with both. In both cases, once speech-dispatcher "warmed up", the responses were fluid.

Edit: This was with espeak from command line and orca while interacting with both firefox and a terminal.

@DataDrake First of all, thank you so much for taking such things into account. I assumed that it had not even been tested or researched (I mean the Accessibility aspect).

  1. I managed to get Espeak-NG to work. I can t describe how I did it, but I restarted Pulseaudio exactly a few times. That is, you are one hundred percent right .;
  2. To play the audio stream in raw format, which generates ru_tts, I use the Play utility. Your words made me investigate the problem in more detail. I have found that the endings of words disappear only if the speed exceeds a certain limit. But ... This does not happen if resample-method is used by default.

In any case, there are fears that this will be faced by users who are forced to use voices connected via sd_generic. It would be very good if you could investigate why this is happening. But in general - we can assume that you have overcome me. )
Delays after some of my manipulations disappeared even for ru_tts.

OK. Sorry for the false alarm, and thank you for helping to clarify the situation.
Since I am the only one currently reporting this issue, I m closing this task. I like Sox-based resampling techniques, so I slowed down the speech synthesizer to avoid sound distortion.
Also, I now see that Pulseaudio in Solus is configured better than in other distributions I ve used. So accept my most sincere apologies.

bmivzkrp closed this task as Invalid.Oct 24 2021, 8:52 PM
bmivzkrp added a comment.EditedOct 28 2021, 3:39 PM

It seems that I still solved the problem. To avoid sound distortion, you need to convert Raw to Wav. Then play the sound with Paplay.

bmivzkrp changed the task status from Invalid to Resolved.Oct 28 2021, 3:40 PM