Thursday, 15 December 2022

How to get complete fundamental (f0) frequency extraction with python lib librosa.pyin?

I am running librosa.pyin on a speech audio clip, and it doesn't seem to be extracting all the fundamentals (f0) from the first part of the recording.

librosa documentation: https://librosa.org/doc/main/generated/librosa.pyin.html

sr: 22050

fmin=librosa.note_to_hz('C0')
fmax=librosa.note_to_hz('C7')

f0, voiced_flag, voiced_probs = librosa.pyin(y,
                                             fmin=fmin,
                                             fmax=fmax,
                                             pad_mode='constant',
                                             n_thresholds = 10,
                                             max_transition_rate = 100,
                                             sr=sr)

Raw audio:

raw audio

Spectrogram with fundamental tones, onssets, and onset strength, but the first part doesn't have any fundamental tones extracted.

link to audio file: https://jasonmhead.com/wp-content/uploads/2022/12/quick_fox.wav

times = librosa.times_like(o_env, sr=sr)
onset_frames = librosa.onset.onset_detect(onset_envelope=o_env, sr=sr)

enter image description here

Another view with power spectrogram:

enter image description here

I tried compressing the audio, but that didn't seem to work.

Any suggestions on what parameters I can adjust, or audio pre-processing that can be done to have fundamental tones extracted from all words?

What type of things affect fundamental tone extraction success?



from How to get complete fundamental (f0) frequency extraction with python lib librosa.pyin?

No comments:

Post a Comment