I am running librosa.pyin on a speech audio clip, and it doesn't seem to be extracting all the fundamentals (f0) from the first part of the recording.
librosa documentation: https://librosa.org/doc/main/generated/librosa.pyin.html
sr: 22050
fmin=librosa.note_to_hz('C0')
fmax=librosa.note_to_hz('C7')
f0, voiced_flag, voiced_probs = librosa.pyin(y,
fmin=fmin,
fmax=fmax,
pad_mode='constant',
n_thresholds = 10,
max_transition_rate = 100,
sr=sr)
Raw audio:
Spectrogram with fundamental tones, onssets, and onset strength, but the first part doesn't have any fundamental tones extracted.
link to audio file: https://jasonmhead.com/wp-content/uploads/2022/12/quick_fox.wav
times = librosa.times_like(o_env, sr=sr)
onset_frames = librosa.onset.onset_detect(onset_envelope=o_env, sr=sr)
Another view with power spectrogram:
I tried compressing the audio, but that didn't seem to work.
Any suggestions on what parameters I can adjust, or audio pre-processing that can be done to have fundamental tones extracted from all words?
What type of things affect fundamental tone extraction success?
from How to get complete fundamental (f0) frequency extraction with python lib librosa.pyin?
No comments:
Post a Comment