I have about 30 sound clips that are each a preset from a synthesizer. I want to compare these sounds to find out which ones are similar, and then sort the sounds so that each sound is adjacent in a list to 2 sounds which are similar to it. Frequency is not the only thing I want to look for. I would rather 2 saw waves which are a tone apart be considered similar that a saw wave and a sine wave which are the same note.
These sounds would be considered similar for example
Below is some code I have written in order to separate a song clip into individual sounds. At the end I have an array named transientSamples which holds the ~30 sounds from the sound file. I would like to sort this list by sound similarity. This code works and I do not need help with with. I am wondering how to do the next step, which is sorting the list of sounds. If you answer with code, please explain your code and/or fill it with lots of comments. I am new to audio analysis and have little experience with the concepts described in the librosa documentation but I feel like this page of functions on feature extraction will be useful here
This is the sound file I am using
import librosa
import numpy as np
import os
import soundfile as sf
def transients_from_onsets(onset_samples):
"""Takes a list of onset times for an audio file and returns the list of start and stop times for that audio file
Args:
onset_samples ([int]):
Returns:
[(int, int)]: A list of start and stop times for each sound change
"""
starts = onset_samples[0:-1]
stops = onset_samples[1:]
transients = []
for s in range(len(starts)):
transients.append((starts[s], stops[s]))
return transients
def transient_samples_from_times(transientTimes, y):
transientSamples = []
for (start, stop) in transientTimes:
transientSamples.append(y[start, stop])
return transientSamples
def transients_from_sound_file(fileName, sr=44100):
"""Takes the path to an audio file and returns the list of start and stop times for that audio file as a frame rate
Args:
fileName (string): The path to an audio file
sr (int, optional): The sample rate of the audio file. Defaults to 44100.
Returns:
[(int, int)]: A list of start and stop times for each sound change
"""
y, sr = librosa.load(soundFile, sr=sr)
C = np.abs(librosa.cqt(y=y, sr=sr))
o_env = librosa.onset.onset_strength(sr=sr, S=librosa.amplitude_to_db(C, ref=np.max))
onset_frames = librosa.onset.onset_detect(onset_envelope=o_env, sr=sr)
onset_samples = list(librosa.frames_to_samples(onset_frames))
onset_samples = np.concatenate(onset_samples, len(y))
transients = transients_from_onsets(onset_samples)
transientSamples = transient_samples_from_times(transientTimes)
return transientTimes, transientSamples
def main():
soundFile = "first-four-seconds.wav"
transientTimes, transientSamples = transients_from_sound_file(soundFile)I need this explained at a low level of abstraction.
I made spectrograms for all the sounds
S = [np.abs(librosa.stft(y)) for y in transientSamples[0:3]]
This is the spectrogram array. Since there's only 3, I would the sound which sounds most like the other 2 to be in the middle. I don't know how to compare these arrays to each other
[
array([[5.4624897e-01, 2.4768096e-01, 1.4556460e-01, ..., 5.7867712e-01,
4.5554334e-01, 5.4170746e-01],
[2.2452552e+00, 1.5994334e+00, 1.1147156e+00, ..., 2.4819484e-01,
3.9017826e-01, 2.6280621e-01],
[4.1671777e+00, 3.8175552e+00, 3.0202076e+00, ..., 6.3510722e-01,
6.0488397e-01, 8.2328826e-01],
...,
[4.5242737e-04, 2.0710655e-04, 2.2738833e-04, ..., 1.3031939e-04,
2.1871932e-04, 3.3146524e-04],
[5.5690703e-04, 3.5128437e-04, 3.1929201e-04, ..., 2.3604959e-04,
4.4018662e-04, 5.8832398e-04],
[5.9645117e-04, 2.4223674e-04, 3.1842748e-04, ..., 2.6871334e-04,
2.3753567e-04, 6.7172351e-04]], dtype=float32),
array([[5.28261960e-01, 3.40773761e-01, 3.64203081e-02, ...,
4.94968325e-01, 3.40660214e-02, 6.10241592e-02],
[1.47714257e+00, 8.36813867e-01, 6.81992233e-01, ...,
1.36160457e+00, 7.20932424e-01, 5.77513456e-01],
[2.13315511e+00, 2.37118530e+00, 2.42295027e+00, ...,
2.90257120e+00, 1.71677089e+00, 1.24927975e-01],
...,
[2.86732073e-04, 2.23182797e-04, 1.68425191e-04, ...,
4.66386628e-05, 9.50504182e-05, 1.01365811e-04],
[2.26811855e-04, 1.57420873e-04, 2.92457495e-04, ...,
7.80835835e-05, 1.15517389e-04, 9.55912183e-05],
[3.52372968e-04, 2.86036695e-04, 6.16093748e-05, ...,
7.29371095e-05, 8.64024987e-05, 1.62979442e-04]], dtype=float32),
array([[1.2992793e-01, 4.9643394e-02, 5.8306026e-01, ..., 8.2840703e-02,
2.5342130e-03, 2.2889185e-01],
[1.6738531e-01, 1.6829827e-01, 5.0865650e-01, ..., 3.2662424e-01,
3.1669825e-01, 4.0731630e-01],
[1.2706667e-01, 1.0827064e-01, 4.0436751e-01, ..., 1.7236206e-01,
3.3250493e-01, 5.8282650e-01],
...,
[3.9454570e-04, 2.7970120e-04, 6.7032655e-05, ..., 5.1820884e-04,
4.5072302e-04, 2.3226207e-04],
[7.6335639e-04, 4.5884529e-04, 1.4658972e-04, ..., 5.8171444e-04,
6.4832438e-04, 2.1206622e-04],
[8.9842471e-04, 4.6509632e-04, 1.3202771e-04, ..., 7.2093250e-04,
6.4046000e-04, 2.2259250e-04]], dtype=float32)
]I also computed rms, mfcc, and centroids using
rms = [librosa.feature.rms(S=s) for s in S]
centroids = [librosa.feature.spectral_centroid(y=y, sr=sr) for y in samples]
mfccs = [librosa.feature.mfcc(y=y, sr=sr) for y in samples]
But again, I don't know how to compare the arrays to eachother
from Compare the similarity of 2 sounds using Python Librosa

No comments:
Post a Comment