Hemant Vishwakarma: How to extract all timestamps of badminton shot sound in an audio clip using Neural Networks?

Sunday, 27 November 2022

How to extract all timestamps of badminton shot sound in an audio clip using Neural Networks?

I am trying to find the instances in a source audio file taken from a badminton match where a shot was hit by either of the players. For the same purpose, I have marked the timestamps with positive (hit sounds) and negative (no hit sound: commentary/crowd sound etc) labels like so:

shot_timestamps = [0,6.5,8, 11, 18.5, 23, 27, 29, 32, 37, 43.5, 47.5, 52, 55.5, 63, 66, 68, 72, 75, 79, 94.5, 96, 99, 105, 122, 115, 118.5, 122, 126, 130.5, 134, 140, 144, 147, 154, 158, 164, 174.5, 183, 186, 190, 199, 238, 250, 253, 261, 267, 269, 270, 274] 
shot_labels = ['no', 'yes', 'yes', 'yes', 'yes', 'yes', 'no', 'no', 'no', 'no', 'yes', 'yes', 'yes', 'yes', 'yes', 'no', 'no','no','no', 'no', 'yes', 'yes', 'no', 'no', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'no', 'no', 'no', 'no', 'yes', 'no', 'yes', 'no', 'no', 'no', 'yes', 'no', 'yes', 'yes', 'no', 'no', 'yes', 'yes', 'no']

I have been taking 1 second windows around these timestamps like so:

rate, source = wavfile.read(source) 
def get_audio_snippets(shot_timestamps): 

    shot_snippets = []  # Collection of all audio snippets in the timestamps above 

    for timestamp in shot_timestamps: 
        start = math.ceil(timestamp*rate)
        end = math.ceil((timestamp + 1)*rate)
        if start >= source.shape[0]: 
            start = source.shape[0] - 1

        if end >= source.shape[0]: 
            end = source.shape[0] - 1  

        shot_snippets.append(source[start:end]) 
        
    return shot_snippets

and converting that to spectrogram images for the model. The model doesn't seem to be learning anything with an accuracy of around 50%. What can I do to improve the model?

from How to extract all timestamps of badminton shot sound in an audio clip using Neural Networks?

Hemant Vishwakarma

Sunday, 27 November 2022

How to extract all timestamps of badminton shot sound in an audio clip using Neural Networks?

No comments:

Post a Comment