audio - Detecting Small Sound Effects In C++ -
i'm trying detect small (1-3 seconds in length) sound effects, im using fmod capture sounds (which play on program) using loopback technique.
ive been researching passed few days, how can compare captured sound effect, database of 50 have stored, know comparing each binary byte wont work cause slight inteference change it. sounds exact audio files being captured each time.. characteristcs should dead on everytime.
i cant use fingerprinting libarys out there require record @ least 10-90 seconds of audio.
as sounds small, , in small number, guess 1 of gurus out there know simple solution, wanted try , use fft , compare of frequency's etc, cant kiss fft libary working there absoloutly no docs.
also ive created function split channels. here
int seperatechannels(fmod::sound *sound) { byte *ptr1, *ptr2; unsigned int lenbytes, len1, len2; sound->getlength(&lenbytes, fmod_timeunit_pcmbytes); sound->lock(0, lenbytes, (void**)&ptr1, (void**)&ptr2, &len1, &len2); byte *bufferleft = new byte[(lenbytes/2)]; byte *bufferright = new byte[(lenbytes/2)]; for(int = 0; < lenbytes; += 4) { bufferleft[i] = ptr1[i]; bufferleft[i+1] = ptr1[i+1]; bufferright[i] = ptr1[i+2]; bufferright[i+1] = ptr1[i+3]; } // kiss fft???? return 1; }
any appriciated. -que
if problem determine of pre-defined set of sounds has been recorded can think of 2 options: "compare" recording of sounds in database, or perform "lookup" based on general characteristics of sound (usually called "descriptors" in audio analysis literature). descriptors i'm thinking of things spectral centroid.
for "compare" case either in time domain using correlation, or in frequency domain computing spectral magnitude difference. time domain comparison need perform correlation @ multiple offsets since don't know sound starts. frequency domain case need convert raw fft data kind of spectral envelope -- e.g. take average of magnitude spectrum of set of (windowed) overlapping frames.
for "lookup" case compute set of descriptors, compute them on corpus , on candidate input, , element of corpus closest descriptor computed input. can on sequence of frames: perform same kind of correlation analysis have done time-domain "compare" case, instead of computing difference of each sample, compute difference each descriptor -- work better comparing evolving sounds using single descriptor.
if intend use fft need work out no how apply fft, how compute magnitude spectra , have idea data structures you're dealing with. getting result requires number of steps beyond performing fft. there bunch of ways matching can optimised, if sound set fixed (i'm thinking of group testing approaches example).
for simpler approach way dtmf touch tone decoding done. performing pre-analysis of source sounds might able determine set of non-overlapping frequencies can used fingerprint each sound.
in cases i'd in mono summing left , right channels. stereo won't give unless you're sure input has same panning output.
Comments
Post a Comment