How choose audio fingerprint algorithm to create a cooperative music database?

1.5k Views Asked by At

I need to create a cooperative music identification service. Every user will have an option to fingerprint a song and send it to the server with its meta information. At the beginning the service database will be empty and every time a music fingerprint will be received meta data for the song will be updated (the server will assign meta data to a finger print based on majority choise if different user will send different information for the same fingerprint).

I need to calculate a fingerprint for the whole song, I do not need to identify a song from just a fraction.

The fingerprint should not be 100% accurate. I will be happy if two song file will receive the same fingerprint just if the same file is encoded with different compression rate. A low level of noise independence will be a plus.

Silence at the begining or the end of the song will be not a problem, I should remove them using standard silence suppression algorithm (and also in this case a do not need very precise result).

I know there are some opensource library like: http://echoprint.me/ and https://acoustid.org/ but these libraries are excessive for my needs, because if I understood correctly they can identify a song from just a part, and this will create a heavy database. I need an algorithm that will give me a not too heavy (some kb) fingerprint for the whole song.

Which is the simplest and fastest algorithm I can use?

Thanks to all

1

There are 1 best solutions below

0
Gfy On

I suggest you use the AcoustID project. Your description matches this project on a lot of points. Only some of their approaches are different from what you suggest.

Can the service identify short audio snippets?

No, it can't. The service has been designed for identifying full audio files. We would like to eventually support also this use case, but it's not a priority at the moment. Note that even when this will be implemented, it will be still intended for matching the original audio (e.g. for the purpose of tracklisting a long audio stream), not audio with background noise recorded on a phone.

Have a look at their mailing list for some better explanations: https://groups.google.com/forum/#!forum/acoustid