How do you find music that you love?
In music discovery, there are many variables to choose from and these can take values from an enormous range.
Take genre for example, as seen in the awesome exercise Every Noise At Once by The Echo Nest’s principal engineer Glenn McDonald, there are well over 1.4k genres we can select from. Not only this, as McDonald explains, it’s not easy for a group of people to decide how a genre like, for example, “Rock” is defined or what songs should be included in a Rock playlist. By using intelligent algorithms, they have managed for their system to achieve a very reasonable output.
Besides genre, there are many other dimensions like artist, location, moment in time,influences, popularity, mood, activity, user preferences, social metrics, acoustics, and others that can be used for music discovery
At Stereotheque, we believe the concept of music scenes can help us reduce the dimensionality of the music exploration space and highlight certain features that are valuable for music lovers. A music scene can serve as a starting point from which to span out connections and measure similarity using some of the other variables.
From Sounds To Notes
In this post series I will only explore one of these variables, mainly acoustics going from sound physics to chord recognition. The intention is to express these technical concepts in a simple way that can help us understand how more complex systems work. We will see that in itself, each variable can have many sub-dimensions that add complexity but also give us more options for designing custom search and recommendation algorithms.
Music can have many definitions but a somewhat reasonable one is structured sound.
When a guitar string is plucked, it vibrates at a certain frequency or speed and that vibration travels through the air into our ears that turn it into nerve impulses which are then processed by the brain and let us experience the subjective sensation of hearing.
The actual physical vibration of the strings and the surrounding air can be measured, recorded, and analyzed. The figure below shows the vibration in time. (Click on the Guitar – 82Hz – E2 button)
These vibrations can be decomposed into simpler ones called sine waves. In other words, the sound of the guitar string, or any other sound, can be thought of as many simple sine waves “sounding” at the same time. Furthermore, if you had at your disposal an infinite amount of tunable sine wave generators, you could compose any imaginable sound by switching them on at certain times , adjusting their volume and speed of vibration (phase, amplitude and frequency). This is the basis for additive sound synthesis and an illustration can be seen in the image below
How does a sine wave sound?
This is how a sine wave sounds and looks like. You can vary the amplitude and frequency parameters mentioned before.
To illustrate the sound decomposition idea even further, try to compose a “square wave” sound using only 3 harmonics. (Click on the circles below)
The Frequency Domain
Amplitude is directly related with the perception of volume and frequency with the perception of pitch.
If you set the sine wave frequency to 440 Hz, you will notice that the pitch corresponds to the musical note A above middle C or A4
You can ask yourself why in the previous square wave example, if the sound is made up of sine waves with different frequencies, it can still be perceived only one pitch. The reason is that sound that is associated with a pitch (as opposed to percussive or other non-pitched sounds) is made up of sines with frequencies that are multiples of a main frequency called “fundamental frequency” which will normally be the perceived pitch. You can actually see this in a graph because the wave repeats itself in time. In the real world, a pitched sound can have components that are not multiples of the fundamental frequency although these will have low amplitudes thus contributing less to the wave form.
Up until now we have visualized waves with the horizontal axis representing time and the vertical axis representing amplitude but we can also set the horizontal axis to represent frequency. In the case of a pure sine wave, since it vibrates at a specified frequency, it should take the form of an impulse function as shown sound below where all the energy is condensed in one point.
In the real world, impulse functions don’t exist and sine waves have a limited duration so visualization in the frequency domain looks a little spread as shown below. It is clear that there is a peak around a narrow frequency band, by default 329 Hz
Now lets see how the original guitar string E4 is decomposed into sine components.
As expected, the highest peak in both sounds is in the same frequency band but for the guitar we can see higher order harmonics in frequencies that are multiples of 329 Hz.
Now that we know that any sound can be generated from a series of sines with different frequencies and amplitudes and that we can visualize the sound as a function of frequency instead of time, we can start exploring how this relates more to hearing perception and musical elements.