ART 103
Art as System - Advanced Projects


Aaron Siegel
Project One - Audio Prototyping


When going into this project, my goal was to achieve a seamless interface between the vocalization of percussive beats and basic tones to a dynamically workable data form for use in transcription and resynthesis. This is a personal goal of mine so as to extend my own musical skope from my more natural beatboxing and vocalization abilities, to something that can be overlapped and even associated with predetermined samples. By creating profiles of characteristics of audio events (audio event being time, characteristics being major frequencies and amplitudes over time), their event can trigger audio samples or an ideal resynthesis. This is what gave me the idea to define it as audio prototyping.

In order to accomplish the goal of audio prototyping, there are several phases to undertake: raw input analysis, input stripping, transcription protocol, and the option between characteristic associations or resynthesis of key characteristics.

Below is a time based spectrum frequency analysis visualization. From the left to the right records instances in time where audio events occur. From the bottom to the top is 0 hz to 20,000 hz. Finally, amplitude of audio events is recorded in color, blue being lowest and red being highest. The horizontal parallel lines that can be seen occuring are multi-frequency harmonic tones generated by my voice. Note how they stay below the halfway mark, 10,000hz. The human voice produces most of its range (atleast of normal speech) within 0-1000hz.



Below is another time based spectrum frequency analysis in the same range. This is a recording of me performing a fairly simple beatboxing pattern. There were three "instruments" I used: a snare sound (using the back of my tongue on the roof of my mouth to make a "KA" sound), a bass drum (using my lips and a quick release of air to make a "PUH" sound), and a vocalized bass drum (which uses the same "PUH" technique but also produces a low tone in the voice box to provide a subtle melody). All of these unique sounds can be spotted in this visualization. The snare is the easiest, since it creates a white noise sound which equalizes the use of frequencies across the spectrum (as seen by the four equally placed long yellow lines). The un-vocalized bass drum sounds show up as thin green spikes on the lower half of the visualization. The vocalized versions have the same height, but are stretched out much further and begin to dissipate in intensity until they disappear.

This is where the genius part comes in, and this is where I fell short on time. In order to analyze the data that's being shown, one must construct an algorithm for (a) determining audio events by amplitude, (b) record key characteristics of major frequencies by their energy levels in decibels, and (c) associate the event of similar, but not identical occurances to the same characteristic prototype. This is the key factor to allowing a program to associate the four snare instances as the same instrument, although they vary in intensity and spectrum usage.



Below is a recording the actual version of the song I was beatboxing, "Jam on It" by Nucleus. I placed the microphone on my mousepad about a foot away from my speaker. This shows the complexity that can arise when more difficult and audibly cohesive music is introduced to the programs analysis. Seperate audio events that the human brain can interpret are indistinguishable by the program, causing general disarray. It's for this reason that beatboxing provides an ideal candidate for input, basing all its instrumentation on the use of one instrument which can only make one noise at a time (even though that noise can sometimes sound like a bass and snare together, which leads to a myriad of mathematical problems on its own). (note: most research in this field avoids using human vocalization entirely, as it has a natural vibratto which makes it difficult to pinpoint a frequency, so instead they often use pianos or flutes.)



In order to develop an algorithm for parsing out the desired data from the raw wave input, massive amounts of numerical output must be monitored in conjunction with its visual or and/or audibal equivelent. This tool at its current level allows the user to monitor all of these for development in analysis. It does not yet do any analysis of the spectrum amplitude data, but I hope in the future to add the ability to at first distinguish between the beggining and end of audio events, then analyze the characteristics of each of those data sets (most likely stored in wave frames).

Watch the sample video of animated graph and audio playback. (35mb)

Description:
The program is to act as a transparent interface between performance and composition, allowing the user to "jam" out a musical piece while the program does all the transcription and resynthesis.

Dependencies:
Dislin: high-level plotting libraries for data analysis.
Pymedia: audio and video processing modules.
Numpy: Numeric python, scipy should work too.

Python Progress:
001: records from sound device to wave file.
002: unpack binary frames from wave file, analyzes numerical data, begins attempt to plot results.
003: generates sin wave of user specified frequencies and plots result.
004: cycles through and plots frequencies that are multiples of 5 between 5 and 10000.
005: plots overlapped values of time instances of freuqnecy magnitude by frequency range.
006: generates audible result of sin wave of given frequency, but produces semi-white noise (gray noise?) along with it.
007: generates audible result of sin wave of user designated frequency or harmonic of frequencies (still has static issue).
008: generates clean audible tone of user frequency through PC speaker (doesn't generate sine wave data for saving to file, though, and only works on windows)
009: opens a user specified wave file and plots the time and spectrum analysis in dislin using a 3D color plot.
010: reads data from microphone, saves to temporary wave file, plots spectrum analysis with dislin, and refreshes at user specified rate.
 
Research & Reference

Analyzing Sound Files
Awesome code introduction to using python libraries to interface and analyze wave files.

Scipy Plotting Tutorial
Easy to understand documentation of the plotting methods in the SciPy.

Quadrature Signals: Complex, but Not Complicated
Excellent math lesson in quadrature signal and complex number processing.

Harmonic Phasors and Fourier Series
Physics and math lesson on complex sinusoids.

struct.unpack question
Quick explanation of the format argument for the struct.unpack method.

Transpose wav to another key
Script I used for reference in proper handling of struct.unpack data from wav files.

Note-A-Rific: Frequency, Wavelength, Amplitude
Physics lesson on measuring dimensions of wave signals.

Physlink Q&A
Physics lesson to find musical note from wavelength.

Signal Processing for Melody Transcription
Report regarding project results on melody transcription system via audio prototyping.

Automatic Extraction of Drum Tracks from Polyphonic Music Signals
Report regarding project results on percussion transcription and resynthesis from mixed instrument audio source.

UCSB MAT 202: Intro to Mathematics for Signal Processing
Graduate course for media art and technology students at UCSB; contains hand written lecture notes from professor.

Reliable Software Frequency Analyzer
Real-time or wave file based spectrum frequency analyzer; open source visual c++ program.

Sound, Synthesis, and Audio Reproduction
Thorough introduction to all aspects of hearing, sound physics, sound synthesis, and processing.

Reli-Soft Frequency Analyzer
Awesome realtime time and spectrum analysis for windows, AND it's open source!

Automatic Transcription of Music
Masters and PhD research in the field of transcription and resynthesis of solid tones (ie: piano, or flute).

International Symposium on Music Information Retrievel
The global community of sound oriented information parsers.

 
The Whiteboard

2-26-05 - 8:34pm - made room for development of the transvergence model.

2-22-05 - 2:52am - mapping out strong frequencies and multiples of known notes to move onto audible sin wave synthesis.