MPEG-4 Structured Audio: Research Publications

By John Lazzaro and John Wawrzynek, CS Division, UC Berkeley.

Below are Postscript and PDF versions of papers we have written about our work on MPEG 4 Structured Audio and related topics.

To learn how to write programs using Structured Audio, see our online MPEG 4 Structured Audio Book. For information about implementing decoders and encoders for Structured Audio, see the relevant appendices of the book for the latest standards documents. You may also wish to view this multimedia presentation on Structured Audio.

Paper Titles

2012

John Lazzaro and John Wawrzynek (2012). A Tilt Filter in a Servo Loop. The 133th Convention of the Audio Engineering Society (AES), October 26-29, 2012, San Francisco, CA, USA.

Download aes133.pdf (941KB) versions. Prints out 14 pages on A4 paper.

To listen to audio examples, view the source code, and download the plug-in, click here.

Abstract. Tone controls based on the tilt filter first appeared in 1982, in the Quad 34 Hi-Fi preamp. More recently, tilt filters have found a home in specialist audio processors, such as the Elysia mpressor. This paper describes a novel dynamic filter design based on a tilt filter. A control system sets the tilt slope of the filter, in order to servo the spectral median of the filter output to a user-specified target. Users also specify a tracking time. Potential applications include single-instrument processing (in the spirit of envelope filters) and mastering (for subtle control of tonal balance). Although we have prototyped the design as an AudioUnit plug-in, the architecture is also a good match for analog circuit implementation.

2011

Lazzaro, J. P., Wawrzynek, J. (2011).  RTP Payload Format for MIDI. RFC 6295, Internet Engineering Task Force (IETF) Proposed Standard Protocol (obsoletes RFC 4695) [download].

Abstract. This memo describes a Real-time Transport Protocol (RTP) payload format for the MIDI (Musical Instrument Digital Interface) command language. The format encodes all commands that may legally appear on a MIDI 1.0 DIN cable. The format is suitable for interactive applications (such as network musical performance) and content- delivery applications (such as file streaming). The format may be used over unicast and multicast UDP and TCP, and it defines tools for graceful recovery from packet loss. Stream behavior, including the MIDI rendering method, may be customized during session setup. The format also serves as a mode for the mpeg4-generic format, to support the MPEG 4 Audio Object Types for General MIDI, Downloadable Sounds Level 2, and Structured Audio. This document obsoletes RFC 4695.

2006

Lazzaro, J. P., Wawrzynek, J. (2006).  RTP Payload Format for MIDI. RFC 4695, Internet Engineering Task Force (IETF) Proposed Standard Protocol [download].

Abstract. This memo describes a Real-time Transport Protocol (RTP) payload format for the MIDI (Musical Instrument Digital Interface) command language. The format encodes all commands that may legally appear on a MIDI 1.0 DIN cable. The format is suitable for interactive applications (such as network musical performance) and content- delivery applications (such as file streaming). The format may be used over unicast and multicast UDP and TCP, and it defines tools for graceful recovery from packet loss. Stream behavior, including the MIDI rendering method, may be customized during session setup. The format also serves as a mode for the mpeg4-generic format, to support the MPEG 4 Audio Object Types for General MIDI, Downloadable Sounds Level 2, and Structured Audio.

Lazzaro, J. P., Wawrzynek, J. (2006).  An Implementation Guide for MIDI. RFC 4696, Internet Engineering Task Force (IETF) Standards Track (Informational) [download].

Abstract. This memo offers non-normative implementation guidance for the Real- time Protocol (RTP) MIDI (Musical Instrument Digital Interface) payload format. The memo presents its advice in the context of a network musical performance application. In this application two musicians, located in different physical locations, interact over a network to perform as they would if located in the same room. Underlying the performances are RTP MIDI sessions over unicast UDP. Algorithms for sending and receiving recovery journals (the resiliency structure for the payload format) are described in detail. Although the memo focuses on network musical performance, the presented implementation advice is relevant to other RTP MIDI applications.

J. Lazzaro (2006).  Framing RTP and RTCP Packets over Connection-Oriented Transport RFC 4571, Proposed Standard, Internet Engineering Task Force [download].

Abstract. This memo defines a method for framing Real Time Protocol (RTP) and Real Time Control Protocol (RTCP) packets onto connection-oriented transport (such as TCP). The memo also defines how to specify the framing method in a session description.

2004

John Lazzaro and John Wawrzynek (2004). An RTP Payload Format for MIDI. The 117th Convention of the Audio Engineering Society (AES), October 28-31, 2004, San Francisco, CA, USA.

Download aes117.pdf (234KB) versions. Prints out 16 pages on A4 paper.

Abstract. The Real-Time Protocol (RTP) is an extensible transport for sending media streams over Internet Protocol packet networks. We describe a new payload format that extends RTP to transport MIDI (the Musical Instrument Digital Interface command language). The payload format encodes all commands that may legally appear on a MIDI 1.0 DIN cable. The format is suitable for interactive applications (such as the remote operation of musical instruments) and content-delivery applications (such as file streaming). The format defines tools for graceful recovery from packet loss, to support use over lossy unicast and multicast networks (including wireless networks). Stream behavior, including the MIDI rendering method, may be specified during session setup. Rendering methods are specified using the extensible Multipurpose Internet Mail Extensions (MIME) registry.

John Lazzaro and John Wawrzynek (2004). Subtractive Synthesis without Filters. In Audio Anecdotes II, edited by Ken Greenebaum and Ronen Barzel, A. K. Peters.

Download buzz.ps (217KB) buzz.ps.gz (86KB) or buzz.pdf (148KB) versions. Prints out 9 pages on A4 paper.

Abstract. This book chapter describes an efficient implementation of the buzz SAOL core opcode. This method is based on the summation series techniques developed by Moorer and by Winhamand and Steiglitz. The book is aimed at a wide audience; this chapter uses the buzz core opcode as an example to teach basic concepts in subtractive synthesis.

2001

John Lazzaro and John Wawrzynek (2001). A Case for Network Musical Performance. The 11th International Workshop on Network and Operating Systems Support for Digital Audio and Video (NOSSDAV 2001) June 25-26, 2001, Port Jefferson, New York.

Download nossdav01.ps (301KB) nossdav01.ps.gz (136KB) or nossdav01.pdf (202KB) versions. Prints out 10 pages on A4 paper.

Abstract. A Network Musical Performance (NMP) occurs when a group of musicians, located at different physical locations, interact over a network to perform as they would if located in the same room. We present a case for NMP as a practical Internet application, and describe a method to ameliorate the effect of late and lost packets on NMP. We describe an NMP system that combines several existing standards (MIDI, MPEG 4 Structured Audio, RTP/AVP, and SIP) with a new RTP packetization for MIDI performance. We analyze NMP experiments performed on CalREN2 hosts on the UC Berkeley, Stanford, and Caltech campuses.

John Lazzaro and John Wawrzynek (2001). Compiling MPEG 4 Structured Audio into C. Proceedings of the Second IEEE MPEG-4 Workshop and Exhibition (WEMP) June 18-20, 2001, San Jose, CA.

Download wemp01.ps (67KB) wemp01.ps.gz (26KB) or wemp01.pdf (202KB) versions. Prints out 4 pages on A4 paper.

Abstract. Structured Audio (SA) is an MPEG 4 Audio standard for algorithmic sound encoding, using the programming language SAOL. The paper describes a SA decoder, sfront, that translates a SAOL program into a C program, which is then compiled and executed to create audio. Performance data shows a 7.6x to 20.4x speedup compared to the SA reference MPEG decoder.

Copyright 2001 John Lazzaro and John Wawrzynek.