From The MPEG-4 Structured Audio Book by John Lazzaro and John Wawrzynek.

Part IV/3: Signal Processing Core Opcodes

Sections

Core Opcodes:

balance compressor decimate downsamp fft gain ifft port rms samphold sblock upsamp

Wavetable Generator:

window

Introduction

In this chapter, we complete our description of the core opcode library.

We describe opcodes that perform signal processing operations on a buffer of a-rate signal values. These opcodes perform operations such as gain control, sample-rate conversion, and Fourier analysis on the buffer.

We describe the specialop semantics that govern several of the opcodes described in the chapter. A specialop opcode computes at the a-rate, but returns values at the k-rate.

We describe the core wavetable generator window, that computes popular windowing functions used in block-based signal processing. Several of the opcodes in this chapter have wavetable parameters that window buffer data.

 

Level Matching

The a-rate core opcodes gain and balance act as simple automatic gain control systems. See the right panel for header syntax.

On each call, these opcodes return the scaled copy a*x of the input signal parameter x, where a is an internal variable that sets the attenuation of the system.

The attenuation variable is initialized to 1 during the first opcode call, and is updated as specified by the gain control algorithm for each opcode.

gain

The gain opcode returns a signal whose RMS power approximates the power level specified by the parameter g.

To perform this task, gain periodically recalculates the attenuation variable, using a formula (shown on the right panel) that measures the power level of recent values of the signal parameter x.

By default, the attenuation is updated once every control period (the inverse of the k-rate). The optional i-rate parameter length (units of seconds) overrides the default value for the attenuation period.

During the first call to gain, a buffer is created of sufficient size to hold the x values for an entire attenuation period, and the current x value is placed at the start of the buffer. Subsequent calls to gain fill successive positions in the buffer.

On the gain call that fills the buffer, a new attenuation value is computed, using the equation shown on the right panel. The buffer is cleared, and future calls to gain refill the buffer in preparation for the next attenuation update.

balance

The balance opcode returns a scaled copy of the parameter x. The returned signal has an RMS power level that approximates the power of the signal parameter ref.

To achieve this behavior, the opcode creates two buffers, to hold recent values of ref and x. The opcode periodically updates the attenuation parameter, to reflect the energy of the signals in the two buffers.

The control period (1/k_rate) sets the default length of balance's buffers, which may be overridden by the optional i-rate parameter length.

During the first call to balance, buffers are created for ref and x parameters, and the current values for ref and x are placed at the start of the buffers. Subsequent calls fill successive positions in the ref and x buffers.

On the opcode call that fills the buffers, a new attenuation value is computed, using the equation shown on the right panel. The buffers are cleared, and future calls to balance refill the buffers in preparation for the next attenuation update.

gain

aopcode gain(asig x, ksig g
             [, ivar length]) 


on every call, return a*x.

on first call:

    set internal variable a 
    to 1, and create buffer
    xh. the optional parameter
    length (units of seconds)
    sets the buffer size. if
    this parameter is not given,
    set length to the 1/k_rate.

    xh contains 

    L = floor(length*s_rate)

    samples. insert x into xh[0].

on subsequent calls:

    put x into next position
    in xh. if xh is filled,
    compute new value of a:

             g*sqrt(L)
a = -------------------------------
    sqrt(xh[0]^2 + ... + xh[L-1]^2)

    this completes one cycle of
    the algorithm. on the next
    call, insert x into xh[0],
    starting the next cycle.

balance

aopcode balance(asig x,
                asig ref
             [, ivar length]) 


on every call, return a*x.

on first call:

    set internal variable a
    to 1. create buffers xh
    and rh. optional parameter
    length (units of seconds)
    sets the buffer sizes. if
    this parameter is not given,
    set length to the 1/k_rate.

    xh and rh contains 

    L = floor(length*s_rate)

    samples. insert x into xh[0] 
    and ref into rh[0].

on subsequent calls:

    put x into next position
    in xh, and ref into next
    position of rh. if buffers
    filled, compute the new
    value of a:


    sqrt(rh[0]^2 + ... + rh[L-1]^2)
a = -------------------------------
    sqrt(xh[0]^2 + ... + xh[L-1]^2)

    this completes one cycle of
    the algorithm. on the next
    call, insert x into xh[0],
    and ref into rh[0], starting
    the next cycle.

Specialops

The gain opcode, if called without the optional length parameter, fills its buffer by accepting new x values with each a-rate call, and computes the signal energy of its buffer once per k-rate.

The rms opcode also performs this function, but returns the signal energy as its k-rate return value. See the right panel for the header syntax and exact semantics for the rms opcode.

The rms opcode is an example of a SAOL specialop opcode, which has aspects of both aopcode and kopcode semantics.

Like an a-rate opcode, the rms opcode runs at the a-rate in order to fill the buffer. But like a k-rate opcode, it also runs at k-rate, and returns a k-rate value.

Specialop calls may only appear in instrument code, and in aopcode user-defined opcodes (described in Part IV). The rules below set the semantics of specialop opcodes:

  1. A specialop returns values at the k-rate. For the purpose of evaluating the rate of expressions, a specialop is considered to be a kopcode.
  2. A specialop is evaluated at both the a-rate and the k-rate. However, the expression returns a value, and the statement containing it executes, at the k-rate.
  3. A specialop may appear in an a-rate statement. If so, its k-rate return semantics work in the same way as a normal k-rate opcode call.

Specialop calls may only appear in instrument code, and in aopcode user-defined opcodes (described in Part IV/4). Specialop calls are also restricted in these ways:

  1. An expression containing a specialop opcode is considered a specialop expression.
  2. A specialop expression may not appear in the code block or guard expression of a while statement.
  3. A specialop expression may only appear in the code block of an if or if-else statement if the guard expression of the statement is also specialop.

The right panel shows several examples of specialop semantics, using the rms opcode.

rms

specialop rms(asig x,
            [, ivar length]) 

as a specialop, it runs at the
a-rate and k-rate, but only 
returns values at the k-rate.

k-rate, first call:

   create the buffer xh, and
   initialize values to zero.
   the optional parameter length
   (units of seconds, must be > 0)
   sets the buffer size. if this
   parameter is not given, set 
   length to the 1/k_rate. 

   xh contains:

   L = floor(length*s_rate)

   samples. create buffer index,
   set it to zero (first element).


k-rate, all calls:

   return the value

sqrt(xh[0]^2 + ... + xh[L-1]^2)
-------------------------------
           sqrt(L)


a-rate, all calls:

   place the x value into the 
   buffer xh, at the position
   of the buffer index. then
   increment buffer index. if
   index has value L, reset
   the index to 0.

Examples

asig x;
ksig k;

// legal, rms runs at a-rate
// and k-rate, but returns
// a value at k-rate that is
// assigned to y.

y = rms(x);

// legal, both rms run at a-rate
// and k-rate, if condition is
// true at k-rate, assignment
// is made. rms in assignment
// returns same value as rms in
// conditional 

if (rms(x) > y)
 {
   y = rms(x);
 }

Sample Rate Conversion

The rms opcode converts the information carried by an a-rate parameter to a k-rate return value. In this sense, it performs a type of sample-rate conversion.

In this section, we describe other core opcodes that perform sample-rate conversion.

Downsampling Opcodes

Three other simple opcodes make a-rate signal information available at the k-rate. These opcodes are all specialop opcodes.

The decimate opcode returns (during its k-pass) one of the in parameter values that it received in the preceding set of a-pass calls. The opcode definition does not specify which in value is chosen.

The downsamp opcode buffers the in values of the last s_rate/k_rate opcode calls at the a-rate. At the k-rate call following the a-rate calls, it returns the mean of the buffer.

If the downsamp call includes the optional table parameter win, the wavetable values are multiplied with the buffer values point by point, and opcode returns the sum of of all multiplication results. If the win table is shorter than the buffer, zeros are used for the extra window values.

The sblock opcode buffers in values of the last s_rate/k_rate opcode calls at the a-rate. During the k-rate call, it places these buffer values in the table provided by parameter t, which must have at least s_rate/k_rate table values. The opcode always returns zero.

Upsampling Opcodes

The simplest way to upsample control information to the audio rate is to assign a k-rate value to an a-rate variable. The upsamp and samphold core opcodes offer more sophisticated methods of upsampling.

The upsamp opcode upsamples the k-rate parameter in to a-rate via a shift-and-add technique. An optional table parameter win controls the spectral properties of the upsampling. The upsamp opcode reduces the aliasing artifacts produced by assigning k-rate values to a-rate variables directly. See the right panel for a complete explanation of this opcode.

The polymorphic samphold opcode performs a sample-and-hold operation on the polymorphic input parameter in, under the control of the k-rate parameter gate. It acts as an upsampling system if the in parameter is a-rate.

The samphold opcode returns the value of an internal state variable, that is initialized to zero at the start of the first call to the opcode. If the gate parameter is non-zero, the internal state variable is updated to the value of the in parameter.

Downsampling Opcodes


specialop decimate(asig in)

specialop downsamp(asig in
                   [,table win])

specialop sblock(asig in,
		 table t)

see left panel for algorithms.

Upsampling Opcodes


opcode samphold(xsig in, 
                ksig gate)

see left panel for algorithm.


asig upsamp(ksig in
            [,table win])

This opcode upsamples the
k-rate in parameter to
a-rate, using a smoothing
buffer. In the interesting
case, the buffer size is
the size of the table win,
and is several times greater
than a_rate/k_rate in length.
On the first call to upsamp, 
the buffer buf[] is created, 
and initialized to zeros.

On the first a-pass call to
upsamp in a given execution
cycle, the contents of buf[]
are shifted forward by 
a_rate/k_rate samples. The last
a_rate/k_rate buff[] values are
set to zero. Then, all buf[]
values is updated using this
formula:

buf[i] = buf[i] + 
         input*win[i]

This first a-pass call returns
buf[0]; future a-pass calls in
the execution cycle return buf[1],
buf[2], ...

If the win table has fewer than
a_rate/k_rate elements, the buf[]
has a size a_rate/k_rate, and zeros
are used for the extra win values in
the formula.

If no win table is provided, a win of
size a_rate/k_rate is used, with all
samples of value 1. The buf[] is also
a_rate/k_rate.

Window Wavetables

Several of the opcodes in the previous section let the programmer specify a windowing function as a wavetable of window values.

The core wavetable generator window simplifies the creation of windowing wavetables. The right panel shows the declaration syntax and algorithm for this wavetable generator.

The size parameter sets the number of samples in the window table, and must be greater than zero. The type parameter is an integer that sets the window type.

The window generator produces six window types.

  1. Hamming window.
  2. Hanning window.
  3. Bartlett window.
  4. Gaussian window.
  5. Kaiser window.
  6. Boxcar window

The numbering of the list indicates the value of the type parameter that produces the associated window shape.

The Kaiser window algorithm creates a family of windows, controlled by the optional parameter p.

window

table t(window, size, type[,p]);


Type parameter is an integer that
codes the window shape. Listing
below shows algorithm, for samples
that lie in range 0 <= x <= size-1.


[1] Hamming window. 

0.54 - 0.46*cos(2*pi*x/(size-1))

[2] Hanning window.

0.54*(1 - cos(2*pi*x/(size-1)))

[3] Bartlett window (triangle).

     2*fabs(x - ((size-1)/2))
1 -  ------------------------
            (size-1)

[4] Gaussian window:

        exp(-((m-x)^2)/a)

  where

  m = size/2   a = (size*size)/18

[5] Kaiser window

   a = (size-1)/2

   Io[p*sqrt(a^2 - (x-a)^2)]
   -------------------------
           Io[p*a]

[6] Boxcar window -- all table
    values are 1.


Slib defines the constants
WINDOW_HAMMING, WINDOW_HANNING,
WINDOW_BARTLETT, WINDOW_GAUSSIAN,
WINDOW_KAISER, and WINDOW_BOXCAR
to use as the type parameter in 
the window wavetable generator.

Gain Control

The compressor opcode implements a complete gain control system. The opcode may be configured to perform gain control functions such as compression, expansion, noise-gating, and limiting. The right panel shows the header syntax and algorithm for this opcode.

The opcode returns a scaled version of the a-rate input signal parameter x, with a latency of set by the parameter look. The scaling depends on the loudness of the a-rate signal parameter comp. For most uses, comp and x are set to the same value.

The opcode measures the loudness of comp, expressed in terms of decibels (dB), and changes the scaling of x in response to this loudness. The loudness is not computed as an instantaneous value, but by evaluating the signal over a short analysis window (set by the parameter look).

In this scale, 90 dB corresponds to a signal with a peak waveform value of 1, 70 dB corresponds to a signal with a peak waveform value of 0.1, etc. The noise floor of the system is set by the k-rate parameter nfloor, in units of dB.

The parameters att and rel set the attack and release times (in seconds) for the loudness measurement of comp. Short attack and release times let the loudness track quick signal transients, while longer attack and release times result in a smoother loudness estimate.

Given the loudness measurement of comp, the opcode calculates the scaling factor for the delayed version of x using the table shown on the right panel. The k-rate parameters nfloor, thresh, loknee, hiknee, and ratio control this scaling. All of these parameters have units of dB.

Noise gating

The nfloor and thresh parameters control noise gating. If the loudness of comp is above thresh, the noise gate is open, and the opcode returns a delayed replica of the x signal. If the loudness of comp is below nfloor, the noise gate is closed, and the opcode returns zero. Non-normative interpolation occurs in the transition regime between nfloor and thresh.

To turn off noise gating, both nfloor and thresh should be set to noise floor of this system (for most applications, a value of -40 dB yields good results).

Compression/expansion

If the loudness of comp is above hiknee, the opcode acts as a compressor or expander. The value of ratio determines the exact behavior in this regime. If the loudness of comp increases by ratio dB, the opcode returns a delayed version of x whose loudness has increased by 1 dB. Thus, ratio values greater than 1 dB result in compression, and ratio values between 0 and 1 dB result in expansion. Negative ratio values are prohibited.

If the loudness of comp is below loknee, the opcode performs as a "wire with latency", returning a replica of parameter x delayed by the analysis window time look. Non-normative interpolation occurs in the transition regime between loknee and hiknee.

Cross-signal effects

By choosing the comp signal to be different than the x signal, the opcode produces a version of the x signal whose dynamics are shaped by the comp signal.

compressor


aopcode compressor(asig x, 
        asig comp, ksig nfloor,
        ksig thresh, ksig loknee,
        ksig hiknee, ksig ratio,
        ksig att, ksig rel,
        ivar look)


The compressor opcode delays
the signal parameter x for 
look seconds, and returns
the delayed value after
weighting it by R.

R is determined by measuring
the dB level of the signal
parameter comp, as shown
by the table. The parameters
nfloor, thresh, loknee, and
hiknee, and ratio are all in
units of dB (90 dB corresponds
to a signal amplitude of 1.0).

 comp (dB) |     R
-------------------------
less than  |     0
nfloor     | (noise gate:
           |    closed)
-------------------------
between    |  0 < R < 1   
nfloor and | (noise gate:
thresh     |  transition)
-------------------------
between    |     1
thresh and | (noise gate:
loknee     |     open)
-------------------------
between    |  transition
loknee and |    regime
hiknee     |
-------------------------
greater    | R is set so  
than       | that a ratio
hiknee     | dB increase
           | in comp
           | yields a 1 
           | dB increase
           | in x.
-------------------------

given that:

nfloor <= thresh 
thresh <= loknee
loknee <= hiknee
ratio  >  0

If ratio is < 1 dB, the 
opcode acts as an expander.
If ratio > 1 dB, the opcode
it acts as a compressor.

To compute comp dB value, 
the opcode keeps a buffer of
instantaneous dB values of 
the comp signal, using the
equation:

90 + 20*log_10(abs(comp))

This buffer length is set by
the parameter look. The comp dB
signal is computed by extrapoling
signal trends in this buffer, 
under the guidance of the 
attack and release times of
the opcode, set by parameters
att and rel (which have units of
seconds). Short att and rel values
produce in quick changes in R,
longer att and rel produce slower
changes in R.

Fourier Analysis

The fft opcode computes a windowed and overlapped complex-valued Discrete Fourier Transform (DFT) on the a-rate parameter signal in. It stores the results in the wavetables re and im.

The complementary opcode ifft computes a windowed and overlapped Inverse Discrete Fourier Transform on the wavetable pair re and im, and returns samples of the resulting audio waveform.

These opcodes are designed to be used together to implement sound synthesis algorithms that use spectral modification techniques. If a boxcar window is used for both fft and ifft, an fft-ifft pair has unity gain. See the right panel for the header syntax of fft and ifft.

fft

The fft opcode is a specialop, that executes at the a-rate and k-rate, but returns a value at the k-rate.

The fft opcode returns a 1 if a new DFT has been calculated since the last k-pass, and 0 otherwise. If a new DFT has been computed, the real components are placed in the wavetable parameter re, and the imaginary components are placed in the wavetable parameter im.

The optional parameters len, shift, and size control the operation of the fft opcode.

The len parameter sets the size of the holding buffer for new audio samples. In most cases, len is also the size of the DFT.

The shift parameter controls the number of audio samples to add to the holding buffer before computing a new DFT. For a simple, non-overlapped DFT, shift is set to the same value as len. For an overlapped DFT, shift is set to a value smaller than len. For example, if len is 1024 and shift is 128, the opcode computes a new 1024 DFT every 128 samples.

On the first call to fft, a buffer hbuf of size len is created and zeroed, and the in parameter is placed in position hbuf[len - shift].

Subsequent calls fill hbuf[len - shift + 1], hbuf[len - shift + 2] ... until the buffer is filled, and then the DFT computation begins. The optional size parameter may be used to set the DFT size; if size is not used, the len parameter is used. The DFT size may be no larger than 8192, and must be a power of 2.

The table win may be supplied to window the audio samples prior to computing the DFT. If it is not supplied, a boxcar window is used. When hbuf is filled for the first time, a buffer new with size values is created. Each buffer variable new[i] takes the value win[i]*hbuf[i]. If size is greater than len, the extra values of new[i] are set to zero.

Once new is filled, a DFT is performed on the buffer, and the real and imaginary results placed in the wavetables re and im respectively, which must be able to hold size values. The first position in each table holds the DC DFT value, the size/2 position holds the Nyquist frequency coefficient value, and the positions after size/2 hold values that code the reflection of the spectrum above the Nyquist frequency.

The shift parameter controls the data overlap between successive DFT calculations. After the first DFT is computed, the hbuf buffer is shifted forward by shift values. The shift spaces at the end of the buffer are the place where future calls to fft place in values. Once the hbuf buffer is refilled, the new is refilled, and a new DFT is performed.

The right panel describes the default values and legal ranges for the fft parameters len, shift, size, and win.

ifft

The ifft opcode runs at a-rate, and returns audio samples created from the complex DFT values in the re and im tables. The opcode assumes these tables are in the format created by the fft opcode.

The optional parameters len, shift, and size control the operation of the ifft opcode.

The len parameter sets the size of the holding buffer for output audio samples. Since in most cases len is also the size of the IDFT, the size parameter defaults to len. The IDFT size may be no larger than 8192, and must be a power of 2.

During the first call to ifft, the opcode computes the IDFT of the re and im tables. If re and im are greater than size, only the first size elements of the wavetables are used to compute the IDFT.

The first len components of the IDFT result are multiplied point-by-point by the windowing table win, and placed in an output buffer out of length len.

The value out[0] is returned on this first call, the next call returns out[1], etc. Each sample is scaled by shift/len, so that an fft-ifft pair using boxcar windows has unity signal gain.

On the call where out[shift-1] is returned, the next IDFT is calculated, in the following way.

The contents of the out buffer are shifted forward shift elements, and the last shift values of out are set to zero. A new IDFT is computed, and the first len components of the result are multiplied point-by-point with the win table, and added into the out buffer. Values from out[0] to out[shift-1] are returned as described above, and the cycle repeats.

The right panel describes the default values and legal ranges for the ifft parameters len, shift, size, and win.

Example

The right panel shows a simple example, using fft and ifft together in a simple spectral modification algorithm.

FFT and IFFT


specialop fft(asig in,
              table re,
              table im [,
              ivar len,
              ivar shift,
              ivar size,
              table win])

See right panel for algorithm
details. Characteristics of
parameters described below.

in: audio input signal that
is processed by the opcode.

re: table that holds the real
portion of the DFT. Must have
at least size samples.

im: table holds the imaginary
portion of the DFT. Must have
at least size samples.

len: optional parameter that
sets the number of samples to
use. may not be negative. if
zero or not provided, it is
the next power of two greater
than a_rate/k_rate.

shift: optional parameter that
sets the shift amount of the
analysis window. may not be
negative. if not provided or
zero, set to len.

size: optional parameter that
sets the DFT size. may not
be negative. if zero, set to
len. must be a power of 2,
and no greater than 8192.

win: windowing table for
analysis. if not provided,
a boxcar of length len. 
may not have fewer than
len samples.


aopcode ifft(table re,
             table im [,
             ivar len,
             ivar shift,
             ivar size,
             table win])

See right panel for algorithm
details. Descriptions of
parameter limits for fft also
hold for ifft.

Example


// hanning window table

table win(window, 1024, 2); 

// space for fft

table re(empty, 1024);
table im(empty, 1024);
table re_m(empty, 1024);
table im_m(empty, 1024);

// signal new fft done

ksig flag;

// signal to process

asig in;




flag = fft(in, re, im, 1024, 128,
	   1024, win);

if (flag)
 {
  // modify re and im here
  // put results in re_m and im_m
 }

output(ifft(re_m, im_m, 1024, 128,
	   1024, win));

Portamento

The core opcode port is a k-rate filter, that converts a step transition of the k-rate parameter ctrl into a smooth transition with an exponential trajectory. When applied to a pitch control signal (in Hertz), it confers a portamento effect on pitch changes.

The right panel shows the header syntax and algorithm for the port opcode. A k-rate parameter htime sets the time that the output signal traverses one half of its total excursion.

This section concludes our descriptions of the SAOL core opcode library. In the final chapter in this section, we describe how users may write new opcodes in SAOL.

Next section: Part IV/4: User-Defined Opcodes

port


kopcode port(ksig ctrl,
             ksig htime)


ctrl: input k-rate signal
       to be filtered.

htime: half-transition
       time, in seconds.
       one half of the
       time for the return
       value of port to 
       reflect a step change
       in ctrl.

port returns the value:

o + (n - o)*(1 - 2^(t/htime))

where o is the old value of
ctrl and n is the new value
of ctrl. o and n are updated
whenever ctrl and n are not
equal (o = n, n = ctrl).

t is set to zero at each
ctrl transition, and incremented
by the 1/k_rate on each call.

on first call, both o and n
are set to ctrl.

Copyright 1999 John Lazzaro and John Wawrzynek.