Music 209 Week 2: Splicing

January 26, 2006.

Evaluating the quality of a candidate concatenation, by comparing pitch, loudness, spectrum, tempo, and volume across the concatenation. Methods for performing splices in the time domain and the spectral domain (morphing). Guest appearance by Eric Lindemann of Synful.

Links from Lecture

At the start of the lecture, we introduced the flowchart of a concatenative synthesis system. This work was derived from Diemo Schwarz's work, referenced in Lecture 1.

We then introduced transparency splicing, by discussing classical techniques for hardware samplers. The tutorial discussion in this part of the lecture was adapted from this tutorial from Harmony Central, and this review on the Tweakheadz.net website.

We then discussed crossfading as a technique for doing a transparent splice. This article discusses the art and science of crossfade splices.

We then moved on to non-transparent splices. To introduce the topic, we discussed the Roland D-50 synthesis system. This article is a good introduction to the D-50. The D-50 sounds we played in class can be auditioned here.

We ended the first part of the class with a discussion of the Roger Dannenberg trumpet synthesis system that follows on from the D-50 synthesizer architecture, and solves the fusion problem in a more modern way. The sound examples from class may be heard here, and the key paper on this work may be downloaded here.

We noted how the Dannenberg system naturally leads to a discussion of spectrally morphing two samples (his system morphs a sample into a synthesis engine). This product is a good example of a stand-alone tool for spectral morphing.

The rest of the lecture featured a guest appearance by Eric Lindemann of Synful. His company website is a good place to look for further information on the topics he discussed in class.

The text below are notes we made in preparation of the lecture slides. They may be helpful in reviewing the slides.

Introduction

The term concatenative synthesis implies combining separate audio recordings (hereafter called samples) to produce a composite output. These recordings might be notes of a single monophonic instrument, or the sound of a thunderstorm, or the sound of the Beatles song Love Me Do -- anything.

At the highest level of abstraction, these samples can be combined in two ways:

Mixing. Two or more samples that represent separate audio objects are composited by linear mixdown. Example: a drum machine that takes recordings of individual drum sounds (bass drum, snare drum) and mixes them down to a stereo output pair. This technology is well known, and will not be covered further here.
Splicing Two samples are played back-to-back, in some general sense, with the goal of creating the illusion of a unitary audio experience, in some sense. Splicing is the focus of the lecture.

We begin with the technically simplest kind of splicing: the case where we place sample A next to sample B, and no click occurs because sample A falls to silence at its end, and sample B begins with silence (defined here as samples of value 0.0).

Technically, this is easy: play A, then play B, and perhaps shrink or grow the silence region to fit the needs of an application. An example of a system that concatenates in this manner, while keeping a unitary percept, is Liquid Saxophone. The samples are complete phrases of saxophone solos -- the necessity for wind players to breath means that these phrases begin and end with silence.

Liquid Saxophone achieves a unitary percept when its user chooses A and B so that it sounds like two phrases a real saxophonist would play back to back -- this selection would include picking phrases whose notes follow well, and whose playing volume and timbres are compatible when heard back to back. This level of "making a good splice" is a topic for later on in the course -- today, we focus on making abutments of non-silence-ending audio samples create the illusion of a unitary audio experience.

Monophonic Instrument Voices

We start by thinking of the simple case: a monophonic instrument, like a trumpet or a saxophone. In general, when sample A and B are joined to create the illusion of the instrument playing through the join, we have one of these goals in mind:

Transparency. In this case, we choose the end of sample A and the beginning of sample B to have sound properties as close as possible (example: both are trumpets playing a sustained tone at the same pitch, brightness, and loudness), and our goal is to have the splice produce a continuous sound.
Fusion. In this case, the end of sample A and the beginning of sample B are qualitatively different sounds the instrument can make (example: sample A is a 100 ms burst of a trumpet onset sound, sample B is the sustained tone for a trumpet). Our goal is to achieve "fusion" across the splice: listeners will hear the sound quality has changed (as they should), but it should sound "real", not like an artifice.

In the sections that follow, we discuss transparent transitions. Earlier in this webpage we provided pointers to discussion on fusion.

Choosing Transparent Transitions

In this lecture, we assume that we choose sample A from the database (example: sample A is a sustained E-flat note from a saxophone). We assume that we have some criteria for choosing a sample B to abut to it (example: we want a sustained E-flat that then smoothly segways to a D-flat).

We assume we have some search mechanism for going through the database to find the sample we want: however, this search mechanism needs "metrics" so it can compare the sounds at the end of sample A and the beginning of sample B, so that playing one after the other will yield the same percept on either side of the transition region (we later talk about how to do artifact-free "splices"). We also assume that a near-perfect match is available somewhere in the database -- all we need are metrics to find it.

Given these assumptions, we need:

Metrics for comparing the loudness of two sample regions.
Metrics for comparing the spectral shape of two sample regions.
Metrics for comparing the pitch of two sample regions.

Once we have these simple metrics, we can use them to build a complete metric for transparency. This complete metric will probably look use local changes in each property in addition to absolute value (examples of phenomena that need change detection: pitch-bend, vibrato, amplitude envelopes)

Linear Splicing

Once we've selected two samples to splice to make a transparent transition, all that's left to do is the splice itself. This article discusses the art and science of cross-fading splices.

Phase Bashing

An alternative to doing a linear splice is to use phase bashing. We will discuss this technique in detail later in the semester.

Questions on this web page? Contact: john [dot] lazzaro [at] gmail [dot] com