Computer simulation of the orchestra has now reached a new stage: Brute force memory and computing speed are now adequate for high quality work. However, as the quality, variety and size of available sample libraries increase, the main limitations have become musical:

  • lack of skill in selecting and controlling the many available  sounds
  • inadequate  knowledge about acoustic instruments
  • inadequate knowledge about musical performance
  • lack of easy to use, sufficiently refined control interfaces

Attentive listening to the demos given with even the best sound libraries now available shows that most of them suffer from flagrant musical defects, most of which could be avoided with better musical knowledge. The purpose of this little guide is to supply some of the most essential musical knowledge needed to improve the current level of orchestral simulation. (N.B. there will be no discussion here of the human voice: The problems of vocal simulation are not yet well solved, due particularly to the problem of the words, and the much greater variety of range and timbre in human voices than in instruments.)
The Purpose(s) of a Simulation
Not all simulations have the same goal. One may only be made to try out ideas during composition; another may be for showing off one’s work to other musicians. The most advanced simulation aims to sound authentic even to a professional performer of the instrument in question. The kind of attention to detail needed for this latter type would be a waste of time for a composer who just needs a quick sketch. Another way to think of this classification is that it corresponds, at least to some extent, to the varying skill levels of players.  For example, achieving really sensitive control of intonation on a string instrument is a long and arduous job, and therefore the aural sensitivity needed to distinguish between subtleties of intonation is not as common as that required to notice more elementary characteristics, like whether or not the player is using vibrato.  In a sense, therefore, the purpose if this text is to make orchestrators, especially synthesizer orchestrators, more sensitive to the refinements of what real instrumentalists work to achieve during their years of training.  One cannot do a convincing simulation of any instrument without knowing (which includes hearing) refinements in its acoustic performance.
The Musical Issues
Notation and performance
Musical scores are imprecisely notated at best. Even for a score full of performance indications, when played – exactly as written – for example by a computer program – the result is mechanical and uninteresting.  Despite some recent improvements – for example automatically accenting strong beats and peaks, realizing crescendos and diminuendos, applying some rhythmic swing, and moderate randomization overall – automatic computer performance is still only a very rough approximation of what a well trained performer would do. Although no human can equal a computer for sheer velocity, the real goal of advanced performance training is more than just speed: It is to achieve control of the instrument in a musical way.  This requires both subtle musical judgment and refined physical skills.
An instrument is not just a physical, sound producing object. It is an object whose design has evolved over many years to control sound in specific, refined ways. While gross differences in timbre are audible immediately, the subtler nuances make a good live performance genuinely artistic. This is why musical performers spend years mastering the specific expressive possibilities of their individual instruments. This is an important point: The way a clarinetist phrases is quite different from the way an organist phrases.  No one phrasing algorithm can possibly fit all instruments. It must be altered according to what the player of the acoustic instrument can control, and how. Therefore, as pointed out above, to substantially improve computer simulation from its current level requires a hefty dose of instrumental ear training, along with musical insight, and sustained physical practice to master the means of control.
Achieving Realistic Simulations
The main objective
Every performing musician, no matter what his instrument, has the same basic goal: to communicate expressively. Starting from the music’s structure and character, he will use every means available on his instrument to communicate with the listener.  The highest artistic aim, achieved only by the best performers, is to make every controllable detail musically meaningful, reinforcing the music’s character. As with musical composition, an artistic performance requires both a clear overall conception and the coordination of all relevant details into that design in a meaningful way. This requires both musical knowledge (common to all instruments: harmony, counterpoint, form etc.) and physical training (mastery of the skills specific to the instrument). In developing a musical conception, like an actor trying out different ways of speaking his lines, the performer will experiment until he finds the most expressive use of the available resources. (There is also the issue of stylistic conventions: Mozart is not played with the same conventions as Chopin.) Although there is no one single “correct” version, some versions will be definitely better than others.  In all cases however, controllable details of performance left haphazard distract the listener, and weaken the effect.
Most of the performer’s musical goals are aspects of phrasing. A phrase is not a democracy: Not all elements are equally important. The performer must place the elements of the phrase in proper relation to one another, and in the context of the whole piece. The performer will therefore explore:

  • punctuation: finding musically meaningful subdivisions (i.e. letting the music breathe)  to allow the listener to make sense of the musical flow;
  • emphasis: highlighting certain points in the phrase to bring out important moments. Examples include: a melodic peak, a modulation, a cadence, etc. Depending on the instrument, such highlighting may be achieved by stronger accent (e.g. on the piano), wider vibrato (strings), a subtle rhythmic shifts and differences of articulation (organ), etc.
  • pacing: maintaining clear overall momentum while allowing momentary moments of relaxation where appropriate.  Even within a given basic tempo, a musician always applies a certain mild elasticity of rhythm; rigidity of tempo within the phrase is one of the most obvious defects of most computer simulations. (Some programs allow for metric flexibility, e.g. swing, but the ebb and flow of a whole phrase is much harder to automate. It is best simply performed in real time.)

Choosing the right kind of expressivity
As pointed out above, instruments are expressive in varying ways. Applying vibrato to an piano sound is obviously unidiomatic. But for instruments with many variables, there can be legitimate decisions about which one to use in a given situation.  Should the violinist emphasize a given note with increased bow pressure, or by a faster vibrato, or perhaps a wider vibrato?
How many gradations are meaningful?
Not only must the performer supply the appropriate kind of expressivity, but also the appropriate degree of expressivity. For an instrument to sound really expressive, it must allow for more than just a few, primitive gradations. The question of how many gradations to allow, and how to readily control them is critical to convincing simulations. If there are not enough gradations available, the musical effect will be crude; if too many, control will be needlessly complex. For example, vibrato/non-vibrato is a not a binary, off/on decision. That’s why sampled vibrato, say on a flute, is often a dead giveaway for an amateurish simulation. Real vibrato varies in depth and speed, in real time, according to musical considerations – important notes in the phrase, the character desired, the style of playing. In addition there is a small amount of randomness involved. All acoustic instruments have distinctive attacks; indeed, it has long been known that removing an instrument’s attack often makes its timbre downright unrecognizable. The number of useful gradations, however, varies greatly. A pipe organ allows for little or no variation – either in speed or in loudness – of attack. On the other hand, given that the piano’s main tool for expressive playing is control of key velocity, a good simulated piano can require over 50 dynamic levels per note. (Note that for the piano, speed and force of attack are identical: It is impossible simultaneously to attack very slowly and very loudly.) Wind and string instruments can vary their attacks both in speed and in loudness: at least 6-7 speeds of attack are meaningful, and a lot more for loudness. Although both strings and winds must achieve a certain speed of attack to make the note sound at all, there is still a substantial audible range between a gentle attack and an aggressive one. Finally, and perhaps most importantly, sustained notes in strings and winds are never completely static. The players may aim for smooth playing, but in practice, there is often at least mild variation in loudness, and often individual notes are intentionally swelled or made softer. In any such variation occurring over the course of one note, many degrees will be required – enough to make the changes sound continuous and not discrete.
In general, the more things the performer of a real, physical instrument has to control in real time, the harder a realistic simulation will be: The meaningful coordination of multiple, simultaneous elements in real time  is always a complex skill. This is why a beginner on the cello takes much longer to sound even mildly respectable than a beginner on the piano.  This remains true even when the interface to the instrument is changed. Although the control required may be accomplished differently from that on the acoustic instrument – e.g. a mod wheel instead of breath control – really refined work still requires practice. The good news is that, at least within groups of instruments, control is similar enough that once one learns to successfully play, say, a trumpet sound, playing a horn sound convincingly does not require very different skills. Here are some more detailed comparisons between instruments and their control possibilities.
Harpsichord and organ
The harpsichord has no direct dynamic accent for single notes.  Although the player chooses when to let go of the note, the note decays at a fixed rate, so beyond a certain point, there is no more pitched sound (the harpsichord does create a very characteristic mechanical release sound as the plectrum disengages, but this is not significantly controllable by the player). So, apart from registration – which cannot be changed from note to note – the player’s control is limited entirely to rhythm. Even for instruments which may seem superficially similar, like harpsichord and organ, there are often important distinctions. The organ’s sounds do not die away, and therefore an overlapping legato is annoying. On the harpsichord, such a legato is a useful option, since it allows the player to somewhat camouflage the release sound of one note with the start of another. On the other hand, harpsichord chords are commonly slightly rolled, to make the attack richer and less percussive. On the organ, this is not needed, and indeed it is very rare in practice. The organ also includes a pedalboard, whose mechanics are different from the keyboard, since the player uses two feet instead of ten fingers, which makes alternating passages much easier then scales.
The violinist (or other bowed string player) can control the following elements: which string  most  notes are played on (and the choice of open or stopped notes for the notes corresponding to 3 upper open strings), its basic pitch (intonation), and (especially for some twentieth century repertoire) change of pitch within the note; vibrato around the basic pitch (both width and speed); dynamic shape over the whole note (attack, cresc. and dim. during the note, release), which is intimately related to bowing decisions; pitch movement between notes – portamento – (rate, subdivision, according to which string(s) is (are) chosen). All these can be applied with or without mute. In addition, the violinist can play chords in various ways, can also play pizzicato, col legno, etc. The number of combinations of all these elements is enormous. Controlling a violin sound with sensitivity comparable to that of a good violinist with a different interface (usually a keyboard) is no simple matter!
If we now turn to wind instruments, the most important element in any wind instrument’s natural sound is: breath.  Although players aim at evenness, a wind phrase is naturally shaped by breath.  The feeling is entirely different from keyboard playing, where fingering and arm movements create grouping, and from strings, where more or less symmetrical, alternating bow movements create a physical, rhythmic sense. In addition tonguing controls subtle articulation, rather in the way consonants punctuate sound in speech. A good wind player has many degrees of tongue articulation, ranging from a hard attack to a soft one.  Apart from breathing and articulation, wind players also have some real-time control over intonation. They may also use vibrato to varying degrees; The flute, for example, is virtually always played with vibrato, whereas the other winds vary in their use of vibrato according to the style of the music, and even to some extent the nationality of the player.
Control differences between ensembles and soloists
Things which are very noticeable in a soloist’s performance sometimes lose importance in a group.  For example, a solo string player’s legato is very complex, since it is affected by string choice and bowing technique. 16 violins playing a legato line in unison will always overlap somewhat, since no two will ever attack and release a given note at the same time.  The same is true of vibrato. Likewise, a fast run by a group of instruments in unison is always slightly blurred. Therefore, successfully simulating a section of violins requires somewhat different priorities and control methods than simulating a soloist. In a real orchestra the result of extended divisi is less instruments per note. For example, a section of 16 violins divided into 4, gives 4 violins per note. Composers often use such substantial divisi to create a thinner, more transparent sound.  Simply playing the chord using samples of a 16 note string section gives a total of 64 (4×16) notes, not at all the same sound.  By the same token, when a string section does play chords (multiple stops), the attacks of the notes are never together, either for the individual instruments or within the whole group. This creates an unusually rich attack. When real instruments play together in unison, they are never totally in tune. Further, the intonation evolves subtly even within a sustained note, as the players try to adjust to each other.  In the same manner, there is a constant subtle balancing of loudness and tone going on in a good orchestra, where the players are always listening to each other, and the conductor makes continuous adjustments. Such details may sometimes sound mildly random, but usually they tend towards more refined intonation, tone or balance.
Special Note: Dynamics
For sampled instruments, dynamics can create special problems. Acoustic instruments each have a natural dynamic curve, which varies according to register. For example, the flute is always softer in its low register than in its high register. While the specifics vary for each instrument, it is important to realize that recording technology can have the effect of eliminating or minimizing the normal dynamic differences between instrumental registers.
Potential for the Future
Apart from currently evolving improvements in control interfaces, there are also some situations where synthetic instruments can actually improve on acoustic instruments (and not just for cost reasons) e.g.

  • making the harp genuinely chromatic
  • making the guitar more fully polyphonic.
  • playing full polyphony on percussion instruments
  • allowing figuration which would be impossible or extremely awkward on the acoustic instrument. This last point is somewhat contentious, as one could argue that the kind of arpeggio which is idiomatic to keyboard instrument sounds wrong on the violin. However if used in a musically convincing way, there seems no good reason to avoid such things. And in fact this sort of borrowing from other instruments is nothing new in music history: Bach’s vocal writing is often clearly influenced by string instrument technique (e.g. in the B minor Mass, the Kyrie theme is derived from typical string crossing figures).

This last example points to an interesting question: What are the limits to changing the control interface of a musical instrument in this way? The possibilities are intriguing, as long as the changes make possible more expressive playing.