Lectures

Composing music with the idea of a sound recording

15 November 2022. International Musicology Symposium. Academy of Music in Ljubljana.

Dr. Žiga Stanič

Summary

At the beginning of the previous century, listening to live performances was the only means of perceiving music. As the present article notes, however, the rise of mass media has resulted in live concert events becoming a relatively small market niche. Internet platforms deliver a range of musical recordings to billions of people, including recordings that were difficult to access in the 20th century due to the limited distribution possibilities of physical data carriers.

Today, newly composed contemporary music usually gets very few live public performances, which is why audio recordings are so important to composers. These recordings extend their life in the context of specialised radio programmes or in the digital environment of online music offerings. It is therefore extremely important that recordings of newly composed contemporary music are made in a way that is acceptable to the composer’s aesthetic idea, while at the same time meeting technological-acoustic requirements. 

After post production, the final recording of the musical performance is no longer the same as the performance on stage at the time of recording. Furthermore, performing in front of an audience is no longer necessary for the musical work to come to life. It is at this point that the exchange of roles between technology and performing virtues begins, with the preference of the music producer often being more decisive than that of the conductor. Moreover, the power of advertising and marketing rises above the primary aesthetic power of the newly created musical composition, whose final product, the musical recording, can vary greatly depending on the level of technical processing. The lecture summarises the process of music production from the original musical idea to the distribution of the final product.

Live Music and the Role of Recording in the Modern Era. At the beginning of the last century, live music was the only way to experience musical performances. However, with the rise of mass media, it lost its dominance in the music market. Online platforms now allow billions of people to listen to musical works, including those that were once difficult to access due to the limited geographical distribution of vinyl records, cassettes, and CDs. Compared to the frequent performances of classical music’s established repertoire, newly composed serious art music is performed far less often, usually only at its premiere. As a result, it is typically recorded, and these recordings continue their existence through specialized radio programs or digital music streaming platforms.

For the presentation of music across all genres, it is crucial today that recordings are made in a way that aligns with both the composer’s aesthetic vision and the technological-acoustic requirements. Before the advent of recording technology, composers had to strictly adhere to natural acoustic principles to ensure their musical ideas were clearly conveyed to the audience. They had to be mindful of the natural balance of volume among instruments; otherwise, details of their compositions would be lost. Today, most popular music creators take into account the possibilities of microphone technology, multi-channel mixing, sound shaping, mastering, and even the correction of performance errors during the creative process. Even the quietest instrument can be amplified and positioned as the dominant element in the final sound mix.

Although composers of art music were pioneers in electroacoustic music, modern technologies have not yet been widely embraced by composers of acoustic art music. There remains an untapped niche in this creative process, particularly in the use of music production technologies. Nevertheless, it is already evident that the final version of a recorded piece—whether in popular or art music—rarely sounds exactly as it did during the live recording session. For a musical work to thrive in the market, a live performance before an audience is no longer necessary. This largely depends on the genre of recorded music. While classical music recordings adhere to established aesthetic criteria rooted in traditional acoustic settings, popular music follows more complex standards for sound capture and processing. These trends are gradually, yet steadily, influencing the production of serious art music as well. At this intersection, the roles of technology and musical performance skills begin to shift—music producers gain prominence over conductors, and the power of advertising and marketing often surpasses the intrinsic aesthetic impact of newly composed works. The final product—a recorded piece—can vary significantly depending on the extent of technical processing. For music intended exclusively for audio recordings, the role of production and post-production is crucial. Technology allows for radical modifications of a performance: tempo and dynamics can be altered, errors corrected, tonal colors adjusted, volume balances modified, reverberation added, and many other acoustic parameters fine-tuned. In this regard, even the function of the conductor—who traditionally serves as the artistic leader responsible for controlling these performance and acoustic aspects—is somewhat diminished.

There are now more tools than ever for sound processing. In the early days of radio broadcasting, it was nearly impossible to manipulate sound without making the intervention obvious to listeners. However, with the development of magnetic tape recording, the technique of cutting and splicing together the best parts of a performance (audio editing) became more refined. The digital era has vastly expanded the complexity of sound processing, leading to extended post-production times. The degree of post-production depends on both the technical and performance quality of the recorded material available for editing. The standard for audio production continues to rise, as sound processing becomes increasingly comprehensive and sophisticated. Experience in the field of music production shows that more and more musicians are becoming aware of and utilizing these possibilities. Even recordings of live concerts—once considered unique and unalterable events—are now frequently post-produced, with significant modifications made to both performance quality and acoustic structure.

To better illustrate the process of recording and post-processing a musical recording, let’s draw a comparison between music and film production. In film editing, the average viewer understands, thanks to visual images, that a film is composed of short shots and sequences. They are aware that these shots were filmed multiple times and that the director selected and assembled the most suitable material into a cohesive whole. Similarly, in the creation of studio music recordings, segmented production processes are applied. However, for the average listener, these steps are not necessarily self-evident, as sound information is integrated into the final recording in a way that is less perceptible. Just as individual pieces of a film are assembled, in studio music production, segments of a musical work are joined together—musical phrases defined by the score and timing parameters, or even corrections of specific passages or individual notes. Thus, the composition is built from shorter segments throughout its timeline, which must be seamlessly combined so that the cuts are imperceptible, ensuring that the piece sounds like a unified, flawlessly performed whole.

A completed musical product released on the market is typically in stereo format, meaning that the sound is encoded on two channels—left and right—designed for playback on two speakers, one for each ear. This allows the listener to perceive the sound panorama (the location of each sound source—whether it is more to the left or right, closer or farther, etc.). While film footage is traditionally created using a single lens that captures a moment in time, music recording is significantly more complex, as it typically involves capturing at least a stereo image (using a minimum of two microphones), but most often, multiple microphones are used. By positioning these microphones in a space—taking into account factors such as directionality and distance from the sound sources—we shape the tonal color and spatial image of the recording. The captured signals are further processed on a mixing console, where the volume of individual microphone inputs is balanced into a coherent whole, and sound color is manipulated using overtone frequency spectrum adjustments, artificial reverb, and other effects. The recorded sound can then be mixed into a stereo track or retained in multichannel form, allowing for further refinement using computer software and/or a mixing console. In this case, compared to film editing, the processing of musical recordings not only involves a temporal aspect but also a complex multichannel structure, which sound engineers and music producers must skillfully manage by ear.

As technological standards rise, new reference recordings constantly emerge, serving as benchmarks for composers and performers who strive to achieve the highest possible audiophile and performance quality. Within the framework of multichannel editing, an unwanted voice or instrument can be easily removed from the overall sound mix and replaced. However, for this to be possible, the sound material must be acoustically well-isolated during recording, ensuring that each track contains only the intended musical content. If a microphone inadvertently picks up other instruments (e.g., in classical recordings of a symphony orchestra, a clarinet microphone might also capture nearby musicians), then simply deleting the track is not an option. Microphone-isolated studio recordings are most commonly used in film music recording, popular music production, and other contexts where individual musicians or groups playing the same musical line can be acoustically separated without logistical challenges. Their synchronized performance is then coordinated using an auditory metronome known as the “click track,” which musicians hear through headphones to prevent it from being picked up by microphones. The click track also allows for overdubbing, making simultaneous ensemble playing unnecessary. During the recording process, layers of audio material are added track by track, gradually building a multichannel, editable whole. Composers and arrangers incorporate this approach into their creative process, as it enables them to produce music that would otherwise be impossible to perform in conventional acoustic settings on a live stage.

Percussion instruments are most frequently acoustically isolated from an ensemble, as their loudness can be picked up by all microphones in the space, compromising the precision of individual instrument recordings. To prevent this, percussion instruments are either separated on stage using acoustic panels or placed in a dedicated recording booth outside the main space. Unwanted recorded material on an individual microphone input, known as bleed, can also come from click tracks, excessive reverb, loud percussion, or other high-volume instruments. In live performances of popular music, stage monitors are used to provide performers with pitch and timing references for synchronized playing. Today, musicians are increasingly aware of these parameters and plan their work in accordance with music production technology, integrating it from the earliest stages of composition. As a result, music production now begins with the compositional process itself, incorporating musical arrangement and recording preparation. Initially associated primarily with popular music, this approach is now being adopted by an increasing number of contemporary classical composers in the 21st century. Due to technological advancements, certain aspects of stage etiquette for large ensembles have also evolved. In the 20th century, absolute silence was required approximately ten seconds before a recording session began, as any disturbances would be permanently embedded in the recording and could not be removed. Today, such silence is mainly required for collective focus, as acoustic disturbances can now be eliminated. If an audio recording contains a noticeable error—such as an incorrect pitch, a timing mistake, an unwanted background noise, or an electronic artifact—it can be corrected or at least sonically minimized using a visual digital audio workstation (DAW) interface. Before the year 2000, such post-production corrections were extremely difficult to execute, requiring time-consuming and costly processes with a low likelihood of successfully resolving the issue.

Many technological inventions that today enable complex post-production modifications of recorded musical material unfortunately result in excessive reliance by performers on the possibility of correcting their flawed or poor performances. Since such considerations were not possible in the 20th century, performers were generally better prepared for recordings, and the level of musicianship had to be uncompromisingly higher than it is today. The quality of musicians is continuously improving, as is the standard by which we evaluate the technical perfection of a musical performance. However, today, thanks to technological assistance, we can produce solid recordings even for performers who make more technical mistakes, whereas in the past, this was not feasible. The inability to technologically correct musical performances highlights the undeniable mastery of the great musicians of the 20th century. In contrast, in the 21st century, there is always a lingering doubt that recordings have been technologically corrected, making their performance authenticity questionable. This applies to both studio and live concert recordings, which, despite being captured in real-time, can still undergo processing before publication.

Of course, today, many musicians possess flawless technical and musical performance potential. However, advanced digital music post-production also enables high-quality musical products for performers with more modest capabilities. The production of their recordings is more demanding and time-consuming due to a higher number of mistakes, making it more expensive as well. With proper recording management, post-production can elevate average or even below-average performances to a relatively high-quality level. Such results were unattainable for average performers in the 20th century, which is why the small number of outstanding musicians, at least in terms of serious classical music interpretation, were undeniably excellent. Today’s flood of technically flawless performances, made possible through sound modifications, diminishes the value of those that are truly exceptional in their own right.

Music creators and interpreters are increasingly aware of the new services available in the music production process and compare their work with competing recordings. Today, the global market for historical recordings of classical music is virtually saturated with performances of the core repertoire. With hundreds of recordings of fundamental works, such as Beethoven’s piano sonatas, a contemporary performer cannot afford a wrongly played note or an inadequately interpreted phrase when releasing their album online. The market is overflowing with excellent recordings, leaving no reason for anyone to purchase or listen to a lower-quality performance. From this perspective, there is a growing tendency toward technological retouching and corrections of recorded material, provided that the number of syntactical and musical errors in the performance is not excessively high. Many performers enter recording sessions with their heads bowed but become vocal when seeking correction options in the audio mastering process.

Listeners of radio music programs unconsciously compare recordings, especially when they are played consecutively. It is often noticeable that one recording is more present (Romih, 2016) than another—louder, with more audible details, making its musical message easier to follow without straining one’s ears or increasing the playback volume. Consequently, we unintentionally judge that a louder recording is technically better than a quieter one (Hodges & Sebald, 2011), as we miss many musical details in the quieter recording, making it seem of lesser quality.

In popular music recordings, the loudness level remains relatively consistent throughout the entire track, making them quite different from recordings of acoustic classical music, where the dynamic range between pianississimo (ppp) and fortississimo (fff) can be extremely wide, such as in symphonic music. A symphonic orchestra’s fortissimo at its loudest can be as powerful as a heavy metal performance. However, classical music pieces typically do not maintain a constant loudness level and transition into quieter sections. A symphony orchestra can reach loudness levels between 100 and 110 dB, which is comparable to loud metal bands in popular music. The difference, however, is that the latter use amplified sound played through speakers, allowing their volume to be further increased. Such dramatic dynamic changes, common in classical music, rarely occur in popular music. As a result, classical acoustic music, in its original concept of dynamic contrast, often sounds undermodulated and too quiet in comparison to compressed popular music. For this reason, the 21st century has also brought changes in the post-production of classical music. To raise the overall average loudness of classical acoustic pieces and bring them closer to the typical loudness levels of popular music recordings, the quieter sections must be amplified. This, however, diminishes the expressive power of large dynamic contrasts while making the quieter parts more distinct for radio broadcasting purposes. The quieter segments of an audio signal can be amplified using so-called limiter-compressors—tools that reduce the dynamic range between the softest and loudest sounds by boosting the quiet ones and attenuating the loud ones.

The decision on how much to emphasize quiet sounds and attenuate loud ones is left to the aesthetic judgment of the sound engineer, who balances sound quality with the desired final loudness of the musical sequence. Excessive digital or even analog compression of sound can lead to distortion and alter the fundamental characteristics of the natural acoustic timbre of instruments. Therefore, it is often more reasonable to adjust quiet sections through digital editing, manually modifying their loudness. In this case, we do not alter the tonal characteristics of the sound—only its volume. This approach results in exceptionally high loudness levels for quiet instruments, requiring the composer, music producer, or sound engineers responsible for modifying the sound to assess the aesthetic consistency of the technological intervention.

An important factor in compressing quiet sound information is the presence of unwanted ambient noise, such as the hum of air conditioning units. This is the so-called analog silence, which, unlike digital silence, reaches our hearing threshold. The concept was most clearly introduced to a broader audience by composer John Cage in his piece 4’33”. When increasing the loudness of quiet sections in a recording, the level of analog silence—noise and other extraneous sound information—automatically rises as well, making it disproportionately loud within the musical sequence. To counter this post-production issue, noise reduction tools are used by providing the software with a sample of the analog silence present in the quietest parts of the recording, either before the performance begins or after it ends. The software then subtracts and removes this material from the musical recording. However, when defining analog silence, care must be taken to ensure that it does not include frequency ranges that contain musical content, as removing them would result in undesirable changes to the timbre of the music. With the rise of online digital media in the 21st century, a competitive race to increase the average loudness of recordings has emerged. Successive radio broadcasts of CDs from different record labels—especially when mixing older and newer recordings that adhere to different loudness standards—led to such significant variations in volume that, after a loud track, a quieter one would become almost inaudible at the same volume setting on a radio receiver. To address this issue, the European Broadcasting Union (EBU) adopted the EBU R 128 recommendation in 2010, standardizing audio normalization and setting the maximum level of audio signals at -23 LUFS. European radio stations use this standard to ensure relatively equal loudness levels, whereas loudness levels on online platforms tend to be much higher. For example, YouTube’s recommendation (Mastering The Mix, 2020) for mastered tracks is between -13 and -15 LUFS, with an upper limit of -9 LUFS. At these levels, the balance between quiet and loud parts of a piece is compressed to such an extent that meaningful dynamic contrasts are nearly eliminated. The compression of classical music under such parameters presents a serious aesthetic issue, as the original intent of composers did not involve such minimal loudness contrasts. These do not align with the natural laws of acoustics.

The choice of recording space depends on the genre of music and the number of musicians being recorded. It may be a concert hall, a church, an outdoor location, or a studio. However, the chosen space is not always ideal, and due to different microphone placements, it is not always possible to fully control the overall sound image in terms of tone color, reverberation, etc. In any case, the recording must be modified, sonically shaped, or mastered in a way that optimizes the sound image. The success of audio mastering depends significantly on the experience and individual aesthetic judgment of the sound engineer or whoever is responsible for the final sonic output. The criteria for a high-quality product are universal only to a certain extent, as they can vary greatly depending on the musical genre or ensemble.

Modern creators of musical works are increasingly aware of the process of music production and their role within it. They more or less skillfully adapt to the new paths leading to their final compositional product: a sound recording placed on the global music market. In recording practice, many acoustically inconsistent works appear during premiere performances of artistic acoustic music, which sound significantly different in live performance compared to the recording. In serious music, a common example is the frequent use of glissando in the harp, creating a magical atmosphere in a loud orchestral finale. If we can hear this detail on the recording, we must ask whether the harp’s volume is naturally sufficient to overpower the percussion and ten brass instruments, including four horns sitting next to the harp and blasting a pompous chord. The audience in the concert hall almost certainly cannot hear these details, but the recording is enhanced by manipulating the harp’s point-microphone volume at the mixing desk. Even solo concertos sometimes emerge with hidden compromises, such as simultaneous flute and orchestra performances. A composer may write a technically demanding solo flute part in the low register, which, under no circumstances, can be sufficiently sonorous and penetrating to be heard alongside an orchestral tutti accompaniment. This could be considered a poor example of compositional practice. However, on a well-produced recording, it can be heard if the composer intended it and the music producer successfully realized their vision.

Even more complex are the expectations of contemporary composers of artistic music, who seek an acoustically equal simultaneous performance of, for instance, ten solo instruments, expecting that the listener will be able to distinguish each one equally well. This is a psychoacoustic phenomenon that can be illustrated in reverse—through the perception of an individual soloist within the orchestra. A horn player, for example, focuses on their own musical line, listens to the instruments that introduce their entrance, and those with whom they are connected for a particular phrase. The soloist in an orchestra does not pay attention to the overall orchestral sound, as they are not capable of tracking the performance details of all other instruments, nor is that their duty. Since they know the details of their part well, they would like to hear as much of it as possible in the final orchestral recording, with all nuances preserved. Another soloist, such as a cellist, would wish the same for themselves, following their own line in the recording and preferring the cello part to be more prominent. Everyone wants to hear as much of their own part as possible.

In practice, both composers and performers who rely on post-production mixing often wish to modify the volume and presence of individual instruments, but this is not always possible. Even if all instruments were recorded in isolated tracks and assembled in a balanced stereo image, we would not be able to perceive them all equally individually. A perfect acoustic democracy of orchestral instruments cannot exist—neither in the concert hall nor in the recording. A solid classical score allows instruments the space to resonate naturally, but in modern compositional practice, these principles are often disregarded in favor of technology. Since this is a matter of individual psychoacoustic perception, technology can only assist by reducing reverberation to enhance definition and clarity, assuming that each soloist has a microphone positioned close to their instrument, capturing minimal ambient sound. From this perspective, many composers have unrealistic expectations or insufficient knowledge of what can and cannot be achieved in music production.

Even more common are misconceptions among composers who assume that live concert conditions will be sufficient for high-quality recordings of crossover music, which blends multiple genres. If we combine a rock band with a symphony orchestra, we must understand that poor sound mixing can result in a single delicate guitar pluck overpowering the entire orchestral tutti. In past concert sound systems, it was all too common for a 70-piece orchestra to serve more as a visual backdrop for a four-member band. In such cases, the sound engineering team, responsible for regulating volume levels, carries an enormous responsibility and must possess both aesthetic judgment and technological expertise. Otherwise, the most expensive and ambitious projects can turn into disasters—unusable and poorly received by audiences. This is why recordings of complex crossover projects are often successfully realized only after extensive post-production, while live media broadcasts or amplified concert experiences rarely achieve the desired quality. Composers are generally aware that acoustically isolating instrumental and large vocal ensembles (choirs) for studio recordings is advantageous, making the technological and performance coordination of such projects crucial.

Recording orchestral accompaniment and subsequently overdubbing a solo vocal is standard practice in popular music. Once the orchestral accompaniment is recorded and edited—assembled from the best performed sections—the solo vocal part is synchronized and recorded separately. Before merging it into a final stereo mix, the vocal is fine-tuned for pitch, articulation, and volume. The volume of softer passages is adjusted to match louder sections, ensuring that the vocal remains clearly intelligible in the final mix. Throughout their careers, composers receive feedback on how their ideas are realized after technical processes are completed. With sufficient knowledge and proper work organization, their visions can be effectively implemented. However, technological execution can also fail if acoustic details are not properly conceived and managed throughout the entire music production process. For this reason, it is essential that music creators are well-educated in this field.

Dr. Žiga Stanič