Creating and Exporting Audio in Python

August 29, 2024August 31, 2024

The following will demonstrate the basics of creating simple sounds in Python and using the wave and library to export it. It serves as a simple Python sine wave audio export tutorial (my SEO focus keyphrase)

Making the sound is as simple as creating an array of oscillations, exporting the sound is a touch more complex but still trivially easy.

The first example will be in vanilla python without the use of numpy so make it slightly more clear. Where speed is prioritized I would advise using numpy, however this initial example will be much easier to port into other imperative languages.

The first step is to import the required libraries

import wave #we will require the wave library to export the finished sound,
import struct #the struct library to format each frame as binary
from math import pi, sin #and finally import pi and the sine wave to create an oscillation

For the demo we will be creating a pure sine wave at 440Hz which is known as “Middle A” in the most common standards.
Additionally we must assign a framerate, the number of channels, and the sample width.

Frame Rate: this number represents the number of samples which occur each second. There are a number of commonly used values but 44,100 Hz (alternately represented as 44.1 kHz) remains one of the most common as it was the standard for CD players since the late 70s.
Channels: This represents how many different audio streams will be playing concurrently, this can be as high as 65535 channels! However the most common is 1 channel for mono, 2 channels for stereo, 6 channels for 5.1 surround sound, or 8 for 7.1 surround sound.

FrameRate=44100 #Samples Per Second
Pitch=440 #Hz
Channels=2 #How many audio channels.  1 for mono, 2 for stereo
SampleWidth=2 #Bytes Per Sample

The next step is to simply generate the sine wave. This could easily be done with a list comprehension mapping a range to a sine function like

MySound=[sin(i/FrameRate*2*pi*Pitch) for i in range(ToneLength*FrameRate)]

However a major downside to doing this is that you can’t change the frequency and it rules out basically all frequency modulation, so while it would be fine for this demo it’s a bad habit to get into.
A much better way is to keep track of the current position of the oscillation and within a loop increment it proportionally to the current frequency (which in this example is constant)

ToneLength=1 #Tone length is in seconds as the framerate may be variable it's best to use a fixed unit
MySound=[] #make an empty list to append the sound to
tau=2*pi #Each oscillation has a length of 2pi so it saves us some computations if we define 2pi as its own variable
OscillatorPosition=0 #This is the value we will increment which will be the argument inputted to the sine function
for frame in range(ToneLength*FrameRate):
    OscillatorPosition+=Pitch/FrameRate*tau #increment the oscillator position
    #We are incrementing by a constant amount each iteration, although that often won't be a the case
    MySound.append(sin(OscillatorPosition))#Add the oscillator position to the list of frames

Now that we have our sound all that’s left is to simply export it.
I will cover the creation of .wav files in more depth in future, however python has it covered for us with the wave library we imported in the beginning.

def SaveSound(l,name="default.wav",samplewidth=2,framerate=44100,channels=2):
    f=wave.open(name,"w") #f is our output file
    #After creating the wave we must define a few variables required for 
    f.setframerate(framerate) #set the framerate
    f.setnchannels(2) #how many channels, this demo uses 2 and is stereo
    f.setsampwidth(samplewidth)#in bytes
    if samplewidth==1:#if the sample width is one byte This is called PCM8
        charcode="b" #pack into a signed byte
    if samplewidth==2: #if the sample width is two bytes This is called PCM16
        charcode="h" #pack into a signed short
    if samplewidth==4: #if the sample width is four bytes This is called PCM32
        charcode="h" #pack into a signed long
    Magnitude=2**(8*samplewidth-1) #We must multiply each value by 2 to the power of the number of bits minus 1 so we predefine this value as to avoid recalculating it each iteration within the loop
    for val in l:
#As our values are normalised between -1 and 1, and we need them between -2^(8*samplewidth) and -2^(8*samplewidth)-1
        #We must multiply each input value by the desired magnitude and then recast it to an integer
        val=int(val*Magnitude)
        val=-Magnitude if val<-Magnitude  else val  #if the value is less than -2^(8*samplewidth)
        val=Magnitude-1 if val>(Magnitude-1)  else val #or greater than 2^(8*samplewidth)-1, then limit it to the extrema instead
        #The positive magnitude is one less than the negative, 
        #since in C-structs 0 is the first positive number and -1 the first negative number
        #this will ensure it falls within the desired range
        for channel in range(channels):
            f.writeframes(struct.pack(charcode,val)) 
            #pack value into the desired number of bytes and write value into the current channel. 
            #if you wanted to pan the sound you would have a different magnitude for each channel
    f.close()

More about the wave library can be found here: https://docs.python.org/3/library/wave.html
And the struct library here: https://docs.python.org/3/library/struct.html

The final step is simply to take out sound and use it as the first argument of SaveSound with our desired arguments as defined earlier.

SaveSound(MySound,name="default.wav",samplewidth=SampleWidth,framerate=FrameRate,channels=Channels)
#Export the list as a file

Complete Code

import wave #we will require the wave library to export the finished sound,
import struct #the struct library to format each frame as binary
from math import pi, sin #and finally import pi and the sine wave to create an oscillation

FrameRate=44100 #Samples Per Second
Pitch=440 #Hz
Channels=2 #How many audio channels.  1 for mono, 2 for stereo
SampleWidth=2 #Bytes Per Sample



ToneLength=1 #Tone length is in seconds as the framerate may be variable it's best to use a fixed unit
MySound=[] #make an empty list to append the sound to
tau=2*pi #Each oscillation has a length of 2pi so it saves us some computations if we define 2pi as its own variable
OscillatorPosition=0 #This is the value we will increment which will be the argument inputted to the sine function

#MySound=[sin(i/FrameRate*2*pi*Pitch) for i in range(ToneLength*FrameRate)]
#List Comprehension Example

for frame in range(ToneLength*FrameRate):
    OscillatorPosition+=Pitch/FrameRate*tau #increment the oscillator position
    #We are incrementing by a constant amount each iteration, although that often won't be a the case
    MySound.append(sin(OscillatorPosition))#Add the oscillator position to the list of frames

def SaveSound(l,name="default.wav",samplewidth=2,framerate=44100,channels=2):
    f=wave.open(name,"w") #f is our output file
    #After creating the wave we must define a few variables required for 
    f.setframerate(framerate) #set the framerate
    f.setnchannels(2) #how many channels, this demo uses 2 and is stereo
    f.setsampwidth(samplewidth)#in bytes
    if samplewidth==1:#if the sample width is one byte This is called PCM8
        charcode="b" #pack into a signed byte
    if samplewidth==2: #if the sample width is two bytes This is called PCM16
        charcode="h" #pack into a signed short
    if samplewidth==4: #if the sample width is four bytes This is called PCM32
        charcode="l" #pack into a signed long
    Magnitude=2**(8*samplewidth-1) #We must multiply each value by 2 to the power of the number of bits minus 1 so we predefine this value as to avoid recalculating it each iteration within the loop
    for val in l:
#As our values are normalised between -1 and 1, and we need them between -2^(8*samplewidth) and -2^(8*samplewidth)-1
#We must multiply each input value by the desired magnitude.
        val=int(val*Magnitude)
        val=-Magnitude if val<-Magnitude  else val  #if the value is less than -2^(8*samplewidth)
        val=Magnitude-1 if val>(Magnitude-1)  else val #or greater than 2^(8*samplewidth)-1, then limit it to the extrema instead
        #The positive magnitude is one less than the negative, 
        #since in C-structs 0 is the first positive number and -1 the first negative number
        #this will ensure it falls within the desired range
        for channel in range(channels):
            f.writeframes(struct.pack(charcode,val)) 
            #pack value into the desired number of bytes and write value into the current channel. 
            #if you wanted to pan the sound you would have a different magnitude for each channel
    f.close()

SaveSound(MySound,name="default.wav",samplewidth=SampleWidth,framerate=FrameRate,channels=Channels)

And that’s everything for now.
If you need to visualise it, just add

from matplotlib import pyplot as plt

To the imports at the beginning and then add

plt.plot(MySound)
plt.show()

at the end before or after SaveSound depending on if you want to preview it before or after exporting the file.

In the next demo I will show how to envelope sounds and add overtones to create a simulated guitar or piano pluck.
I already have the code for it on github here:
https://github.com/Troy-Osborne/PluckLab/blob/main/Pluck.py
however it’s not yet adequately commented.

In larger projects it’s best to export the sound as you’re creating it, this ensures you don’t have to store the entire sound in memory, which usually isn’t a good idea but often doesn’t matter with smaller files. You’ll definitely notice the difference if trying to export an entire song. I will give examples of this soon.

If you are planning on storing the entire sound in memory before exporting it I recommend doing it as a numpy array and vectorizing the process, you’ll notice incredible performance increases, however I wanted to ensure this could easily be ported to C or whatever other language you want to work with.

I hope this has helped, thank you very much for your time. I wish you luck on all future audio programming endeavors.