Using Sox


  1. Introduction
  2. Copying the Input
  3. Changing the Number of Channels
  4. Effects
  5. Using Sox to Change Volume
  6. Using Sox to Obtain Information
  7. Using Sox to Extract Subparts of a File
  8. Using Sox to Concatenate Files
  9. Using Sox to Synthesize Sound
  10. Using Sox to Create Silence
  11. Using Sox to Combine Sound Files
  12. Using Sox to Play and Record

Introduction

Sox is a very useful program, but its command line syntax is confusing and it isn't always easy to figure out how to get it to do what you want it to do. Under most circumstances, sox copies its input to its output, possibly making changes along the way. It therefore needs an input file name and an output file name, possibly together with information about them. If it is desired to do anything other than copy the input to the output (possibly with a change in format), it is necessary to specify what to do.


Copying the Input

The simplest use of sox therefore is with two filenames as arguments:

sox foo.aiff foo.wav

This command tells sox to copy the file foo.aiff, changing its format from aiff to wav. Sox will infer the type of a file from its extension. Since the header of the aiff file contains sufficient information about the file to convert it to wav format, no other information is necessary.

Sometimes you need to convert a file to pure PCM data so that it can be processed by programs that don't understand the various encodings and headers. Such pure PCM files are called raw files and are recognized by sox by the extension .raw. By using this extension you can use sox to convert a file to raw format. The command:

sox foo.wav foo.raw
converts the file foo to raw format.

We can also use sox to convert a raw file to another format. In this case, we have to supply some information about the raw file:

sox -r 44100 -s -w foo.raw foo.wav

The three flags preceding the input file name tell sox that the input file has a sampling rate of 44,100 samples per second, that the data is signed, and that each sample consists of a two byte word. With this information, sox can create a copy in wav format. The wav header also obligatorily includes the number of channels, but the number of channels in the input file need not be specified as sox assumes a default of mono.

It is also possible to change the representation of the data. For example, we can change the sampling rate by specifying the sampling rate for the output file:

sox foo.wav -r 22050 foonew.wav

This command changes the sampling rate to 22,050 samples per second.


Back to top

Effects

Thus far we have used sox only to copy a file, possibly with a change in format. Sox can also transform its input in various ways. Some of these, such as reverb, are for musical use, but a number of effects, such as filtering, may be useful for phonetics. The name of the effect follows the the name of the output file. Any further parameters necessary to specify the effect follow its name. For example, the command:

sox foo.wav bar.wav lowp 1000.0 
applies a low pass filter with cutoff at 1000 Hz to foo.wav and puts the result in bar.wav.


Back to top

Changing the Number of Channels

Sox can also change the number of channels. For example, some sound cards insist on stereo data, so it may be useful to convert monaural sound files to stereo. This command does the job:

sox foo.wav -c 2 foostereo.wav split

This command does not create true stereo: it creates a sound file with two duplicate channels.

If you have two audio files that you wish to use as the two left and right channels of a single audio file, you can use sox's merge effect to combine them into a single stereo file.

sox left.wav right.wav -c 2 stereo.wav -M

If you want to create artificial stereo from a mono source, have a look at Christopher Kissel's web site monotoSTEREO.info.

On the other hand, sometimes it is necessary to extract a single channel from a stereo recording. This may be because we want to process it using software that cannot deal with stereo input, or it may be because we are interested only in one channel. Sox can deal with monaural (1 channel), stereo (2 channel) and quadriphonic (4 channel) data.

There are two ways to reduce the number of channels. One is to select a particular channel. This is done by using the remix effect with an option indicating what channel to use. The channels are numbered, beginning with 1. For example, to extract the left channel give a command like this:

sox foo.wav foomono.wav remix 1

Another approach is to average the channels. To create a monoaural file from a stereo file by averaging the two channels, give a command like this:

sox original.wav mono.wav channels 1


Back to top

Using Sox to Change Volume

The general option -v is used to change the volume. The argument to this option is used as a multiplier:

sox -v 2.0 foo.wav bar.wav
places in bar.wav a copy of foo.wav with the volume doubled.

Counterintuitively, this is an input option, which is why it precedes the name of the input file.

You can use this together with the stat effect to maximize the volume of a file. The command sox foo.wav -n stat -v prints the multiplier that will maximize the volume without clipping. On Unix systems, the multiplier is written on the stderr output. In a csh script you might do this:

sox foo.wav -n stat -v >& vc
sox -v `cat vc` foo.wav foo-maxed.wav

In a bash script you might do this:

sox foo.wav -n stat -v 2> vc
sox -v `cat vc` foo.wav foo-maxed.wav

The -n flag suppresses the normal output. In this case, we want the multiplier and are not interested in a copy of the audio file. (The -n flag can also be used in place of an input file. In that case, sox behaves as if it had been given an input file consisting of an infinite amount of silence.)

There is also a built-in option for normalizing amplitude. You specify the level to which you wish to normalize. To maximize amplitude, you will probably want to set this to -1 (decibels).

sox --norm=-1 <inputfile> <outputfile>

Back to top

Using Sox to Obtain Information

The "stat" effect produces statistical information about the audio data:

sox foo.wav -n stat 

The -n flag tells sox not to generate any output other than the statistical information.

If the stat effect is followed by the flag -v, all that is printed is the multiplier that will maximize the volume without clipping. This value can be used as the argument to the -v general option.


Back to top

Using Sox to Extract Subparts of a File

The trim effect copies the portion of the input starting at start and ending at start plus length to the output. Both parameters may be specified either as numbers of samples, consisting of an integer followed by the letter s, e.g. "8700s" or a time value. Time values are of the form ((hh:)mm:)ss(.fs). A bare integer is therefore a time value in seconds.

For example, suppose that you have a recording 1 hour long and wish to cut it into two halves. The following two commands will leave the first half in Half1.wav and the second half in Half2.wav.

sox Input.wav  Half1.wav trim 0 30:00
sox Input.wav  Half2.wav trim 30:00 30:00

The original file is unaffected, so once you have confirmed that the two output files contain what they should, you may delete the original if you wish to.


Back to top

Using Sox to Concatenate Files

You can concatenate two or more input files into a single file simply by giving multiple input file names. The following command concatenates Half1.wav and Half2.wav into Full.wav.

sox Half1.wav Half2.wav Full.wav

The files to be concatenated must be of the same type, have the same sampling rate, and so forth.


Back to top

Using Sox to Synthesize Sound

Sox can synthesize a number of standard waveforms and types of noise. These are specified by means of the synth effect. Even though sox creates the output from scratch, an input file name must still be specified. The -n flag tells sox that there is no input file.

sox -n sine.wav synth 1.0 sine  1000.0

This command synthesizes a 1000 Hz sine wave 1.0 seconds long, leaving the result in sine.wav. The types of sound that it can synthesize are: sine, square, triangle, sawtooth, trapetz (trapezoidal), exp (exponential), whitenoise, pinknoise, and brownnoise.


Back to top

Using Sox to Create Silence

Sox can create silences of specified duration using the -n flag for null input and the trim effect to specify the duration.

sox -n -r 48000 silence.wav trim 0.0 0.250

This command creates 250 ms of silence in the file silence.wav at a sampling rate of 48,000 samples per second.


Back to top

Using Sox to Combine Sound Files

With the -m flag, sox adds two input files together to produce its output. For example, the command:

sox -m sine100.wav sine250.wav sine100-250.wav
adds sine100.wav and sine250.wav, leaving the result in sine100-250.wav. (Note that prior to version 13 there was no -m flag and that to obtain mixing behavior it was necessary to call Sox as soxmix.)


Back to top

Using Sox to Play and Record

The play command takes the same arguments as many other commands. In particular, you can specify a particular region to be played.

play foo.wav trim 10.0 5.0

will play the 5.0 seconds of the file beginning at 10.0 seconds into the file. You also specify an end point rather than a duration by preceding duration argument with an equal-sign.

play foo.wav trim 10.0 =15.0

will play the same region.

On many GNU/Linux systems, sox provides the usual means for playing and recording sound files. The play command is actually a shell script that calls sox. Playing a sound file is accomplished by copying the file to the device special file /dev/dsp. The following command plays the file foo.wav:

sox foo.wav -t ossdsp /dev/dsp
The -t flag specifies the type of the file /dev/dsp.

Some recent Linux systems, such as Ubuntu from Maverick Meerkat onward, no longer have /dev/dsp, as a result of which this will not work. You can get around this problem by preceding the sox command with the command padsp, e.g.:

padsp play foo.wav

or by setting the environment variable LD_PRELOAD to libpulsedsp.so before running sox, e.g.:

setenv LD_PRELOAD libpulsedsp.so

in csh or:

LD_PRELOAD=libpulsedsp.so; export LD_PRELOAD

in bash. These divert accesses to /dev/dsp to the PulseAudio server.

If you are just shifting to Pulseaudio, there is a good chance that you have residual incorrect environment variable settings in your shell init file (e.g..bashrc or .cshrc). You should remove any setting for the AUDIODEV variable and set AUDIODRIVER to "Pulseaudio". Note that you not only need to change your init file but, if you want to make the changes take effect immediately, you need to unset AUDIODEV and reset AUDIODRIVER in your current shell.

MP3 Files

Sox does not understand mp3 files. If you want to convert data from mp3 to another format or extract information about an mp3 file, try ffmpeg. For example, ffmpeg -i ⟨filename⟩ will produce information about the file format, such as the number of bits per second. ffmpeg can also extract audio from video files, or combine your audio with a video track.






Back to top

Revised 2020-06-22 based on SoX version 14.4.1. © William J. Poser.