Sox is a very useful program, but its command line syntax is confusing and it isn't always easy to figure out how to get it to do what you want it to do. Under most circumstances, sox copies its input to its output, possibly making changes along the way. It therefore needs an input file name and an output file name, possibly together with information about them. If it is desired to do anything other than copy the input to the output (possibly with a change in format), it is necessary to specify what to do.
The simplest use of sox therefore is with two filenames as arguments:
sox foo.aiff foo.wav
This command tells sox to copy the file foo.aiff, changing its format from aiff to wav. Sox will infer the type of a file from its extension. Since the header of the aiff file contains sufficient information about the file to convert it to wav format, no other information is necessary.
Sometimes you need to convert a file to pure PCM data so that it can be processed by programs that don't understand the various encodings and headers. Such pure PCM files are called raw files and are recognized by sox by the extension .raw. By using this extension you can use sox to convert a file to raw format. The command:
sox foo.wav foo.rawconverts the file foo to raw format.
We can also use sox to convert a raw file to another format. In this case, we have to supply some information about the raw file:
sox -r 44100 -s -w foo.raw foo.wav
The three flags preceding the input file name tell sox that the input file has a sampling rate of 44,100 samples per second, that the data is signed, and that each sample consists of a two byte word. With this information, sox can create a copy in wav format. The wav header also obligatorily includes the number of channels, but the number of channels in the input file need not be specified as sox assumes a default of mono.
It is also possible to change the representation of the data. For example, we can change the sampling rate by specifying the sampling rate for the output file:
sox foo.wav -r 22050 foonew.wav
This command changes the sampling rate to 22,050 samples per second.
Thus far we have used sox only to copy a file, possibly with a change in format. Sox can also transform its input in various ways. Some of these, such as reverb, are for musical use, but a number of effects, such as filtering, may be useful for phonetics. The name of the effect follows the the name of the output file. Any further parameters necessary to specify the effect follow its name. For example, the command:
sox foo.wav bar.wav lowp 1000.0applies a low pass filter with cutoff at 1000 Hz to foo.wav and puts the result in bar.wav.
Sox can also change the number of channels. For example, some sound cards insist on stereo data, so it may be useful to convert monaural sound files to stereo. This command does the job:
sox foo.wav -c 2 foostereo.wav split
This command does not create true stereo: it creates a sound file with two duplicate channels.
If you have two audio files that you wish to use as the two left and right channels of a single audio file, you can use sox's merge effect to combine them into a single stereo file.
sox left.wav right.wav -c 2 stereo.wav -M
If you want to create artificial stereo from a mono source, have a look at Christopher Kissel's web site monotoSTEREO.info.
On the other hand, sometimes it is necessary to extract a single channel from a stereo recording. This may be because we want to process it using software that cannot deal with stereo input, or it may be because we are interested only in one channel. Sox can deal with monaural (1 channel), stereo (2 channel) and quadriphonic (4 channel) data.
There are two ways to reduce the number of channels. One is to select a particular channel. This is done by using the remix effect with an option indicating what channel to use. The channels are numbered, beginning with 1. For example, to extract the left channel give a command like this:
sox foo.wav foomono.wav remix 1
Another approach is to average the channels. To create a monoaural file from a stereo file by averaging the two channels, give a command like this:
sox original.wav mono.wav channels 1
The general option -v is used to change the volume. The argument to this option is used as a multiplier:
sox -v 2.0 foo.wav bar.wavplaces in bar.wav a copy of foo.wav with the volume doubled.
Counterintuitively, this is an input option, which is why it precedes the name of the input file.
You can use this together with the stat effect to maximize the volume of a file. The command sox foo.wav -n stat -v prints the multiplier that will maximize the volume without clipping. On Unix systems, the multiplier is written on the stderr output. In a csh script you might do this:
sox foo.wav -n stat -v >& vc sox -v `cat vc` foo.wav foo-maxed.wav
In a bash script you might do this:
sox foo.wav -n stat -v 2> vc sox -v `cat vc` foo.wav foo-maxed.wav
The -n flag suppresses the normal output. In this case, we want the multiplier and are not interested in a copy of the audio file. (The -n flag can also be used in place of an input file. In that case, sox behaves as if it had been given an input file consisting of an infinite amount of silence.)
There is also a built-in option for normalizing amplitude. You specify the level to which you wish to normalize. To maximize amplitude, you will probably want to set this to -1 (decibels).
sox --norm=-1 <inputfile> <outputfile>
The "stat" effect produces statistical information about the audio data:
sox foo.wav -n stat
The -n flag tells sox not to generate any output other than the statistical information.
If the stat effect is followed by the flag -v, all that is printed is the multiplier that will maximize the volume without clipping. This value can be used as the argument to the -v general option.
The trim effect copies the portion of the input starting at start and ending at start plus length to the output. Both parameters may be specified either as numbers of samples, consisting of an integer followed by the letter s, e.g. "8700s" or a time value. Time values are of the form ((hh:)mm:)ss(.fs). A bare integer is therefore a time value in seconds.
For example, suppose that you have a recording 1 hour long and wish to cut it into two halves. The following two commands will leave the first half in Half1.wav and the second half in Half2.wav.
sox Input.wav Half1.wav trim 0 30:00 sox Input.wav Half2.wav trim 30:00 30:00
The original file is unaffected, so once you have confirmed that the two output files contain what they should, you may delete the original if you wish to.
You can concatenate two or more input files into a single file simply by giving multiple input file names. The following command concatenates Half1.wav and Half2.wav into Full.wav.
sox Half1.wav Half2.wav Full.wav
The files to be concatenated must be of the same type, have the same sampling rate, and so forth.
Sox can synthesize a number of standard waveforms and types of noise. These are specified by means of the synth effect. Even though sox creates the output from scratch, an input file name must still be specified. The -n flag tells sox that there is no input file.
sox -n sine.wav synth 1.0 sine 1000.0
This command synthesizes a 1000 Hz sine wave 1.0 seconds long, leaving the result in sine.wav. The types of sound that it can synthesize are: sine, square, triangle, sawtooth, trapetz (trapezoidal), exp (exponential), whitenoise, pinknoise, and brownnoise.
Sox can create silences of specified duration using the -n flag for null input and the trim effect to specify the duration.
sox -n -r 48000 silence.wav trim 0.0 0.250
This command creates 250 ms of silence in the file silence.wav at a sampling rate of 48,000 samples per second.
With the -m flag, sox adds two input files together to produce its output. For example, the command:
sox -m sine100.wav sine250.wav sine100-250.wavadds sine100.wav and sine250.wav, leaving the result in sine100-250.wav. (Note that prior to version 13 there was no -m flag and that to obtain mixing behavior it was necessary to call Sox as soxmix.)
The play command takes the same arguments as many other commands. In particular, you can specify a particular region to be played.
play foo.wav trim 10.0 5.0
will play the 5.0 seconds of the file beginning at 10.0 seconds into the file. You also specify an end point rather than a duration by preceding duration argument with an equal-sign.
play foo.wav trim 10.0 =15.0
will play the same region.
On many GNU/Linux systems, sox provides the usual means for playing and recording sound files. The play command is actually a shell script that calls sox. Playing a sound file is accomplished by copying the file to the device special file /dev/dsp. The following command plays the file foo.wav:
sox foo.wav -t ossdsp /dev/dspThe -t flag specifies the type of the file /dev/dsp.
Some recent Linux systems, such as Ubuntu from Maverick Meerkat onward, no longer have /dev/dsp, as a result of which this will not work. You can get around this problem by preceding the sox command with the command padsp, e.g.:
padsp play foo.wav
or by setting the environment variable LD_PRELOAD to libpulsedsp.so before running sox, e.g.:
setenv LD_PRELOAD libpulsedsp.so
in csh or:
LD_PRELOAD=libpulsedsp.so; export LD_PRELOAD
in bash. These divert accesses to /dev/dsp to the PulseAudio server.
If you are just shifting to Pulseaudio, there is a good chance that you have residual incorrect environment variable settings in your shell init file (e.g..bashrc or .cshrc). You should remove any setting for the AUDIODEV variable and set AUDIODRIVER to "Pulseaudio". Note that you not only need to change your init file but, if you want to make the changes take effect immediately, you need to unset AUDIODEV and reset AUDIODRIVER in your current shell.
Sox does not understand mp3 files. If you want to convert data from mp3 to another format or extract information about an mp3 file, try ffmpeg. For example, ffmpeg -i 〈filename〉 will produce information about the file format, such as the number of bits per second. ffmpeg can also extract audio from video files, or combine your audio with a video track.
Revised 2020-06-22 based on SoX version 14.4.1. © William J. Poser.