Audio Recording and Encoding in Linux

Linux is a professional class operating system, so if you plan to do audio recording and encoding then you chose the proper system.

I use Pulseaudio on some boxes and just Alsa on other boxes. As of this writing (Summer 2011) Pulseaudio has all the major bugs worked out, so all the old web discussions about it causing issues are no longer relevant as far as I'm concerned.

When using Pulseaudio you are strongly advised to install the GUI management tools in order to control the audio streams and volumes and so forth. You can do it at the command line which is great, but it is a bit cumbersome unless you are using a recording script (which I will demonstrate later).

If only using Alsa, then there is usually an added step of specifying the audio device in the recording command. For example, arecord expects a "-D plughw:0,0" or "-D hw:0,0". You also need to use the alsamixer utility to control the selection of external sources and gain levels. Actually, gurus will also use alsamixer on Pulseaudio systems as well when recording from sources such as line-in or microphones.

arecord examples
ffmpeg examples
speex format
voice activated recording
useful scripts

Examples of recording and piping the output to a compressed format:

$ arecord -v -f cd -t raw | lame -r - output.mp3

$ arecord -v -f cd -t raw | lame -r -b 192 - output.mp3

$ arecord -v -f cd -t raw | lame -r -m s -a -b 64 - output.mp3

$ arecord -v -f cd -t raw | lame -r -h -V 0 -b 128 -B 224 - output.mp3

$ arecord -v -f cd -t raw | lame -r --preset extreme - output.mp3

$ arecord -vv -f cd output.wav

$ arecord -f dat output.wav

$ arecord -f S16_LE -c1 -r22050 -t raw | lame -r -s 22.05 -m m -b 64 - Output.mp3

$ arecord -f S16_LE -c1 -r22050 -t raw | oggenc - -r -C 1 -R 22050 -q 2 -o Output.ogg

$ arecord -f S16_LE -c1 -r22050 -t raw -d xxxx | oggenc-aotuvb5d - -r -C 1 -R 22050 -o Output.ogg

(-d xxxx limits the recording time in seconds, useful for scripts)
(oggenc -q option is -1 to 10 where higher numbers mean higher quality/bitrate, the default is 3)

It's probably a good idea to read the man pages of arecord, lame, and oggenc (the aotuv version is just an enhanced version). Basically the "-f" in arecord allows you to select a pre-packaged format like CD (44.1 kHz, 16 bits little endian, 2 channels) or DAT (48 kHz, 16 bits little endian, 2 channels), or you can get specific and choose S16_LE (16 bit little endian) -c1 (mono) -r22050 (22.050 kHz sampling rate), and so forth.

What you pipe arecord out to needs to logically match how you are recording with arecord. Notice that when I record in "CD" or "DAT" formats I choose high quality options in lame. When I use lower quality options in arecord, I also use lower encoding demand parameters in lame and oggenc.

Digital audio recording is all about specifying a quantizing method, sample rate, number of channels, bits per channel, and compression schemes and bitrates. Read the man pages.

Using ffmpeg to record and encode:

$ ffmpeg -f alsa -ac 2 -ar 44100 -ab 160k -i pulse -acodec libvorbis OUTPUT.ogg

$ ffmpeg -f alsa -ac 2 -ar 44100 -ab 160k -i pulse -acodec libmp3lame OUTPUT.mp3

$ ffmpeg -f alsa -ac 2 -ar 44100 -i pulse OUTPUT.flac

The placement of "-i pulse" and "-acodec xxx" is critical in order for ffmpeg to encode in 2 channels, if -i pulse was at the beginning of the command line it would only see mono audio. Use the Pulseaudio tools to select the audio source to record, or use the option to specify an audio device (-i hw:0,0) if using Alsa only.

Some record/encode examples with the Speex format:

Example of transcoding an MP3 to 16 kHz speex:
$ sox INPUT.MP3 -t wav -r 16000 - | speexenc - --denoise OUTPUT.SPX

Example encoding of wav file. Speex prefers input file sampling rates of 8 or 16 kHz:
$ speexenc --vbr --denoise INPUT.WAV OUTPUT.SPX

Piping arecord to speexenc (realtime speex encoding method):
$ arecord -f S16_LE -c1 -r8000 -t raw | speexenc - --vad OUTPUT.SPX

Sound Activated Recording, VOX recording, Sound Operated Recording:

This stuff goes by many names. Basically, it describes a method to only record incoming audio signals that meet or exceed a user-specified threshold level. It's useful when recording from police scanner radios or microphones that are setup to record intermittent audio.

On Winblows there are numerous programs to accomplish this. On Linux there are only two methods that I've discovered that work properly.

Audacity. Audacity now has an option under the Transport menu to do "Sound Activated Recording" and allows you to set an input threshold dB level. It appears to be stable, I've set it up to record from my motherboard's line-in for up to 20 hour periods with no crashes or memory leaks.

VACR. This is a python script that works well from the command line. It has limted options but the default threshold detection method works very well. There are a few places in the script to specify the sample rate and output file name. There is also a line to tweak the threshold level gate.

Update 11/23/11. I found the script below on a German language forum. This method uses sox (major points for that) and works very well. The big advantages of this script is that sox lets you encode how you want (fine grain control), and you can also do post-processing, in this case the creator had a normalizing section and also generated a nice spectra energy graph. I've been using this script in some form or another for about a month now.

#!/bin/bash
#source http://www.sis-germany.de/index.php?page=Thread&threadID=1444

NAME=`date +%m-%d-%Y_%H-%M-%S`

#choose a method below, either the decibel or % of level method
#also choose number of channels, sample rate, encoding, etc..
#you need to find a level that works for your hardware

#rec -c 1 -r 22050 $NAME.wav silence 1 0 -22d -1 00:00:05 -22d
#rec -c 1 -r 22050 $NAME.wav silence 1 0 8% -1 00:00:05 8%
rec -c 1 -r 22050 $NAME.mp3 silence 1 0 8% -1 00:00:05 8%
#rec -c 1 -r 22050 $NAME.mp3 silence 1 0 25% -1 00:00:05 25%

#uncomment appropriate line below for normalizing when finished

#echo "Normalize..."
#sox $NAME.wav $NAME-norm.wav gain -n -1
#sox $NAME.mp3 $NAME-norm.mp3 gain -n -1

#uncomment appropriate line below for a spectral graph if desired

#echo "Calculating Spectrogram..."
#sox $NAME.wav -n spectrogram -x 1024 -y 768 -z 100 -t "$NAME.wav" -c '' -o $NAME.png
#sox $NAME.mp3 -n spectrogram -x 1024 -y 768 -z 100 -t "$NAME.mp3" -c '' -o $NAME.png
#echo "Done."

Script methods:

Here is a script to record from the Pulseaudio monitor source. The monitor source is basically the source that you hear the audio coming from out of your speakers.

#!/bin/bash
# Records the PulseAudio monitor channel.
# Uncomment the wav or mp3 output extension for OUTFILE then specify if a particular bitrate is needed for mp3

if [ -n "$1" ]; then
OUTFILE="$1"
else
TIME=$(date +%d-%b-%y_%H%M-%Z)
#Choose wav or mp3 output
#OUTFILE="recording_$TIME.wav"
OUTFILE="recording_$TIME.mp3"
fi

# Get sink monitor:
MONITOR=$(pactl list | grep -A2 '^Source #' | grep 'Name: .*\.monitor$' | awk '{print $NF}' | tail -n1)

# Record it raw, and convert to a wav, uncomment the method you need below.
echo "Recording. Ctrl-C or close window to stop"
#44.1 kHz, 16 bit little endian, 2 channels, line below is the deprecated method
#parec -d "$MONITOR" | sox -t raw -r 44100 -sLb 16 -c 2 - "$OUTFILE"
#Defaults to 128 kbps when encoding directly to mp3
#parec -d "$MONITOR" | sox -t raw -r 44100 -b 16 -L -e signed -c 2 - "$OUTFILE"
#-C192 uses 192 kbps when encoding directly to mp3
parec -d "$MONITOR" | sox -t raw -r 44100 -b 16 -L -e signed -c 2 - -C192 "$OUTFILE"

------------------------------------------------------------------------
Here is an example script that grabs an Internet stream and converts it to ogg. This is a useful script for a cron job.

#!/bin/bash

# set up the variables below as needed, they're self explanitory
DATE1=$(date +"%d%m%Y")
DATE2=$(date +"%d-%m-%Y")
STREAM=http://www.123.com/xxxxx.ram
DURATION=2.2h
MUSIC_DIR=$HOME/temp

cd $MUSIC_DIR
# Download the stream and convert it to wave format:
mplayer -cache 2048 -playlist $STREAM \
        -vc null -vo null -ao pcm:fast:waveheader:file=output.wav &

sleep $DURATION # Length of the program being recorded as background.
kill $!         # End the most recently backgrounded job = mplayer

# Convert to ogg format and place the appropriate tags:

oggenc -q 6 output.wav \
         -t "title string: $DATE2" \
         -l "album string" \
         -a "artist string" \
         -o output_$DATE1.ogg

rm output.wav
------------------------------------------------------------------------