Audio Recording and Encoding in Linux
Linux is a professional class operating system, so if you plan to do
audio recording and encoding then you chose the proper system.
I use Pulseaudio on some boxes and just Alsa on other boxes. As
of this writing (Summer 2011) Pulseaudio has all the major bugs worked
out, so all the old web discussions about it causing issues are no
longer relevant as far as I'm concerned.
When using Pulseaudio you are strongly advised to install the GUI
management tools in order to control the audio streams and volumes and
so forth. You can do it at the command line which is great, but
it is a bit cumbersome unless you are using a recording script (which I
will demonstrate later).
If only using Alsa, then there is usually an added step of specifying
the audio device in the recording command. For example, arecord
expects a "-D plughw:0,0" or "-D hw:0,0". You also need to use
the alsamixer utility to control the selection of external sources and
gain levels. Actually, gurus will also use alsamixer on
Pulseaudio systems as well when recording from sources such as line-in
or microphones.
arecord examples
ffmpeg examples
speex format
voice activated recording
useful scripts
Examples of recording and piping the output to a compressed format:
$ arecord -v -f cd -t raw | lame -r - output.mp3
$ arecord -v -f cd -t raw | lame -r -b 192 - output.mp3
$ arecord -v -f cd -t raw | lame -r -m s -a -b 64 - output.mp3
$ arecord -v -f cd -t raw | lame -r -h -V 0 -b 128 -B 224 - output.mp3
$ arecord -v -f cd -t raw | lame -r --preset extreme - output.mp3
$ arecord -vv -f cd output.wav
$ arecord -f dat output.wav
$ arecord -f S16_LE -c1 -r22050 -t raw | lame -r -s 22.05 -m m -b 64 - Output.mp3
$ arecord -f S16_LE -c1 -r22050 -t raw | oggenc - -r -C 1 -R 22050 -q 2 -o Output.ogg
$ arecord -f S16_LE -c1 -r22050 -t raw -d xxxx | oggenc-aotuvb5d - -r -C
1 -R 22050 -o Output.ogg
(-d xxxx limits the recording time in
seconds, useful for scripts)
(oggenc -q option is -1 to 10 where higher numbers mean higher quality/bitrate, the default is 3)
It's probably a good idea to read the man pages of arecord,
lame, and oggenc (the aotuv version is just an enhanced version).
Basically the "-f" in arecord allows you to select a pre-packaged
format like CD (44.1 kHz, 16 bits little endian, 2 channels) or DAT (48
kHz, 16 bits little endian, 2 channels), or you can get specific and
choose S16_LE (16 bit little endian) -c1 (mono) -r22050 (22.050 kHz
sampling rate), and so forth.
What you pipe arecord out to needs to logically match how you are
recording with arecord. Notice that when I record in "CD" or
"DAT" formats I choose high quality options in lame. When I use
lower quality options in arecord, I also use lower encoding demand
parameters in lame and oggenc.
Digital audio recording is all about specifying a quantizing method,
sample rate, number of channels, bits per channel, and compression
schemes and bitrates. Read the man pages.
Using ffmpeg to record and encode:
$ ffmpeg -f alsa -ac 2 -ar 44100 -ab 160k -i pulse -acodec libvorbis OUTPUT.ogg
$ ffmpeg -f alsa -ac 2 -ar 44100 -ab 160k -i pulse -acodec libmp3lame OUTPUT.mp3
$ ffmpeg -f alsa -ac 2 -ar 44100 -i pulse OUTPUT.flac
The placement of "-i pulse" and "-acodec xxx" is critical in
order for ffmpeg to encode in 2 channels, if -i pulse was at the
beginning of the command line it would only see mono audio. Use
the Pulseaudio tools to select the audio source to record, or use the
option to specify an audio device (-i hw:0,0) if using Alsa only.
Some record/encode examples with the Speex format:
Example of transcoding an MP3 to 16 kHz speex:
$ sox INPUT.MP3 -t wav -r 16000 - | speexenc - --denoise OUTPUT.SPX
Example encoding of wav file. Speex prefers input file sampling rates of 8 or 16 kHz:
$ speexenc --vbr --denoise INPUT.WAV OUTPUT.SPX
Piping arecord to speexenc (realtime speex encoding method):
$ arecord -f S16_LE -c1 -r8000 -t raw | speexenc - --vad OUTPUT.SPX
Sound Activated Recording, VOX recording, Sound Operated Recording:
This stuff goes by many names. Basically, it describes a method
to only record incoming audio signals that meet or exceed a
user-specified threshold level. It's useful when recording from
police scanner radios or microphones that are setup to record
intermittent audio.
On Winblows there are numerous programs to accomplish this. On
Linux there are only two methods that I've discovered that work
properly.
Audacity. Audacity
now has an option under the Transport menu to do "Sound Activated
Recording" and allows you to set an input threshold dB level. It
appears to be stable, I've set it up to record from my motherboard's
line-in for up to 20 hour periods with no crashes or memory leaks.
VACR. This is a
python script that works well from the command line. It has
limted options but the default threshold detection method works very well.
There are a few places in the script to specify the sample rate and
output file name. There is also a line to tweak the threshold
level gate.
Update 11/23/11. I found the script below on a German language
forum. This method uses sox (major points for that) and works
very well. The big advantages of this script is that sox lets you
encode how you want (fine grain control), and you can also do
post-processing, in this case the creator had a normalizing section and
also generated a nice spectra energy graph. I've been using this
script in some form or another for about a month now.
#!/bin/bash
#source http://www.sis-germany.de/index.php?page=Thread&threadID=1444
NAME=`date +%m-%d-%Y_%H-%M-%S`
#choose a method below, either the decibel or % of level method
#also choose number of channels, sample rate, encoding, etc..
#you need to find a level that works for your hardware
#rec -c 1 -r 22050 $NAME.wav silence 1 0 -22d -1 00:00:05 -22d
#rec -c 1 -r 22050 $NAME.wav silence 1 0 8% -1 00:00:05 8%
rec -c 1 -r 22050 $NAME.mp3 silence 1 0 8% -1 00:00:05 8%
#rec -c 1 -r 22050 $NAME.mp3 silence 1 0 25% -1 00:00:05 25%
#uncomment appropriate line below for normalizing when finished
#echo "Normalize..."
#sox $NAME.wav $NAME-norm.wav gain -n -1
#sox $NAME.mp3 $NAME-norm.mp3 gain -n -1
#uncomment appropriate line below for a spectral graph if desired
#echo "Calculating Spectrogram..."
#sox $NAME.wav -n spectrogram -x 1024 -y 768 -z 100 -t "$NAME.wav" -c '' -o $NAME.png
#sox $NAME.mp3 -n spectrogram -x 1024 -y 768 -z 100 -t "$NAME.mp3" -c '' -o $NAME.png
#echo "Done."
Script methods:
Here is a script to record from the Pulseaudio monitor source.
The monitor source is basically the source that you hear the audio
coming from out of your speakers.
#!/bin/bash
# Records the PulseAudio monitor channel.
# Uncomment the wav or mp3 output extension for OUTFILE then specify if a particular bitrate is needed for mp3
if [ -n "$1" ]; then
OUTFILE="$1"
else
TIME=$(date +%d-%b-%y_%H%M-%Z)
#Choose wav or mp3 output
#OUTFILE="recording_$TIME.wav"
OUTFILE="recording_$TIME.mp3"
fi
# Get sink monitor:
MONITOR=$(pactl list | grep -A2 '^Source #' | grep 'Name: .*\.monitor$' | awk '{print $NF}' | tail -n1)
# Record it raw, and convert to a wav, uncomment the method you need below.
echo "Recording. Ctrl-C or close window to stop"
#44.1 kHz, 16 bit little endian, 2 channels, line below is the deprecated method
#parec -d "$MONITOR" | sox -t raw -r 44100 -sLb 16 -c 2 - "$OUTFILE"
#Defaults to 128 kbps when encoding directly to mp3
#parec -d "$MONITOR" | sox -t raw -r 44100 -b 16 -L -e signed -c 2 - "$OUTFILE"
#-C192 uses 192 kbps when encoding directly to mp3
parec -d "$MONITOR" | sox -t raw -r 44100 -b 16 -L -e signed -c 2 - -C192 "$OUTFILE"
------------------------------------------------------------------------
Here is an example script that grabs an Internet stream and converts it to ogg. This is a useful script for a cron job.
#!/bin/bash
# set up the variables below as needed, they're self explanitory
DATE1=$(date +"%d%m%Y")
DATE2=$(date +"%d-%m-%Y")
STREAM=http://www.123.com/xxxxx.ram
DURATION=2.2h
MUSIC_DIR=$HOME/temp
cd $MUSIC_DIR
# Download the stream and convert it to wave format:
mplayer -cache 2048 -playlist $STREAM \
-vc null -vo null -ao pcm:fast:waveheader:file=output.wav &
sleep $DURATION # Length of the program being recorded as background.
kill $! # End the most recently backgrounded job = mplayer
# Convert to ogg format and place the appropriate tags:
oggenc -q 6 output.wav \
-t "title string: $DATE2" \
-l "album string" \
-a "artist string" \
-o output_$DATE1.ogg
rm output.wav
------------------------------------------------------------------------