are you ready for a mindfuck ? are you ready for a mindfuck?

....::::Menu::::....
...::About::...
...::Articles::...
...::Contact::...
...::Home & News::...
...::Links & Credits::... OpenGL logo
Valid XHTML 1.0!

scope

by Elie De Brauwer

Goal of this file

In this article we are going to visualise the information stored in a wav audio file. In this article we'll use the scissor test and printing text in the window.

History of this file

  • 11 march 2004: created
  • 12 march 2004: continued
  • 25 march 2004: finished

How this article is organized

The project consists of three files Wav.h which handles everything that has to do with wav file access, Scope.h which draws the information it can retrieve from the wav file by using the Wav class and sound.cpp which opens the window, defines the callbacks and initialises the Scope class.
This article consists of two distinct parts, part I is a discussion of the wav file and Wav.h, if this is of no interest to you just know what the interface looks like and skip to part two, part II is a discussion of the OpenGL part.

Part 1: Using/Understanding the wav file

Material to use

One of my friends and also a classmate Bart Crets was kind enough to create a sample wav file which may be freely used and distributed. This file contains a sine with a frequency of 440 Hz. It is ten seconds long, it is sampled at 44100 Hz, using 16 bit per sample (which implies it is stereo). Hexdumps and references in this article are always made to this file.

What we will try to create

If you are a bit interested in science and want to know why and how it works read the following section, if you don't care a bit and just want to go to the example skip the following section. The choice is yours.

Scientific approach

In this article we will create a visualisation of sound samples. We start from a .WAV file, this file is created by using so called PCM (Pulse Code Modulation). This is needed because sound is a wave (analog) and a computer works with discrite (digital) data. So the sound is stored using ADC (analog to digital conversion), this is done be taking a sample at certain discrite intervals. In a 44100 Hz file like the one we are using here, a sample is taken 44100 times a second and this value is stored in the WAV file. Now there are various possible visualisations we could create. We could create a spectrum analyser. Which gives you an idea of what kind of signals the sounds exists but this technique requires that you split the sound into elementary waves from which you can easily know their frequency. But this requires too much mathematics for this (easy) article so here I have chosen to draw the oscillogram. This is the visual reconstruction of the acutal wave that existed before the PCM occured. So we start from the samples and try to create a wave from it, which can be considered as a DAC (digital to analog convertor), but we aren't reconstructing a signal we are merely visualising it. Below you can see two examples of an xmms screenshot, the first showing the spectrum analyser on a song (a sum of multiple frequency waves) and the oscillogram of another part of the same song.

Spectrum analyser
Oscillogram

Below you can see the spectogram and the oscillogram of the WAV file we are examining. Our WAV file is a 10 second long sine wave with a frequency of 440 Hz. You can see in the spectogram that it isn't a combined wave.

Spectrum analyser
Oscillogram showing a

Other examples of a spectogram and an oscillogram can be seen at: http://www.clarku.edu/research/access/psychology/thompson/sound.html. If you don't believe me or don't trust me, go ask Google.

The 'sound is a wave ?' approach

Basicly we read the header, draw a horizontal line in the center of the screen, read a byte, take the byte as being a Y value and put it on the screen, we read the next byte, increment our X position draw the Y value, and so on.

The .wav file format

A .wav files contains raw pieces of data that are taken a certain number of times a second. In the sample case 44100 samnples a seconds were taken. (For more information search Google for analog to digital conversion and for the Shannon and Nyquist theorem). For more information about the wav file format specification you might want to look at the following two sites:

Below you can find a small summary of which fields are in the header file. This data is used in the Wav class.

Start byte (abs)Start byteEnd byteDescriptionEndian
RIFF Chunk (12 bytes)
003"RIFF" in ASCIIBig
447Length of the package to follow. This is also filesize - 8 in bytesLittle
8811"WAVE" in ASCIILittle
FORMAT Chunk (24 bytes)
1203"fmt " in ASCII, please note the ending spaceBig
1647Length of the FORMAT Chunk that follows this field, always 0x10Little
2089AudioFormat, always 0x01, other values indicate compressionLittle
221011Channel numbers, 0x01=Mono, 0x02=Stereo Little
241215Sample Rate in HertzLittle
281619Bytes Per Second=bytes per sample * sample rateLittle
322021Bytes Per Sample: 1=8 bit mono, 2=8 bit stereo or 16 bit mono and 4=16 bit stereo Little
342223Bits Per Sample Little
DATA Chunk (?? bytes)
3603"data" in ASCIIBig
4047Length of data to follow = bytes per second * time (in seconds) Little
448endData SamplesLittle

This is a sample header of an 16 bit stereo file, sampled at 44100 Hz. The file contains 10 seconds of data.

helios@qntal:~/vuilbak$ hexdump -n 48   -C  test.wav 
00000000  52 49 46 46 c4 ea 1a 00  57 41 56 45 66 6d 74 20  |RIFF....WAVEfmt |
00000010  10 00 00 00 01 00 02 00  44 ac 00 00 10 b1 02 00  |........D.......|
00000020  04 00 10 00 64 61 74 61  a0 ea 1a 00 00 00 00 00  |....data........|
00000030

The Wav class

Below you can find the content of the file Wav.h, the class in this file (Wav class) has two public methods, you have string Wav::toString() which dumps the information about the wav file to the screen and you have int Wav::readSample(int,bool) which reads a sample from the file and returns the value. For those not interested in this don't look at the code below and use it as a blackbox. For the others, enjoy your read.

#ifndef __WAV_H
#define __WAV_H

#include <cstdlib>
#include <sstream>
using namespace std;

typedef unsigned char uchar;
typedef unsigned int uint;

class Wav{
 private:
  char * filename;
  FILE * wavfile;
  bool valid;             // True if it is a wav
  bool mono;              // True == mono, False == stereo
  int s_rate;             // Sample rate 44100 kHz, 22050 kHz
  int byte_sec;           // number of bytes for a second
  int byte_samp;          // 1 = 8 bit mono, 2 = 8 bit stereo or 16 bit mono, 4 = 16 bit stereo
  int bit_samp;           // number of bits in a sample
  int length;             // length = amount of data after the header in bytes
  double time;            // length/byte_sec
  Wav();// not available
  
 public:
  Wav(char *);
  ~Wav();
  //  String getInfo();
  string toString();
  int readSample(int,bool);
  bool isValid();
  int getSampleRate();
};

Above you can see the class prototype, all the usefull information is stored there, the default constructor is hidden and the only valid constructor is Wav(char *) which expects a filename.

///////////////////////////////////////////////////
// Wav::Wav
Wav::Wav(char *c){
  valid = false;
  filename = new char[strlen(c)+1];
  strcpy(filename,c);
  // open filepointer readonly
  wavfile = fopen(filename,"r");
  if(wavfile==NULL){
    cout << "Could not open " << filename << endl;
  }else{
    // declare a uchar buff to store some values in
    uchar *buff = new uchar[5];
    buff[4]='\0';
    // read the first 4 bytes
    fread((void *)buff,1,4,wavfile);
    // the first four bytes should be 'RIFF'
    if(strcmp((char *)buff,"RIFF")==0){
      
      // read byte 8,9,10 and 11
      fseek(wavfile,4,SEEK_CUR);
      fread((void *)buff,1,4,wavfile);
      // this should read "WAVE"
      if(strcmp((char *)buff,"WAVE")==0){
	// read byte 12,13,14,15
	fread((void *)buff,1,4,wavfile);
	// this should read "FMT "
	if(strcmp((char *)buff,"fmt ")==0){
	  fseek(wavfile,20,SEEK_CUR);
	  // final one read byte 36,37,38,39
	  fread((void *)buff,1,4,wavfile);
	  if(strcmp((char *)buff,"data")==0){
	    
	    valid=true;
	    // Now we know it is a wav file, rewind the stream
	    rewind(wavfile);
	    // now is it mono or stereo ?
	    fseek(wavfile,22,SEEK_CUR);
	    fread((void *)buff,1,2,wavfile);
	    if(buff[0]==0x02){
	      mono=false;
	    }else{
	      mono=true;
	    }
	    // read the sample rate
	    fread((void *)&s_rate,1,4,wavfile);
	    fread((void *)&byte_sec,1,4,wavfile);
	    byte_samp=0;
	    fread((void *)&byte_samp,1,2,wavfile);
	    bit_samp=0;
	    fread((void *)&bit_samp,1,2,wavfile);
	    fseek(wavfile,4,SEEK_CUR);
	    fread((void *)&length,1,4,wavfile);
	  }
	}
      }
    }
    delete [] buff;
  }
}

Above you can see how the header is being read and being stored in the private data members.

//////////////////////////
// Wav::toString()
string Wav::toString(){
  // we put all the data in a stringstream using 
  // only strings would result in int->string 
  // conversion errors
  ostringstream info;
  if(wavfile!=NULL){
    if(valid){
      info << "Opened WAV file: " << filename << endl;
      info << "This file is in ";
      if(byte_samp==1){
	info <<  "8 bit mono" << endl;
      }else if(byte_samp==4){
	info << "16 bit stereo" << endl;
      }else if(byte_samp==2){
	if(mono){
	  info << "16 bit mono" << endl;
	}else{
	  info << "8 bit stereo" << endl;
	}
      }
      info << "Sample rate " <<  s_rate << " samples each second" <<  endl;
      info << "There are "  << byte_sec << " bytes for a second data" << endl;
      info << "There are "  << byte_samp << " bytes for sample" << endl;
      info << "A sample consists of " <<  bit_samp << " bits" << endl;
      info << "There are " << length << " bytes of audio data in the file " << endl;
      info << "The file is " << length/(byte_sec*1.0) << " seconds long" << endl;
      info << "The file contains " <<  length/byte_samp << " samples" << endl;
    }else{
      info << filename << " is not a WAV file " << endl;
    }
  }else{
    info << "Could not open file" << endl ;
  }
  return info.str();
}

string Wav::toString() returns a string which contains all the usefull information stored in the data members. Not a really vital part but handy when debugging. The result of test.wav is:

Opened WAV file: test.wav
This file is in 16 bit stereo
Sample rate 44100 samples each second
There are 176400 bytes for a second data
There are 4 bytes for sample
A sample consists of 16 bits
There are 1764000 bytes of audio data in the file 
The file is 10 seconds long
The file contains 441000 samples

We use a stringstream from the STL here to store the data (the nice part is that stringstream does all the nasty conversions (double to string, integer to string, ...) for us and we don't have to care about it. Dont forget to include sstream.

//////////////////////////////////////////////////////
// Wav::readSample()
int Wav::readSample(int number,bool leftchannel=true){
  
  /*
    Reads sample number, returns it as an int, if
    this.mono==false we look at the leftchannel bool
    to determine which to return.
    
    number is in the range [0,length/byte_samp[
    
    returns 0xefffffff on failure
  */

  if(valid && number>=0 && number<length/byte_samp){
    // go to beginning of the file
    rewind(wavfile);
    
    // we start reading at sample_number * sample_size + header length 
    int offset = number * byte_samp + 44;
    
    // unless this is a stereo file and the rightchannel is requested.
    if(!mono && !leftchannel){
      offset += byte_samp/2;
    }

    // read this many bytes;
    int amount;
    mono?amount=byte_samp:amount=byte_samp/2;
    
    fseek(wavfile,offset,SEEK_CUR);
    short sample = 0;
    fread((void *)&sample,1,amount,wavfile);
    return sample;
  }else{
    // return 0xefffffff if failed
    return (int)0xefffffff;
  }
}

Reads a sample return the integer value of the sample or returns 0xefffffff allowing us to use a nice while loop to step thru the file. The sample is a short because if we should read out integers the sign bit would be disregarded.

//////////////////////////////////////////////////
// Wav::~Wav
Wav::~Wav(){
  delete []filename;
  if(wavfile!=NULL){
    fclose(wavfile);
  }
}

///////////////////
// Wav::isValid()
bool Wav::isValid(){
  return valid;
}

/////////////////////////
// Wav::getSampleRate()
int Wav::getSampleRate(){
  return s_rate;
}

#endif /* Wav.h included */
#endif

Now all that is left is the destructor which closes the file, a logical isValid() function to know if a file has been opened and a getSampleRate() function which tells us the samplerate.

Part 2: Wav meets OpenGL

Part will be added soon

Scope.h

Below you can see the code from the Scope.h file, this file contains the Scope class which handles the filling of the window with the data from the .wav file

#ifndef __SCOPE_H
#define __SCOPE_H

#include <sstream>
#include "Wav.h"

class Scope{
 private:
  int height;                     // dimensions of the window
  int width;
  int num;                        // number of sample to draw
  double limit;                   // maximum amplitude
  int start;                      // start reading at this point
  string info;
  Wav *wavfile;
  string toString();
 public:
  Scope();
  Scope(char *);
  void draw();
  void setDimensions(int,int);
  void alterNum(int);
  void alterLimit(int);
  void alterStart(int);
};

The class definition, the scope contains some variables that define what data to display and how to display it, it also contains an Wav object (see above) where we will get our data from, the private toString() function is used to draw some text on the screen.

/////////////////////////
// Scope::Scope()
Scope::Scope(){
  wavfile=NULL;
}

/////////////////////////
// Scope::Scope()
Scope::Scope(char *file){
  wavfile=new Wav(file);
  if(!wavfile->isValid()){
    delete  wavfile;
    wavfile=NULL;
  }else{
    info=wavfile->toString();
    height=0;
    width=0;
    num=1000;
    limit=5000.0;
    start=0;
  }
}

The constructor creates a Wav object, sets the boundaries or sets the Wav file pointer to NULL if somehow the opening of the wav file failed.

////////////////////
// Scope::draw()
void Scope::draw(){
  glEnable(GL_SCISSOR_TEST);
  
  // left, the information
  glViewport(0,0,300,height);
  glScissor(0,0,300,height);
  glClear(GL_COLOR_BUFFER_BIT);
  int i=0;
  float y=0.90;

  glColor3f(1,1,0);

  // the position on the screen
  glRasterPos2f(-1,y);
  while(info[i]!='\0'){
    if(info[i]=='\n'){
      y-=0.08;
      glRasterPos2f(-1,y);
    }else{
      glutBitmapCharacter(GLUT_BITMAP_HELVETICA_12,info[i]);
    }
    i++;
  }
  
  // now show some information about what we see
  string info2 = toString();
  i=0;
  y-=0.16;
  glRasterPos2f(-1,y);
  while(info2[i]!='\0'){
    if(info2[i]=='\n'){
      y-=0.08;
      glRasterPos2f(-1,y);
    }else{
      glutBitmapCharacter(GLUT_BITMAP_HELVETICA_12,info2[i]);
    }
    i++;
  }
  
  // lower right
  glViewport(300,0,width-300,height);
  glScissor(300,0,width-300,height);
  glClear(GL_COLOR_BUFFER_BIT);

  // only draw it if we have a valid wavfile
  if(wavfile!=NULL){
    float sample=0;
    int i=0;
    glBegin(GL_LINE_STRIP);
    sample=wavfile->readSample(start+i);
    while(i<num && sample!=(int)0xefffffff){
      // scale the sample
      sample/=limit;
      glColor3f(fabs(sample),1.0-fabs(sample),0.0);
      glVertex2f(-1+i*2.0/(num*1.0),sample);
      i++;
      sample=wavfile->readSample(start+i);
    }
    glEnd();
  }
}

The first thing you notice is that we are using the GL_SCISSOR_TEST to split the screen in a tekst and a drawing part (the GL_SCISSOR_TEST was the main subject of another article which can be found here). So first we split the window, we get the info from the Wav object and use the void glutBitmapCharacter(void *font, int character) function to put the letter on the screen, we do this for each character but we have to manually set the place where the letter will be shown, we do this using the void glRasterPos2f( GLfloat x, GLfloat y ) function. Since we have to do this manually we will also have to interpret the newlines. The available fonts (these are bitmapped fonts so don't expect miracles) can be obtained from the glutBitmapCharacter manpage.
After we did this for the info from the Wav file we also do this for the info coming from our current Scope object. (There are other ways to draw fonts but these will be explained in a seperate article).
After we have drawn the text we have to draw the actual pattern coming from the wav file this is a GL_LINE_STRIP where we simply connect the points obtained by the sample and we add a color variation in function of the y coordinate.


//////////////////////////////////////
// Scope::setDimensions(int,int)
void Scope::setDimensions(int x,int y){
  width=x;
  height=y;
}

///////////////////////
// Scope::alterNum()
void Scope::alterNum(int delta){
  if(num+delta>0){
    num+=delta;
  }
}

////////////////////////
// Scope::alterLimit()
void Scope::alterLimit(int delta){
  if(limit+delta>0){
    limit+=delta;
  }
}

////////////////////////
// Scope::alterStart()
void Scope::alterStart(int delta){
  if(start+delta>0){
    start+=delta;
  }
}

These functions do nothing more than changing some private data members, this way the user can determine what data is begin displayed.

////////////////////////
// Scope::toString()
string Scope::toString(void){
  ostringstream info;
  if(wavfile == NULL){
    info << "No valid wav file opened" << endl;
  }else{
    info << "Drawing " << num << " samples" << endl;
    info << "Amplitude goes from " << -limit << " to " << limit << endl;
    info << "Starting at sample " << start << endl;
    info << "Beginning at " << (start*1.0)/wavfile->getSampleRate() << " seconds" << endl;
    info << "Ending at " << ((num+start)*1.0)/wavfile->getSampleRate() << " seconds" << endl;

  }
  return info.str();
}


#endif /* Scope.h included */

The toString() function uses a stringstream (from the C++ STL) to create a string with information about what the user can see on his screen. The string returned is drawn by using the draw() method (see above).

The glue

The file sound.cpp creates the window and a scope object. It also defines some keybindings.

#include <iostream>
#include <GL/glut.h>
#include "Scope.h"
using namespace std;

typedef unsigned char uchar;

// callbacks 
void disp(void);
void keyb(uchar key, int x, int y);
void reshape(int x, int y);

Scope *scope;


////////////////////////////////
// main
int main(int argc, char **argv){

  glutInit(&argc, argv);
  glutInitDisplayMode(GLUT_RGBA | GLUT_DOUBLE);
  glutInitWindowSize(600,600);
  glutInitWindowPosition(100,100);
  glutCreateWindow("Sound");

  // create a scope object 
  scope=new Scope("test.wav");

  glClearColor(0.0,0.0,0.0,0.0);
  glutDisplayFunc(disp);
  glutKeyboardFunc(keyb);
  glutReshapeFunc(reshape);

  glutMainLoop();
  return 0;
}

The parameter that is passed to the Scope class constructor is the filename of the Wav file to open.

////////////////////////////////////
// disp
void disp(void){
  scope->draw();
  glutSwapBuffers();
}

The scope class can draw itself we only swap the buffers here.

////////////////////////////////////
// keyb
void keyb(uchar key, int x, int y ){
  if(key == 'q'){
    exit(0);
  }else if(key == 'n'){
    scope->alterNum(1);
  }else if(key == 'N'){
    scope->alterNum(-1);
  }else if(key == 'l'){
    scope->alterLimit(10);
  }else if(key == 'L'){
    scope->alterLimit(-10);
  }else if(key == 's'){
    scope->alterStart(1);
  }else if(key == 'S'){
    scope->alterStart(-1);
  }

  glutPostRedisplay();
}

We define 7 keybindings:

  • q: quit
  • n: increase the amount of samples drawn on the screen
  • N: decrease the amount of samples drawn on the screen
  • l: increase the maximum amplitude drawn on the screen
  • L: decrease the maximum amplitude drawn on the screen
  • s: increase the offset
  • S: decrease the offset
//////////////////////////
// reshape
void reshape(int x,int y){
  scope->setDimensions(x,y);
}

When the window is reshaped we modify the parameters so the maximum viewable area is being used.

The result

An archive containing the source files and the test.wav file can be downloaded by clicking here

The result of our test.wav can be seen in the screenshot below:

A sine

And another view of our sine:

A sine

Until now we've only been using the (predictable) test.wav so I asked someone (Aergn) to provide me with another random wav file and below you can see the result of that file:

Another WAV file

Below you can see the result after zooming into the the same wav file:

Zooming