|Introduction to Spectral Analysis and Matlab
IRIS Summer Intern Orientation, 2012
The object of this lab is to explore the relationship between the time domain and the frequency domain using MATLAB. You will first look at pure sine waves as a function of time and their representation in the frequency domain, and then examine some earthquake data.
MATLAB is a commonly used commercial package designed to manipulate and plot all sorts of data. The MATLAB introduction states:
MATLAB is a high-performance language for technical computing. It integrates computation, visualization, and programming in an easy-to-use environment where problems and solutions are expressed in familiar mathematical notation. Typical uses include:
MATLAB is an interactive system whose basic data element is an array that does not require dimensioning. This allows you to solve many technical computing problems, especially those with matrix and vector formulations, in a fraction of the time it would take to write a program in a scalar noninteractive language such as C.
Today's lab will use only a few of the features offered by MATLAB, but should give you enough of an introduction to allow you to understand the basic syntax, input/output and plotting.
PART ONE - ARTIFICIAL DATA
First login to Fedora (not Windows) and open a terminal window.
Then create a new directory in your home directory to use for your MATLAB processing, cd to that directory and then copy all files from the system directory written on the board to that directory. Finally start up MATLAB by typing matlab in the terminal window.
The initial windows that appear include Command, Workspace, and Command history. Many operations can be performed either in the command window or via the drop down menus. In most cases the command window commands will be listed here.
MATLAB is designed for easy matrix manipulation. The matrices are like excel spreadsheets in that columns and rows can be manipulated as a unit or as individual elements.
To get a feel for the MATLAB matrix syntax, click on MATLAB help, go to getting started, then Matrices and Magic Squares.
Loading and listing data
In the text that follows, lines beginning with “ >> “ are commands to be entered into the command window.
Cell #1 – Playing with simple sine waves in the Time Domain
>>load sine_waves (loads the file sine_waves from the current directory)
>>who (lists the variables you have loaded or created)
>>whos (lists the variables you have loaded or created and their sizes)
The file sine_waves has 512 rows and 4 columns. The first column is the time in seconds.
Columns 2-4 are the amplitudes of 3 different sine waves, as sampled at the times listed in column 1.
To view the elements of a matrix or any variable, simply type its name.
or double click on the variable name in Workspace.
You can also select a single column:
or just a few elements:
What is the sampling interval of the data (ie. the time in seconds between successive samples)?
How many samples are there per second?
The maximum signal frequency that can be correctly observed is half the sampling frequency. This is called the Nyquist frequency. What is the Nyquist frequency in this case?
To plot data, use the plot command and select the columns you want to plot against each other:
>>plot(sine_waves(:,1),sine_waves(:,2)); (plots column 1 (time) vs column 2 (amplitude))
You can manipulate the plot using the plot menu items. For example, Edit, axis properties lets you change the x and y scales, and add axes labels and a title.
Or you can use the commands:
Figure 2 (Use these commands to get a zoom of figure 1)
>>axis([0 5 -1.5 1.5]);
To plot a second line on the plot:
>>hold on (hold axes on for later plots, hold off allows replotting of new data, the default is to clear the plot each time (hold off))
Figure 3 (Still zoomed in a per Figure 2):
>>plot(sine_waves(:,1),sine_waves(:,3),':'); (plots column 1 vs col 3 using a dotted line)
If you want to add a line at y=0, which makes it easier to determine frequency, create a new variable called zero_line, and fill it with zeros:
If you want to look at the plots independently:
>>clf (clears the graphics window)
then replot as desired.
To open multiple windows so you can look at plots separately select New Figure in the File plot menu
You can find the x,y value of any point on the graph by selecting data cursor in the Tools menu
What are the frequencies of the sine waves in column 2 and column 3?
What are their relative amplitudes (ie what is the ratio of their amplitudes)?
Comparing individual sine waves to their sum
Column 4 is the sum of the amplitudes of columns 2 and 3. To plot column 4:
Figure 4: Add the sum of the two sine waves to your previous plot and include a legend
>>plot(sine_waves(:,1),sine_waves(:,4),'--'); (plots a dashed line)
Note that while you can tell that the resulting wave contains more than one frequency, it is harder to estimate the relative amplitudes of the two frequencies when they are summed together.
Cell #2: Frequency Domain
To transform to the frequency domain, calculate the Fourier transform for the sine waves in columns 2-4 of sine_waves. You must give the Fourier transform the sampling frequency (in this case 25 Hz).
For each sine wave, a new matrix is created with frequency (Hz) in column 1, amplitude in column 2 and phase in column 3. Looking at the numbers in transformed2:
What is the sampling interval in the transformed data (in Hz)
What is the maximum frequency?
Does this agree with your determination of the Nyquist frequency?
Before plotting the spectra, consider what you might expect for the frequency response of each sine wave.
Now plot the amplitude spectrum for the sine wave that was in column 2 of sine_waves:
At what frequency is there a maximum?
Now plot the amplitude spectrum for the sine wave that was in column 3 of sine_waves.
What is the peak frequency of this sine wave?
What is the relative amplitude of the peaks for the 2 waves?
How does this ratio compare to your measured ratio of the sine wave amplitudes?
Figure 6: Plot the spectrum of the two sine waves as well as their sum. Use a different color for each line. Include a Legend.
Now plot the amplitude spectrum for the sine wave that was in column 4 of sine_waves, using a different line symbol.
How do the spectral amplitudes of the combined sine waves compare to the spectral amplitudes of the individual sine waves?
This shows that the combination of sine waves is a linear process, which leads to the concept that an arbitrarily shaped wave can be created by the addition of a sufficient number of sine waves.
Cell #3: Construction of a wave plot
Now load the file multi_sine.
The file multi_sine includes 10 different sine waves (in columns 2-11) which have been phase shifted so that there is one time when all the sine waves are at a maximum. Column 1 is time as in sine_waves. You can plot them individually in the time domain to see what they look like.
Figure 7: Plot all 10 sine waves. Use a google search (ex. “MATLAB colors”) to try and make each wave a different color.
To add all the sine waves together at each point in time, you can do it the long way:
or you can use MATLAB's sum utility (which sums columns) along with its transpose utility (which swaps rows and columns):
bigwave is now the sum of all 10 sine waves.
Figure 8: Make a plot of “bigwave” in the time domain
Plot bigwave in one window and in another window plot the 10 sine waves used to create it. Note how the sum of continuous sine waves results in a wave packet of finite duration.
Figure 9: Now calculate and plot the Fourier amplitude spectrum of bigwave (the sampling interval is the same as before).
What are the frequencies of the sine waves that make up the wave pulse in bigwave?
Cell #4 – Frequency of a spike
This example shows how many sine and cosine functions are needed to create a pulse-like waveform. To create a single pulse, an infinite series of sine and cosine functions have to be added together. A single pulse can be found in column 13 of multi_sine.
Figure 10: Use “subplot” to create a two-window plot. The top plot will show the spike in the time domain, and the bottom plot will show the fourier transform of the spike.
What is the frequency response of the spike in column 13 of multi_sine? Why?
Cell #5 – Real Data at last
For the following sections, please work in pairs and discuss your shared results.
Now you can examine earthquake data that were collected during an earthquake hazard assessment study of the Wellington, New Zealand region. The file quake_data includes 10 seconds of S wave recording from 3 different sites for the same earthquake. Column 2 is data from a rock site, column 3 is the recording from a sedimentary basin site and the column 4 seismograph was located on an old peat bog (now housing development). As before, time is in column 1.
Figure 11: Plot the 3 seismograms in the time domain and compare the signals. Use different colors for each site and include a legend.
What is the sampling rate?
What is the Nyquist frequency?
Transform the time series to the frequency domain as before (don't forget to include the new sampling rate).
Figure 12: Make a plot of the three seismograms in the frequency domain. Be sure to include a legend.
What is the maximum frequency now?
First look at the frequency response of the rock site (column 2).
What is the range of frequencies in the ground motion?
Now look at the frequency response of the ground motion after the seismic waves have traveled through the soft sediments below the sites in columns 3 and 4. You may want to change the axes so that you can focus on the low frequencies.
Note that there are clearly defined peaks in frequency for columns 3 and 4 but not for the rock site (column 2).
What is the frequency at which there is a maximum for columns 3 and 4?
These data were collected in sedimentary basins that can shake like a bowl of jello. A basin can resonate at particular frequencies just like a simple harmonic oscillator. The resonance continues long after the seismic energy has dissipated at the nearby rock site. The frequency of oscillation for a simple cylindrical basin is related to the velocity of the material and the depth of the basin. A wave whose wavelength is four times the thickness of the basin will resonate in the basin.
4 x thickness = wavelength = velocity / frequency
The surface shear wave velocity for the site in column 3 has been measured at 110 m/s, while for the site in column 4 it was 80 m/s. What is the approximate thickness of the two basins?
Continue with the rest of the exercise if you have time.
Inverse transform - Frequency domain to time domain
The Fourier transform can be used to go either from time to frequency or from frequency to time. To verify that this is true, see the optional explorations at the end of the lab.
It is possible to filter out frequencies that aren't of interest for a particular problem, or to select for particular frequencies. For example, local earthquakes include much higher frequency content than distant earthquakes.
We will now examine broadband data recorded in at station LKWY in Utah from a magnitude 7.9 earthquake on the Denali fault in Alaska in November 2002. First load the data using the m-file algorithm load_sac:
>>[sachdr,utahz] = load_sac(‘lkwyz.sac’);
This loads information about the data in sachdr and loads the amplitude data into a single column in utahz. The sampling frequency is 40 Hz.
Try plotting the data to see what phases are visible. You may also want to examine the spectra (the spectra calculation will take less time if you limit the calculation to the first 30000 points). To enhance the high frequency arrivals you can apply a bandpass filter:
where the arguments are: filbutt(Data,Samp_Rate,Low_Limit,High_Limit)
Compare the time series before and after filtering.
This particular filter doesn’t have a very sharp frequency cut off. One way to increase the sharpness of the filter is to filter the data a second or third time using the same parameters. What does this do to the waveform and the spectra?
After a 3rd round of filtering do you see phases you didn’t see in the raw data? What do you think might be causing the high frequency signals? Do you think they are from local or distant sources? Why?
If you have time - optional explorations
Effects of tapering and signal length
Two of the reasons that the spectral amplitude peak is spread out over a range of frequencies are the finite length of the sine wave in time and the abrupt truncation of the waves at the end of the file. The amplitude of the spurious frequencies can be reduced by tapering the wave so that its amplitude drops smoothly to zero at the end of the time series.
You can do this with the function taper:
>>tapered = taper(sine_waves(:,4), 10); (taper 10% of both ends of sine_waves column 4)
Plot the tapered signal and compare it to the untapered signal.
Now Fourier transform the data as before and compare the tapered and untapered signals.
Has the amplitude of the spurious frequencies decreased?
Calculate and plot the spectra for a sine wave that is truncated at row 130:
How does the spectrum compare to the previous example?
Inverse transform - Frequency domain to time domain
The Fourier transform can be used to go either from time to frequency or from frequency to time. To verify that this is true, take the frequency domain result from the New Zealand data for column 2 and transform it to the time domain. If qtrans2 is the output matrix from fourier, with frequency in column 1, amplitude in column 2 and phase in column 3, then you can calculate the inverse transform by:
where 100 is the resulting sample rate in Hz.
Plot the initial data (quake_data) and the twice transformed data (intrans2) and compare. Does any information appear to have been lost in the transformations?
Filtering a spike in the time domain
Try several different bandpass filters on the spike in multi_sine (column 13) in the previous section. What is the result in the time and frequency domains?
Lowpass filtering of the spike is a good analog of what happens to an impulsive seismic source as the sensor moves further away from the source.
Advanced plotting of seismograms
Try looking at the other components of station LKWY: lkwyn.sac, lkwye.sac.
Create a plot of the LKWY data with a proper time axis, either just with relative time using the digitization rate (40 Hz), or also using the start time in the header file (sachdr).
List of MATLAB commands used in lab
The syntax for all the commands can be found in the help pages except for those marked "not a basic matlab function". The syntax for these can be found in the files with a .m extension on the distributed CD.
axis - sets user defined axes
clf - clears graphics window
clear - clears all variables
clear name - clears just the variable name
exit - leave matlab
hold on - keeps plot from clearing for each new line
filbut - bandpass filter (not a basic matlab function)
fourier - calculate Fourier transform (not a basic matlab function)
load - loads in a data file
load_sac – loads in a data file in SAC format (not a basic matlab function)
plot - x vs y plot
print - saves current graphics plot to disk
save filename - saves all the current variables to disk in the file filename.mat
sum - sum the columns of a matrix
title - puts title on plot
xlabel - labels x axis
ylabel - labels y axis
zero - creates a file of zeros
; - at end of command: execute the command but don't print the result on the screen
\' - transpose of a matrix