Tutorial 10

Basic statistics

A look at descriptive statistics and linear regression

 

back to advanced topics

on to regression

Excel has a wide variety of built-in statistics functions that give, for instance, the slope and y-intercept of a line, the standard deviation of a data sample, and the mean, median and mode of a set of values. In this tutorial we will cover a few of the more useful and popular statistics functions, but there are many more built-in statistics functions that you can learn about via Excel's help files.


Basic built-in functions. (AVERAGE, MEAN, MODE, COUNT, MAX, MIN)

We will use the familiar example of a class's grades to illustrate the use of some of the more basic Excel functions, like AVERAGE( ), MODE( ) AND MAX( ). Assume a class's grade distribution is as follows: 3, 0, 4, 4, 4, 2, 4, 1, 4, 0, 3, 3, 1, 1, 3. These grades are based on a 4-point scale with 4=A and 0=F and are entered into an Excel worksheet shown below. Using the AVERAGE( ) function, we find the class's average (or arithmetic mean) grade is a disappointing 2.47, or a mid-C. The syntax for this common function is =AVERAGE(number1, number2, ...) and is displayed in the screen shot below. For more information using the AVERAGE function, see the arithmetic section in this tutorial.

However we don't get a clear picture of the class's performance by simply looking at its average. We can further analyze the data using the MEDIAN( ) function. The median gives the middle number in a set of numbers and its syntax is =MEDIAN(number1, number2,...). We see from the screen shot below that the median grade is 3.0, meaning that half of the grades are higher than 3.0, and half are lower. Therefore, despite the low class average, more students scored 3's and 4's than 2's, 1's and 0's.

Additionally, we can also analyze the grade distribution by using the MODE( ) function. The mode gives the most frequently occurring value of a set of numbers and its syntax is =MODE(number1, number2,...). From the screen shot, we see that the mode grade is 4, meaning that a score of 4 was the most common grade. Again, the instructor of the class can take heart that, despite the low class average, more students made A's than any other grade.

Without going into too much detail, we can also use some of Excel's built-in functions to determine the number of grades entered, and the maximum and minimum grades of the distribution. The syntax for these functions are shown below in the bulleted list and also in the screen shot.

 

 

 

 


Excel's statistical abilities are not limited to the built in functions discussed above. One of the add in packages available for Excel includes a Data analysis package located under the Tools section of the menu bar. Clicking Data analysis opens a drop down list of tests for determining the statistical significance of your data as well as presenting an overview of all of the common descriptive statistics. Using the Descriptive statistics package will present you with a table including the information you determined individually above.

 

 

 

 

Start by selecting Tools Data analysis Descriptive statistics. Remember that Excel expects your data to be organized into columns unless you tell it otherwise.

This will open a dialog box and allow you to

  1. Enter the input range by collapsing the dialog box and selecting the data directly from the data table as before
  2. Select column or row arrangement (remember excel expects columns)
  3. Inform Excel that you have entered a column label in the first cell of the column. You did do that; right?
  4. Enter the cell you want the the table to start in. Again click the collapse icon to the right and then select the cell directly from the worksheet.
  5. Select summary statistics by checking the box

 

 

 

 

 

 

 

That's it click ok. A descriptive statistics table will appear. You could calculate each of these values individually using built in functions or calculate all of them and present the results in a la bled table.

There is one important difference in the values calculated by the Data analysis package Descriptive statistics analysis. It will not be updated automatically.If you calculate each statistic from a built in function and you edit your original data table, the statistical value will be automatically. You must rerun the Data analysis Descriptive statistic if you change anything in your original data table.

This is true of any calculation performed by the Data Analysis package. Try not to forget this.

 

 

 

 

 

 

Now that you have a descriptive statistics table all you need to do is delete the rows that contain statistics that you are not going to discuss, format the numeric column as Numbers, and adjust the significant figures. Finally format the table according to the style manual and you done.

 

 

 

 

 

 

 

 



Linear regression equations. (SLOPE, INTERCEPT, CORREL)

Excel has three built-in functions that allow for a third method for determining the slope, y-intercept, correlation coefficient, and R-squared values of a set of data. The functions are SLOPE(), INTERCEPT(), CORREL() and RSQ(), and are also covered in the statistics section of this tutorial.

The syntax for each are as follows:

I use these functions for two reasons. One, using them is easier and faster than plotting the data and adding a trendline -- although a visual graph shows trends in the data better than any other tool. And, two, it is often necessary to operate on the slope and y-intercept, and using the SLOPE( ) and INTERCEPT( ) functions allows the scientist to automate this, rather than manually transcribing the values given by the graph.

As an example, let's examine the equation of motion, , for a car coming to a stop. If we measure the car's position and velocity we can determine its acceleration and its initial velocity with the use of the SLOPE( ) and INTERCEPT( ) functions. The equation of motion has the form of , so if the square of the car's velocity is plotted along the y-axis and its position along the x-axis, then the slope is , and the y-intercept is simply .

Note that in order to find the acceleration, we must divide the slope by 2 and to find the initial velocity, we must take the square root of the y-intercept.

Note that the CORREL( ) function was used to ensure that the data did display a linear trend -- otherwise, the slope and y-intercept values are meaningless! It is always a good idea to plot the data as well as use these statistics functions because sometimes trends are not obvious. Additionally, a plot of the data allows us to visualize the data and gross blunders and errant data points are easily detected. The graph below tells us immediately that our data appears reasonable.

You must always convince yourself of what the data is telling you.



Error analysis tools. (STDEV)

Let's assume we make a number of repetitive measurements of one quantity, say the speed of sound. We could find the average value, but this tells us nothing of the precision of our measurement. For this, we need to calculate the standard deviation of the measured values. To quickly determine the standard deviation of any measurement, use Excel's built-in STDEV( ) function. This is illustrated in the following example:

It should be noted here that to find the accuracy of the experimental value, you will need to determine its percent error from the accepted value of 343.0m/s.



Miscellany. (ABS)

To find the absolute value of a number, use the ABS( ) function. See the examples below.


See the complete list of Excel's built-in statistics functions and their descriptions.

back to advanced topics

on to regression

Copyright © 2000, St. Mary's College of Maryland. All Rights Reserved.

Please send comments, problems or request for topics to

Walter I. Hatch
wihatch@smcm.edu
August 11, 2005