SPSS Training | SPSS Courses
Call us on 01483 688 488
  • Untitled
  • Statistical Tests Overview
    • Untitled
    • Shapiro-Wilks
    • T Tests
  • Training
  • SPSS Overview - Introduction To The Various Windows
  • Location
  • Contact Us
  • Glossary
  • SPSS Keyboard shortcuts

SPSS Introduction - Advanced Permutations

10/28/2014

1 Comment

 
Having introduced the concept of combinations and permutations in the last post I'd like to continue with this theme and touch on some more advanced concepts. As I said before, SPSS is fantastic and does a huge amount of statistical analysis work for you but you do need to understand the underlying statistical concepts well to get the most from it.

 Last time we talked about combinations and permutations with groups of different items, or more formally put sets of n items each of which was unique. This time I'm going to explore what happens if the groups we're working with aren't all different items. A classic example of this being phone numbers where a number can reoccur a number of times in some cases.

If you want to look at combinations where not all of the set are different then the formula that you use is

nC n1, n2


1 Comment

SPSS Introduction - Combinations & Permutations

9/29/2014

0 Comments

 
In a recent SPSS training course that we were running we got into a conversation about combinations and permutations.

Although this may sound a little off topic for an SPSS blog these come into sampling in survey research, significance testing and other statistical methods and so I thought it would be useful to take a slight detour and write a post or two about combinations and permutations and other basic statistical concepts.
 
Combinations and permutations are related concepts and related to how many difference ways things can be sequenced.

Permutations is the number of ways that a given number of items from a given set of items can be put into difference sequences.So for example if you have A, B, C and D (so a set of 4 items) but only take 3 of them at any one time how many different ways are there that they could come up.

Working through the list of possibilities gives us ABC, ACB, ABD, ADB, ACD, ADC, BAC, BCA, BDA, BAD, BCD, BDC, CAB, CBA, CBD, CDB, CAD, CDA, DAC, DCA, DAB, DBA, DBC, DCB. So that is 24 possible permutations in total.

When you are working with a set of n items and taking r items from that set the standard notation and also way of calculating the number of permutations is given by:

nPr = n! / (n-r)! 


This can obviously be simplified to nPn = n!   where you are taking the same number of items from the set as there are items in the set. In this notation n! is n factorial which is calculated as (1 * 2 * 3 *4 ..... * n-1 * n) so when taking 6 distinct items for a set of 6 the number of permutations is 1 * 2 * 3 * 4 * 5 * 6 = 720.


If you are taking 4 items from a set of 6 items then the formula gives you 360 = 720 /(1*2). And using our earlier example taking 3 items from a set of 4 items gives (1*2*3*4) / (1*2) = 24 items.

Combinations is the number of subsets that can be derived from any given set of items. Combinations is independent of sequence which, as you will have seen above, is not the case with permutations where ABC and ACB, for example, are separate items and counted as such. In our above example there are 6 versions of the combination of the 3 letters ABD, so there are 6 permutations of ABD but these are all only one combination.

The relevant notation for combinations is shown below. As you can see it's very similar to the notation for permutations.

nCr = n! / ((n-r)!*r!) where again n is the number of items in the set and r is the number of items being taken from the set. So for our previous example with A, B, C and D the notation gives 4C3 = (1*2*3*4) / (1*(1*2*3)) = 4 


Over the next few blog posts I will explain a few more of the basics of statistics as users of SPSS will need to understand this. Although SPSS does a lot of the work for you the more of a grounding you have in basic statistics the simpler and quicker it will be to use it. 


0 Comments

Introduction to SPSS - Linear Regression

9/9/2014

0 Comments

 
Although regression analysis is a huge topic for statisticians I want to just look at a simple case today to help people understand what it is and how it can be used in SPSS. 

Put simply regression analysis is looking for a line of best fit through a series of data points. This will then show the relationships between the 2 datasets very clearly and also allow you to predict new values.

Given that this method uses all data points it is worth remembering that it is not robust to outliers and so some care needs to be taken with the dataset being used. It is also worth remember that it assumes normality for the data.

If you have a data set then linear regression seeks to find the equation in the form of Y = A + BX that most closely fits the data. Where Y is the value on the vertical axis, A is the point at which the line crosses the Y axis and B defines the slope of the line. So if B is 2 then the line will be steep (>45 degrees) as for every one unit that Y increases by X will need to increase by 2. 

Firstly you need to calculate the 5 key pieces of data for your dataset so:
1.  The mean of your x values - X
2. The mean of your y values - Y
3.  The standard deviation of the x values- SDx
4. The standard deviation of the y values - SDy
5.  The correlation between your x values and your y values.- R

Once you have this data then getting your line of best fit is very simple. 

The slope of your line (B) is the R * (SDy/SDx). It is worth remembering that if you get a negative number it is no necessarily wrong. An increase in the number of trains on a particular route is likely to lead to a decrease in average waiting time for a train, or the amount of average overcrowding in a carriage. 

You now need to figure out where the line crosses the Y axis. This you can calculate by using the formula A=Y-MX. 

So once you have completed these steps you now have your equation in the form of Y = A  + B X

Obviously SPSS will allow you to do all of these calculations very quickly. You simply choose ANALYSE from the toolbar and then click REGRESSION > LINEAR. This brings up a dialogue box which will ask you to define which variable is dependent and which independent and it will then calculate the variables in your formula. These can be found in the 'Unstandardised Coefficients' table, under the first column which is labelled 'B'. 

The constant figure in this column is A, so where the line crosses the Y axis. Beneath that you will find the name of the independent variable which gives you B, the slope of the line. 

So there you have it, an introduction to using linear regression and also how you can calculate one in SPSS.



 
0 Comments

Intermediate SPSS - The Bell Curve - Normal or Gaussian Distributions

9/3/2014

0 Comments

 
In my previous introduction to SPSS post about the normal distribution I said that I would come back and explain a little more about Gaussian distributions or normal distributions, which are often known colloquially as 'The Bell Curve'.

Normal distributions are enormously useful, and used very frequently. It is a continuous probability distribution and used a lot in scientific research. The key idea that makes the normal distribution so useful is something called the Central Limit Theorem. This states that the mean of a sample of variables drawn at random from the same distribution will be distributed normally. This is regardless of the distribution of the underlying variable. The sample mean will be the same as that of the underlying population which the sample variance will be equal to the population variance divided by the sample size. This approximation improves as the sample size gets larger. 

So the next question is - why is this so useful? The most useful thing about this is that is allows us to test hypotheses about data without knowing the underlying distribution of that data. 

The second reason it's so useful is that normal distributions are everywhere, and this is in the main because so many variables in nature are not impacted by just one variable but are themselves the sum of many independent variables. Anything in nature (for example individual heights) is the sum of multiple different, and sometimes opposing factors like genes, diet etc. 

Which normal distributions are very useful the key 'gotcha' to look out for is how likely it is to get an outlier. Under a normal distribution results far from the mean (many multiples of the standard deviation) are exceedingly unlikely and so if you expect distant outliers from the mean with any regularity at all it is probably the wrong distribution to use.

Hopefully that adds a little more detail to your understanding of Normal or Gaussian distributions and how 

0 Comments

Introduction To SPSS - Blog 2 On Standard Deviation

8/27/2014

0 Comments

 
So in our previous blog we introduced the concept of Standard Deviation and how you use it in SPSS. 

Standard Deviation is a very useful concept if used with care and so I'm going to write a couple more blogs to help you understand more about how it is used in practice. The first one is how to interpret a standard deviation figure.

Probably a very useful way of interpreting a standard deviation is that, assuming that your underlying data is sampled from a Gaussian distribution (for these purposes a normal distribution), you expect approximately 68% of the values of your distribution to lie within one standard deviation of the mean of your distribution, and you expect approximately 95% of your distribution to lie within 2 standard deviations of the mean. Also by extension you can assume that 27% of your population will lie between one and 2 standard deviations from the mean, and 5% will lie more than 2 standard deviations from the mean.

So for example, imagine if your a distribution where the mean is 20 and the standard deviation is 5.

That means that the range within one standard deviation of the mean is 20 + or - 5 so between 25 and 15, and the range within 2 standard deviations of the mean is 30 to 10.

Therefore if you then take another reading the odds are 19 in 20 (95%) that the reading will be between 30 and 10.   

Similarly this also helps you understand why a lower standard deviation also implies a 'tighter' or more defined distribution. If the standard deviation of the above distribution was actually 2.5 not 5 then 95% of the values of the distribution would like between 20 + or - (2 * 2.5) = 25 to 15. This is the same range within which only 68% of the values of our original distribution would lie.

This can also be shown pictorially. The illustrations at the bottom of this post illustrate this point for you.

Don't worry I will write a follow up blog post to give you some more detail on Gaussian distributions but for these purposes we are assuming that your data is sampled from a normal distribution.

That's it for this week's SPSS blog.







Picture
0 Comments

What is Standard Deviation And Where Do I Find It In SPSS

8/18/2014

0 Comments

 

Today is another beginner's tip for people new to using SPSS. What is the standard deviation of a dataset and how do I use SPSS to calculate it.

Standard Deviation is a measure of how widely dispersed our dataset is. It is a fairer and more comprehensive way of describing a dataset than just using a simple mean, median or mode. It actually describes how widely a dataset is dispersed from its mean. This of course means that in order to be really useful, you also need to know the units that your standard deviation is in and the mean of the dataset that it refers to as well. On it's own a standard deviation figure is unlikely to be very useful. A low standard deviation figure implies a tight or little dispersed dataset and conversely a large standard deviation implies a widely dispersed dataset.

It is useful to know how standard deviation is calculated as well so here goes.

It is the square root of the mean of the square of the differences of each variable in the dataset from the datasets mean. So in order to calculate it the sum of all of the squares of each piece of data's difference from the mean of the data set is taken. To get the mean it is then divided by the number of pieces of data and the square root of that is taken.

It is probably most easily illustrated by example. Image a dataset of 3 items - 9, 8, 7, 6 , 5

The mean of this data is 7 and so the square of difference from the mean for the data is 4 (9-7)^1 , 1 (8-7)^1 , 0 , 1 , 4

So the sum of the square of the differences is 10. There are 5 items in the data set and so the mean of this figure is 2 (10/5), and the square root of it is 1.414.

So for our very simple distribution the mean is 7 and the square root is 1.414. Obviously it can be far more complicated to calculate for larger and more

In SPSS to calculate the standard deviation for a dataset it is a very simple process. Select your variables, click STATISTICS and select Standard Deviation as well as Mean and click CONTINUE. SPSS will now very quickly and simply calculate the mean and standard deviation of your data.




I will post more in my next post about standard deviation as it is an important concept in statistics and so for anyone using SPSS. 

0 Comments

Calculating Mean and Median in SPSS

8/12/2014

0 Comments

 
In SPSS it is very quick and simple to calculate the mean and median of a sampl. 
 
You simple choose ANALYSE > DESCRIPTIVE STATISTICS > EXPLORE. You will then need to ensure that in the dependent list is the quantity that we are measuring and describing and in the factor listing we put the factor or quantity that we are exploring. 

So for example the quantity might be people's ages and in the factor listing we would put where they come from. 

SPSS will then calculate the mean and median of your data set for you. 
0 Comments

Back To Basics - The Difference Between Mean And Median

8/4/2014

0 Comments

 
A quick one today. I thought we should just remind everyone of the difference between Mean and Median to avoid any confusion as the issue arose in one of our recent training classes for SPSS.

Both Mean and Median are different types of averages. The mean value is based on all of the values within the data and so will include outliers. In small datasets significant outliers can have a significant impact. It is calculated as the sum of all of the variable in the dataset divided by the number of items in the dataset.

So for example imagine our dataset is 1,11,12,13,14. The sum of these is 51, divided by 5 which gives a mean value of 10.2. As this is less than 80% of our dataset (chosen to illustrate the point!) in this case the outlier has made the mean a less useful statistic.

The Median however is the middle number in an ordered dataset. So imagine our dataset was actually 11,13,1,14,13, we would first order the data into 1,11,12,13,14 and then choose the middle number in the dataset, so 12 is our median. As you can see where you have an outlier in the data the median is a far more useful and representative figure as it removes the influence of the outlier.

The median is easy to calculate for datasets with odd numbers of pieces of data in them. For datasets with even numbers of pieces of data in them we calculate the mean of the middle 2 pieces of data. So assuming that our dataset had an additional piece of data in it, 15, our middle 2 pieces of data would be 12 and 13. Taking the mean of them would give a median of 12.5.  
0 Comments

Entering And Formatting Data - The One Key Rule 

7/28/2014

0 Comments

 
It is critical to get your data entered and set up correctly in SPSS to be sure that you will work quickly and efficiently.
When importing or entering data the rule to remember is as follows: All the information about one thing does in one row, or to expand it a bit information about different things goes in different rows (and the same column) whereas information, or data, about the same thing goes in different columns (and so the same row). 

So if you had a list of peoples' names and their heights, weights and ages then in the first column you would put the first person's name, in the second their height, in the third their weight and in the fourth their age. So all information about the same person goes in the same row but in different columns.

Obviously there are exceptions to this rule when you get to using SPSS and you have a variable that defines a group of things but if you're not absolutely sure you should always stick with the rule above.
 
0 Comments

The Variable View window - A deeper Dive

7/23/2014

0 Comments

 
The Variable View window in SPSS contains information about the data that you can see in the Data View.

In this post I will go through each of the column titles to explain little more about what that column is for. Remember that each column in the Variable View window refers to a piece of data in the Data View.

Name - As you can probably guess this is the name of the dataset. In SPSS each name must be unique and the first character must be alphanumeric. You are not permitted to have any spaces in this field.

Type - This column contains details of what type of data your data set contains. In the main most data falls into 2 or 3 categories - String, Numeric or Date- although there are others, for example dollar for currencies.

If you click on the Type box at the top of the column this will open a dialogue box that will allow you to quickly define the data type.

Width - This is just allowing you to define the maximum number of characters that can be entered for the variable. SPSS's default setting for this is 8.

Decimals - Obviously really, in SPSS this is limited to 16 or fewer. SPSS's default setting for this is 2.

Label - The name setting has some limitations and also needs to be kept relatively short as long names do not show in column headings. The label setting can contain a much longer description of the data which can be very useful, especially when sharing data.

Values - This is used to link numbers to categories where the variable represents a category. For example in some data you may sort between pensioners and workers by their age so that an OAP is a 1 and a worker is a 2. Values then allows 1 to show as an pensioner and 2 to show as a worker. 

Missing - This will tell SPSS what to do if there is a missing piece, or pieces, of data in the data set. Clicking on this column will bring up the Missing Values dialogue box which allows you to define the missing data. You may choose to have all missing pieces of data represented by one number - 999999 for example - or represent data that is missing for different reasons in different ways. An example of this would be if people have ignored or missed out a question they could be categorised as 999999, but if they've written Not Applicable they could be categorised as 888888.

Columns - This defines how wide the column is and so how many characters are displayed at the top of the column.

Align - This is the same as in Microsoft Office products and is left, centre or right.

Measure - Last but not least measure is where the way that the data was measured is recorded - so its either nominal, ordinal or scale.

Hopefully you'll find that helpful next time you're working through a Variable View in SPSS.





0 Comments
<<Previous

    Author

    Written by the team at Acuity Training

    Archives

    October 2014
    September 2014
    August 2014
    July 2014
    December 2012
    November 2012
    October 2012

    Categories

    All

    RSS Feed

Powered by Create your own unique website with customizable templates.