SPSS Training | SPSS Courses
Call us on 01483 688 488
  • Untitled
  • Statistical Tests Overview
    • Untitled
    • Shapiro-Wilks
    • T Tests
  • Training
  • SPSS Overview - Introduction To The Various Windows
  • Location
  • Contact Us
  • Glossary
  • SPSS Keyboard shortcuts

SPSS Introduction - Combinations & Permutations

9/29/2014

0 Comments

 
In a recent SPSS training course that we were running we got into a conversation about combinations and permutations.

Although this may sound a little off topic for an SPSS blog these come into sampling in survey research, significance testing and other statistical methods and so I thought it would be useful to take a slight detour and write a post or two about combinations and permutations and other basic statistical concepts.
 
Combinations and permutations are related concepts and related to how many difference ways things can be sequenced.

Permutations is the number of ways that a given number of items from a given set of items can be put into difference sequences.So for example if you have A, B, C and D (so a set of 4 items) but only take 3 of them at any one time how many different ways are there that they could come up.

Working through the list of possibilities gives us ABC, ACB, ABD, ADB, ACD, ADC, BAC, BCA, BDA, BAD, BCD, BDC, CAB, CBA, CBD, CDB, CAD, CDA, DAC, DCA, DAB, DBA, DBC, DCB. So that is 24 possible permutations in total.

When you are working with a set of n items and taking r items from that set the standard notation and also way of calculating the number of permutations is given by:

nPr = n! / (n-r)! 


This can obviously be simplified to nPn = n!   where you are taking the same number of items from the set as there are items in the set. In this notation n! is n factorial which is calculated as (1 * 2 * 3 *4 ..... * n-1 * n) so when taking 6 distinct items for a set of 6 the number of permutations is 1 * 2 * 3 * 4 * 5 * 6 = 720.


If you are taking 4 items from a set of 6 items then the formula gives you 360 = 720 /(1*2). And using our earlier example taking 3 items from a set of 4 items gives (1*2*3*4) / (1*2) = 24 items.

Combinations is the number of subsets that can be derived from any given set of items. Combinations is independent of sequence which, as you will have seen above, is not the case with permutations where ABC and ACB, for example, are separate items and counted as such. In our above example there are 6 versions of the combination of the 3 letters ABD, so there are 6 permutations of ABD but these are all only one combination.

The relevant notation for combinations is shown below. As you can see it's very similar to the notation for permutations.

nCr = n! / ((n-r)!*r!) where again n is the number of items in the set and r is the number of items being taken from the set. So for our previous example with A, B, C and D the notation gives 4C3 = (1*2*3*4) / (1*(1*2*3)) = 4 


Over the next few blog posts I will explain a few more of the basics of statistics as users of SPSS will need to understand this. Although SPSS does a lot of the work for you the more of a grounding you have in basic statistics the simpler and quicker it will be to use it. 


0 Comments

Introduction to SPSS - Linear Regression

9/9/2014

1 Comment

 
Although regression analysis is a huge topic for statisticians I want to just look at a simple case today to help people understand what it is and how it can be used in SPSS. 

Put simply regression analysis is looking for a line of best fit through a series of data points. This will then show the relationships between the 2 datasets very clearly and also allow you to predict new values.

Given that this method uses all data points it is worth remembering that it is not robust to outliers and so some care needs to be taken with the dataset being used. It is also worth remember that it assumes normality for the data.

If you have a data set then linear regression seeks to find the equation in the form of Y = A + BX that most closely fits the data. Where Y is the value on the vertical axis, A is the point at which the line crosses the Y axis and B defines the slope of the line. So if B is 2 then the line will be steep (>45 degrees) as for every one unit that Y increases by X will need to increase by 2. 

Firstly you need to calculate the 5 key pieces of data for your dataset so:
1.  The mean of your x values - X
2. The mean of your y values - Y
3.  The standard deviation of the x values- SDx
4. The standard deviation of the y values - SDy
5.  The correlation between your x values and your y values.- R

Once you have this data then getting your line of best fit is very simple. 

The slope of your line (B) is the R * (SDy/SDx). It is worth remembering that if you get a negative number it is no necessarily wrong. An increase in the number of trains on a particular route is likely to lead to a decrease in average waiting time for a train, or the amount of average overcrowding in a carriage. 

You now need to figure out where the line crosses the Y axis. This you can calculate by using the formula A=Y-MX. 

So once you have completed these steps you now have your equation in the form of Y = A  + B X

Obviously SPSS will allow you to do all of these calculations very quickly. You simply choose ANALYSE from the toolbar and then click REGRESSION > LINEAR. This brings up a dialogue box which will ask you to define which variable is dependent and which independent and it will then calculate the variables in your formula. These can be found in the 'Unstandardised Coefficients' table, under the first column which is labelled 'B'. 

The constant figure in this column is A, so where the line crosses the Y axis. Beneath that you will find the name of the independent variable which gives you B, the slope of the line. 

So there you have it, an introduction to using linear regression and also how you can calculate one in SPSS.



 
1 Comment

Intermediate SPSS - The Bell Curve - Normal or Gaussian Distributions

9/3/2014

0 Comments

 
In my previous introduction to SPSS post about the normal distribution I said that I would come back and explain a little more about Gaussian distributions or normal distributions, which are often known colloquially as 'The Bell Curve'.

Normal distributions are enormously useful, and used very frequently. It is a continuous probability distribution and used a lot in scientific research. The key idea that makes the normal distribution so useful is something called the Central Limit Theorem. This states that the mean of a sample of variables drawn at random from the same distribution will be distributed normally. This is regardless of the distribution of the underlying variable. The sample mean will be the same as that of the underlying population which the sample variance will be equal to the population variance divided by the sample size. This approximation improves as the sample size gets larger. 

So the next question is - why is this so useful? The most useful thing about this is that is allows us to test hypotheses about data without knowing the underlying distribution of that data. 

The second reason it's so useful is that normal distributions are everywhere, and this is in the main because so many variables in nature are not impacted by just one variable but are themselves the sum of many independent variables. Anything in nature (for example individual heights) is the sum of multiple different, and sometimes opposing factors like genes, diet etc. 

Which normal distributions are very useful the key 'gotcha' to look out for is how likely it is to get an outlier. Under a normal distribution results far from the mean (many multiples of the standard deviation) are exceedingly unlikely and so if you expect distant outliers from the mean with any regularity at all it is probably the wrong distribution to use.

Hopefully that adds a little more detail to your understanding of Normal or Gaussian distributions and how 

0 Comments

    Author

    Written by the team at Acuity Training

    Archives

    October 2014
    September 2014
    August 2014
    July 2014
    December 2012
    November 2012
    October 2012

    Categories

    All

    RSS Feed

Powered by Create your own unique website with customizable templates.