Although regression analysis is a huge topic for statisticians I want to just look at a simple case today to help people understand what it is and how it can be used in SPSS.
Put simply regression analysis is looking for a line of best fit through a series of data points. This will then show the relationships between the 2 datasets very clearly and also allow you to predict new values.
Given that this method uses all data points it is worth remembering that it is not robust to outliers and so some care needs to be taken with the dataset being used. It is also worth remember that it assumes normality for the data.
If you have a data set then linear regression seeks to find the equation in the form of Y = A + BX that most closely fits the data. Where Y is the value on the vertical axis, A is the point at which the line crosses the Y axis and B defines the slope of the line. So if B is 2 then the line will be steep (>45 degrees) as for every one unit that Y increases by X will need to increase by 2.
Firstly you need to calculate the 5 key pieces of data for your dataset so:
1. The mean of your x values - X
2. The mean of your y values - Y
3. The standard deviation of the x values- SDx
4. The standard deviation of the y values - SDy
5. The correlation between your x values and your y values.- R
Once you have this data then getting your line of best fit is very simple.
The slope of your line (B) is the R * (SDy/SDx). It is worth remembering that if you get a negative number it is no necessarily wrong. An increase in the number of trains on a particular route is likely to lead to a decrease in average waiting time for a train, or the amount of average overcrowding in a carriage.
You now need to figure out where the line crosses the Y axis. This you can calculate by using the formula A=Y-MX.
So once you have completed these steps you now have your equation in the form of Y = A + B X
Obviously SPSS will allow you to do all of these calculations very quickly. You simply choose ANALYSE from the toolbar and then click REGRESSION > LINEAR. This brings up a dialogue box which will ask you to define which variable is dependent and which independent and it will then calculate the variables in your formula. These can be found in the 'Unstandardised Coefficients' table, under the first column which is labelled 'B'.
The constant figure in this column is A, so where the line crosses the Y axis. Beneath that you will find the name of the independent variable which gives you B, the slope of the line.
So there you have it, an introduction to using linear regression and also how you can calculate one in SPSS.
Put simply regression analysis is looking for a line of best fit through a series of data points. This will then show the relationships between the 2 datasets very clearly and also allow you to predict new values.
Given that this method uses all data points it is worth remembering that it is not robust to outliers and so some care needs to be taken with the dataset being used. It is also worth remember that it assumes normality for the data.
If you have a data set then linear regression seeks to find the equation in the form of Y = A + BX that most closely fits the data. Where Y is the value on the vertical axis, A is the point at which the line crosses the Y axis and B defines the slope of the line. So if B is 2 then the line will be steep (>45 degrees) as for every one unit that Y increases by X will need to increase by 2.
Firstly you need to calculate the 5 key pieces of data for your dataset so:
1. The mean of your x values - X
2. The mean of your y values - Y
3. The standard deviation of the x values- SDx
4. The standard deviation of the y values - SDy
5. The correlation between your x values and your y values.- R
Once you have this data then getting your line of best fit is very simple.
The slope of your line (B) is the R * (SDy/SDx). It is worth remembering that if you get a negative number it is no necessarily wrong. An increase in the number of trains on a particular route is likely to lead to a decrease in average waiting time for a train, or the amount of average overcrowding in a carriage.
You now need to figure out where the line crosses the Y axis. This you can calculate by using the formula A=Y-MX.
So once you have completed these steps you now have your equation in the form of Y = A + B X
Obviously SPSS will allow you to do all of these calculations very quickly. You simply choose ANALYSE from the toolbar and then click REGRESSION > LINEAR. This brings up a dialogue box which will ask you to define which variable is dependent and which independent and it will then calculate the variables in your formula. These can be found in the 'Unstandardised Coefficients' table, under the first column which is labelled 'B'.
The constant figure in this column is A, so where the line crosses the Y axis. Beneath that you will find the name of the independent variable which gives you B, the slope of the line.
So there you have it, an introduction to using linear regression and also how you can calculate one in SPSS.