Introduction
There are various types of regressions used in different scenarios and conditions. These various scenarios and the appropriate regression to be used in each of these scenarios are described as follows:
When There Is 1 Dependent Variable and 1 Independent Variable
When there is 1 dependent variable and 1 independent variable, we use a simple linear regression. The SPSS output of simple linear regression is shown as follows:
Suppose age is the independent variable and earnings are the dependent variable.
Model Summary
Model  R  R Square  Adjusted R Square  Std. Error of the Estimate 
1  .374a  .140  .109  1303.23184 
 Predictors: (Constant), Age
ANOVA^{a}
Model 
Sum of Squares 
df 
Mean Square 
F 
Sig. 

1  Regression 
7730417 
1 
7730416.552 
4.552 
.042b 

Residual 
47555570 
28 
1698413.218 

Total 
55285987 
29 
 Dependent Variable: Earnings
 Predictors: (Constant), Age
Coefficients^{a}
Model 
Unstandardized Coefficients 
Standardized Coefficients 
t 
Sig. 

B 
Std. Error 
Beta 

1  (Constant) 
956.437 
1480.319 
.646 
.523 

Age 
57.724 
27.057 
.374 
2.133 
.042 
 Dependent Variable: Earnings
The Rsquared of 0.374 indicates that 37.4% of the variation in the dependent variable of this study, i.e., ‘earnings’, is explained by the independent variable of this model. The coefficient of age is 57.724. This implies that a 1year rise in age leads to a 57.724 $ rise in the earnings; all other factors remain constant.
When There Is 1 Dependent Variable and Multiple Independent Variables
When there is 1 dependent variable and multiple independent variables, we use multiple linear regressions. The SPSS output of multiple linear regression is shown as follows. Now, suppose that ‘skill level’ and ‘experience’ also affect age, i.e., they are also the independent variables along with ‘age’.
Model Summary
Model  R  R Square  Adjusted R Square  Std. Error of the Estimate 
1  .230a  .053  .105  1597.82217 
 Predictors: (Constant), Skill, Experience, Age
ANOVA^{a}
Model 
Sum of Squares 
df 
Mean Square 
F 
Sig. 

1  Regression 
2569739.329 
3 
856579.776 
.336 
.800b 
Residual 
45954642.489 
18 
2553035.694 

Total 
48524381.818 
21 
 Dependent Variable: Earnings
 Predictors: (Constant), Skill, Experience, Age
Coefficients^{a}
Model 
Unstandardized Coefficients 
Standardized Coefficients 
t 
Sig. 

B 
Std. Error 
Beta 

1  (Constant) 
591.081 
8790.423 
.067 
.947 

Age 
48.324 
172.219 
.187 
.281 
.782 

Experience 
4.792 
15.184 
.073 
.316 
.756 

Skill 
62.696 
1649.962 
.025 
.038 
.970 
 Dependent Variable: Earnings
The Rsquared of 0.230 indicates that 23.0% of the variation in the dependent variable of this study, i.e., ‘earnings’, is explained by the three independent variables of this model. The coefficient of age is 48.324. This implies that a 1year rise in age leads to a 48.234 $ rise in the earnings; all other factors remain constant.
The coefficient of experience is 4.792. This implies that a 1year rise in experience leads to a 4.792 $ rise in the earnings, all other factors remain constant. The coefficient of skill is 62.696. This implies that a 1 unit rise in the skill score leads to a 62.696 $ rise in the earnings; all other factors remaining constant.
When the Dependent Variable Is a Binary Variable
When the dependent variable is a binary variable, we can use three types of regression:
 OLS,
 Logistic regression and
 Probit regression.
Suppose the binary dependent variable is ‘default’ which takes a value of 1 if the person defaults and takes a value of 0 if the person does not default. There are many factors that affects whether a person would default on a loan or not, e.g., income, family size, integrity level etc. The three SPSS regressions for this example are as follows. They all have different output and interpretations.
OLS
Ordinary Least Squares (OLS) is famous of all regression techniques, as it is a linear modelling technique capable of being used in the modelling of a single response variable which has been recorded on at least an interval scale. It is the appropriate starting point for all spatial regression analyses.
This technique may be applied to single or multiple explanatory variables and categorical explanatory variables appropriately coded, since it provides a global model of the variable or process you are trying to understand or predict; it creates a single regression equation to represent that process.
Model Summary
Model  R  R Square  Adjusted R Square  Std. Error of the Estimate 
1  .592a  .351  .276  .43279 
 Predictors: (Constant), Integrity level, Family size, Income
ANOVA^{a}
Model 
Sum of Squares 
df 
Mean Square 
F 
Sig. 

1  Regression 
2.630 
3 
.877 
4.680 
.010b 
Residual 
4.870 
26 
.187 

Total 
7.500 
29 
 Dependent Variable: Default
 Predictors: (Constant), Integrity level, Family size, Income
Coefficients^{a}
Model 
Unstandardized Coefficients 
Standardized Coefficients 
t 
Sig. 

B 
Std. Error 
Beta 

1  (Constant) 
1.205 
.218 
5.520 
.000 

Income 
.002 
.002 
.388 
1.333 
.194 

Family size 
.083 
.080 
.187 
1.033 
.311 

Integrity level 
.059 
.156 
.118 
.381 
.706 
 Dependent Variable: Default
A 1 unit rise in ‘income’ leads to a 0.002 unit fall in the probability that the person would default; all other factors remain constant.
A 1 unit rise in ‘family size’ leads to a 0.083 unit fall in the probability that the person would default, all other factors remaining constant. A 1 unit rise in ‘integrity level’ leads to a 0.059 unit fall in the probability that the person would default; all other factors remain constant.
Logistic regression
This is a statistical technique for data analysis in which there are one or several independent variables that determine an outcome. The outcome is quantified with a dichotomous variable where there are only two possible outcomes.
The main aim of logistic regression is to determine the best fitting and biologically reasonable model to describe the correlation between the dichotomous characteristic of interest and a set of independent variables.
Model Fitting Information
Model 
Model Fitting Criteria 
Likelihood Ratio Tests 

2 Log Likelihood 
ChiSquare 
df 
Sig. 

Intercept Only 
35.233 

Final 
3.688 
31.545 
7 
.000 
Pseudo RSquare
Cox and Snell 
.651 

Nagelkerke 
.867 

McFadden 
.758 
Likelihood Ratio Tests
The chisquare statistic is the difference in 2 loglikelihoods between the final model and a reduced model. The reduced model is formed by omitting an effect from the final model. The null hypothesis is that all parameters of that effect are 0.
Effect 
Model Fitting Criteria 
Likelihood Ratio Tests 

2 Log Likelihood of Reduced Model 
ChiSquare 
df 
Sig. 

Intercept 
3.688a 
0.000 
0 

Income 
3.688 
.000 
2 
1.000 

Family size 
9.923 
6.235 
3 
.101 

Integrity 
4.193 
.505 
2 
.777 
 This reduced model is equivalent to the final model because omitting the effect does not increase the degrees of freedom.
Parameter Estimates
Default a 
B 
Std. Error 
Wald 
df 
Sig. 
Exp(B) 
95% Confidence Interval for Exp(B) 

Lower Bound 
Upper Bound 

Intercept 
1.099 
1.155 
.905 
1 
.341 

[Income=100.00] 
.000 
3287.265 
.000 
1 
1.000 
1.000 
0.000 
.b 

[Income=200.00] 
.189 
9421.005 
.000 
1 
1.000 
.828 
0.000 
.b 

[Income=300.00] 
0c 
0 

[Family size=.00] 
.292 
5117.214 
.000 
1 
1.000 
.747 
0.000 
.b 

[Family size=1.00] 
17.911 
3287.265 
.000 
1 
.996 
60085080.228 
0.000 
.b 

[Family size=2.00] 
.292 
6133.191 
.000 
1 
1.000 
.747 
0.000 
.b 

[Family size=3.00] 
0c 
0 

[Integrity=1.00] 
19.010 
0.000 
1 
5.548E09 
5.548E09 
5.548E09 

[Integrity=2.00] 
17.104 
8968.264 
.000 
1 
.998 
26809918.315 
0.000 
.b 

[Integrity=4.00] 
0c 
0 
 The reference category is: 1.00.
 Floating point overflow occurred while computing this statistic. Its value is therefore set to system missing.
 This parameter is set to zero because it is redundant.
A 1 unit rise in ‘income’ leads to a 3.688 unit rise in the log of the odds ratio that the person would default; all other factors remain constant. A 1 unit rise in ‘family size’ leads to a 9.923 unit rise in the log of the odds ratio that the person would default; all other factors remain constant. A 1 unit rise in ‘integrity level’ leads to a 4.193 unit rise in the log of the odds ratio that the person would default; all other factors remain constant.
The assumptions of the logistic model are as follows. The logistic model accepts that information is caseparticular; that is, every free factor has a solitary incentive for each case. The logistic model likewise expects that the response variable can’t be impeccably anticipated from the autonomous factors for any case. Similarly, as with different sorts of relapse, there is no requirement for the autonomous factors to be measurably free from each other (dissimilar to, for instance, in a gullible Bayes classifier).
Be that as it may, collinearity is thought to be generally low, as it ends up plainly hard to separate between the effects of a few factors if this isn’t the case. In the event that the logistic is utilized to demonstrate decisions, it depends on the supposition of freedom of unimportant options, which isn’t generally alluring. This suspicion expresses that the chances of leaning towards one class over another doesn’t rely upon the nearness or nonappearance of other “insignificant” choices.
Probit regression
Probit Analysis
Parameter Estimates
Parameter 
Estimate 
Std. Error 
Z 
Sig. 
95% Confidence Interval 

Lower Bound 
Upper Bound 

PROBIT a  Family size 
.051 
.077 
.656 
.512 
.202 
.101 

Integrity level 
.347 
.157 
2.207 
.027 
.655 
.039 

Intercept 
2.059 
.207 
9.961 
.000 
2.266 
1.852 
 PROBIT model: PROBIT(p) = Intercept + BX
Covariances and Correlations of Parameter Estimates
Family size 
Integrity level 

PROBIT  Family size 
.006 
.482 

Integrity level 
.006 
.025 
 Covariances (below) and Correlations (above)
ChiSquare Tests
ChiSquare 
Df a 
Sig. 

PROBIT  Pearson GoodnessofFit Test 
28.625 
27 
.379 
 Statistics based on individual cases differ from statistics based on aggregated cases.
Cell Counts and Residuals
Number 
Family size 
Integrity level 
Number of Subjects 
Observed Responses 
Expected Responses 
Residual 
Probability 

PROBIT  1 
1.000 
1.000 
100 
1 
.701 
.299 
.007 

2 
1.000 
1.000 
100 
1 
.701 
.299 
.007 

3 
0.000 
1.000 
300 
1 
2.420 
1.420 
.008 

4 
0.000 
1.000 
100 
1 
.807 
.193 
.008 

5 
0.000 
1.000 
100 
1 
.807 
.193 
.008 

6 
1.000 
1.000 
100 
0 
.701 
.701 
.007 

7 
1.000 
1.000 
100 
0 
.701 
.701 
.007 

8 
0.000 
1.000 
100 
1 
.807 
.193 
.008 

9 
2.000 
1.000 
100 
1 
.608 
.392 
.006 

10 
2.000 
1.000 
100 
1 
.608 
.392 
.006 

11 
2.000 
1.000 
100 
1 
.608 
.392 
.006 

12 
3.000 
1.000 
100 
1 
.527 
.473 
.005 

13 
3.000 
1.000 
100 
1 
.527 
.473 
.005 

14 
3.000 
1.000 
100 
1 
.527 
.473 
.005 

15 
3.000 
1.000 
100 
1 
.527 
.473 
.005 

16 
3.000 
1.000 
100 
1 
.527 
.473 
.005 

17 
3.000 
2.000 
200 
0 
.367 
.367 
.002 

18 
3.000 
2.000 
200 
0 
.367 
.367 
.002 

19 
3.000 
2.000 
200 
0 
.367 
.367 
.002 

20 
3.000 
2.000 
200 
0 
.367 
.367 
.002 

21 
3.000 
2.000 
200 
0 
.367 
.367 
.002 

22 
1.000 
2.000 
200 
0 
.505 
.505 
.003 

23 
3.000 
2.000 
200 
0 
.367 
.367 
.002 

24 
3.000 
2.000 
200 
0 
.367 
.367 
.002 

25 
3.000 
2.000 
200 
0 
.367 
.367 
.002 

26 
3.000 
2.000 
300 
0 
.551 
.551 
.002 

27 
3.000 
4.000 
300 
0 
.048 
.048 
.000 

28 
3.000 
4.000 
300 
0 
.048 
.048 
.000 

29 
3.000 
4.000 
300 
0 
.048 
.048 
.000 

30 
3.000 
4.000 
300 
1 
.048 
.952 
.000 
When the Data Is Panel Data
When the data on which the regression is to be applied is a panel data (i.e., data of multiple cross sectional units over a period of time) then we can use two types of regression:
 Random effect model and
 Fixed effect model.
Suppose we have data about the education level and GDP of various countries over a 5year period. This is a panel data. To analyze the impact of the education level on GDP, we would have to run random effect regression and fixed effect regression in STATA.
Random effect model
Syntax: xtreg gdp education level, re  
Randomeffects GLS regression  Number of obs  = 
25 
Group variable: country  Number of groups  = 
5 
Rsq: within = 0.8848  Obs per group: min  = 
5 
Between = 1.0000  avg  = 
5 
Overall = 0.9948  max  = 
5 
Wald chi2(1)  = 
4438.08 

Correlation (u_i, X) = 0 (assumed)  Prob > chi2  = 
0 
gdp  Coef.  Std. Error  z  P> [z]  [95% Conf.  Interval] 
Education level
_cons 
127.8054
14111.01 
1.918454
1208.535 
66.62
11.68 
0.000
0.000 
124.0453
16479.7 
131.5655
11742.33 
Sigma_u
Sigma_e Rho 
0
3043.5487 0 
Fraction of variance to due u_i 
Fraction of variance to due u_i 
Fraction of variance to due u_i 
Fraction of variance to due u_i 
Fraction of variance to due u_i 
A 1 unit rise in education level leads to a 127.8054 unit rise in the GDP; all other factors remain constant.
Fixed effect model
Syntax: xtreg gdp education level, fe  
Fixedeffects (within) regression  Number of obs  = 
25 
Group variable: country  Number of groups  = 
5 
Rsq: within = 0.8848  Obs per group: min  = 
5 
Between = 1.0000  Avg  = 
5 
Overall = 0.9948  Max  = 
5 
F(1,19)  = 
145.9 

Correlation (u_i, Xb) = 0.9803  Prob > F  = 
0 
gdp  Coef.  Std. Error  t  P> [t]  [95% Conf.  Interval] 
Education level
_cons 
129.1728
14876.79 
10.69394
6019.463 
12.08
2.47 
0.000
0.023 
106.7902
27475.67 
151.5555
2277.909 
Sigma_u
Sigma_e Rho 
449.94136
3043.5487 .0213876 
fraction of variance to due u_i 
fraction of variance to due u_i 
fraction of variance to due u_i 
fraction of variance to due u_i 
fraction of variance to due u_i 
 F test that all u_i = 0: F (4, 19) = 0.0; Prob > F = 1.0000.
A 1 unit rise in education level leads to a 129.1728 unit rise in the GDP; all other factors remain constant.
Leave a Reply