Are you more of a visual learner? Check out our online video lectures and start your calculus course now for free! Image: “science statistics” by anacrus. License: CC0 1.0

## Introduction

There are various types of regressions used in different scenarios and conditions. These various scenarios and the appropriate regression to be used in each of these scenarios are described as follows:

## When There Is 1 Dependent Variable and 1 Independent Variable

When there is 1 dependent variable and 1 independent variable, we use a simple linear regression. The SPSS output of simple linear regression is shown as follows:

Suppose age is the independent variable and earnings are the dependent variable.

Model Summary

 Model R R Square Adjusted R Square Std. Error of the Estimate 1 .374a .140 .109 1303.23184
1. Predictors: (Constant), Age

ANOVAa

 Model Sum of Squares df Mean Square F Sig. 1 Regression 7730417 1 7730416.552 4.552 .042b Residual 47555570 28 1698413.218 Total 55285987 29
1. Dependent Variable: Earnings
2. Predictors: (Constant), Age

Coefficientsa

 Model Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta 1 (Constant) -956.437 1480.319 -.646 .523 Age 57.724 27.057 .374 2.133 .042
1. Dependent Variable: Earnings

The R-squared of 0.374 indicates that 37.4% of the variation in the dependent variable of this study, i.e., ‘earnings’, is explained by the independent variable of this model. The coefficient of age is 57.724. This implies that a 1-year rise in age leads to a 57.724 \$ rise in the earnings; all other factors remain constant.

## When There Is 1 Dependent Variable and Multiple Independent Variables

When there are 1 dependent variable and multiple independent variables, we use multiple linear regressions. The SPSS output of multiple linear regression is shown as follows. Now, suppose that ‘skill level’ and ‘experience’ also affect age, i.e., they are also the independent variables along with ‘age’.

Model Summary

 Model R R Square Adjusted R Square Std. Error of the Estimate 1 .230a .053 -.105 1597.82217
1. Predictors: (Constant), Skill, Experience, Age

ANOVAa

 Model Sum of Squares df Mean Square F Sig. 1 Regression 2569739.329 3 856579.776 .336 .800b Residual 45954642.489 18 2553035.694 Total 48524381.818 21
1. Dependent Variable: Earnings
2. Predictors: (Constant), Skill, Experience, Age

Coefficientsa

 Model Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta 1 (Constant) -591.081 8790.423 -.067 .947 Age 48.324 172.219 .187 .281 .782 Experience 4.792 15.184 .073 .316 .756 Skill 62.696 1649.962 .025 .038 .970
1. Dependent Variable: Earnings

The R-squared of 0.230 indicates that 23.0% of the variation in the dependent variable of this study, i.e., ‘earnings’, is explained by the three independent variables of this model. The coefficient of age is 48.324. This implies that a 1-year rise in age leads to a 48.234 \$ rise in the earnings; all other factors remain constant.

The coefficient of experience is 4.792. This implies that a 1-year rise in experience leads to a 4.792 \$ rise in the earnings, all other factors remain constant. The coefficient of skill is 62.696. This implies that a 1 unit rise in the skill score leads to a 62.696 \$ rise in the earnings; all other factors remaining constant.

## When the Dependent Variable Is a Binary Variable

When the dependent variable is a binary variable, we can use three types of regression:

1. OLS,
2. Logistic regression and
3. Probit regression

Suppose the binary dependent variable is ‘default’ which takes a value of 1 if the person defaults and takes a value of 0 if the person does not default. There are many factors that affect whether a person would default on a loan or not, e.g., income, family size, integrity level, etc. The three SPSS regressions for this example are as follows. They all have different outputs and interpretations.

### OLS

Ordinary least squares (OLS) is famous of all regression techniques, as it is a linear modelling technique capable of being used in the modelling of a single response variable which has been recorded on at least an interval scale. It is the appropriate starting point for all spatial regression analyses.

This technique may be applied to single or multiple explanatory variables and categorical explanatory variables appropriately coded since it provides a global model of the variable or process you are trying to understand or predict; it creates a single regression equation to represent that process.

Model Summary

 Model R R Square Adjusted R Square Std. Error of the Estimate 1 .592a .351 .276 .43279
1. Predictors: (Constant), Integrity level, Family size, Income

ANOVAa

 Model Sum of Squares df Mean Square F Sig. 1 Regression 2.630 3 .877 4.680 .010b Residual 4.870 26 .187 Total 7.500 29
1. Dependent Variable: Default
2. Predictors: (Constant), Integrity level, Family size, Income

Coefficientsa

 Model Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta 1 (Constant) 1.205 .218 5.520 .000 Income -.002 .002 -.388 -1.333 .194 Family size -.083 .080 -.187 -1.033 .311 Integrity level -.059 .156 -.118 -.381 .706
1. Dependent Variable: Default

A 1 unit rise in ‘income’ leads to a 0.002 unit fall in the probability that the person would default; all other factors remain constant.

A 1 unit rise in ‘family size’ leads to a 0.083 unit fall in the probability that the person would default, all other factors remaining constant. A 1 unit rise in ‘integrity level’ leads to a 0.059 unit fall in the probability that the person would default; all other factors remain constant.

### Logistic regression

This is a statistical technique for data analysis in which there are one or several independent variables that determine an outcome. The outcome is quantified with a dichotomous variable where there are only two possible outcomes.

The main aim of logistic regression is to determine the best fitting and biologically reasonable model to describe the correlation between the dichotomous characteristic of interest and a set of independent variables.

Model Fitting Information

 Model Model Fitting Criteria Likelihood Ratio Tests -2 Log Likelihood Chi-Square df Sig. Intercept Only 35.233 Final 3.688 31.545 7 .000

Pseudo R-Square

 Cox and Snell 0.651 Nagelkerke 0.867 McFadden 0.758

Likelihood Ratio Tests

The chi-square statistic is the difference in -2 log-likelihoods between the final model and a reduced model. The reduced model is formed by omitting an effect from the final model. The null hypothesis is that all parameters of that effect are 0.

 Effect Model Fitting Criteria Likelihood Ratio Tests -2 Log Likelihood of Reduced Model Chi-Square df Sig. Intercept 3.688a 0.000 0 Income 3.688 .000 2 1.000 Family size 9.923 6.235 3 .101 Integrity 4.193 .505 2 .777
1. This reduced model is equivalent to the final model because omitting the effect does not increase the degrees of freedom.

Parameter Estimates

 Default a B Std. Error Wald df Sig. Exp(B) 95% Confidence Interval for Exp(B) Lower Bound Upper Bound Intercept 1.099 1.155 .905 1 .341 [Income=100.00] .000 3287.265 .000 1 1.000 1.000 0.000 .b [Income=200.00] -.189 9421.005 .000 1 1.000 .828 0.000 .b [Income=300.00] 0c 0 [Family size=.00] -.292 5117.214 .000 1 1.000 .747 0.000 .b [Family size=1.00] 17.911 3287.265 .000 1 .996 60085080.228 0.000 .b [Family size=2.00] -.292 6133.191 .000 1 1.000 .747 0.000 .b [Family size=3.00] 0c 0 [Integrity=1.00] -19.010 0.000 1 5.548E-09 5.548E-09 5.548E-09 [Integrity=2.00] 17.104 8968.264 .000 1 .998 26809918.315 0.000 .b [Integrity=4.00] 0c 0
1. The reference category is: 1.00.
2. Floating-point overflow occurred while computing this statistic. Its value is therefore set to system missing.
3. This parameter is set to zero because it is redundant.

A 1 unit rise in ‘income’ leads to a 3.688 unit rise in the log of the odds ratio that the person would default; all other factors remain constant. A 1 unit rise in ‘family size’ leads to a 9.923 unit rise in the log of the odds ratio that the person would default; all other factors remain constant. A 1 unit rise in ‘integrity level’ leads to a 4.193 unit rise in the log of the odds ratio that the person would default; all other factors remain constant.

The assumptions of the logistic model are as follows. The logistic model accepts that information is case-particular; that is, every free factor has a solitary incentive for each case. The logistic model likewise expects that the response variable can’t be impeccably anticipated from the autonomous factors for any case. Similarly, as with different sorts of relapse, there is no requirement for the autonomous factors to be measurably free from each other (dissimilar to, for instance, in a gullible Bayes classifier).

Be that as it may, collinearity is thought to be generally low, as it ends up plainly hard to separate between the effects of a few factors if this isn’t the case. In the event that the logistic is utilized to demonstrate decisions, it depends on the supposition of freedom of unimportant options, which isn’t generally alluring. This suspicion expresses that the chances of leaning towards one class over another doesn’t rely upon the nearness or non-appearance of other ‘insignificant’ choices.

### Probit regression

Probit Analysis

Parameter Estimates

 Parameter Estimate Std. Error Z Sig. 95% Confidence Interval Lower Bound Upper Bound PROBIT a Family size -.051 .077 -.656 .512 -.202 .101 Integrity level -.347 .157 -2.207 .027 -.655 -.039 Intercept -2.059 .207 -9.961 .000 -2.266 -1.852
1. PROBIT model: PROBIT(p) = Intercept + BX

Covariances and Correlations of Parameter Estimates

 Family size Integrity level PROBIT Family size .006 -.482 Integrity level -.006 .025
1. Covariances (below) and Correlations (above)

Chi-Square Tests

 Chi-Square Df a Sig. PROBIT Pearson Goodness-of-Fit Test 28.625 27 .379
1. Statistics based on individual cases differ from statistics based on aggregated cases.

Cell Counts and Residuals

 Number Family size Integrity level Number of Subjects Observed Responses Expected Responses Residual Probability PROBIT 1 1.000 1.000 100 1 .701 .299 .007 2 1.000 1.000 100 1 .701 .299 .007 3 0.000 1.000 300 1 2.420 -1.420 .008 4 0.000 1.000 100 1 .807 .193 .008 5 0.000 1.000 100 1 .807 .193 .008 6 1.000 1.000 100 0 .701 -.701 .007 7 1.000 1.000 100 0 .701 -.701 .007 8 0.000 1.000 100 1 .807 .193 .008 9 2.000 1.000 100 1 .608 .392 .006 10 2.000 1.000 100 1 .608 .392 .006 11 2.000 1.000 100 1 .608 .392 .006 12 3.000 1.000 100 1 .527 .473 .005 13 3.000 1.000 100 1 .527 .473 .005 14 3.000 1.000 100 1 .527 .473 .005 15 3.000 1.000 100 1 .527 .473 .005 16 3.000 1.000 100 1 .527 .473 .005 17 3.000 2.000 200 0 .367 -.367 .002 18 3.000 2.000 200 0 .367 -.367 .002 19 3.000 2.000 200 0 .367 -.367 .002 20 3.000 2.000 200 0 .367 -.367 .002 21 3.000 2.000 200 0 .367 -.367 .002 22 1.000 2.000 200 0 .505 -.505 .003 23 3.000 2.000 200 0 .367 -.367 .002 24 3.000 2.000 200 0 .367 -.367 .002 25 3.000 2.000 200 0 .367 -.367 .002 26 3.000 2.000 300 0 .551 -.551 .002 27 3.000 4.000 300 0 .048 -.048 .000 28 3.000 4.000 300 0 .048 -.048 .000 29 3.000 4.000 300 0 .048 -.048 .000 30 3.000 4.000 300 1 .048 .952 .000

## When the Data Is Panel Data

When the data on which the regression is to be applied is a panel data (i.e., data of multiple cross sectional units over a period of time) then we can use two types of regression:

1. Random effect model
2. Fixed effect model

Suppose we have data about the education level and GDP of various countries over a 5-year period. This is panel data. To analyze the impact of the education level on GDP, we would have to run random effect regression and fixed effect regression in STATA.

Random effect model

 Syntax: xtreg gdp education level, re Random-effects GLS regression Number of obs = 25 Group variable: country Number of groups = 5 R-sq: within = 0.8848 Obs per group: min = 5 Between = 1.0000 avg = 5 Overall = 0.9948 max = 5 Wald chi2(1) = 4438.08 Correlation (u_i, X) = 0 (assumed) Prob > chi2 = 0

 gdp Coef. Std. Error z P> [z] [95% Conf. Interval] Education level _cons 127.8054 -14111.01 1.918454 1208.535 66.62 -11.68 0.000 0.000 124.0453 -16479.7 131.5655 -11742.33 Sigma_u Sigma_e Rho 0 3043.5487 0 Fraction of  variance to due u_i Fraction of  variance to due u_i Fraction of  variance to due u_i Fraction of  variance to due u_i Fraction of  variance to due u_i

A 1 unit rise in education level leads to a 127.8054 unit rise in the GDP; all other factors remain constant.

Fixed effect model

 Syntax: xtreg gdp education level, fe Fixed-effects (within) regression Number of obs = 25 Group variable: country Number of groups = 5 R-sq: within = 0.8848 Obs per group: min = 5 Between = 1.0000 Avg = 5 Overall = 0.9948 Max = 5 F(1,19) = 145.9 Correlation (u_i, Xb) = -0.9803 Prob > F = 0

 gdp Coef. Std. Error t P> [t] [95% Conf. Interval] Education level _cons 129.1728 -14876.79 10.69394 6019.463 12.08 -2.47 0.000 0.023 106.7902 -27475.67 151.5555 -2277.909 Sigma_u Sigma_e Rho 449.94136 3043.5487 .0213876 fraction of variance to due u_i fraction of variance to due u_i fraction of  variance to due u_i fraction of  variance to due u_i fraction of  variance to due u_i
1. F test that all u_i = 0: F (4, 19) = 0.0; Prob > F = 1.0000.

A 1 unit rise in education level leads to a 129.1728 unit rise in the GDP; all other factors remain constant.

Learn. Apply. Retain.
Your path to achieve medical excellence.
Study for medical school and boards with Lecturio.     (Votes: 0, average: 0.00) Loading...