Table of Contents
Measurement of data refers to the quantification of a quality attributed to data set. It refers to the scales which are used for measurement purpose in statistics. These scales help in the categorization of data. Each measurement scale has different properties which should be used in case the properties of scales match with properties of the data set. Different levels of measurement are used to analyze a data set appropriately.
Variable in statistics is a data which is changing continuously due to change in different factors. It can be a characteristic, quantity or number which is measurable and continually changing. Some common examples of variable include class, grades, expenses, vehicles etc.
There are two types of variable:
- Continuous variable
Continuous variables are not countable due to bulk of values present in data. It refer to a never ending list of variables with infinite numbers. E.g. age of a person’s pet might be 2 years, 2 months, 5 hours, 5 seconds, 2 milliseconds and the list goes on. Likewise weight of students in a class can be an example of continuous variable.
- Discrete variable
This type of variable is countable in finite amount of time for example number of plates placed on a table or Grades of students in a class. No matter how long the series of discrete variables is, it is still logically countable with an ending number.
Causality refers to interlink between two variables. It shows the co-relation where a change in one variable causes changes in other variables. It shows that the occurrence of one event results in occurrence of second event. The casual relationship between two variable depicts the regime of causality in statistics.
The ‘hours worked’ and ‘income earned’ are two interlinked factors where change in one factor results in changes in the second one. Suppose higher number of hours have been worked, it leads to higher income of a worker and vice versa. Same in the case of price and purchasing power. If price of commodities in market increases, the purchasing power of consumers goes down.
Levels of Measurement
There are four scales of data measurement used in statistics i.e. nominal, ordinal, and internal and ratio. Different levels of measurement are used to analyze a data set appropriately. Different levels of measurement are shown as follows:
This is the first level of data measurement. Being first level it is considered as lowest form of measurement which refers to the name only of the data set. It deals with labels, names and categories of data set. The data at this level is normally qualitative and simple in nature.
Suppose a test has been conducted which requires the outcomes to record number of blue eyes in a population. The survey will generate simple responses through nominal measurement in the form of yes or no. Another example of nominal measurement is the number attributed to a football player as his name is printed on the back of his shirt. It is the form of nominal measurement of a player or data set.
This type of measurement is not helpful for meaningful contribution for calculation of mean or standard deviation of a data set.
The second and comparatively advanced level of nominal measurement is named ordinal measurement. This data is arranged in a sequence and ordered properly despite of this it is unable to form a meaningful contribution for calculation of mean and standard deviation.
The list of top ten football players of world can give a sequence of performance level of players from one to ten. Still the list does not give appropriate reasons or difference for which these players are ranked from level 1 to 10. It is not possible to measure the efficiency level of each player. Like nominal measurement data, ordinal level of data does not also contribute in meaningful statistical calculations.
This is more advance level of measurement where the data not only can be measured but the differences between data sets can also be identified. This scale of measurement helps in ranking a data and also to convey information. The distance between values in a data set at internal measurement level are considered to be uniform and averages of data can be computed easily.
The most common example of interval measurement data are Fahrenheit and Celsius temperature scales. It can be demonstrated for a 20 degree temperature that it is 60 degrees less than 80 degrees. On the other hand, 0 degrees in both scales represent cold. This level of data is feasible for calculation of mean, standard deviation and other statistical measurement.
It is the advanced form of data measurement including all attributes of interval measurement along with zero value. Due to presence or rationality of zero factor, it is possible to make comparison of different data sets like twice or four times at ratio level.
Distance is the most common example of ratio measurement level. 0 feet represent no distance and 2 feet represent twice distance than 1 feet. This form of data is measurable and comparable. This level of data no only gives the opportunity for calculation of sums and differences but also for ratio analysis. It results in meaningful interpretation of data.
Frequency is the possibility of occurrence of something over a specific period of time. For example John played football on Wednesday morning, Wednesday afternoon and Friday morning. The frequency of football game played by John is 2 on Wednesday, 1 on Friday and 3 for the whole week.
Frequency distribution process:
It is the range of frequencies presented in a table which displays the frequency of various values or outcomes. Suppose Sam is a high school football team player. Sam has scored various numbers of goals in recent games. The range is given as follows: 2, 3, 1, 2, 1, 3, 3, 2, 4. The frequency of score will be shown as follows:
The above frequency distribution of goals indicates that the often Sam scores 2 and 3 goals. The possibility of 1 goal is 2 and for 4 goals is least i.e. 1.
It is a well-known type of continuous probability distribution. This type of distribution is helpful in natural and social sciences to show up random variables and their relationship. Due to central limit theorem the importance of normal distribution has increased.
The curve of normal distribution is called bell curve as shown below. The data in normal distribution is distributed around central values with no bias on left and right side. The curve of normal distribution is called bell curve due to its resemblance with the shape of bell.
Central Limit Theorem
It is a statistical theory which states that the mean of samples taken from a large population with finite variance is same to the mean of whole population. The central limit theorem is based on normal distribution pattern which means that the graph of sample means will look more like a normal distribution curve. The same way, variance of all samples is equal to variance of whole population.
A hypothesis is a statement which predicts the relation between certain population parameters being studied. A null Hypothesis (Ho) is a hypothesis which states that no statistical relation or significance exists between the variables or parameters. It is usually the null hypothesis (Ho) which a researcher attempts to disprove or reject in his study to prove his point. The alternate hypothesis (HA), on the other hand states that statistically significant relation exists between variables.
Type I and Type II Errors
In statistical hypothetical testing, type I error is incurred rejection of a true null hypothesis. It is also called false positive. It states that a supposed relationship between two variables exist when it actually does not exist. For example, a patient is declared to have a deadly disease who in reality does not have any disease. Another example of type I error is buzz of a fire alarm when there is no actual fire break.
It is error of rejection of a null hypothesis when actually the hypothesis is true. It refers to observation of a different when actually there is no difference in reality. It occurs when null Hypothesis (Ho) is true. It refers to a false hit when a factor or thing actually not present. The probability of significance level in type I error is denoted by alpha (α). The probability of type I error is denoted by P (R | Ho). Here R denotes to rejection region of true null hypothesis.
Type-II error or False Negative is opposite to Type I error. In this case the null hypothesis is not rejected when actually it is wrong. The alternative hypothesis in this case is true state of hypothesis instead of null hypothesis. This error refers to failure of acceptance of an alternative hypothesis when it is right. It leads to failure of identification of any difference between null hypothesis and alternative hypothesis when actually there is one. The probability of type II error is denoted by 1- P (R | Ho).