Table of Contents
Measurement of data refers to the quantification of a quality attributed to a data set. It refers to the scales which are used for measurement purpose in statistics. These scales help in the categorization of data. Each measurement scale has different properties which should be used in case the properties of scales match with properties of the data set. Different levels of measurement are used to analyze a data set appropriately.
Variable in statistics is a data which is changing continuously due to change in different factors. It can be a characteristic, quantity or number which is measurable and continually changing. Some common examples of variable include class, grades, expenses, vehicles etc.
There are two types of variable:
Continuous variables are not countable due to a bulk of values present in data. It refers to a never-ending list of variables with infinite numbers. E.g. age of a person’s pet might be 2 years, 2 months, 5 hours, 5 seconds, 2 milliseconds and the list goes on. Likewise, weight of students in a class can be an example of a continuous variable.
This type of variable is countable in finite amount of time for example number of plates placed on a table or Grades of students in a class. No matter how long the series of discrete variables is, it is still logically countable with an ending number.
Causality refers to interlink between two variables. It is a situation where one event is because of another event i.e. the cause-effect relationship between the two. It shows the co-relation where a change in one variable causes changes in other variables. It shows that the occurrence of one event results in an occurrence of the second event. The casual relationship between two variable depicts the regime of causality in statistics.
The ‘hours worked’ and ‘income earned’ are two interlinked factors where a change in one-factor results in changes in the second one. Suppose a higher number of hours have been worked, it leads to higher income of a worker and vice versa. Same in the case of price and purchasing power. If the price of commodities in market increases, the purchasing power of consumers goes down.
This should be differentiated from correlation which is the statistical measure expressed as a number that describes the size and direction of a relationship between two or more variables.
Levels of Measurement
There are four scales of data measurement used in statistics i.e. nominal, ordinal, and internal and ratio. The last two levels of measurement are sometimes known as the continuous or scale measurements. Different levels of measurement are used to analyze a dataset appropriately.
|Math permissible||count||count, rank||count, rank, add, substract||count, rank, add, substract, multiply, divide|
Different levels of measurement are shown as follows:
This is the first level of data measurement. Being the first level it is considered the lowest form of measurement which refers to the name only of the data set. It deals with labels, names, and categories of data set. The data at this level is normally qualitative and simple in nature.
Suppose a test has been conducted which requires the outcomes to record a number of blue eyes in a population. The survey will generate simple responses through nominal measurement in the form of yes or no. Another example of nominal measurement is the number attributed to a football player as his name is printed on the back of his shirt. It is the form of nominal measurement of a player or data set.
This type of measurement is not helpful for meaningful contribution for calculation of mean or standard deviation of a data set.
The second and comparatively advanced level of nominal measurement is named ordinal measurement. This data is arranged in a sequence and ordered properly despite this it is unable to form a meaningful contribution for calculation of mean and standard deviation.
The list of top ten football players of the world can give a sequence of the performance level of players from one to ten. Still, the list does not give appropriate reasons or difference for which these players are ranked from level 1 to 10. It is not possible to measure the efficiency level of each player. Like nominal measurement data, ordinal level of data does not also contribute in meaningful statistical calculations.
This is a more advanced level of measurement where the data not only can be measured but the differences between datasets can also be identified. This scale of measurement helps in ranking a data and also to convey information. The distance between values in a data set at internal measurement level is considered to be uniform and averages of data can be computed easily.
The most common example of interval measurement data is Fahrenheit and Celsius temperature scales. It can be demonstrated for a 20-degree temperature that it is 60 degrees less than 80 degrees. On the other hand, 0 degrees in both scales represent cold. This level of data is feasible for calculation of mean, standard deviation and another statistical measurement.
It is the advanced form of data measurement including all attributes of interval measurement along with zero value. Due to presence or rationality of zero factor, it is possible to make a comparison of different data sets like twice or four times at ratio level.
Distance is the most common example of ratio measurement level. 0 feet represent no distance and 2 feet represent twice distance than 1 feet. This form of data is measurable and comparable. This level of data no only gives the opportunity for calculation of sums and differences but also for ratio analysis. It results in the meaningful interpretation of data.
Frequency is the possibility of occurrence of something over a specific period of time. For example, John played football on Wednesday morning, Wednesday afternoon and Friday morning. The frequency of football game played by John is 2 on Wednesday, 1 on Friday and 3 for the whole week.
Frequency distribution process:
It is the range of frequencies presented in a table which displays the frequency of various values or outcomes. Suppose Sam is a high school football team player. Sam has scored various numbers of goals in recent games. The range is given as follows: 2, 3, 1, 2, 1, 3, 3, 2, 4. The frequency of score will be shown as follows:
The above frequency distribution of goals indicates that the often Sam scores 2 and 3 goals. The possibility of 1 goal is 2 and for 4 goals is least i.e. 1.
It is a well-known type of continuous probability distribution. This type of distribution is helpful in natural and social sciences to show up random variables and their relationship. Due to the central limit theorem, the importance of normal distribution has increased.
The curve of normal distribution is called bell curve as shown below. The data in a normal distribution is distributed around central values with no bias on the left and right side. The curve of normal distribution is called bell curve due to its resemblance with the shape of a bell.
Central Limit Theorem
It is a statistical theory which states that the mean of samples taken from a large population with finite variance is same as the mean of the whole population. The central limit theorem is based on normal distribution pattern which means that the graph of sample means will look more like a normal distribution curve. The same way, the variance of all samples is equal to the variance of the whole population.
A hypothesis is a statement which predicts the relation between certain population parameters being studied. A null Hypothesis (Ho) is a hypothesis which states that no statistical relation or significance exists between the variables or parameters.
It is usually the null hypothesis (Ho) which a researcher attempts to disprove or reject in his study to prove his point. The alternate hypothesis (HA), on the other hand, states that statistically, a significant relation exists between variables.
Type I and Type II Errors
In statistical hypothetical testing, type I error is incurred rejection of a true null hypothesis. It is also called false positive. It states that a supposed relationship between two variables exists when it actually does not exist.
For example, a patient is declared to have a deadly disease who in reality does not have any disease. Another example of a type I error is the buzz of a fire alarm when there is no actual fire break.
|H0 is true and he is truly innocent||Ha is true and he is actually guilty|
|Jury fails to reject H0 and man is found not guilty||Right decision||Type II error|
|Jury rejects H0 and man is found guilty||Type I error||Right decision|
It is an error of rejection of a null hypothesis when actually the hypothesis is true. It refers to observation of a different when actually there is no difference in reality. It occurs when null Hypothesis (Ho) is true. It refers to a false hit when a factor or thing actually not present. The probability of significance level in type I error is denoted by alpha (α). The probability of type I error is denoted by P (R | Ho). Here R denotes to rejection region of a true null hypothesis.
Type-II error or False Negative is opposite to Type I error. In this case, the null hypothesis is not rejected when actually it is wrong. The alternative hypothesis, in this case, is a true state of hypothesis instead of a null hypothesis. This error refers to a failure of acceptance of an alternative hypothesis when it is right. It leads to failure of identification of any difference between null hypothesis and alternative hypothesis when actually there is one. The probability of type II error is denoted by 1- P (R | Ho).