Playlist

Data Visualization: Categorical Variables

by 365 Careers

My Notes
  • Required.
Save Cancel
    Learning Material 5
    • XLS
      2.3. Categorical variables. Visualization techniques lesson.xls
    • XLS
      2.3.Categorical-variables.Visualization-techniques-exercise.xls
    • XLS
      2.3.Categorical-variables.Visualization-techniques-exercise-solution.xls
    • PDF
      Statistics Excel solutions.pdf
    • PDF
      Download Lecture Overview
    Report mistake
    Transcript

    00:00 Now that we've seen the different types of data and levels of measurement we can have, we are ready to explore different graphs and tables which will allow us to visually represent the data we are working with.

    00:12 Visualizing data is the most intuitive way to interpret it, so it's an invaluable skill. It is much easier to visualize data if you know its type and measurement level. As you may recall, there are two types of variables, categorical and numerical.

    00:28 In this video, we will focus on categorical variables.

    00:33 Some of the most common ways to visualize them are frequency distribution tables, bar charts, pie charts and pareto diagrams.

    00:42 First, let's see what a frequency distribution table looks like.

    00:46 It has two columns, the category itself and the corresponding frequency.

    00:52 Imagine you own a car shop, and you sell only German cars.

    00:56 The table below shows the categories of cars Audi, BMW and Mercedes and their frequency, or in plain English, the number of units sold.

    01:07 By organizing your data.

    01:08 In this way, you can compare the different brands and see that Audi has been sold the most. So that is a frequency distribution table.

    01:18 However, tables aren't much fun.

    01:20 Are they using the same table? We can construct a bar chart, also known as column chart.

    01:27 The vertical axis shows the number of units sold, while each bar represents a different category indicated on the horizontal axis.

    01:35 In this way, it is much, much clearer that Audi is the best selling brand.

    01:41 Okay. Let's represent the same data as a pie chart.

    01:45 In order to build one, we need to calculate what percentage of the total each brand represents. In statistics, this is known as relative frequency.

    01:55 Naturally, all relative frequencies add up to 100%.

    01:59 Pie charts are especially useful when we want to not only compare items among each other, but also see their share of the total.

    02:07 Ok. This example could be easily transformed into a business example of market share. Market share is so predominantly represented by pie charts that if you search for market share and Google images, you would only get pie charts.

    02:22 Imagine that the data in our table is representing the sales of Audi, BMW and Mercedes in a single German city, say Bonn.

    02:32 The chart will show us the market share that each of these brands has.

    02:37 Lastly, we have the pareto diagram.

    02:41 In fact, a pareto diagram is nothing more than a special type of bar chart where categories are shown in descending order of frequency.

    02:49 By frequency, statisticians mean the number of occurrences of each item.

    02:53 As we said earlier in our example, that's exactly the number of units sold.

    03:00 Let's go back to our frequency distribution table and order the brands by frequency.

    03:05 Now we can create the bar chart based on the reorder table and voilà, we almost have a pareto diagram.

    03:14 There is one last touch to make it one.

    03:16 A curve on the same graph showing the cumulative frequency.

    03:22 The cumulative frequency is the sum of the relative frequencies.

    03:25 It starts as the frequency of the first brand, and then we add the second, the third and so on until it finishes.

    03:32 And 100%.

    03:35 This polygon. A line is measured by a different vertical axis on the right of the graph. At each of its vertices, it shows the subtotal of the categories to its left. See how the pareto diagram combines the strong size of the bar and the pie chart.

    03:50 It is easy to compare the data both between categories and as a part of the total.

    03:55 Furthermore, if this was a market share graph, you could easily see the market share of the top two or top five companies.

    04:02 Finally, it is named after Vilfredo.

    04:05 You may have heard of another idea of his, the principle, also known as the 8020 rule.

    04:12 It states that 80% of the effects come from 20% of the causes.

    04:16 A real life example is a statement by Microsoft that by fixing 20% of its software bugs, they manage to solve 80% of the problems customers experience. A diagram can reveal information like that. It is designed to show how subtotals is change with each additional category and provide us with a better understanding of our data.

    04:39 Okay. These are the main ways in which we can visually represent categorical data. In our next lecture, we will get acquainted with the most useful graphs and tables for a numerical data.

    04:51 Thanks for watching.


    About the Lecture

    The lecture Data Visualization: Categorical Variables by 365 Careers is from the course Statistics for Data Science and Business Analysis (EN).


    Author of lecture Data Visualization: Categorical Variables

     365 Careers

    365 Careers


    Customer reviews

    (1)
    5,0 of 5 stars
    5 Stars
    5
    4 Stars
    0
    3 Stars
    0
    2 Stars
    0
    1  Star
    0