When studying data, one of the most important things to understand is the different types of data. Generally speaking, data can be divided into two broad categories: categorical data and quantitative data. How we analyze data depends on whether it is categorical (qualitative) or quantitative (also known as numerical). In this blog post I’ll go over the differences between each type and some real world examples.
Why do we care about types of data?
Statistical computing is becoming more common with the increasing accessibility of data. There are many fantastic tools available that help us crunch numbers and find insights with our data. However, these tools can be useless if we do not understand the data we are working with.
For example, certain measures must be taken if we are to perform linear regression using categorical data. However, the process of creating a linear regression model is quite simple if our data is numerical. On the other hand, frequency charts and tables can be created easily with categorical data but numerical data needs to be divided into classes before frequency can be counted.
It is important that before any statistical analysis is performed on your data, you first analyze the data itself. Understanding what kind of data you are working with will inform how you proceed with further analysis and creating models.
What is qualitative/categorical data?
To put it simply, qualitative data describes qualities. For example, the color of a car is a quality. You would describe it as a word, such as blue, green or silver. You can also think of these different colors as categories. This helps remember what this type of data is about. If you see data that is a word, such as the city someone lives in, or the type of school they went to (private or public), that means the data is categorical.
Nominal or ordinal?
Let’s say you have taken a closer look at one of your variables and it’s categorical. Now, it can be further divided into two more categories. These categories are nominal or ordinal. Ordinal data means that there is a natural ordering. Nominal data does not have any such natural ordering.
This can be a little confusing, but I will clarify with some examples. A shirt size (small, medium or large) has a natural progression and is considered ordinal data. However let’s consider the state someone lives in (Arizona, Minnesota, etc). This data is nominal. But wait! Can’t you put states in alphabetical order? What about from smallest population to greatest? Or order them by the year they were founded? Great question. The fact that there are several ways you could order the states, and no single, obvious way, means they are nominal. The key to remember here is that ordinal data has one clear, obvious order, while nominal data does not.
What is quantitative data?
This type of data is one that people may be most familiar with. Quantitative data describes numerical values. Going back to the car example, this type of data would describe the mileage of a car. It is a value that is described using numbers instead of words. Other common examples include business data such as salaries and sales, or demographic information like age and height.
Continuous or discrete?
Similarly to categorical data, continuous data can be further divided into two main categories. These two categories are continuous and discrete. The main idea is that continuous variables can take on any value within an interval. For example, someone’s height can take on any value between the minimum human height and the maximum human height. Data that describes money is also generally considered continuous, although you wouldn’t say that money would take on any value including fractions of pennies.
Discrete data, on the other hand, does not take on any value in an interval. This data has clear values. An example of this is how many siblings a person has. The value will be some integer, but it cannot be any value in an interval. Note that discrete variables do not have to only be integers. The sections of a textbook are distinct values like 1.1, 1.2, and so on, however these are discrete because the values cannot take on any numbers between 1.1 and 1.2. One way to keep these straight is to consider whether you’d like to list the possible values. If it is discrete, then writing down the possible values probably won’t take long. However if it’s a continuous variable, that would be an impossible task!
Categorical and quantitative data may appear confusing at first. However, there are some key concepts that help us understand what kind of data and variables we are dealing with. This is an important step to take before utilizing any statistical software or performing any time consuming analysis. Hopefully this blog post was helpful. For more information, here is a free download. Please let me know if you have any questions or requests for more information.
Join the discussion!
Comment below or check out our forums to talk to other teachers and learn more. Register now to build relationships with fellow educators and make an impact on the teaching community! Do you teach statistics and go over qualitative and quantitative data? What methods and definitions work best for you and your students?