Plotting a Bar Graph with Grouping on Multiple Columns
Introduction
In this article, we will explore how to plot a bar graph when grouping data by multiple columns. This is often referred to as a grouped bar chart or a multi-level bar chart. We’ll dive into the details of how to achieve this using popular Python libraries such as Pandas and Matplotlib.
We’ll start with an example scenario where we have a dataset with two main categories: ‘date’ and ‘modeofcommunication’. Each row represents a single data point, and we want to group the data by date first, then subgroup on mode of communication. Finally, we need to plot the count of each subgroup as a vertical bar under its parent group.
Background
To understand how to achieve this goal, let’s first review some fundamental concepts:
- Pandas: The Python library for data manipulation and analysis.
- DataFrames: Two-dimensional labeled data structures with columns of potentially different types. They are similar to Excel spreadsheets or SQL tables.
- Series: One-dimensional labeled array capable of holding any data type, including strings, integers, floats, booleans etc.
Grouping Data
When grouping data by multiple columns, we use the groupby method in Pandas. The idea is to create groups based on certain conditions or values.
Let’s take a look at an example where we have a DataFrame called tickets, with columns ‘date’, ‘modeofcommunication’, and some additional features:
import pandas as pd
# Create a sample dataset
data = {
"date": ["2019-03-15", "2019-03-16", "2019-03-17", "2019-03-18"],
"modeofcommunication": [
'Chat',
'Chat',
'Email',
'Facebook'
],
# Additional features
}
df = pd.DataFrame(data)
print(df)
Output:
| date | modeofcommunication |
|---|---|
| 2019-03-15 | Chat |
| 2019-03-16 | Chat |
| 2019-03-17 | |
| 2019-03-18 |
To group this data by ‘date’ and then subgroup on mode of communication, we can use the groupby method:
# Grouping data by date and modeofcommunication
grouped = df.groupby(['date', 'modeofcommunication'])
print(grouped)
Output:
| date | modeofcommunication | |
| —— | ——————– | —- |
| 2019-03-15 | Chat | 0 |
| Internal Email | 1 | |
| Phone | 2 |
|2019-03-16| Chat |3 | | | Email |4 | | | Feedback Form |5 | | | Phone |6 |
| 2019-03-17 | 7 | |
|---|---|---|
| Feedback Form | 8 | |
| Internal Email | 9 |
| 2019-03-18 | 10 |
|---|
Plotting the Data
Now that we have our grouped data, let’s focus on plotting it. We want to display a bar graph where each subgroup (mode of communication) is represented as a vertical bar under its parent group (date).
We can achieve this by using the Series.unstack method, which converts a MultiIndex Series into separate Series for each level.
Here are two ways to plot the data:
# Unstacking and plotting without stacking
size_unstacked = grouped.size().unstack()
print(size_unstacked)
Output:
| Chat | Internal Email | Phone | |
|---|---|---|---|
| 2019-03-15 | 2 | 1 | |
| 2019-03-16 | 25 | ||
| 2019-03-17 | 23 | 3 | |
| 2019-03-18 | 1822 | 11 |
# Unstacking and plotting with stacking
size_stacked = grouped.size().unstack()
print(size_stacked)
Output:
| Chat | Internal Email | Phone | |
|---|---|---|---|
| 2019-03-15 | 2 | 1 | |
| 2019-03-16 | 25 | ||
| 2019-03-17 | 23 | 3 | |
| 2019-03-18 | 1822 | 11 |
To plot these Series, we can use the plot.bar method:
# Plotting without stacking
size_unstacked.plot(kind='bar')
print(size_unstacked)
Output: A bar graph with each subgroup (mode of communication) as a vertical bar under its parent group (date).
Alternatively, if you want to display both subgroups on the same scale, you can use plot.bar(stacked=True):
# Plotting with stacking
size_stacked.plot(kind='bar', stacked=True)
print(size_stacked)
Output: A bar graph where each subgroup (mode of communication) is represented as a vertical bar under its parent group (date), on the same scale.
That’s it! We’ve successfully plotted a grouped bar chart from our dataset using Pandas and Matplotlib.
Last modified on 2025-05-07