Visual Data Analysis with Python in Excel: Using Boxplots

Emilie Lewis
Dave Langer
This is the fifth and final in a series of blog posts that teach you to analyze data using Python code in Microsoft Excel visually.
If you are new to Python in Excel, you should start with my Python for Excel Analysts blog series, which covers many concepts that will be assumed in this blog series.
This series will use the Microsoft Excel Labs Python Editor to write code. However, the Python Editor is not required. All code can be entered using the Formula Bar and the new PY() function.
Each post in the series has an accompanying Microsoft Excel workbook to download and use to build your skills. This post’s workbook is available for download here.
For convenience, here are links to all the blog posts in this series:
Note: To reproduce the examples in this post, install the Python in Excel trial. If you like this blog series, check out my self-paced certification program, Anaconda Certified: Data Analysis with Python in Excel.
This blog post will continue the hypothetical scenario covered in Part 3 and Part 4 of this series – analyzing the impact of promotion strategies.
The following monthly sales data table is included in this blog post’s workbook:
Fig 01 – Monthly Sales Data
The table in Fig 01 illustrates a very common scenario in business analytics – analyzing the performance of business processes over time (i.e., time-series analysis).
The go-to visualization for time-series analysis is a line chart.
Time-series analysis looks at the behavior of measurements over time. When performing time-series analysis, five patterns provide you with essential insights:
The Python seaborn library makes creating line charts quite easy. Only a few lines of code are required.
To craft the seaborn line charts needed to analyze data, the Excel table shown in Fig 01 must be loaded as a pandas DataFrame:
Fig 02 – Python Code for Loading the Data
Note: While the above Python code is written using the Excel Labs Python Editor, this is not required.
Clicking the disk icon tells the Python Code Editor to execute the Python code:
Fig 03 – Executing the Python Code
A logical place to start the analysis is to understand SalesAmounts over time. The following Python code uses the seaborn library to create a line chart:
Fig 04 – Python Code for a Line Chart
Here’s the explanation of the code shown in Fig 04:
Configuring the code cell with the Convert to Excel values option will render the line chart directly in the worksheet cell:
Fig 05 – Rendering the Line Chart Inside the Worksheet Cell
Clicking the disk icon in the Python Code Editor executes the code and produces the following line chart:
Fig 06 – Your First Line Chart
Let us apply the five aforementioned patterns of time-series analysis to Fig 06:
Note: Regarding the potentially exceptional values in a time series, process behavior charts offer a robust way to determine if any values in a time series are truly exceptional based on historical performance.
The DiscountAmount column of monthly_sales represents the aggregate promotional discounts given during a month. The following code creates a line chart of DiscountAmounts:
Fig 07 – Python Code for the DiscountAmount Line Chart
Executing the code cell shown in Fig 07 with the Convert to Excel values option produces the following line chart:
Fig 08 – The DiscountAmount Line Chart
Once again, let us apply the five patterns of time-series analysis to Fig 08:
Analyzing the line chart shown in Fig 08 provides a powerful piece of information regarding the behavior of promotions over time. Namely, there doesn’t appear to be a consistent promotion strategy during the three-year period depicted in Fig 08.
As evident from the line charts above, time-series analysis is a very useful technique, but there is more that can be done with line charts!
Visualizing multiple time series on the same chart can often lead to recognizing layered patterns in data. Luckily, using pandas, it is trivial to create multi-series line charts.
The first step is to transform the existing monthly_sales DataFrame from a “wide” format to a “long” format. A wide format is when a column exists for each time series, as the card for monthly_sales demonstrates:
Fig 09 – The Card for the reseller_sales DataFrame
A data frame in the long format has the following three columns:
The easiest way to transform a DataFrame from wide to long format is using the DataFrame melt() method. The following code creates a sales_melted DataFrame from monthly_sales:
Fig 10 – Melting Monthly Sales Data
Note: The code cell depicted in Fig 10 is executed with the Python object output option.
Here’s the explanation of the code shown in Fig 10:
After the code in Fig 10 is executed, looking at the sales_melted card in Excel makes the above code clear:
Fig 11 – The Card for the sales_melted DataFrame
When looking at Fig 11, notice how there are 108 rows in the sales_melted DataFrame. Compare this to the monthly_sales DataFrame depicted in Fig 09 with 36 rows.
The difference in row count is due to sales_melted having 3 times series each with 36 values; 3 times 36 equals 108.
With the sales_melted DataFrame created, it’s time to plot the line chart.
The following code creates a multi-series line chart:
Fig 12 – Python Code for a Multi-Series Line Chart
While the code shown in Fig 12 resembles previous code, there are some items to note:
Executing the code cell with the Convert to Excel values option produces the following line chart:
Fig 13 – A Multi-Series Line Chart
In addition to applying the five time-series patterns to Fig 13, examining the chart reveals the following:
Line charts like the one depicted in Fig 13 are often the early warning system in business analytics.
For example, line charts are common visualizations used in executive dashboards. A common scenario is for leaders to see something “bad” in a line chart and request that an analysis be conducted to explain what is being observed.
The analysis techniques covered in this blog series are commonly used to craft these explanations.
This post has demonstrated how line charts provide one of the most fundamental skills in business analytics – recognizing patterns in business metrics over time.
Line charts are particularly effective in analyzing key performance indicators (KPIs). If you want to have more impact using data with leaders, KPI analysis is a powerful tool in your toolbelt.
If you are interested in learning more about visualizing data using the seaborn library, be sure to check out “An introduction to seaborn.”
Until next time, stay healthy and happy data sleuthing!
Talk to one of our experts to find solutions for your AI journey.