An Introduction to the Seaborn Objects System
This article is a continuation of the blog post “An Introduction to the Seaborn Objects System” on Anaconda.com.
The Syntax of the Seaborn Objects System
In this section, I’ll explain the high-level syntax of the new Seaborn objects system. Before we dive in, I want to make a couple of points:
- First, this is an all-new way to create plots with Seaborn. Even if you’ve used Seaborn in the past, this syntax will be new (although some elements may be the same, or look familiar).
- Second, in this section, I’ll cover only the high-level syntax. There are some details that I will leave out (like the “theme” system, scales, etc.) for the sake of brevity.
The High-Level Syntax
As I noted above, this new Seaborn syntax is highly modular. That said, there are some elements that you’ll see in the syntax for almost any plot.
To create a visualization, you need to:
- Use the so.Plot() function to initialize plotting.
- Specify the dataframe or data that you want to use.
- Map dataframe columns to the aesthetic attributes of the plot.
- Use the .add() method to specify the type of mark that you want to draw (i.e., dots, lines, etc.).
The syntax looks something like this:
data:image/s3,"s3://crabby-images/037d5/037d55018ffc5b3ab7c79b6a3c1045159c812989" alt=""
You’ll notice that the entire expression is also enclosed inside of parentheses. This allows us to call separate methods (like the .add() method, the .label() method, etc.) on separate lines. This enhances readability and the ability to debug the code.
For those of you who have used R’s ggplot2, this probably seems familiar. That’s because ggplot2 and this new Seaborn system share the same conceptual design: the Grammar of Graphics.
Now that I’ve explained this system conceptually, let’s actually look at the elements that I listed above in a little more detail.
The Seaborn Plot Function
The so.Plot() function initializes plotting. As such, you’ll need to use it for essentially every plot that you make with Seaborn objects. You can even use this function by itself, in which case, it just creates an empty plot:
so.Plot()
OUT:
data:image/s3,"s3://crabby-images/207d1/207d10ee51ca9faa24cb2068cb212420205e4262" alt=""
Again: You’ll need to call this function any time you want to create a visualization with Seaborn objects.
The Plot Function Has Parameters for the Data and Mappings
Typically, when we use the so.Plot function, we’ll use it with a few parameters. Most importantly, you’ll commonly use:
- the data parameter
- the x parameter
- the y parameter
- the color parameter
To be clear, most of these are optional. And, there are other parameters that I’m excluding here for the sake of brevity. That said, these are the parameters you’ll see most frequently. And exactly how you use them determines the plot that you create.
Let’s quickly examine what these parameters do.
Data
The data parameter allows you to specify the dataframe that holds the data that you want to plot. This parameter is commonly used, however, it’s optional. Instead of plotting data in a dataframe, you can also plot data in NumPy arrays, or array-like objects (which I’ll discuss in a moment).
If you do use this parameter, the argument will be a pandas dataframe.
Ultimately, if the data you need to plot exists inside of a dataframe, you need to use the data parameter.
X and Y
As you might guess, the x and y parameters control the data that we plot on the x-axis and y-axis. Technically, both of these parameters are optional. However, it’s common to use at least one of these for most visualizations (and we’ll frequently use both).
If the data that you want to plot exists in a pandas dataframe, then the arguments to these parameters will be the names of the dataframe columns. When the arguments to these parameters are dataframe columns, then syntactically those column names will need to be inside of quotes.
Alternatively, if the data you want to plot is in a NumPy array or array-like object, you can provide the name of the NumPy array as the argument. In this case, the name of the NumPy array does not need to be in quotes.
data:image/s3,"s3://crabby-images/f19e1/f19e15f34a374d4ff5855c659944fc117570f7da" alt=""
As you’ll see in the examples, the data you map to these parameters controls the exact form of the plot.
Color
As you might assume, the color parameter controls the color of the marks in your plot. When it’s used inside the so.Plot function though, this parameter connects the colors of the marks that you draw to the values of the argument you provide.
Let me give you a quick example. Let’s say that you have a categorical variable called stock in your data. This categorical variable contains two values: “amzn” and “goog.” If you map this categorical variable to the color parameter, the data points for “amzn” will be one color, and the data points for “goog” will be a different color. Therefore, setting color = ‘stock’ will create a line chart with two lines of different colors, one for each value of the stock variable.
(so.Plot(data = stocks
,x = 'date'
,y = 'close'
,color = 'stock'
)
.add(so.Line())
)
OUT:
data:image/s3,"s3://crabby-images/126fb/126fb7ce23f4bfc87929bd5628ec4e9a3c6916db" alt=""
Although we commonly use this parameter with categorical data, as seen above, it’s also possible to use this parameter with numeric data.
As with the x and y parameters, the argument to this parameter can be a column in a dataframe (in which case, you’ll need to put the column name in quotes). Or, the data can be in a NumPy array or array-like object.
Adding Marks
As noted above, when you make a visualization with the Seaborn objects system, you need to specify the exact type of mark that you want to draw. There are many different mark types, like dots, lines, and bars.
data:image/s3,"s3://crabby-images/7e378/7e378771d5d30ee400d43fd714f3a4d27431c0b2" alt=""
There are also other types of more specialized marks, such as area marks (which we use with area plots) and range marks.
Notice that syntactically, to add a mark, we call a specific Seaborn objects function inside of the .add() method. Each different mark type has its own function: so.Dot() for dots, so.Line() for lines, and so on.
Importantly, the type of mark that you use, along with the data mappings you create to specific parameters, will determine the plot that gets created. For example, if you want to create a line chart, you’ll add line marks.
Adding Multiple Layers
Finally, let’s talk about layering.
One of the most powerful features of the Seaborn objects system is that it allows you to create plots with multiple layers.
Syntactically, to do this you simply call the .add() method multiple times. Each time you call the add method, you can specify a different mark type for each layer.
data:image/s3,"s3://crabby-images/2dd02/2dd02bbbea8407659c5852d135cb5489ab6b4ea8" alt=""
Note that by default, all of the layers will use the data-to-parameter mappings that you specify in your call to so.Plot(), although it is possible to override those mappings inside of a particular layer.
How to Use the Seaborn Objects System: Examples
Now that we’ve looked at the syntax, let’s take a look at some examples of how to create simple data visualizations with the Seaborn objects system.
To be clear, these examples represent a small fraction of the types of visualizations you can create with Seaborn objects. But they will help you see some of the possibilities, and will help you understand how this system works.
Run This Code First
Before we get started, we need to run some setup code.
Import Packages
First, we need to import some packages.
We obviously need to import the Seaborn objects sub-package. But we’ll also need NumPy and pandas. We’re going to use NumPy and pandas to create some scatter plot and bar chart data, and we’ll use pandas to retrieve some CSV data from an external source.
import seaborn.objects as so
import pandas as pd
import numpy as np
After you import these packages, you’ll be ready to run the examples.
Example 1: Make a Scatter Plot
First, we’ll make a scatter plot. To do this, we’re first going to create a dataset, and then we’ll plot it with Seaborn objects.
Create Scatter Plot Data
Let’s start by making some scatter plot data.
To do this, we’re going to use NumPy random seed and NumPy random uniform to create some randomly distributed values for the x-axis. We’ll call this x_data.
We’ll then create some y-axis data by taking the x-axis data and adding some random noise with NumPy random normal.
Then, we’ll package those two NumPy arrays into a dataframe using the pd.DataFrame function.
np.random.seed(22)
x_data = np.random.uniform(low = 0, high = 100, size = 100)
np.random.seed(22)
np.random.normal(size = 50)
y_data = x_data + np.random.normal(size = 100, loc = 0, scale = 10)
point_data = pd.DataFrame({'x_var':x_data
,'y_var':y_data
})
We’ve called this new dataframe point_data, and we’ll plot it as a scatter plot.
Create Scatter Plot
Now, let’s plot point_data as a scatter plot using Seaborn objects. Let’s plot the data first, and then I’ll explain.
(so.Plot(data = point_data
,x = 'x_var'
,y = 'y_var'
)
.add(so.Dot())
)
OUT:
data:image/s3,"s3://crabby-images/e9fac/e9fac234513bab7c8948eba3b05a83f159ed0123" alt=""
Explanation
Ok. Let me explain.
We initialized the plot with the so.Plot() function. Inside of the so.Plot() function, we’re using the data parameter to specify that we want to plot data in the point_data dataframe. Additionally, we’re using the x parameter to specify that we want to plot the x_var column on the x-axis and we’re using the y parameter to specify that we want to plot the y_var column on the y-axis. Notice that the column names are inside quotes. Then, we’re using the .add() method to add a layer of dots. To add the dots themselves, we’re calling the so.Dots() inside of .add().
So to summarize the code, we:
- initialize plotting
- specify the data source
- map variables/columns to the appropriate parameters
- add a mark
That’s the rough structure for all Seaborn objects plots. It’s just that the details change from plot to plot.
Also, notice that we’ve enclosed the whole expression inside parentheses. That’s because this allows us to put different methods (like the .add() method) on different lines. We could, alternatively, keep everything on one line, like this:
so.Plot(data = point_data, x = 'x_var', y = 'y_var').add(so.Dot())
I strongly recommend against this though. It’s hard to read and hard to modify, and it will get worse if you need to call additional methods like the .theme() method, .scale() method, etc. Do yourself and your colleagues a favor: Enclose your Seaborn objects code inside parentheses, and put the different method calls on separate lines.
Example 2: Make a Scatter Plot With a Trend Line
Next, we’ll create a scatter plot with a trend line.
In this example, we’re going to re-use the point_data dataframe that we created in example 1, so if you need to, you can go back and run the code to create the dataset.
Create Plot
Let’s run the code to create the chart, and then I’ll explain.
(so.Plot(data = point_data
,x = 'x_var'
,y = 'y_var'
)
.add(so.Dot())
.add(so.Line(color = 'orange'), so.PolyFit())
)
OUT:
data:image/s3,"s3://crabby-images/26eb4/26eb48130a66459b1e2bc4f33a93e14b82f7b398" alt=""
Explanation
I really love this chart. It’s simple, but it’s a good example of how we can use layering to create more complex charts.
Let’s break it down.
This example is almost identical to example 1. We call the so.Plot() function, specify the dataframe, map variables to parameters, and call the .add() method to add dots.
But notice that we’ve called .add() a second time. Here, inside the method, we’re using so.Line() and so.PolyFit(). That line of code essentially adds a new layer. It adds the trend line.
This might seem simple, but it can be very powerful. Adding additional marks (like new lines, etc.) is as simple as calling the .add() method and specifying a new mark type. When done well, this enables you to create complex, multi-layered visualizations with a simple, easy-to-read syntax.
Example 3: Create a Bar Chart
Next we’ll create a bar chart.
For this example, we’re going to create a new dataset. Let’s do that first, and then we’ll visualize it.
Create Data
Here, we’ll create a dataframe with some dummy income data. The dataframe will have two variables:
- a categorical variable containing names
- a numeric variable with corresponding incomes
To create this data, we’re using the pd.DataFrame method, and using some lists to specify the data contained in the different variables.
bar_data = pd.DataFrame({'name':['Mark', 'Sofia', 'Arun']
,'income':[180000, 210000,150000]
})
Create Bar Chart
Now, let’s plot the bar chart.
(so.Plot(data = bar_data
,x = 'name'
,y = 'income'
)
.add(so.Bar())
)
OUT:
data:image/s3,"s3://crabby-images/92416/92416e4392aa4d584f0c5a8fc74762201d30fb6f" alt=""
Explanation
Notice how simple the code is, and how similar it is to previous examples.
We use the data parameter to specify the dataframe. We use the x parameter to specify the column that we want to put on the x-axis, and we use the y parameter to specify the column that we want to put on the y-axis. Finally, we use the .add() method to add a mark type. Aside from the different dataframe name and column names, the one real difference here is that we’re using so.Bar(). Why? As you’ve probably figured out, so.Bar() adds bars to a plot.
It’s so simple.
Example 4: Create a Line Chart With Multiple Lines
Now, let’s create a line chart. Specifically, we’ll create a line chart with multiple lines. To do this, we’ll get a special dataset, and then we’ll plot it.
Get Data
First, we need to get our data.
In this example, we’ll be plotting some historical stock data. To get it, we’ll use pandas read_csv to retrieve the data from a CSV file that lives at a URL. We’ll also use the pandas to_datetime method to process the date variable to a proper date format.
stocks = pd.read_csv("https://www.sharpsightlabs.com/datasets/amzn_goog_2000-01-01_to_2020-12-05.csv")
stocks.date = pd.to_datetime(stocks.date)
Note that this dataset contains stock data for two companies: Amazon and Google.
Let’s quickly look at the unique values of the stock column:
stocks.stock.unique()
OUT:
array(['amzn', 'goog'], dtype=object)
As you can see, the stock column has two categorical values, amzn and goog. This will be relevant as we create our line chart.
Create Scatter Plot
Now that we have the dataset, we’ll use Seaborn objects to create the line chart.
(so.Plot(data = stocks
,x = 'date'
,y = 'close'
,color = 'stock'
)
.add(so.Line())
)
OUT:
data:image/s3,"s3://crabby-images/16165/161656254c10be1aa8bcd2c3931fec60a3523c56" alt=""
Explanation
Ok. Let’s break this down.
Notice how similar the code is to the previous examples. We’re still using the so.Plot() function, as well as the data, x, and y parameters.
Obviously, one difference in this example is that we’re using a different dataset and column names as arguments to those parameters. But there are two other important differences:
- We’re using the color parameter to map a categorical variable (the stock column) to the color of the lines. This has the effect of creating a multi-line chart, with one differently colored line for each category of that column.
- We’re using the so.Line() function to specify that we want to draw lines.
It’s all very intuitive once you begin to understand how the system works. You initialize plotting, specify the dataset, map variables to plot attributes, and specify a type of mark that you want to add.
In this case, to change the plot type, you could actually use a different mark. Try to replace so.Line() with so.Dot() and see what happens.
Example 5: Make a Small Multiple Plot From a Line Chart
Let’s do one more quick example.
Here, we’re going to take the line chart that we made in example 4 and convert it to a small multiple chart. As you’ll see, this is as simple as calling one additional method.
Create Chart
Let’s run the code, and then I’ll explain.
(so.Plot(data = stocks
,x = 'date'
,y = 'close'
,color = 'stock'
)
.facet(col = 'stock')
.add(so.Line())
)
OUT:
data:image/s3,"s3://crabby-images/ca514/ca5149d5748b9f54006365f6cb6ae1b411495410" alt=""
Explanation
If you look carefully, you’ll notice that the code for this example is almost identical to the code to create a line chart that we used in example 4. The main difference is that we’ve added a additional line:
.facet(col = ‘stock’)
What does this line of code do? It facets the chart based on the stock column. Remember: A facet is just another word for a small multiple. So that line of code creates a different small multiple panel for each level of the categorical column, stock.
This shows how powerful the Seaborn objects system is. Small multiple charts are a powerful analytical tool, but they’re often difficult to create with other visualization toolkits. With Seaborn objects, it’s as simple as adding one additional method call. And there’s actually a lot more you can do with faceting and pair plotting. I’ve really just scratched the surface.
Continued Learning
Hopefully, this tutorial gave you a solid overview of the new Seaborn objects visualization system. We covered the high-level syntax, and I showed you some basic examples. But there’s still a lot more to learn about Seaborn objects, and I’m going to publish separate tutorials on:
- how to modify labels and titles
- how to create small multiple charts (i.e. faceting)
- how to create pair plots
- how to create multi-panel plots
- how to modify the “theme” properties of Seaborn objects plots
- and more
To learn more, keep an eye on both the Anaconda blog and the Sharp Sight blog for more tutorials about Seaborn and data visualization in Python. Thanks for reading!
About the Author
Joshua Ebner
Sharp Sight
Joshua Ebner is the Founder and Chief Data Scientist at Sharp Sight, a data science training company. Before founding Sharp Sight, Joshua did data science and analytics at Apple, Bank of America, and other Fortune 500 firms. He has a degree in physics from Cornell University.