Introduction to Python Data Visualization

What is Data Visualization? Data visualization is aimed at making data more efficiently reflect the situation of the data, facilitating readers to read more efficiently, and highlighting the patterns behind the data to emphasize important factors within the data. If you are using Python for data visualization, it is recommended to master the following four Python data analysis packages:

Pandas, Matplotlib, Seaborn, Pyecharts

Mastering these four data analysis packages is sufficient for visualization. The article is quite long, so it is advised to read patiently. After learning, you can use Python for data visualization. The specific code practical parts can be demonstrated with actual code, which will help you grasp the concepts better. Let’s learn together~
Introduction to Python Data Visualization

01. Pandas

Official website: https://www.pypandas.cn/

Pandas is the core data analysis support library for Python, providing fast, flexible, and clear data structures aimed at handling relational and labeled data simply and intuitively. It is widely used in the field of data analysis and is suitable for handling tabular data similar to Excel tables, as well as ordered and unordered time series data.

The main data structures of Pandas are Series (one-dimensional data) and DataFrame (two-dimensional data). These two data structures are sufficient to handle most typical use cases in finance, statistics, social sciences, engineering, etc. The data analysis process using Pandas includes stages of data organization and cleaning, data analysis and modeling, data visualization, and tabulation.

  • Flexible grouping function: group by data grouping;
  • Intuitive merging function: merge data connections;
  • Flexible reshaping function: reshape data;

Introduction to Python Data Visualization

The pandas library can not only perform some data cleaning tasks but also create plots using pandas, and it can easily plot with a single line of code. Detailed plotting methods can be found in the comments within the code.

# Import the pandas library
import pandas as pd

# Generate a Series
s=pd.Series([1,3,3,4], index=list('ABCD'))

# If no chart type is specified in parentheses, a line chart is generated by default
s.plot()

Introduction to Python Data Visualization

# Bar chart
s.plot(kind='bar')

Introduction to Python Data Visualization

# Horizontal bar chart
s.plot.barh()

Introduction to Python Data Visualization

# Pie chart
s.plot.pie()

Introduction to Python Data Visualization

# Histogram
s.plot.hist()

Introduction to Python Data Visualization

# Density plot
import numpy as np
s=pd.Series(np.random.randn(1000))  # Generate a column of random numbers
s.plot.kde()
s.plot.density()

Introduction to Python Data Visualization

# Scatter plot
import numpy as np  # Generate a DataFrame
df=pd.DataFrame(np.random.randn(1000,2), columns=['X1','Y'])
df.plot.scatter(x='X1',y='Y')

Introduction to Python Data Visualization

# Hexbin plot
df.plot.hexbin(x='X1',y='Y',gridsize=8)

Introduction to Python Data Visualization

# Box plot
df=pd.DataFrame(np.random.rand(10,2),columns=['A','B'])
df.plot.box()

Introduction to Python Data Visualization

# Area plot
df=pd.DataFrame(np.random.randint(10,size=(4,4)),
                 columns=list('ABCD'),
                 index=list('WXYZ'))

df.plot.area()

Introduction to Python Data Visualization

02. Matplotlib

Official website: https://www.matplotlib.org.cn/

Matplotlib is a 2D plotting library for Python that generates publication-quality graphics in various hardcopy formats and interactive environments across platforms. Matplotlib can be used in Python scripts, Python and IPython Shells, Jupyter notebooks, web application servers, and four graphical user interface toolkits.

Matplotlib tries to make easy things easier and hard things possible, with just a few lines of code to generate charts, histograms, power spectra, bar charts, error charts, scatter plots, etc.

For simple plotting, the pyplot module provides an interface similar to MATLAB, especially when used in conjunction with IPython. For advanced users, you can have complete control over line styles, font properties, axis properties, etc., through an object-oriented interface or a set of functions familiar to MATLAB users.

Introduction to Python Data Visualization

Below is an introduction to the usage of matplotlib. In addition to plotting, matplotlib can also adjust the parameters of the charts to make them more aesthetically pleasing. Regarding the use of matplotlib, it is advisable to create some common chart templates, and by changing the data source in the code, you can generate charts without adjusting parameters one by one.

# Import module
import matplotlib.pyplot as plt

# Set style
plt.style.use('seaborn-white')

# Chinese display issue. Without this code, the chart will not display Chinese characters
plt.rcParams['font.sans-serif'] =['SimHei']

Here, we first import the matplotlib library and use the seaborn-white chart style. You can use plt.style.available to view the available chart styles and choose one that you like. If Chinese characters cannot be displayed in the chart, a snippet of code can fix that.

# Build a DataFrame
import pandas as pd
import matplotlib.pyplot as plt
df=pd.DataFrame({'X':[1,3,5,7]})
df['Y']=df['X']**3
df
Introduction to Python Data Visualization
# Set the size of the image
plt.figure(facecolor='white',figsize=(9,6),dpi=100)
plt.plot(df['X'],df['Y'])

# Set the title of the image
plt.title('Line Chart',fontsize=15,color='b')
# Set the X and Y axis title size, color, and distance from the axes
plt.xlabel('X-axis',fontsize=10,color='r',labelpad=15)
plt.ylabel('Y-axis',fontsize=10,color='g',rotation=0,labelpad=15)

# Set starting coordinates
plt.xlim([1,8])
plt.ylim([1,350])
# plt.xticks([1,2,3,4]) only shows 1,2,3,4
# plt.yticks([50,150,250,300]) only shows 50,150,250,300
# Set grid lines for the image
plt.grid(color='r', linestyle='-.')

Here, we first set the size of the image, similar to choosing the size of the paper for drawing. The same principle applies here, then we set the axes, starting coordinates, grid lines, etc.

Introduction to Python Data Visualization

Sometimes, you may want to draw multiple lines on a single chart.

# Method for plotting multiple graphs
import numpy as np
import matplotlib.pyplot as plt
x=np.array([1,3,5])
y1=x
y2=x * 10
y3=x * 20
y4=x * 30

You can continue adding another plt.plot command after one plt.plot command to create another line on the same chart.

plt.figure(facecolor='white')
plt.plot(x,y1,label='A')
plt.plot(x,y2,label='B')
plt.plot(x,y3,label='C')
plt.plot(x,y4,label='D')
plt.legend()  # Show legend

Introduction to Python Data Visualization

Using the plt.subplots command can also produce the same chart.

# Object-oriented plotting
fig,ax=plt.subplots(facecolor='white')
plt.plot(x,y1,label='A')
plt.plot(x,y2,label='B')
plt.plot(x,y3,label='C')
plt.plot(x,y4,label='D')
plt.legend()  # Show legend

Introduction to Python Data Visualization

Multiple Graphs

Next, we will introduce how to plot different line types at different positions on a single chart, using the plt.subplot command to first determine the plotting position. For example, plt.subplot(223) indicates the third position in a 2×2 distributed chart, with the remaining plotting commands being similar.

plt.figure(facecolor='white',figsize=(9,6))
plt.subplot(221)
plt.plot(x,y1,label='A',color='r')
plt.xticks(fontsize=15)
plt.legend()  # Show legend
plt.subplot(222)
plt.plot(x,y2,label='B',color='y')
plt.xticks(fontsize=15)
plt.legend()  # Show legend
plt.subplot(223)
plt.plot(x,y3,label='C',color='b')
plt.xticks(fontsize=15)
plt.legend()  # Show legend
plt.subplot(224)
plt.plot(x,y4,label='D',color='g')
plt.xticks(fontsize=15)
plt.legend()  # Show legend
plt.tight_layout()  # Compact display

Introduction to Python Data Visualization

In addition to using the plt.subplot command to determine the plotting area, you can also use the axs[] command for plotting, which is an object-oriented plotting approach.

# Object-oriented multi-plotting
fig,axs=plt.subplots(2,2,facecolor='white',figsize=(9,6))
axs[0,0].plot(x,y1,label='A',color='r')
axs[0,1].plot(x,y2,label='B',color='y')
axs[1,0].plot(x,y3,label='C',color='b')
axs[1,1].plot(x,y4,label='D',color='g')

Introduction to Python Data Visualization

Sometimes when plotting multiple charts, you may need to share a coordinate axis. You can use the sharex=’all’ command.

# sharex='all' to share X-axis
fig,axs=plt.subplots(4,1,facecolor='white', figsize=(9,6), sharex='all')
axs[0].plot(x,y1,label='A',color='r')
axs[1].plot(x,y2,label='B',color='y')
axs[2].plot(x,y3,label='C',color='b')
axs[3].plot(x,y4,label='D',color='g')

Introduction to Python Data Visualization

Setting Global Variables

Use the plt.rcParams command to set global variables, including character display, Chinese display, background color, title size, axis font size, line styles, etc.

# Import module
import matplotlib.pyplot as plt

# Set style
plt.style.use('seaborn-white')

# Set global variables
plt.rcParams['axes.unicode_minus'] = False  # Character display
plt.rcParams['font.sans-serif'] =['SimHei']  # Chinese display
plt.rcParams['figure.facecolor'] = 'b'  # Set chart background color
plt.rcParams['axes.facecolor'] = (0.8,0.9,0.8)  # Set RGB color
plt.rcParams['axes.titlesize'] = 20  # Set title size
plt.rcParams['axes.labelsize'] = 20  # Set axis size
plt.rcParams['xtick.labelsize'] = 20  # Set X-axis size
plt.rcParams['ytick.labelsize'] = 20  # Set Y-axis size
plt.rcParams['lines.linestyle'] = '-.'  # Set line style

plt.plot(x,y1,label='A')
plt.plot(x,y2,label='B')
plt.plot(x,y3,label='C')
plt.plot(x,y4,label='D')
plt.title('Line Chart')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.legend()  # Show legend

The chart below is created by setting global variables. Personally, I find it not very aesthetically pleasing. For other charts, global variable settings can be explored to create better-looking charts.

Introduction to Python Data Visualization

03. Seaborn

Official website: http://seaborn.pydata.org/

Seaborn is a Python data visualization library based on matplotlib, built on top of matplotlib and closely integrated with Pandas data structures, providing a high-level interface for drawing attractive and informative statistical graphics.

Seaborn can be used to explore data, and its plotting capabilities operate on data frames and arrays containing entire datasets, performing necessary semantic mapping and statistical aggregation internally to generate informative graphics. Its dataset-oriented declarative API allows focusing on the meaning of different elements of the plot rather than the details of how to draw them.

While matplotlib has a comprehensive and powerful API that allows changing almost any property of the graphics according to personal preferences, the combination of Seaborn’s high-level interface and matplotlib’s deep customizability makes it possible to quickly explore data and create graphics that can be customized into publication-quality final products.

Introduction to Python Data Visualization

Plotting Multi-line Graphs

Variables can be plotted in a multi-line format using the sns.FacetGrid command.

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_theme(style="white", rc={"axes.facecolor": (0, 0, 0, 0)})
rs = np.random.RandomState(1979)
x = rs.randn(500)
g = np.tile(list("ABCDEFGHIJ"), 50)
df = pd.DataFrame(dict(x=x, g=g))
m = df.g.map(ord)
df["x"] += m

pal = sns.cubehelix_palette(10, rot=-.25, light=.7)
g = sns.FacetGrid(df, row="g", hue="g", aspect=15, height=.5, palette=pal)
g.map(sns.kdeplot, "x", bw_adjust=.5, clip_on=False, fill=True, alpha=1, linewidth=1.5)
g.map(sns.kdeplot, "x", clip_on=False, color="w", lw=2, bw_adjust=.5)
g.refline(y=0, linewidth=2, linestyle="-", color=None, clip_on=False)
def label(x, color, label):
    ax = plt.gca()
    ax.text(0, .2, label, fontweight="bold", color=color,
            ha="left", va="center", transform=ax.transAxes)
g.map(label, "x")
g.figure.subplots_adjust(hspace=-.25)
g.set_titles("")
g.set(yticks=[], ylabel="")
g.despine(bottom=True, left=True)

Introduction to Python Data Visualization

Plotting Heatmaps

The size of the data can be presented using a heatmap, using the sns.heatmap command.

import matplotlib.pyplot as plt
import seaborn as sns
sns.set_theme()

# Load the example flights dataset and convert to long-form
flights_long = sns.load_dataset("flights")
flights = flights_long.pivot("month", "year", "passengers")

# Draw a heatmap with the numeric values in each cell
f, ax = plt.subplots(figsize=(9, 6))
sns.heatmap(flights, annot=True, fmt="d", linewidths=.5, ax=ax)

Introduction to Python Data Visualization

04. Pyecharts

Official website: https://pyecharts.org/#/

Echarts is an open-source data visualization library developed by Baidu, recognized by many developers for its good interactivity and exquisite chart design. Python, being an expressive language, is well-suited for data processing. When data analysis meets data visualization, Pyecharts was born.

Pyecharts has a simple API design, allowing for smooth usage, supporting chain calls, and encompasses more than 30 common chart types, covering a wide range of needs. It supports mainstream notebook environments, Jupyter Notebook and JupyterLab, and has highly flexible configuration options to easily create beautiful charts.

Pyecharts’ powerful data interaction capabilities make data expression more vivid, enhancing human-computer interaction effects, and the data presentation can be directly exported as an HTML file, increasing opportunities for data result interaction, making information communication easier.

Introduction to Python Data Visualization

Plotting Maps

Pyecharts has a rich collection of chart materials, supporting chain calls. Below is an example of using Pyecharts’ geographical chart functionality to visually display data visualization effects spatially.

from pyecharts import options as opts
from pyecharts.charts import Map
from pyecharts.faker import Faker

c = (
    Map()
    .add("Vendor A", [list(z) for z in zip(Faker.provinces, Faker.values())], "china")
    .set_global_opts(
        title_opts=opts.TitleOpts(title="Map-VisualMap (Segmented)"),
        visualmap_opts=opts.VisualMapOpts(max_=200, is_piecewise=True),
    )
    .render("map_visualmap_piecewise.html"))

Introduction to Python Data Visualization

Plotting Radar Charts

Use the Radar command to plot radar charts, which are used to display multivariate data graphically.

from pyecharts import options as opts
from pyecharts.charts import Radar

v1 = [[4300, 10000, 28000, 35000, 50000, 19000]]
v2 = [[5000, 14000, 28000, 31000, 42000, 21000]]
c = (
    Radar()
    .add_schema(
        schema=[
            opts.RadarIndicatorItem(name="Sales", max_=6500),
            opts.RadarIndicatorItem(name="Management", max_=16000),
            opts.RadarIndicatorItem(name="Information Technology", max_=30000),
            opts.RadarIndicatorItem(name="Customer Service", max_=38000),
            opts.RadarIndicatorItem(name="Research and Development", max_=52000),
            opts.RadarIndicatorItem(name="Marketing", max_=25000),
        ]
    )
    .add("Budget Allocation", v1)
    .add("Actual Expenses", v2)
    .set_series_opts(label_opts=opts.LabelOpts(is_show=False))
    .set_global_opts(
        legend_opts=opts.LegendOpts(selected_mode="single"),
        title_opts=opts.TitleOpts(title="Radar-Single Instance Mode"),
    )
    .render("radar_selected_mode.html"))

Introduction to Python Data Visualization

The above introduces how to plot charts using pandas, while also extending the use of the matplotlib library, and introducing the two data visualization libraries Seaborn and Pyecharts. This provides an understanding of Python data visualization content, along with learning multi-plotting and setting global variables in data visualization. I believe that through this learning, it will surely inspire your learning in Python data visualization.

The End

For more content on Python data visualization, you can check out my new book “Comparing Excel, Easily Learning Python Data Visualization”,original price 89, full-color printing, now at 60% off. Click the link below to purchase.

Leave a Comment