100 Questions on Python Data Analysis Part 2 (11-20)

import pandas as pd
import numpy as np

Date: May 8, 2025

Pandas and NumPy are the cornerstones of Python data analysis. Combined with SciPy, Seaborn, and Matplotlib, they provide a complete workflow from numerical computation to visualization. This article compiles and answers the 11th to 20th common questions, covering NumPy array operations, broadcasting, linear algebra, Seaborn visualization, SciPy optimization and integration, descriptive statistics, and Pandas time series functionality, with code examples to help readers master the core techniques of Python data analysis.

Questions and Answers

11. How to install and import the NumPy library?

NumPy is a Python library for efficient numerical computation and can be installed via package managers:

pip install numpy

or using Conda:

conda install numpy

After installation, import it as follows:

import numpy as np

Using np as an alias makes it easier to call NumPy’s array operations and mathematical functions.

12. How to create NumPy arrays?

NumPy provides several methods to create arrays:

# Convert list to array
a = [1, 2, 3]
a = np.array(a)
print(a)  # Output: [1 2 3]

# Create a 3x4 array of zeros
zero_numpy = np.zeros((3, 4))
print(zero_numpy)
# Output:
# [[0. 0. 0. 0.]
#  [0. 0. 0. 0.]
#  [0. 0. 0. 0.]]

# Create a 3x4 array of ones
one_numpy = np.ones((3, 4))
print(one_numpy)
# Output:
# [[1. 1. 1. 1.]
#  [1. 1. 1. 1.]
#  [1. 1. 1. 1.]]

# Create an array with a range (0 to 15, step 4)
arange_array = np.arange(0, 16, 4)
print(arange_array)  # Output: [ 0  4  8 12]

# Create a 2x3 random array (uniform distribution between 0-1)
random_array = np.random.rand(2, 3)
print(random_array)
# Example output:
# [[0.30586899 0.42824434 0.47621228]
#  [0.15227711 0.06399198 0.74924127]]

array() converts a list to an array.
zeros() and ones() create arrays of specified shapes.
arange() generates a range of values, and random.rand() generates random values.

13. How to perform element-wise operations on NumPy arrays?

NumPy arrays support element-wise operations, which differ from the concatenation behavior of Python lists:

a = [1, 2, 3, 4]
b = [5, 6, 7, 8]
print(a + b)  # Output: [1, 2, 3, 4, 5, 6, 7, 8]

array_a = np.array(a)
array_b = np.array(b)
# Element-wise addition, subtraction, multiplication, and division
add = array_a + array_b
print(add)  # Output: [ 6  8 10 12]
sub = array_a - array_b
mul = array_a * array_b
div = array_a / array_b

# Square, square root, and exponential operations
square = np.square(array_b)
sqrt = np.sqrt(array_b)
exp = np.exp(array_b)
print(exp)  # Output: [148.4131591  403.42879349 1096.63315843 2980.95798704]

# Comparison operations
print(array_b > 6)  # Output: [False False  True  True]

For lists, + is concatenation, while for NumPy arrays, + is element-wise addition.
Functions like square, sqrt, and exp operate element-wise, and comparison operations return boolean arrays.

14. How to perform broadcasting in NumPy?

Broadcasting extends smaller arrays to match the shape of larger arrays for compatible operations. The rules are: compare from the last dimension, compatible if dimensions are equal or one of them is 1; if the number of dimensions differs, pad with 1.

# Scalar broadcasting
a = np.array([1, 2, 3])  # Shape: (3,)
b = 2  # Shape: (1,)
print(a + b)  # Output: [3 4 5]

# Array broadcasting
a = np.array([[1, 2, 3], [4, 5, 6]])  # Shape: (2, 3)
b = np.array([10, 20, 30])  # Shape: (3,)
print(a + b)
# Output:
# [[11 22 33]
#  [14 25 36]]

# Multi-dimensional broadcasting
a = np.array([[1], [2], [3]])  # Shape: (3, 1)
b = np.array([[10, 20, 30]])  # Shape: (1, 3)
print(a + b)
# Output:
# [[11 21 31]
#  [12 22 32]
#  [13 23 33]]

Scalar b=2 is extended to (3,) and added to a.
b=(3,) is replicated to (2, 3) and added to a=(2, 3).
a=(3, 1) and b=(1, 3) are extended to (3, 3) for operations.
Incompatible dimensions (e.g., (3,) and (2,)<code>) will raise an error.

15. How to perform linear operations on NumPy arrays?

The linalg module of NumPy supports linear algebra operations:

A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
# Element-wise multiplication
print(A * B)
# Output:
# [[ 5 12]
#  [21 32]]
# Matrix multiplication
print(np.dot(A, B))
print(A @ B)  # Equivalent to dot
# Output:
# [[19 22]
#  [43 50]]

# Solve linear equations Ax = b
A = np.array([[2, 3], [1, -1]])
b = np.array([5, 1])
print(np.linalg.solve(A, b))  # Output: [2. 1.]

# Eigenvalues and eigenvectors
A = np.array([[4, 2], [1, 3]])
eigenvalues, eigenvectors = np.linalg.eig(A)
print(eigenvalues)  # Output: [5. 2.]
print(eigenvectors)
# Output:
# [[ 0.89442719 -0.70710678]
#  [ 0.4472136   0.70710678]]

dot or @ performs matrix multiplication, while * is for element-wise multiplication.
solve solves linear equations, such as 2x₁ + 3x₂ = 5, x₁ - x₂ = 1.
eig returns eigenvalues and eigenvectors.

16. How to use Seaborn for data visualization?

Seaborn is a high-level visualization library based on Matplotlib, providing a beautiful and concise plotting interface. Installation:

pip install seaborn

Example:

import seaborn as sns
import matplotlib.pyplot as plt
x = [1, 2, 3, 4, 5]
y = [1, 4, 9, 16, 25]
sns.barplot(x=x, y=y)
plt.title('Example Plot')
plt.xlabel('X')
plt.ylabel('Y')
plt.show()

barplot creates bar charts, with other types including scatterplot (scatter plots), histplot (histograms), and pie (to be implemented via Matplotlib).
plt methods set titles and axis labels to enhance plot readability.

17. How to install and use SciPy?

SciPy is a scientific computing library that supports optimization, integration, signal processing, etc. Installation:

pip install scipy

Import:

import scipy

Submodules of SciPy (such as optimize and integrate) need to be imported separately for use.

18. How to use SciPy for optimization and integration?

Optimization:

from scipy.optimize import minimize
# Unconstrained optimization
def objective_function(x):
    return x**2 + 4*x + 4
initial_guess = 0
result = minimize(objective_function, initial_guess)
print(result.x, result.fun)  # Output: [-2.] 0.0

# Constrained optimization
def objective(x):
    return x[0]**2 + x[1]**2
constraint = {'type': 'ineq', 'fun': lambda x: x[0] + x[1] - 1}
x0 = [1, 1]
result = minimize(objective, x0, constraints=constraint, method='SLSQP')
print("Optimal solution:", result.x)  # Example output: [0.5 0.5]
print("Optimal value:", result.fun)  # Example output: 0.5

minimize optimizes the objective function, with initial_guess as the initial guess.
Constrained optimization uses the constraints parameter, where 'ineq' indicates inequality constraints (e.g., x₀ + x₁ ≥ 1).

Integration:

from scipy.integrate import quad
def integrand(x):
    return x * 2
result, error = quad(integrand, 0, 2)
print("Integral Value:", result)  # Output: 4.0
print("Estimate Error:", error)  # Output: small error value

quad computes definite integrals, with 0 and 2 as the limits, returning the integral value and error estimate.

19. How to perform descriptive statistics?

NumPy provides a rich set of statistical functions suitable for multi-dimensional data:

data = np.array([
    [[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]],
    [[13, 14, 15], [16, 17, 18], [19, 20, 21], [22, 23, 24]]
])
print("Data shape:", data.shape)  # Output: (2, 4, 3)
# Global statistics
print("Total:", np.sum(data))
print("Mean:", np.mean(data))
print("Standard deviation:", np.std(data))
print("Minimum:", np.min(data))
print("Maximum:", np.max(data))
# Statistics along axes
sum_by_feature = np.sum(data, axis=(0, 1))
print("Sum by feature:", sum_by_feature)  # Output: [92 100 108]
mean_by_batch = np.mean(data, axis=1)
print("Mean by batch:
", mean_by_batch)
max_by_sample = np.max(data, axis=0)
print("Max by sample:
", max_by_sample)
# Percentiles
print("25th percentile:", np.percentile(data, 25))
print("Median:", np.median(data))
print("75th percentile:", np.percentile(data, 75))
median_by_feature = np.median(data, axis=(0, 1))
print("Median by feature:", median_by_feature)
var_by_feature = np.var(data, axis=(0, 1))
print("Variance by feature:", var_by_feature)
# Covariance and correlation coefficients
flattened = data.reshape(-1, data.shape[-1])
cov_matrix = np.cov(flattened.T)
print("Covariance matrix:
", cov_matrix)
corr_matrix = np.corrcoef(flattened.T)
print("Correlation coefficient matrix:
", corr_matrix)

sum, mean, etc., calculate global or axis statistics, where axis=(0, 1) indicates summing along batch and sample axes.
percentile and median calculate percentiles, while cov and corrcoef analyze variable relationships.

20. How to use Pandas’ time series functionality?

Pandas provides powerful time series processing capabilities:

# Create a time series
date_rng = pd.date_range(start='2025-01-01', end='2025-01-10', freq='D')
ts = pd.Series(data=range(len(date_rng)), index=date_rng)
print(ts)
# Indexing and slicing
print(ts['2025-01-03'])  # Output: 2
print(ts['2025-01-05':'2025-01-08'])
# Resampling
weekly_mean = ts.resample('W').mean()
print(weekly_mean)
# Rolling window
rolling_mean = ts.rolling(window=3).mean()
print(rolling_mean)

date_range generates date indices, with freq='D' indicating daily frequency.
Supports date indexing, slicing, resampling (resample), and rolling window (rolling) operations.

Conclusion

This article answered 10 core questions about NumPy, SciPy, Seaborn, and Pandas, covering array creation, operations, broadcasting, linear algebra, visualization, optimization, integration, statistics, and time series processing. Code examples combined with real-world scenarios demonstrate the powerful capabilities of Python data analysis. Part 3 will be released soon, continuing to explore advanced techniques in data analysis, stay tuned.