Hello everyone! Today we’re going to talk about a topic that many data analysis beginners find daunting: Pandas! Don’t be scared by the name, it’s not a real panda ๐ผ, but one of the most powerful data processing tools in Python.
Many friends think Pandas is hard to learn? No worries! Follow my pace, and in 30 minutes, you’ll go from “not knowing anything” to “basic enough to use”! Let’s start this data analysis journey! โจ
1. Getting to Know Pandas: The Swiss Army Knife of Data Analysis ๐ช
First, we need to import this powerful tool:
import pandas as pd
import numpy as np
# Create a simple DataFrame
df = pd.DataFrame({
'Name': ['Xiao Ming', 'Xiao Hong', 'Xiao Hua', 'Xiao Li'],
'Age': [18, 22, 20, 19],
'Score': [85, 92, 78, 95]
})
print(df)
Look! This is a basic data table (DataFrame). Doesn’t it look a lot like an Excel spreadsheet? That’s right, Pandas was designed to allow you to handle data just like you would in Excel!
2. Basic Data Operations: CRUD ๐
Viewing Data
# View the first few rows
print(df.head()) # Default shows the first 5 rows
# View basic information
print(df.info()) # Shows data types and missing value information
# View statistical summary
print(df.describe()) # Shows statistical information for numeric columns
Tip: These are the most commonly used “view data” methods in daily data analysis, so I suggest memorizing them! ๐
Selecting Data
# Select a single column
print(df['Age'])
# Select multiple columns
print(df[['Name', 'Score']])
# Conditional filtering
print(df[df['Score'] > 80]) # Filter students with scores greater than 80
3. Data Processing: Making Data Obey ๐ฏ
Adding New Columns
# Add a pass status column
df['Passed'] = df['Score'] >= 60
print(df)
# Add a rating column
df['Rating'] = df['Score'].apply(lambda x: 'A' if x >= 90 else 'B' if x >= 80 else 'C')
print(df)
Data Statistics
# Calculate average score
print(f"Average Score: {df['Score'].mean():.2f}")
# Group by rating and calculate statistics
print(df.groupby('Rating')['Score'].agg(['count', 'mean']))
4. Data Cleaning: Handling Dirty Data ๐งน
# Handle missing values
df['Score'] = df['Score'].fillna(df['Score'].mean()) # Fill missing values with the average
# Remove duplicate rows
df = df.drop_duplicates()
# Reset index
df = df.reset_index(drop=True)
5. Practical Tips: Efficiency Hacks ๐ก
Data Sorting
# Sort by score in descending order
df_sorted = df.sort_values('Score', ascending=False)
print(df_sorted)
Data Merging
# Create another DataFrame
df2 = pd.DataFrame({
'Name': ['Xiao Ming', 'Xiao Hong'],
'Sports Score': [92, 88]
})
# Merge data
df_merged = pd.merge(df, df2, on='Name', how='left')
print(df_merged)
Mini Project: Score Analysis System ๐
Let’s do a mini project using what we’ve learned:
def analyze_scores(df):
# Basic statistics
print(f"Class Average: {df['Score'].mean():.2f}")
print(f"Highest Score: {df['Score'].max()}")
print(f"Lowest Score: {df['Score'].min()}")
# Score distribution
bins = [0, 60, 70, 80, 90, 100]
labels = ['Fail', 'Pass', 'Good', 'Excellent', 'Outstanding']
df['Score Level'] = pd.cut(df['Score'], bins=bins, labels=labels)
# Count of each level
grade_counts = df['Score Level'].value_counts()
print("\nScore Distribution:")
print(grade_counts)
return df
# Run analysis
df = analyze_scores(df)
Summary & Suggestions ๐
-
The most commonly used operations in Pandas are these! Mastering them will be enough for daily data analysis. -
Practice often, especially data filtering and grouping statistics, as these are the most commonly used. -
You can start with small projects, like analyzing your own spending records or study scores.
Data analysis isn’t hard; what’s hard is not starting! Open Python now and type out the code from this article! ๐ช
Bonus Tips ๐
-
Make good use of the Tab completion feature; after typing <span>df.</span>
, press the Tab key to see all available methods -
Remember <span>df.head()</span>
,<span>df.info()</span>
, and<span>df.describe()</span>
are the three most commonly used methods to view data -
Use <span>df.groupby()</span>
frequently; this is one of the core operations in data analysis
Alright, that’s today’s Pandas introductory tutorial! If you found it helpful, remember to like and save it! If you have any questions, feel free to discuss in the comments, and let’s improve together! ๐