20 Common String Operations in Python: How Many Do You Master?

Strings are the most commonly used data type in Python, yet 90% of programmers only utilize 20% of string operations. This article introduces 20 high-frequency, efficient string operation techniques. Mastering them can enhance your code efficiency by tenfold.

1. Basic String Search and Replace (5 Operations)

1. find() vs index() — Finding the Position of a Substring

These two methods appear to have the same functionality, but there are key differences.

text = "Python is awesome, Python is powerful"

# find(): returns index if found, -1 if not found (does not raise an error)
pos1 = text.find("Python")
print(pos1)  # Output: 0

pos2 = text.find("Java")
print(pos2)  # Output: -1 (not found, returns -1)

# index(): returns index if found, raises an error if not found
pos3 = text.index("Python")
print(pos3)  # Output: 0

pos4 = text.index("Java")
# Raises: ValueError: substring not found

# Find the position of the second occurrence
second_pos = text.find("Python", 1)  # Start searching from position 1
print(second_pos)  # Output: 26

Key Difference:

  • <span>find()</span> returns -1 if not found, which is safer
  • <span>index()</span> raises an error if not found, requiring exception handling

Best Practice: It is recommended to use <span>find()</span> to avoid the overhead of exception handling.

2. replace() — Replacing Substrings

text = "hello world, hello python"

# Basic replacement: replace all matches
result1 = text.replace("hello", "hi")
print(result1)
# Output: hi world, hi python

# Replace a specified number of times: only replace the first n occurrences
result2 = text.replace("hello", "hi", 1)  # Only replace the 1st occurrence
print(result2)
# Output: hi world, hello python

# Case-sensitive
text2 = "Hello world, hello python"
result3 = text2.replace("hello", "hi")
print(result3)
# Output: Hello world, hi python (the first H is not replaced)

Performance Pitfall:

# ❌ Bad practice (inefficient)
text = "a" * 1000000  # 1 million 'a'
for i in range(100):
    text = text.replace("a", "b")  # Traverses the entire string each time

# ✅ Good practice (efficient)
import string
text = "a" * 1000000
result = text.translate(str.maketrans("a", "b"))  # One traversal does it

3. count() — Counting the Occurrences of a Substring

text = "the quick brown fox jumps over the lazy dog"

# Basic counting
count1 = text.count("the")
print(count1)  # Output: 2

# Count in a specified range (from index 5 to 30)
count2 = text.count("the", 5, 30)
print(count2)  # Output: 1

# Count the frequency of different characters
stats = {}
for char in text:
    if char != ' ':
        stats[char] = stats.get(char, 0) + 1

print(stats)
# Output: {'t': 2, 'h': 2, 'e': 3, ...}

# More efficient way: use Counter
from collections import Counter
char_count = Counter(text.replace(" ", ""))
print(char_count.most_common(3))  # Output the 3 most common characters

Production Use Case:

# Count the occurrences of keywords in logs
log_text = """
ERROR: Database connection failed
WARNING: Memory usage high
ERROR: Timeout error
INFO: Server restarted
ERROR: Authentication failed
"""

error_count = log_text.count("ERROR")
warning_count = log_text.count("WARNING")
print(f"Error count: {error_count}, Warning count: {warning_count}")

4. startswith() and endswith() — Prefix and Suffix Checking

filename = "document.pdf"
url = "https://www.example.com"

# Check suffix
if filename.endswith((".pdf", ".doc", ".docx")):
    print("This is a document file")

# Check prefix
if url.startswith(("http://", "https://")):
    print("This is a URL")

# Practical application: file filtering
import os

def get_python_files(directory):
    """Get all Python files in the directory"""
    python_files = []
    for file in os.listdir(directory):
        if file.endswith('.py'):
            python_files.append(file)
    return python_files

# More Pythonic way
def get_python_files_v2(directory):
    """More efficient version"""
    return [f for f in os.listdir(directory) if f.endswith('.py')]

Performance Comparison:

# ❌ Bad practice
if filename.endswith('.pdf') or filename.endswith('.doc'):
    pass

# ✅ Good practice (3 times faster)
if filename.endswith(('.pdf', '.doc')):
    pass

5. strip() / lstrip() / rstrip() — Removing Whitespace

text = "   hello world   \n"

# strip(): removes whitespace from both ends
result1 = text.strip()
print(f"'{result1}'")  # Output: 'hello world'

# lstrip(): removes whitespace from the left end only
result2 = text.lstrip()
print(f"'{result2}'")  # Output: 'hello world   \n'

# rstrip(): removes whitespace from the right end only
result3 = text.rstrip()
print(f"'{result3}'")  # Output: '   hello world'

# ⚠️ Key Pitfall: does not only remove one space!
text2 = "---hello---"
print(text2.strip("-"))  # Output: hello (all consecutive - are removed)

# Custom characters to remove
text3 = "xxxhelloyyy"
print(text3.strip("xy"))  # Output: hello
print(text3.strip("xyhel"))  # Output: o (removes any character contained)

# Practical application: cleaning CSV data
csv_line = " 张三 , 25 , 北京 \n"
fields = [f.strip() for f in csv_line.split(',')]
print(fields)
# Output: ['张三', '25', '北京']

# Handling user input
user_input = input("Please enter your name:").strip()
# Automatically remove excess whitespace to avoid data inconsistency

Common Errors:

# ❌ Error: strip removes a set of characters, not the string itself
text = "hello"
print(text.strip("lo"))  # Output: he (not hello)

# ✅ Correct practice: if you want to remove a string prefix
if text.startswith("lo"):
    text = text[2:]

2. Advanced Splitting and Joining (4 Operations)

6. split() and rsplit() — The Art of Splitting Strings

# Basic splitting
text = "apple,banana,cherry,date"

parts1 = text.split(",")
print(parts1)
# Output: ['apple', 'banana', 'cherry', 'date']

# Limit the number of splits
parts2 = text.split(",", 2)  # Only split 2 times
print(parts2)
# Output: ['apple', 'banana', 'cherry,date']

# rsplit(): splits from the right
parts3 = text.rsplit(",", 2)  # Split 2 times from the right
print(parts3)
# Output: ['apple,banana', 'cherry', 'date']

# Split with multiple delimiters (using regex)
import re
text2 = "apple, banana; cherry: date"
parts4 = re.split(r'[,;:]', text2)
print(parts4)
# Output: ['apple', ' banana', ' cherry', ' date']

# Practical application 1: parsing URLs
url = "https://www.example.com/path/to/resource?key=value&amp;foo=bar"
protocol, rest = url.split("://", 1)
domain, rest = rest.split("/", 1)
path, query = rest.split("?", 1)
print(f"Protocol: {protocol}, Domain: {domain}, Path: {path}, Query: {query}")

# Practical application 2: parsing CSV lines
csv_line = 'John,"Smith, Jr.",30,New York'
# Simple split will fail, need to use csv module
import csv
reader = csv.reader([csv_line])
fields = next(reader)
print(fields)
# Output: ['John', 'Smith, Jr.', '30', 'New York']

Performance Comparison:

# ❌ Inefficient: multiple splits
text = "a:b:c:d:e"
parts = text.split(":")
result = parts[2]  # Get the 3rd element

# ✅ Efficient: only split the needed parts
result = text.split(":", 3)[2]

7. join() — Joining Strings

# Basic joining
words = ["hello", "world", "python"]
result1 = " ".join(words)
print(result1)  # Output: hello world python

# Joining numbers (need to convert)
numbers = [1, 2, 3, 4, 5]
result2 = "-".join(str(n) for n in numbers)
print(result2)  # Output: 1-2-3-4-5

# Practical application 1: generating SQL IN statement
ids = [1, 2, 3, 4, 5]
sql = f"SELECT * FROM users WHERE id IN ({','.join(map(str, ids))})"
print(sql)

# Practical application 2: generating URL path
path_parts = ["api", "v1", "users", "123"]
path = "/" + "/".join(path_parts)
print(path)  # Output: /api/v1/users/123

# Practical application 3: generating CSV line
data = ["张三", 25, "北京", "[email protected]"]
csv_line = ",".join(map(str, data))
print(csv_line)

# ⚠️ Performance Pitfall: do not use + to concatenate multiple strings
# ❌ Bad practice (creates a new string each time, O(n²) complexity)
result = ""
for word in words:
    result = result + " " + word

# ✅ Good practice (one-time join, O(n) complexity)
result = " ".join(words)

Large Scale Data Comparison:

import time

# Generate 10,000 strings
data = ["word"] * 10000

# Using + to join (time-consuming)
start = time.time()
result = ""
for word in data:
    result += word + ","
time1 = time.time() - start

# Using join (fast)
start = time.time()
result = ",".join(data)
time2 = time.time() - start

print(f"+ method: {time1:.4f}s, join method: {time2:.4f}s")
# Output example: + method: 0.1234s, join method: 0.0012s (100 times faster!)

8. partition() and rpartition() — Three-Way Split

# partition(): splits into three parts at the first delimiter
text = "name=John;age=30;city=NYC"

head, sep, tail = text.partition(";")
print(f"Before: {head}, Separator: {sep}, After: {tail}")
# Output: Before: name=John, Separator: ;, After: age=30;city=NYC

# Practical application: parsing key=value format
def parse_key_value(text):
    key, sep, value = text.partition("=")
    return key.strip(), value.strip() if sep else None

result = parse_key_value("timeout = 3000")
print(result)  # Output: ('timeout', '3000')

# rpartition(): splits from the right
head, sep, tail = text.rpartition(";")
print(f"Before: {head}, Separator: {sep}, After: {tail}")
# Output: Before: name=John;age=30, Separator: ;, After: city=NYC

# Practical application: getting file extension
def get_file_info(filename):
    name, sep, ext = filename.rpartition(".")
    return name, ext if sep else ""

print(get_file_info("document.pdf"))  # Output: ('document', 'pdf')
print(get_file_info("archive.tar.gz"))  # Output: ('archive.tar', 'gz')

3. Formatting and Conversion (5 Operations)

9. format() and f-string — The Evolution of String Formatting

name = "张三"
age = 25
salary = 15000.5

# Method 1: % formatting (deprecated)
result1 = "Name: %s, Age: %d, Salary: %.2f" % (name, age, salary)

# Method 2: format() method (good compatibility)
result2 = "Name: {}, Age: {}, Salary: {:.2f}".format(name, age, salary)

# Method 3: f-string (Python 3.6+, recommended)
result3 = f"Name: {name}, Age: {age}, Salary: {salary:.2f}"

print(result3)
# Output: Name: 张三, Age: 25, Salary: 15000.50

# Powerful feature of f-string: can directly execute expressions
print(f"Next year's salary: {salary * 1.1:.2f}")  # Output: Next year's salary: 16500.55

# Alignment and padding
numbers = [1, 12, 123, 1234]
for num in numbers:
    print(f"Number: {num:>5}")
# Output:
# Number:    1
# Number:   12
# Number:  123
# Number: 1234

# Base conversion
num = 255
print(f"Decimal: {num}, Hexadecimal: {num:x}, Binary: {num:b}")
# Output: Decimal: 255, Hexadecimal: ff, Binary: 11111111

# Percentage format
rate = 0.8567
print(f"Completion: {rate:.2%}")  # Output: Completion: 85.67%

# Number separator (Python 3.6+)
large_num = 1234567890
print(f"Large number: {large_num:,}")  # Output: Large number: 1,234,567,890

Performance Comparison:

import time

name = "Python"
age = 10

# Compare the performance of three methods
iterations = 1000000

# % formatting
start = time.time()
for _ in range(iterations):
    result = "%s is %d years old" % (name, age)
time1 = time.time() - start

# format() method
start = time.time()
for _ in range(iterations):
    result = "{} is {} years old".format(name, age)
time2 = time.time() - start

# f-string
start = time.time()
for _ in range(iterations):
    result = f"{name} is {age} years old"
time3 = time.time() - start

print(f"% formatting: {time1:.3f}s")
print(f"format(): {time2:.3f}s")
print(f"f-string: {time3:.3f}s")
# Output example: f-string is the fastest, % is the slowest

10. upper() / lower() / title() / swapcase() — Case Conversion

text = "Hello World Python"

# All uppercase
print(text.upper())  # Output: HELLO WORLD PYTHON

# All lowercase
print(text.lower())  # Output: hello world python

# Title case (first letter capitalized)
print(text.title())  # Output: Hello World Python

# Swap case
print(text.swapcase())  # Output: hELLO wORLD pYTHON

# capitalize(): first letter capitalized, others lowercase
print(text.capitalize())  # Output: Hello world python

# Practical application 1: normalizing user input
user_email = input("Please enter your email:").strip().lower()
# Prevent issues caused by case differences

# Practical application 2: generating URL slug
def slugify(text):
    """Convert text to a URL-safe format"""
    return text.lower().replace(" ", "-")

print(slugify("Hello World Python"))  # Output: hello-world-python

# Practical application 3: checking password complexity
def check_password_strength(password):
    has_upper = any(c.isupper() for c in password)
    has_lower = any(c.islower() for c in password)
    has_digit = any(c.isdigit() for c in password)
    return len(password) >= 8 and has_upper and has_lower and has_digit

print(check_password_strength("Secure123"))  # Output: True

11. isdigit() / isalpha() / isalnum() — Character Validation

# Check if all are digits
print("12345".isdigit())  # Output: True
print("123a5".isdigit())  # Output: False

# Check if all are letters
print("hello".isalpha())  # Output: True
print("hello123".isalpha())  # Output: False

# Check if all are letters or digits
print("hello123".isalnum())  # Output: True
print("hello-123".isalnum())  # Output: False

# Check if all are spaces
print("   ".isspace())  # Output: True

# Check if valid identifier (variable name)
print("var_name".isidentifier())  # Output: True
print("123var".isidentifier())  # Output: False

# Check if all uppercase/lowercase
print("HELLO".isupper())  # Output: True
print("hello".islower())  # Output: True

# Practical application 1: validating user input
def validate_username(username):
    if len(username) < 3 or len(username) > 20:
        return False, "Username length must be between 3-20 characters"
    if not username[0].isalpha():
        return False, "Username must start with a letter"
    if not username.replace("_", "").isalnum():
        return False, "Username can only contain letters, numbers, and underscores"
    return True, "Username is valid"

print(validate_username("user_123"))  # Output: (True, 'Username is valid')
print(validate_username("123user"))   # Output: (False, 'Username must start with a letter')

# Practical application 2: data type recognition
def detect_type(value_str):
    """Recognize the data type represented by the string"""
    if value_str.isdigit():
        return "Integer"
    elif value_str.isalpha():
        return "String"
    elif value_str.isalnum():
        return "Mixed type"
    else:
        return "Other"

print(detect_type("123"))  # Output: Integer

12. zfill() and center() — Padding and Centering

# zfill(): pads with 0 on the left
num_str = "123"
print(num_str.zfill(5))  # Output: 00123

# Practical application 1: generating order number
def generate_order_id(order_num):
    return f"ORD{order_num:0>6d}"

print(generate_order_id(123))  # Output: ORD000123

# center(): centers (pads on both sides)
text = "Python"
print(text.center(15))  # Output: "    Python     "
print(text.center(15, "*"))  # Output: "****Python*****"

# ljust() and rjust(): left and right align
print(text.ljust(15, "-"))  # Output: Python---------
print(text.rjust(15, "-"))  # Output: ---------Python

# Practical application 2: printing tables
def print_table(rows):
    """Print aligned table"""
    for row in rows:
        print("|".join(cell.center(15) for cell in row))

rows = [
    ["Name", "Age", "City"],
    ["张三", "25", "北京"],
    ["李四", "30", "上海"],
]
print_table(rows)

4. Regular Expressions and Advanced Operations (6 Operations)

13. Basics of Regular Expressions — match() / search() / findall()

import re

# match(): matches from the beginning
text = "Python 3.9"
if re.match(r"Python", text):
    print("Match successful")

# search(): searches throughout the text
if re.search(r"\d+\.\d+", text):
    print("Version number found")

# findall(): finds all matches
emails = "contact us at [email protected] or [email protected]"
found = re.findall(r"\b[\w.-]+@[\w.-]+\.\w+\b", emails)
print(found)
# Output: ['[email protected]', '[email protected]']

# Extracting grouped content
text = "Price: $99.99, Tax: $7.50"
matches = re.findall(r"\$(\d+\.\d+)", text)
print(matches)
# Output: ['99.99', '7.50']

# Practical application 1: extracting phone numbers
def extract_phone_numbers(text):
    """Extract phone numbers from text"""
    pattern = r"\b(?:\+?1[-.\s]?)?\(?([0-9]{3})\)?[-.\s]?([0-9]{3})[-.\s]?([0-9]{4})\b"
    return re.findall(pattern, text)

text = "Call me at 123-456-7890 or (098) 765 4321"
print(extract_phone_numbers(text))

# Practical application 2: extracting URLs
def extract_urls(text):
    """Extract all URLs from text"""
    pattern = r"https?://(?:www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b(?:[-a-zA-Z0-9()@:%_\+.~#?&amp;//=]*)"
    return re.findall(pattern, text)

text = "Visit https://www.example.com or http://test.org for more info"
print(extract_urls(text))

14. sub() and subn() — Regular Replacement

import re

# sub(): replaces all matches
text = "The price is $99.99 and tax is $7.50"
result = re.sub(r"\$(\d+\.\d+)", r"¥\1*7", text)
print(result)
# Output: The price is ¥99.99*7 and tax is ¥7.50*7

# subn(): replaces and returns the number of replacements
text = "apple, apple, apple"
result, count = re.subn(r"apple", "orange", text)
print(f"Replaced {count} times")
print(result)

# Using a function for dynamic replacement
def replace_func(match):
    """Increase price by 10%"""
    price = float(match.group(1))
    return f"${price * 1.1:.2f}"

text = "Item 1: $100, Item 2: $50"
result = re.sub(r"\$(\d+(?:\.\d+)?)", replace_func, text)
print(result)
# Output: Item 1: $110.00, Item 2: $55.00

# Practical application 1: date format conversion
def convert_date_format(text):
    """Convert 2024-01-15 to 15/01/2024"""
    pattern = r"(\d{4})-(\d{2})-(\d{2})"
    return re.sub(pattern, r"\3/\2/\1", text)

print(convert_date_format("Today is 2024-01-15"))
# Output: Today is 15/01/2024

# Practical application 2: removing HTML tags
def remove_html_tags(text):
    """Extract plain text from HTML"""
    return re.sub(r"&lt;[^&gt;]+&gt;", "", text)

html = "&lt;p&gt;Hello &lt;b&gt;World&lt;/b&gt;&lt;/p&gt;"
print(remove_html_tags(html))
# Output: Hello World

15. compile() — Precompiling Regular Expressions (Performance Optimization)

import re

# ❌ Bad practice (compiles every time)
def validate_email_slow(email):
    for _ in range(1000):
        if re.match(r"^[\w\.-]+@[\w\.-]+\.\w+$", email):
            return True
    return False

# ✅ Good practice (compile once)
email_pattern = re.compile(r"^[\w\.-]+@[\w\.-]+\.\w+$")

def validate_email_fast(email):
    for _ in range(1000):
        if email_pattern.match(email):
            return True
    return False

# Performance comparison
import time

email = "[email protected]"

start = time.time()
validate_email_slow(email)
time1 = time.time() - start

start = time.time()
validate_email_fast(email)
time2 = time.time() - start

print(f"Uncompiled: {time1:.4f}s, Compiled: {time2:.4f}s")
# Precompilation is usually 2-3 times faster

# Practical application: creating a validator class
class Validator:
    """Validator using precompiled regular expressions"""
    EMAIL_PATTERN = re.compile(r"^[\w\.-]+@[\w\.-]+\.\w+$")
    PHONE_PATTERN = re.compile(r"^\d{10,11}$")
    URL_PATTERN = re.compile(r"^https?://")
    
    @classmethod
    def is_valid_email(cls, email):
        return cls.EMAIL_PATTERN.match(email) is not None
    
    @classmethod
    def is_valid_phone(cls, phone):
        return cls.PHONE_PATTERN.match(phone) is not None
    
    @classmethod
    def is_valid_url(cls, url):
        return cls.URL_PATTERN.match(url) is not None

print(Validator.is_valid_email("[email protected]"))  # True
print(Validator.is_valid_phone("13800138000"))  # True
print(Validator.is_valid_url("https://example.com"))  # True

16. translate() — Efficient Character Replacement

# Create a translation table
translation_table = str.maketrans("aeiou", "12345")
text = "hello world"
result = text.translate(translation_table)
print(result)
# Output: h2ll4 w4rld

# Remove specified characters
delete_table = str.maketrans("", "", "aeiou")
text = "hello world"
result = text.translate(delete_table)
print(result)
# Output: hll wrld

# Practical application 1: removing punctuation
import string
text = "Hello, World! How are you?"
remove_punctuation = str.maketrans("", "", string.punctuation)
result = text.translate(remove_punctuation)
print(result)
# Output: Hello World How are you

# Practical application 2: numbers to Chinese
chinese_map = str.maketrans("0123456789", "零一二三四五六七八九")
text = "My phone is 13800138000"
result = text.translate(chinese_map)
print(result)
# Output: My phone is 一三八零零一三八零零

# Performance comparison: translate vs replace
import time

text = "hello world" * 10000
iterations = 10000

# Method 1: using replace
start = time.time()
for _ in range(iterations):
    result = text.replace("o", "0").replace("e", "3")
time1 = time.time() - start

# Method 2: using translate
trans_table = str.maketrans("oe", "03")
start = time.time()
for _ in range(iterations):
    result = text.translate(trans_table)
time2 = time.time() - start

print(f"replace method: {time1:.4f}s, translate method: {time2:.4f}s")
# translate is usually 3-5 times faster

17. expandtabs() — Handling Tabs

# Convert tabs to spaces
text = "name\tage\tcity\nJohn\t25\tNYC"
print(text.expandtabs(15))
# Output aligned table

# Practical application: handling indentation in log files
log_text = "Error:\t\tConnection failed\nWarning:\t\tMemory high"
formatted = log_text.expandtabs(20)
print(formatted)

# Get the position of tabs
text = "Line1\tColumn1\nLine2\tColumn2"
print(text.expandtabs(10))

18. encode() and decode() — Character Encoding Conversion

# Encoding: string → bytes
text = "Hello 世界 🌍"

# Encode to UTF-8
encoded_utf8 = text.encode("utf-8")
print(encoded_utf8)
# Output: b'Hello \xe4\xb8\x96\xe7\x95\x8c \xf0\x9f\x8c\x8d'

# Encode to GB2312 (Simplified Chinese)
encoded_gb = text.encode("gb2312", errors="ignore")
print(encoded_gb)

# Decoding: bytes → string
decoded = encoded_utf8.decode("utf-8")
print(decoded)
# Output: Hello 世界 🌍

# Handling encoding errors
text = "测试"
try:
    # Attempt to encode with ASCII (will fail)
    encoded = text.encode("ascii")
except UnicodeEncodeError as e:
    print(f"Encoding error: {e}")

# Using error handling strategies
# 'strict': raises an error for unencodable characters (default)
# 'ignore': ignores unencodable characters
# 'replace': replaces unencodable characters with ?
# 'xmlcharrefreplace': replaces with XML character references

text = "Hello 世界"
print(text.encode("ascii", errors="ignore"))
# Output: b'Hello '

print(text.encode("ascii", errors="replace"))
# Output: b'Hello ?'

print(text.encode("ascii", errors="xmlcharrefreplace"))
# Output: b'Hello &amp;#19990;&amp;#30028;'

# Practical application 1: handling file encoding issues
def safe_read_file(filepath):
    """Safely read a file, automatically handling encoding issues"""
    encodings = ["utf-8", "gbk", "gb2312", "ascii"]
    for encoding in encodings:
        try:
            with open(filepath, "r", encoding=encoding) as f:
                return f.read()
        except (UnicodeDecodeError, UnicodeEncodeError):
            continue
    raise ValueError("Unable to read file, encoding unknown")

# Practical application 2: handling network data
import json
json_str = '{"name":"张三","age":25}'
json_bytes = json_str.encode("utf-8")
decoded_str = json_bytes.decode("utf-8")
data = json.loads(decoded_str)
print(data)

19. Advanced Usage of ljust() / rjust() / center()

# Basic usage
text = "Python"
print(text.ljust(15, "-"))  # Output: Python---------
print(text.rjust(15, "-"))  # Output: ---------Python
print(text.center(15, "-")) # Output: ----Python-----

# Practical application 1: creating a progress bar
def progress_bar(percent, width=20):
    """Create a text progress bar"""
    filled = int(width * percent / 100)
    bar = "█" * filled + "░" * (width - filled)
    return f"[{bar}] {percent}%"

for i in range(0, 101, 10):
    print(progress_bar(i))

# Practical application 2: aligned output (like a table)
def print_aligned_table(data):
    """Print aligned table"""
    # Calculate the maximum width of each column
    max_widths = [max(len(str(row[i])) for row in data) 
                  for i in range(len(data[0]))]
    
    for row in data:
        aligned_row = [str(cell).ljust(width) 
                       for cell, width in zip(row, max_widths)]
        print(" | ".join(aligned_row))

data = [
    ["Name", "Age", "City"],
    ["张三", "25", "北京"],
    ["李四的昵称", "30", "上海"],
]
print_aligned_table(data)

20. casefold() — Aggressive Case Folding

# casefold(): more aggressive lowercase conversion
# Suitable for international characters and different languages

text = "ß"  # German letter
print(text.lower())    # Output: ß (unchanged)
print(text.casefold()) # Output: ss (converted to two s)

# Practical application 1: case-insensitive string comparison
def case_insensitive_compare(str1, str2):
    """Case-insensitive comparison (including international characters)"""
    return str1.casefold() == str2.casefold()

print(case_insensitive_compare("Straße", "STRASSE"))  # Output: True
print(case_insensitive_compare("hello", "HELLO"))  # Output: True

# Practical application 2: search functionality
def search_case_insensitive(text, query):
    """Case-insensitive search"""
    return query.casefold() in text.casefold()

print(search_case_insensitive("Hello World", "hello"))  # Output: True
print(search_case_insensitive("Naïve", "naive"))  # Output: True

# Performance comparison: casefold vs lower
import time

text = ("Hello World Python " * 1000).casefold()
query = "world"

iterations = 100000

# Using lower()
start = time.time()
for _ in range(iterations):
    query.lower() in text
time1 = time.time() - start

# Using casefold()
start = time.time()
for _ in range(iterations):
    query.casefold() in text
time2 = time.time() - start

print(f"lower(): {time1:.4f}s, casefold(): {time2:.4f}s")

5. Comprehensive Practice: Complete Data Processing Workflow

Comprehensive Case 1: Parsing and Validating User Data

import re
from collections import defaultdict

def parse_and_validate_user_data(csv_data):
    """Parse and validate CSV formatted user data
    Input format:
    name,email,phone,age
    张三,[email protected],13800138000,25
    李四,[email protected],15900139000,30
    """
    
    lines = csv_data.strip().split("\n")
    headers = [h.strip() for h in lines[0].split(",")]
    
    users = []
    errors = []
    
    for i, line in enumerate(lines[1:], start=2):
        fields = [f.strip() for f in line.split(",")]
        
        if len(fields) != len(headers):
            errors.append(f"Line {i}: Field count mismatch")
            continue
        
        user = dict(zip(headers, fields))
        
        # Validate email
        email_pattern = re.compile(r"^[\w\.-]+@[\w\.-]+\.\w+$")
        if not email_pattern.match(user["email"]):
            errors.append(f"Line {i}: Invalid email format - {user['email']}")
            continue
        
        # Validate phone
        if not user["phone"].isdigit() or len(user["phone"]) != 11:
            errors.append(f"Line {i}: Invalid phone format - {user['phone']}")
            continue
        
        # Validate age
        try:
            age = int(user["age"])
            if not 18 <= age <= 100:
                errors.append(f"Line {i}: Age must be between 18-100")
                continue
        except ValueError:
            errors.append(f"Line {i}: Age must be a number - {user['age']}")
            continue
        
        user["age"] = age
        users.append(user)
    
    return {
        "valid_users": users,
        "errors": errors,
        "summary": f"Success: {len(users)} records, Failed: {len(errors)} records"
    }

# Usage example
csv_data = """
name,email,phone,age
张三,[email protected],13800138000,25
李四,invalid-email,15900139000,30
王五,[email protected],159001390,35
赵六,[email protected],18600136000,120
"""

result = parse_and_validate_user_data(csv_data)
print(result["summary"])
for error in result["errors"]:
    print(f"  ❌ {error}")
for user in result["valid_users"]:
    print(f"  ✅ {user['name']} - {user['email']}")

Comprehensive Case 2: Log Analysis and Statistics

import re
from collections import Counter

def analyze_log_file(log_text):
    """Analyze log files and extract key information
    Log format:
    [2024-01-15 10:30:45] INFO: Server started
    [2024-01-15 10:30:50] ERROR: Connection failed
    """
    
    # Define log pattern
    log_pattern = re.compile(
        r"\[(?P<timestamp>.*?)\]\s+(?P<level>\w+):\s+(?P<message>.*)"
    )
    
    logs = []
    level_count = Counter()
    
    for line in log_text.strip().split("\n"):
        match = log_pattern.match(line)
        if not match:
            continue
        
        log_entry = match.groupdict()
        logs.append(log_entry)
        level_count[log_entry["level"]] += 1
    
    # Find error messages
    errors = [log for log in logs if log["level"] == "ERROR"]
    
    # Statistics
    return {
        "total_logs": len(logs),
        "level_distribution": dict(level_count),
        "errors": errors,
        "error_count": len(errors),
        "error_types": Counter(e["message"].split(":")[0] for e in errors)
    }

# Usage example
log_text = """
[2024-01-15 10:30:45] INFO: Server started
[2024-01-15 10:30:50] ERROR: Connection failed
[2024-01-15 10:31:00] WARNING: Memory usage high
[2024-01-15 10:31:05] ERROR: Connection failed
[2024-01-15 10:31:10] INFO: Request processed
"""

result = analyze_log_file(log_text)
print(f"Total logs: {result['total_logs']}")
print(f"Log level distribution: {result['level_distribution']}")
print(f"Error count: {result['error_count']}")
print(f"Error types: {result['error_types']}")

Comprehensive Case 3: URL Parsing and Cleaning

import re
from urllib.parse import urlparse, parse_qs

def analyze_urls(url_list):
    """Analyze and clean a list of URLs"""
    
    url_pattern = re.compile(
        r"https?://(?:www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b(?:[-a-zA-Z0-9()@:%_\+.~#?&amp;//=]*)"
    )
    
    valid_urls = []
    domains = Counter()
    
    for url in url_list:
        # Extract URL
        url = url.strip()
        if not url_pattern.match(url):
            continue
        
        # Parse URL
        parsed = urlparse(url)
        domain = parsed.netloc.replace("www.", "")
        domains[domain] += 1
        
        # Parse query parameters
        params = parse_qs(parsed.query)
        
        valid_urls.append({
            "url": url,
            "domain": domain,
            "path": parsed.path,
            "params": params
        })
    
    return {
        "total_urls": len(valid_urls),
        "unique_domains": len(domains),
        "top_domains": domains.most_common(5),
        "urls": valid_urls
    }

# Usage example
urls = [
    "https://www.example.com/path?key=value",
    "http://test.org/api/users?id=123&amp;type=admin",
    "invalid-url",
    "https://github.com/repository"
]

result = analyze_urls(urls)
print(f"Valid URLs: {result['total_urls']}")
print(f"Unique domains: {result['unique_domains']}")
print(f"Top domains: {result['top_domains']}")

6. Performance Optimization Summary

Scenario 1: Large Scale String Concatenation

# ❌ Bad (time complexity O(n²))
result = ""
for i in range(10000):
    result += f"Item {i}, "

# ✅ Good (time complexity O(n))
result = ", ".join(f"Item {i}" for i in range(10000))

# Performance improvement: over 100 times

Scenario 2: Multiple Replacement Operations

# ❌ Bad (traverses the string each time)
text = "a" * 1000000
for char in "abcdefg":
    text = text.replace(char, "x")

# ✅ Good (using translate, one traversal)
trans = str.maketrans("abcdefg", "xxxxxxx")
text = text.translate(trans)

# Performance improvement: over 10 times

Scenario 3: Frequent Regular Matching

# ❌ Bad (compiles each time)
import re
for email in emails:
    if re.match(r"^[\w\.-]+@[\w\.-]+\.\w+$", email):
        pass

# ✅ Good (precompile)
pattern = re.compile(r"^[\w\.-]+@[\w\.-]+\.\w+$")
for email in emails:
    if pattern.match(email):
        pass

# Performance improvement: 2-3 times

7. Quick Reference for 20 Operations

Index Operation Usage Complexity Commonality
1 find() / index() Finding Substrings O(n*m) ⭐⭐⭐⭐⭐
2 replace() Replacing Substrings O(n*m) ⭐⭐⭐⭐⭐
3 count() Counting Occurrences O(n) ⭐⭐⭐⭐
4 startswith/endswith Prefix and Suffix Checking O(m) ⭐⭐⭐⭐⭐
5 strip() Removing Whitespace O(n) ⭐⭐⭐⭐⭐
6 split() Splitting Strings O(n) ⭐⭐⭐⭐⭐
7 join() Joining Strings O(n) ⭐⭐⭐⭐⭐
8 partition() Three-Way Split O(n) ⭐⭐⭐
9 format / f-string String Formatting O(n) ⭐⭐⭐⭐⭐
10 upper/lower/title Case Conversion O(n) ⭐⭐⭐⭐
11 isdigit/isalpha Character Validation O(n) ⭐⭐⭐⭐
12 zfill / center Padding and Centering O(n) ⭐⭐⭐
13 match / search Regular Matching O(n*m) ⭐⭐⭐⭐⭐
14 findall Finding All Matches O(n*m) ⭐⭐⭐⭐⭐
15 sub / subn Regular Replacement O(n*m) ⭐⭐⭐⭐⭐
16 compile Precompiled Regex O(m) ⭐⭐⭐⭐
17 translate Character Mapping O(n) ⭐⭐⭐
18 expandtabs Tab Handling O(n)
19 encode / decode Encoding Conversion O(n) ⭐⭐⭐⭐
20 casefold Aggressive Lowercase O(n) ⭐⭐

8. Best Practice Recommendations

✅ Do These Things

  1. Use f-string — The latest, fastest, and most readable
  2. Use join() for concatenation — Never use + to concatenate multiple strings
  3. Precompile regex — Must precompile for frequent matches
  4. Use strip() — Clean user input data
  5. Choose appropriate validation methods — isdigit, isalpha, etc.
  6. Use translate — Most efficient for large-scale character replacements
  7. Standardize encoding — Preferably use UTF-8
  8. Validate input — Always validate external input

Summary

These 20 string operations cover 95% of practical application scenarios in Python. The key is to understand:

  1. Basic Operations (1-5): are the foundation of all string processing
  2. Efficient Operations (6-7): join and split are key to performance
  3. Validation Operations (11): ensure data quality
  4. Regular Expressions (13-16): powerful tools for handling complex matches
  5. Performance Optimization (translate, compile): essential for handling large-scale data

Leave a Comment