Python Development Environment and Engineering Notes for the First Half of 2024

This article summarizes the technical key points of Python development environment configuration, performance optimization, web development, and engineering practices, providing comprehensive guidance for efficient development.

Conda Environment Management

Environment Configuration

Initialization Settings

# Conda initialization script
__conda_setup="$('/home/user/miniconda3/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
eval "$__conda_setup"

if [ -f "/home/user/miniconda3/etc/profile.d/conda.sh" ]; then
    . "/home/user/miniconda3/etc/profile.d/conda.sh"
else
    export PATH="$PATH:/home/user/miniconda3/bin"
fi
unset __conda_setup

Environment Migration and Reconstruction

Common issues when migrating from old miniconda to new anaconda:

InvalidArchiveError:

# Clean conda cache to resolve dependency issues
conda clean -a

Package Conflict Resolution Strategies:

# For example: acpype depends on AmberTools but Amber does not include acpype
# Installing via conda will get another ambertools
# Solution: Use pip to install in the base environment
pip install acpype

Configuration File Settings

conda config --file .condarc --add pkgs_dirs

Environment Variable Configuration

# Example of Python environment path
previous_path = "/home/user/anaconda3/envs/pmx/lib/python3.10/site-packages/pmx/data/mutff"

# Example of Boost library path (for compilation)
boost_path = "/home/user/anaconda3/envs/AMBER22/lib/cmake/Boost-1.78.0/BoostConfig.cmake"

Best Practices for Package Management

PyPI Mirror Configuration

# Temporarily use mirror
pip install -i https://mirrors.zju.edu.cn/pypi/web/simple some-package

# Permanently configure mirror
pip config set global.index-url https://mirrors.zju.edu.cn/pypi/web/simple

Force Reinstalling Packages

pip install --upgrade --force-reinstall <package>

Web Development and Crawling Techniques

Selenium Automation

Basic Selenium Setup

from selenium import webdriver

# Create WebDriver instance
driver = webdriver.Chrome()

Connection Error Handling

urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='localhost', port=17823):
Max retries exceeded with url: /session/xxx/url

This error is usually caused by the target machine actively refusing the connection.

Page Scrolling and Interaction

Implementing Page Scrolling

# Method 1: JavaScript execution for scrolling
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

# Method 2: Sending keys to simulate user scrolling
from selenium.webdriver.common.keys import Keys
driver.find_element_by_tag_name('body').send_keys(Keys.PAGE_DOWN)

Element Interaction Exceptions

ElementNotInteractableException

This exception indicates that the element to interact with is not in an interactive state. Possible reasons include:

  • The element is hidden
  • The element is covered by other elements
  • The element has not finished loading

Static vs Dynamic Content Scraping

Static Web Page Data Scraping

You can use the requests library combined with BeautifulSoup to retrieve static web page data. However, if the target web page uses JavaScript to dynamically load content, requests may not be able to retrieve the complete page content, in which case Selenium is more suitable.

Dynamic Content Loading Recognition

If a div element is dynamically loaded via JavaScript, using the requests library may not be able to access this content, as requests can only retrieve the initial static HTML and will not execute JavaScript.

Tool Selection Recommendations

  • Beautiful Soup: Suitable for parsing static HTML/XML content, faster
  • Selenium: Mainly used for dynamic web interaction and browser automation

Cython Performance Optimization

Cython Compilation and Usage

Cython Compilation Command

python setup.py build_ext

Cython Usage Recommendations

Consider using Cython to optimize some simple Python projects. However, in very complex scenarios, certain syntax features may not be supported, which could lead to unavoidable pitfalls.

Cross-Platform Compilation

Windows and Linux need to compile separately, then copy the compilation results to the target environment.

Data Processing and File Operations

String Processing Techniques

Replacing Substrings in Bytes Strings

# Replace substring in bytes string
byte_string = byte_string.replace(b"&lt;br/&gt;", b"\n\n")

Number String Check

s1 = "12345"
# Use built-in methods to check if the string is a number
s1.isdigit()    # Check if it is a number
s1.isnumeric()  # Check if it is a numeric value

CSV File Processing

Writing to CSV Files

import csv

# Use Python's standard library csv module to write to CSV files
with open('output.csv', 'w', newline='') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(['Column1', 'Column2', 'Column3'])
    writer.writerow(['Data1', 'Data2', 'Data3'])

File Moving Operations

Python file moving tutorial: https://www.learndatasci.com/solutions/python-move-file/

Python Language Features

Conditional Expressions

Python does not have a direct question mark statement (like the condition ? expression1 : expression2 in C), but there is an equivalent conditional expression.

result = value1 if condition else value2
# This is equivalent to the ternary conditional operator in other languages

Calling External Programs

import subprocess

# Call external programs (like antechamber) in Python
def call_antechamber(input_file, output_file):
    cmd = f"antechamber -i {input_file} -o {output_file}"
    result = subprocess.run(cmd, shell=True, capture_output=True, text=True)
    return result

Using Exit Functions

Exit Function Error

# Error: NameError: name 'exit' is not defined
exit()

# Correct: Need to import sys module
import sys
sys.exit()

Scope Issues

Simply importing the sys module is not enough to make exit enter the global scope; you need to explicitly use sys.exit().

JSON Data Processing

import json

# Standard method to load JSON data
with open('data.json', 'r') as f:
    data = json.load(f)

Environment Configuration Optimization

Cleaning PATH Environment Variables

# Clean duplicate PATH entries
export PATH=$(echo -n $PATH | awk -v RS=: -v ORS=: '!($0 in a) {a[$0]; print}' | sed 's/:$//')

Subprocess Configuration

# subprocess.Popen defaults to using /bin/sh
# To use bash, set the executable parameter
subprocess.Popen(..., executable='/bin/bash')

Using bash with Python subprocess: https://www.saltycrane.com/blog/2011/04/how-use-bash-shell-python-subprocess-instead-binsh/

Proxy Configuration

# Set HTTP proxy
export http_proxy="http://127.0.0.1:7890"

Development Tool Integration

Calling External Programs in Python

import subprocess

# Standard method to call external programs
def run_external_command(command):
    result = subprocess.run(command, shell=True, capture_output=True, text=True)
    return result.stdout, result.stderr

Package Management Integration

Use subprocess to call system package managers:

# Call external tools like antechamber
def call_antechamber(input_file, output_file):
    cmd = f"antechamber -i {input_file} -o {output_file}"
    result = subprocess.run(cmd, shell=True, capture_output=True, text=True)
    return result

PyCharm Environment Issues: PyCharm itself is a code editor (IDE), not a web browser. Therefore, it cannot directly “open” and render the page content of localhost:8501 like Chrome or Edge. Port forwarding is recommended.

Related Learning Resources

Python Packaging

Scientific Python packaging guide: https://learn.scientific-python.org/development/guides/packaging-simple/

Troubleshooting and Best Practices

Common Error Patterns

  1. Environment Conflicts: Incompatible package versions in different conda environments
  2. Connection Errors: Network connection issues in web crawlers
  3. Compilation Issues: Cython cross-platform compilation differences
  4. Character Encoding: Improper handling of bytes and str

Debugging Suggestions

  • Isolate testing environment conflicts
  • Use virtual environments to avoid dependency pollution
  • Document complete compilation configurations
  • Be aware of cross-platform compatibility issues

Development Environment Checklist

  1. Python Version: Ensure version compatibility
  2. Dependency Management: Use requirements.txt or environment.yml
  3. Virtual Environments: Create separate environments for each project
  4. Code Quality: Use linter and formatter tools
  5. Performance Monitoring: Regularly conduct performance analysis

This article is based on development practices from September 2023 to the first half of 2024, covering practical technical points for Python engineering and development environment configuration.

Leave a Comment