This article summarizes the technical key points of Python development environment configuration, performance optimization, web development, and engineering practices, providing comprehensive guidance for efficient development.
Conda Environment Management
Environment Configuration
Initialization Settings
# Conda initialization script
__conda_setup="$('/home/user/miniconda3/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
eval "$__conda_setup"
if [ -f "/home/user/miniconda3/etc/profile.d/conda.sh" ]; then
. "/home/user/miniconda3/etc/profile.d/conda.sh"
else
export PATH="$PATH:/home/user/miniconda3/bin"
fi
unset __conda_setup
Environment Migration and Reconstruction
Common issues when migrating from old miniconda to new anaconda:
InvalidArchiveError:
# Clean conda cache to resolve dependency issues
conda clean -a
Package Conflict Resolution Strategies:
# For example: acpype depends on AmberTools but Amber does not include acpype
# Installing via conda will get another ambertools
# Solution: Use pip to install in the base environment
pip install acpype
Configuration File Settings
conda config --file .condarc --add pkgs_dirs
Environment Variable Configuration
# Example of Python environment path
previous_path = "/home/user/anaconda3/envs/pmx/lib/python3.10/site-packages/pmx/data/mutff"
# Example of Boost library path (for compilation)
boost_path = "/home/user/anaconda3/envs/AMBER22/lib/cmake/Boost-1.78.0/BoostConfig.cmake"
Best Practices for Package Management
PyPI Mirror Configuration
# Temporarily use mirror
pip install -i https://mirrors.zju.edu.cn/pypi/web/simple some-package
# Permanently configure mirror
pip config set global.index-url https://mirrors.zju.edu.cn/pypi/web/simple
Force Reinstalling Packages
pip install --upgrade --force-reinstall <package>
Web Development and Crawling Techniques
Selenium Automation
Basic Selenium Setup
from selenium import webdriver
# Create WebDriver instance
driver = webdriver.Chrome()
Connection Error Handling
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='localhost', port=17823):
Max retries exceeded with url: /session/xxx/url
This error is usually caused by the target machine actively refusing the connection.
Page Scrolling and Interaction
Implementing Page Scrolling
# Method 1: JavaScript execution for scrolling
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
# Method 2: Sending keys to simulate user scrolling
from selenium.webdriver.common.keys import Keys
driver.find_element_by_tag_name('body').send_keys(Keys.PAGE_DOWN)
Element Interaction Exceptions
ElementNotInteractableException
This exception indicates that the element to interact with is not in an interactive state. Possible reasons include:
- The element is hidden
- The element is covered by other elements
- The element has not finished loading
Static vs Dynamic Content Scraping
Static Web Page Data Scraping
You can use the requests library combined with BeautifulSoup to retrieve static web page data. However, if the target web page uses JavaScript to dynamically load content, requests may not be able to retrieve the complete page content, in which case Selenium is more suitable.
Dynamic Content Loading Recognition
If a div element is dynamically loaded via JavaScript, using the requests library may not be able to access this content, as requests can only retrieve the initial static HTML and will not execute JavaScript.
Tool Selection Recommendations
- Beautiful Soup: Suitable for parsing static HTML/XML content, faster
- Selenium: Mainly used for dynamic web interaction and browser automation
Cython Performance Optimization
Cython Compilation and Usage
Cython Compilation Command
python setup.py build_ext
Cython Usage Recommendations
Consider using Cython to optimize some simple Python projects. However, in very complex scenarios, certain syntax features may not be supported, which could lead to unavoidable pitfalls.
Cross-Platform Compilation
Windows and Linux need to compile separately, then copy the compilation results to the target environment.
Data Processing and File Operations
String Processing Techniques
Replacing Substrings in Bytes Strings
# Replace substring in bytes string
byte_string = byte_string.replace(b"<br/>", b"\n\n")
Number String Check
s1 = "12345"
# Use built-in methods to check if the string is a number
s1.isdigit() # Check if it is a number
s1.isnumeric() # Check if it is a numeric value
CSV File Processing
Writing to CSV Files
import csv
# Use Python's standard library csv module to write to CSV files
with open('output.csv', 'w', newline='') as csvfile:
writer = csv.writer(csvfile)
writer.writerow(['Column1', 'Column2', 'Column3'])
writer.writerow(['Data1', 'Data2', 'Data3'])
File Moving Operations
Python file moving tutorial: https://www.learndatasci.com/solutions/python-move-file/
Python Language Features
Conditional Expressions
Python does not have a direct question mark statement (like the condition ? expression1 : expression2 in C), but there is an equivalent conditional expression.
result = value1 if condition else value2
# This is equivalent to the ternary conditional operator in other languages
Calling External Programs
import subprocess
# Call external programs (like antechamber) in Python
def call_antechamber(input_file, output_file):
cmd = f"antechamber -i {input_file} -o {output_file}"
result = subprocess.run(cmd, shell=True, capture_output=True, text=True)
return result
Using Exit Functions
Exit Function Error
# Error: NameError: name 'exit' is not defined
exit()
# Correct: Need to import sys module
import sys
sys.exit()
Scope Issues
Simply importing the sys module is not enough to make exit enter the global scope; you need to explicitly use sys.exit().
JSON Data Processing
import json
# Standard method to load JSON data
with open('data.json', 'r') as f:
data = json.load(f)
Environment Configuration Optimization
Cleaning PATH Environment Variables
# Clean duplicate PATH entries
export PATH=$(echo -n $PATH | awk -v RS=: -v ORS=: '!($0 in a) {a[$0]; print}' | sed 's/:$//')
Subprocess Configuration
# subprocess.Popen defaults to using /bin/sh
# To use bash, set the executable parameter
subprocess.Popen(..., executable='/bin/bash')
Using bash with Python subprocess: https://www.saltycrane.com/blog/2011/04/how-use-bash-shell-python-subprocess-instead-binsh/
Proxy Configuration
# Set HTTP proxy
export http_proxy="http://127.0.0.1:7890"
Development Tool Integration
Calling External Programs in Python
import subprocess
# Standard method to call external programs
def run_external_command(command):
result = subprocess.run(command, shell=True, capture_output=True, text=True)
return result.stdout, result.stderr
Package Management Integration
Use subprocess to call system package managers:
# Call external tools like antechamber
def call_antechamber(input_file, output_file):
cmd = f"antechamber -i {input_file} -o {output_file}"
result = subprocess.run(cmd, shell=True, capture_output=True, text=True)
return result
PyCharm Environment Issues: PyCharm itself is a code editor (IDE), not a web browser. Therefore, it cannot directly “open” and render the page content of localhost:8501 like Chrome or Edge. Port forwarding is recommended.
Related Learning Resources
Python Packaging
Scientific Python packaging guide: https://learn.scientific-python.org/development/guides/packaging-simple/
Troubleshooting and Best Practices
Common Error Patterns
- Environment Conflicts: Incompatible package versions in different conda environments
- Connection Errors: Network connection issues in web crawlers
- Compilation Issues: Cython cross-platform compilation differences
- Character Encoding: Improper handling of bytes and str
Debugging Suggestions
- Isolate testing environment conflicts
- Use virtual environments to avoid dependency pollution
- Document complete compilation configurations
- Be aware of cross-platform compatibility issues
Development Environment Checklist
- Python Version: Ensure version compatibility
- Dependency Management: Use requirements.txt or environment.yml
- Virtual Environments: Create separate environments for each project
- Code Quality: Use linter and formatter tools
- Performance Monitoring: Regularly conduct performance analysis
This article is based on development practices from September 2023 to the first half of 2024, covering practical technical points for Python engineering and development environment configuration.