Understanding NumPy: The Core Library for Numerical Computing in Python

The core functionality of NumPy (Numerical Python) is specifically designed to handle multidimensional arrays, but its capabilities extend far beyond that. As the foundational library for scientific computing in Python, it centers around N-dimensional arrays (ndarray), deriving a full range of functionalities covering mathematical operations, linear algebra, data preprocessing, and more.

1. ndarray: NumPy’s Exclusive Weapon

Ultimate Optimization of Multidimensional Arrays The ndarray in NumPy is a homogeneous, contiguous memory storage structure for multidimensional arrays, which, compared to Python’s native lists:• High memory efficiency: Elements are stored directly as values rather than object pointers, resulting in memory usage that is only 1/3 of that of lists for the same amount of data.• Fast computation speed: Based on optimizations at the C language level, vectorized operations eliminate the need for loops, achieving speed improvements of over 50 times. For example, calculating the mean of 100,000 data points takes minutes with lists but only milliseconds with NumPy.• Flexible dimension expansion: Supports flexible operations from one-dimensional vectors to N-dimensional tensors (such as three-dimensional medical images and four-dimensional meteorological data).
Scientific Design of Memory and Computation• Contiguous memory layout: Element addresses are contiguous, enhancing access efficiency in conjunction with CPU caching mechanisms.• Data type control: Allows specification of precision (e.g.,float32 to save memory), complex numbers, timestamps, and other types.• View and copy mechanisms: Shared memory throughview() reduces copying overhead, whilecopy() ensures safe independent operations.

2. Beyond Arrays: Extended Capabilities

Mathematical Operation Ecosystem• Vectorized operations: Directly perform addition, subtraction, multiplication, division, trigonometric functions, exponentials, and logarithms on arrays, avoiding loops.• Broadcasting mechanism: Automatically aligns dimensions of arrays with different shapes for operations (e.g., matrix + scalar).• Linear algebra library: Built-in advanced operations such as matrix multiplication (np.dot), inversion (np.linalg.inv), and eigenvalue decomposition (np.linalg.eig).
Data Science Infrastructure• Statistical functions: Mean, variance, percentiles, histogram statistics, etc.• Random number generation: Supports over 20 probability models, including normal distribution and Poisson distribution, for Monte Carlo simulations.• File interaction: Efficiently read and write binary/text files (e.g.,.npy format is 60 times faster than CSV).
Seamless Integration with Other Libraries• Pandas underlying engine: DataFrame’s internal data storage relies on NumPy arrays.• Machine learning support: Input data formats for libraries like Scikit-learn and TensorFlow are compatible with ndarray.

For example, processing 100,000 floating-point numbers:

Metric	Python List	NumPy Array
Memory Usage	~1.6MB	~0.8MB
Sum Speed	15ms (loop implementation)	0.02ms (vectorized)
Code Simplicity	Requires loop	Direct call to`<span>np.sum()</span>`

3. Core Features of NumPy Data Types

Type System Aligned with C Language NumPy’s data types (dtype) are highly compatible with C language, providing precise numerical ranges and memory control.• Integer Types: Divided into signed (int8, int16, etc.) and unsigned (uint8, uint16, etc.), for example,int8 represents an 8-bit integer (-128 to 127), whileuint16 represents an unsigned integer from 0 to 65535.

• Floating Point Types: Includesfloat16 (half precision),float32 (single precision),float64 (double precision), with increasing precision.

• Complex Types: Such ascomplex64 (real and imaginary parts represented by 32-bit floats), andcomplex128 (64-bit double precision).

• Boolean Type: bool_, stores onlyTrue or False, occupying 1 byte.
Type Naming Convention Type names consist of the type name and bit length, for example,int32 represents a 32-bit integer, andfloat64 represents a 64-bit double precision float.• Abbreviated Forms: float_ is equivalent tofloat64, andcomplex_ is equivalent tocomplex128.
Viewing Data Types Use the array’sdtype attribute to obtain the type object, and further use.name to get the type name:
```
arr = np.array([1, 2, 3], dtype=np.int8)
print(arr.dtype.name) # Output: 'int8'
```
Type Conversion Use theastype() method to explicitly convert data types:
```
int_arr = np.array([1, 2, 3], dtype=np.int32)
float_arr = int_arr.astype(np.float64) # Convert to double precision float
```
• Note: When converting from float to integer, the decimal is truncated directly (e.g., 3.9 → 3).
String and Numeric Conversion If the string content is numeric, it can be converted to a numeric type:
```
str_arr = np.array(['123', '456'])
int_arr = str_arr.astype(np.int32)  # Output: [123, 456]
```
Defining Composite Types Usedtype to define structured types containing multiple fields, suitable for handling tabular data:
```
dt = np.dtype([('name', 'S20'), ('age', 'i1'), ('score', 'f4')])
students = np.array([('Alice', 20, 90.5)], dtype=dt)
```
• Field Access: students['age'] retrieves all age columns.
Byte Order and Alignment• Byte Order Mark: '<' indicates little-endian (low byte first), and'>' indicates big-endian (high byte first). • Memory Alignment: Optimize the memory layout of structures throughalign=True to enhance access speed.

4. Memory Optimization Techniques

Selecting Appropriate Data Types• Saving Memory: For example, usingint8 instead ofint32 can reduce memory usage by 75%, and usingfloat32 instead offloat64 can reduce memory usage by 50%.

• Matching Numeric Ranges: Choose the smallest bit-width type based on the data range (e.g., useuint8 for data ranging from 0-255).
Avoiding Redundant Copies Usenp.asarray() orastype(copy=False) to convert input data (lists, tuples, other arrays, etc.) into NumPy arrays, reducing data copying overhead when the input data is already an array.
Array Preallocation and Batch Operations• Preallocate fixed-size arrays (e.g.,np.empty(shape)) to avoid dynamic resizing.

• Utilize vectorized operations to replace loops, reducing the generation of temporary variables.

5. Application Scenario Examples

• Image Processing: Useuint8 to store pixel values (0-255) to save memory.

• Scientific Computing: float32 meets most computational needs, balancing precision and performance.

• Machine Learning: Structured types handle feature datasets (e.g., mixed numeric and string data).

6. Common Functions

Array Creation and Operations

• np.array(): Create arrays from lists/tuples, supporting multidimensional data.• np.zeros() / np.ones(): Generate arrays of zeros or ones, such as np.zeros((3,3)).• np.arange(): Generate arithmetic sequences, such as np.arange(0,10,2) generating [0,2,4,6,8].• np.linspace(): Generate evenly spaced arrays, such as np.linspace(0,1,5) generating [0.0, 0.25, 0.5, 0.75, 1.0].
Array Operations• reshape(): Change the shape of an array, such as arr.reshape(2,3).• concatenate() / vstack() / hstack(): Array concatenation, supporting horizontal or vertical merging.• split() / vsplit() / hsplit(): Split arrays at specified positions or evenly.
Basic Operations• Element-wise operations:np.add(), np.subtract(), np.multiply(), np.divide().• Dot product and matrix multiplication:np.dot() computes the dot product, andnp.matmul() handles matrix multiplication.• Trigonometric functions:np.sin(), np.cos(), np.tan(), noting the conversion from degrees to radians (e.g., np.deg2rad(30)).
Statistical Functions• np.sum() / np.mean(): Sum and mean, supporting axis specification (e.g., axis=0 for column-wise calculations).• np.std() / np.var(): Calculate standard deviation and variance, supporting adjustment of degrees of freedom (ddof parameter).• np.max() / np.min(): Extreme value queries,np.argmax() / np.argmin() return indices.
Exponential and Logarithm• np.exp(): Calculate natural exponent (e^x), such as np.exp(2)=7.389.• np.log() / np.log10(): Natural logarithm and logarithm base 10.
Random Arrays• np.random.rand(): Generate arrays with uniform distribution in [0,1), such as np.random.rand(3,3).• np.random.randn(): Generate random numbers from a standard normal distribution.• np.random.randint(): Generate integers within a specified range, such as np.random.randint(1,7,10) generating 10 dice rolls.
Random Operations• np.random.shuffle(): Shuffle the order of an array.• np.random.choice(): Randomly sample from an array.
Linear Algebra• np.linalg.inv(): Matrix inversion.• np.linalg.det(): Calculate the determinant of a matrix.• np.linalg.eig(): Solve for eigenvalues and eigenvectors.
Conditional Filtering• np.where(): Return elements based on conditions, such as np.where(arr>5, arr, 0) keeping values greater than 5 and setting others to zero.
File Operations• np.save() / np.load(): Save and load binary array files (.npy).• np.savetxt() / np.loadtxt(): Handle text format data.

NumPy achieves a balance between memory efficiency and computational performance through fine control of data types. In development, selecting types based on data characteristics can significantly optimize large data processing workflows.