The core functionality of NumPy (Numerical Python) is specifically designed to handle multidimensional arrays, but its capabilities extend far beyond that. As the foundational library for scientific computing in Python, it centers around N-dimensional arrays (ndarray), deriving a full range of functionalities covering mathematical operations, linear algebra, data preprocessing, and more.
1. ndarray: NumPy’s Exclusive Weapon
-
Ultimate Optimization of Multidimensional Arrays The ndarray in NumPy is a homogeneous, contiguous memory storage structure for multidimensional arrays, which, compared to Python’s native lists:• High memory efficiency: Elements are stored directly as values rather than object pointers, resulting in memory usage that is only 1/3 of that of lists for the same amount of data.• Fast computation speed: Based on optimizations at the C language level, vectorized operations eliminate the need for loops, achieving speed improvements of over 50 times. For example, calculating the mean of 100,000 data points takes minutes with lists but only milliseconds with NumPy.• Flexible dimension expansion: Supports flexible operations from one-dimensional vectors to N-dimensional tensors (such as three-dimensional medical images and four-dimensional meteorological data).
-
Scientific Design of Memory and Computation• Contiguous memory layout: Element addresses are contiguous, enhancing access efficiency in conjunction with CPU caching mechanisms.• Data type control: Allows specification of precision (e.g.,
<span>float32</span>
to save memory), complex numbers, timestamps, and other types.• View and copy mechanisms: Shared memory through<span>view()</span>
reduces copying overhead, while<span>copy()</span>
ensures safe independent operations.
2. Beyond Arrays: Extended Capabilities
-
Mathematical Operation Ecosystem• Vectorized operations: Directly perform addition, subtraction, multiplication, division, trigonometric functions, exponentials, and logarithms on arrays, avoiding loops.• Broadcasting mechanism: Automatically aligns dimensions of arrays with different shapes for operations (e.g., matrix + scalar).• Linear algebra library: Built-in advanced operations such as matrix multiplication (
<span>np.dot</span>
), inversion (<span>np.linalg.inv</span>
), and eigenvalue decomposition (<span>np.linalg.eig</span>
). -
Data Science Infrastructure• Statistical functions: Mean, variance, percentiles, histogram statistics, etc.• Random number generation: Supports over 20 probability models, including normal distribution and Poisson distribution, for Monte Carlo simulations.• File interaction: Efficiently read and write binary/text files (e.g.,
<span>.npy</span>
format is 60 times faster than CSV). -
Seamless Integration with Other Libraries• Pandas underlying engine: DataFrame’s internal data storage relies on NumPy arrays.• Machine learning support: Input data formats for libraries like Scikit-learn and TensorFlow are compatible with ndarray.
For example, processing 100,000 floating-point numbers:
Metric | Python List | NumPy Array |
---|---|---|
Memory Usage | ~1.6MB | ~0.8MB |
Sum Speed | 15ms (loop implementation) | 0.02ms (vectorized) |
Code Simplicity | Requires loop | Direct call to<span>np.sum()</span> |
3. Core Features of NumPy Data Types
-
Type System Aligned with C Language NumPy’s data types (
<span>dtype</span>
) are highly compatible with C language, providing precise numerical ranges and memory control.• Integer Types: Divided into signed (<span>int8</span>
,<span>int16</span>
, etc.) and unsigned (<span>uint8</span>
,<span>uint16</span>
, etc.), for example,<span>int8</span>
represents an 8-bit integer (-128 to 127), while<span>uint16</span>
represents an unsigned integer from 0 to 65535.• Floating Point Types: Includes
<span>float16</span>
(half precision),<span>float32</span>
(single precision),<span>float64</span>
(double precision), with increasing precision.• Complex Types: Such as
<span>complex64</span>
(real and imaginary parts represented by 32-bit floats), and<span>complex128</span>
(64-bit double precision).• Boolean Type:
<span>bool_</span>
, stores only<span>True</span>
or<span>False</span>
, occupying 1 byte. -
Type Naming Convention Type names consist of the type name and bit length, for example,
<span>int32</span>
represents a 32-bit integer, and<span>float64</span>
represents a 64-bit double precision float.• Abbreviated Forms:<span>float_</span>
is equivalent to<span>float64</span>
, and<span>complex_</span>
is equivalent to<span>complex128</span>
. -
Viewing Data Types Use the array’s
<span>dtype</span>
attribute to obtain the type object, and further use<span>.name</span>
to get the type name:arr = np.array([1, 2, 3], dtype=np.int8) print(arr.dtype.name) # Output: 'int8'
-
Type Conversion Use the
<span>astype()</span>
method to explicitly convert data types:int_arr = np.array([1, 2, 3], dtype=np.int32) float_arr = int_arr.astype(np.float64) # Convert to double precision float
• Note: When converting from float to integer, the decimal is truncated directly (e.g., 3.9 → 3).
-
String and Numeric Conversion If the string content is numeric, it can be converted to a numeric type:
str_arr = np.array(['123', '456']) int_arr = str_arr.astype(np.int32) # Output: [123, 456]
-
Defining Composite Types Use
<span>dtype</span>
to define structured types containing multiple fields, suitable for handling tabular data:dt = np.dtype([('name', 'S20'), ('age', 'i1'), ('score', 'f4')]) students = np.array([('Alice', 20, 90.5)], dtype=dt)
• Field Access:
<span>students['age']</span>
retrieves all age columns. -
Byte Order and Alignment• Byte Order Mark:
<span>'<'</span>
indicates little-endian (low byte first), and<span>'>'</span>
indicates big-endian (high byte first). • Memory Alignment: Optimize the memory layout of structures through<span>align=True</span>
to enhance access speed.
4. Memory Optimization Techniques
-
Selecting Appropriate Data Types• Saving Memory: For example, using
<span>int8</span>
instead of<span>int32</span>
can reduce memory usage by 75%, and using<span>float32</span>
instead of<span>float64</span>
can reduce memory usage by 50%.• Matching Numeric Ranges: Choose the smallest bit-width type based on the data range (e.g., use
<span>uint8</span>
for data ranging from 0-255). -
Avoiding Redundant Copies Use
<span>np.asarray()</span>
or<span>astype(copy=False)</span>
to convert input data (lists, tuples, other arrays, etc.) into NumPy arrays, reducing data copying overhead when the input data is already an array. -
Array Preallocation and Batch Operations• Preallocate fixed-size arrays (e.g.,
<span>np.empty(shape)</span>
) to avoid dynamic resizing.• Utilize vectorized operations to replace loops, reducing the generation of temporary variables.
5. Application Scenario Examples
• Image Processing: Use<span>uint8</span>
to store pixel values (0-255) to save memory.
• Scientific Computing: <span>float32</span>
meets most computational needs, balancing precision and performance.
• Machine Learning: Structured types handle feature datasets (e.g., mixed numeric and string data).
6. Common Functions
-
Array Creation and Operations
•
<span>np.array()</span>
: Create arrays from lists/tuples, supporting multidimensional data.•<span>np.zeros()</span>
/<span>np.ones()</span>
: Generate arrays of zeros or ones, such as<span>np.zeros((3,3))</span>
.•<span>np.arange()</span>
: Generate arithmetic sequences, such as<span>np.arange(0,10,2)</span>
generating<span>[0,2,4,6,8]</span>
.•<span>np.linspace()</span>
: Generate evenly spaced arrays, such as<span>np.linspace(0,1,5)</span>
generating<span>[0.0, 0.25, 0.5, 0.75, 1.0]</span>
. -
Array Operations•
<span>reshape()</span>
: Change the shape of an array, such as<span>arr.reshape(2,3)</span>
.•<span>concatenate()</span>
/<span>vstack()</span>
/<span>hstack()</span>
: Array concatenation, supporting horizontal or vertical merging.•<span>split()</span>
/<span>vsplit()</span>
/<span>hsplit()</span>
: Split arrays at specified positions or evenly. -
Basic Operations• Element-wise operations:
<span>np.add()</span>
,<span>np.subtract()</span>
,<span>np.multiply()</span>
,<span>np.divide()</span>
.• Dot product and matrix multiplication:<span>np.dot()</span>
computes the dot product, and<span>np.matmul()</span>
handles matrix multiplication.• Trigonometric functions:<span>np.sin()</span>
,<span>np.cos()</span>
,<span>np.tan()</span>
, noting the conversion from degrees to radians (e.g.,<span>np.deg2rad(30)</span>
). -
Statistical Functions•
<span>np.sum()</span>
/<span>np.mean()</span>
: Sum and mean, supporting axis specification (e.g.,<span>axis=0</span>
for column-wise calculations).•<span>np.std()</span>
/<span>np.var()</span>
: Calculate standard deviation and variance, supporting adjustment of degrees of freedom (<span>ddof</span>
parameter).•<span>np.max()</span>
/<span>np.min()</span>
: Extreme value queries,<span>np.argmax()</span>
/<span>np.argmin()</span>
return indices. -
Exponential and Logarithm•
<span>np.exp()</span>
: Calculate natural exponent (e^x), such as<span>np.exp(2)=7.389</span>
.•<span>np.log()</span>
/<span>np.log10()</span>
: Natural logarithm and logarithm base 10. -
Random Arrays•
<span>np.random.rand()</span>
: Generate arrays with uniform distribution in [0,1), such as<span>np.random.rand(3,3)</span>
.•<span>np.random.randn()</span>
: Generate random numbers from a standard normal distribution.•<span>np.random.randint()</span>
: Generate integers within a specified range, such as<span>np.random.randint(1,7,10)</span>
generating 10 dice rolls. -
Random Operations•
<span>np.random.shuffle()</span>
: Shuffle the order of an array.•<span>np.random.choice()</span>
: Randomly sample from an array. -
Linear Algebra•
<span>np.linalg.inv()</span>
: Matrix inversion.•<span>np.linalg.det()</span>
: Calculate the determinant of a matrix.•<span>np.linalg.eig()</span>
: Solve for eigenvalues and eigenvectors. -
Conditional Filtering•
<span>np.where()</span>
: Return elements based on conditions, such as<span>np.where(arr>5, arr, 0)</span>
keeping values greater than 5 and setting others to zero. -
File Operations•
<span>np.save()</span>
/<span>np.load()</span>
: Save and load binary array files (.npy).•<span>np.savetxt()</span>
/<span>np.loadtxt()</span>
: Handle text format data.
NumPy achieves a balance between memory efficiency and computational performance through fine control of data types. In development, selecting types based on data characteristics can significantly optimize large data processing workflows.