Memory Mapping in NumPy
Memory mapping is a method used to store and manipulate large datasets that can be used as an efficient way to deal with large datasets.
What is Memory Mapping?
Memory mapping is a technique where a file or a portion of it is treated as an array and can be accessed as part of the system's memory, instead of loading the entire file into memory. This can be particularly useful when working with larger datasets that don't fit into memory.
Creating a Memory-Mapped Array
To create a memory-mapped array in NumPy, you use the numpy.memmap()
function. You provide the filename, data type, and shape of the array.
Here's an example code that creates a memory-mapped array named mm_array
with a shape of (1000, 1000) and fills it with random floating-point values between 0 and 1. The memory-mapped array is linked to the file 'data.dat', and you can work with this array as if it were a regular NumPy array, even though the data is stored on disk.
import numpy as np
# Create a memory-mapped array
filename = 'data.dat' # The filename of the memory-mapped file
shape = (1000, 1000) # Shape of the array (1000 rows, 1000 columns)
dtype = np.float32 # Data type of the array (32-bit floating point)
mode = 'w+' # Open the file for both reading and writing
mm_array = np.memmap(filename, dtype=dtype, mode=mode, shape=shape)
# Fill the memory-mapped array with data
mm_array[:] = np.random.rand(*shape)
mm_array[:] = np.random.rand(*shape)
: This line fills the entire memory-mapped array with random floating-point values between 0 and 1. The np.random.rand(*shape)
generates an array of random values with the same shape as the memory-mapped array, and then the [:]
notation assigns these values to the memory-mapped array.
Loading and Closing a Memory-Mapped Array
Once you have a memory-mapped array, you can access small segments of a large file on disk, without reading the entire file into memory:
# Open the existing memory-mapped array
filename = 'data.dat' # Use the same filename as when creating the memory-mapped array
shape = (1000, 1000) # Shape of the array (1000 rows, 1000 columns)
dtype = np.float32 # Data type of the array (32-bit floating point)
mode = 'r+' # Open the file for both reading and writing
new_mmap = np.memmap(filename, dtype=dtype, mode=mode, shape=shape)
print(new_mmap)
# Close the memory-mapped array
del mm_array # This closes the memory-mapped array and deallocates resources
Accessing and Manipulating Memory-Mapped Arrays
Memory-mapped arrays are accessed and manipulated like regular NumPy arrays.
# Accessing elements
print("First Element:", mm_array[0, 0])
# Modifying elements
mm_array[0, 0] = 10.0
print("Modified Element:", mm_array[0, 0])
Advantages of Memory Mapping
Memory mapping provides several advantages:
-
Efficiency: Memory mapping allows you to work with large datasets that don't fit into memory, as only parts of the file are loaded into memory as needed.
-
Speed: It can be faster to read and write data directly from disk rather than loading it into memory, especially for larger datasets.
-
Shared Memory: Memory-mapped files can also be used as a shared data source between multiple processes.
Practical Use Cases:
-
Image Processing: Memory mapping is useful for processing large image datasets without loading the entire dataset into memory.
-
Data Analysis: You can efficiently analyze large datasets, like time series or financial data, that don't fit in memory.
-
Machine Learning: Memory mapping enables you to process large training datasets for machine learning models.