NumPy
Structured Arrays

Structured Arrays in NumPy

Structured arrays are arrays whose datatype is a composition of simpler datatypes organized as a sequence of named fields. In other words, structured arrays provide efficient storage for compound, heterogeneous data. Unlike regular arrays, which store homogeneous data, structured arrays enable you to store and manipulate data of various types in a single array.

Creating a Structured Array

You can create a structured array using a list of tuples as the datatype. Each tuple consists of a string (the name of the field) and another string (the data type of the field). Then you create a structured array by providing this datatype as argument.

import numpy as np
 
# Define a data type with fields
data_type = np.dtype([('name', 'U10'), ('age', 'i4'), ('weight', 'f8')])
 
# Create a structured array
data = np.array([('John', 15, 55.5), ('Anna', 25, 62.1), ('Peter', 12, 48.3)], dtype=data_type)
 
print(data)
# [('John', 15, 55.5) ('Anna', 25, 62.1) ('Peter', 12, 48.3)]

In this example, U10 represents a Unicode string of maximum length 10, i4 represents a 4-byte (32-bit) integer, and f8 represents an 8-byte (64-bit) float.

Accessing Structured Arrays

You can access the fields in a structured array by index or by attribute (name), just like you would with dictionary keys.

# Get the names
names = data['name']
print(names)  # Output: ['John' 'Anna' 'Peter']
 
# Get the first row of the array
first_row = data[0]
print(first_row)  # Output: ('John', 15, 55.5)
 
# Get the name from the last row
last_name = data[-1]['name']
print(last_name)  # Output: 'Peter'

Modifying Structured Arrays

You can modify the values in a structured array by providing index of array along with the field name that you would like to modify. You can also modify the values directly through field name. See the examples below:

# Change the age of the last person
data[-1]['age'] = 13
 
print(data['age'])  # Output: [15 25 13]
 
# Adding +1 to all values of age
data['age'] +=1
print(data['age']) # Output: [16 26 14]

Filtering and Querying Structured Arrays

You can use Boolean indexing to filter and query structured arrays based on specific criteria.

# Filtering based on age
filtered_data = data[data['age'] > 25]
 
print("Filtered Data:")
print(filtered_data)
# Output: Filtered Data: [('John', 16, 55.5) ('Anna', 26, 62.1)]

Sorting Structured Arrays:

Structured arrays can be sorted based on the values of a specific field.

# Sorting by age
sorted_data = np.sort(data, order='age')
 
print("Sorted Data:")
print(sorted_data)
# Output: 
# Sorted Data: [('Peter', 14, 48.3) ('John', 16, 55.5) ('Anna', 26, 62.1)]

Record Arrays

NumPy also provides the numpy.recarray class, which is almost identical to structured arrays but with one additional feature: fields can be accessed as attributes rather than as dictionary keys.

# Create a record array
data_rec = data.view(np.recarray)
 
# Access fields as attributes
print(data_rec.age)  # Output: [16 26 14]

Benefits and Applications:

  1. Heterogeneous Data: Structured arrays allow you to store and work with heterogeneous data, such as datasets containing various data types like strings, integers, and floats.

  2. Organized Data: You can organize data with multiple attributes into a single array, making it easier to manage and process.

  3. Database-Like Operations: Structured arrays offer functionalities similar to database tables, enabling filtering, querying, and sorting of data.

  4. Memory Efficiency: Unlike Python lists of dictionaries, structured arrays are memory-efficient due to the underlying NumPy array structure.

Structured arrays in NumPy are useful when you need to perform operations on data that can be thought of as a table of elements, where each element has different fields that can be of different data types.

The benefits of using structured arrays include:

  1. Efficiency: Structured arrays allow you to store and manipulate data in a way that's more memory-efficient than using Python's native data structures.

  2. Convenience: You can access and modify the data in a structured array using the field names, which can make your code easier to write and understand.

  3. Integration with NumPy: Since structured arrays are part of NumPy, you can use them with other NumPy functions and methods, and take advantage of NumPy's capabilities for numerical computing.

Comparing to Python dictionaries:

  • Similarities: Both structured arrays and dictionaries can store heterogeneous data, and both allow access to their elements using keys (field names for structured arrays).

  • Differences: The main difference is that structured arrays are part of the NumPy library and are designed for numerical computing, so they are more memory-efficient and faster for large datasets. On the other hand, dictionaries are a built-in Python data type and are more flexible because they can store any Python object. Also, dictionaries are better for when the data is sparse or when you don't know the structure of the data in advance.

In conclusion, if you're working with large, structured numerical data and need to perform mathematical operations on it, structured arrays are likely a better choice. If you're working with smaller, more flexible or sparse data, or data that includes non-numerical Python objects, dictionaries may be more suitable.