Indexing and Slicing in NumPy
In this post, we will cover one of the core concepts you need to grasp in NumPy , indexing and slicing, which allow you to access and manipulate specific elements or subarrays within your arrays.
Indexing in NumPy
Indexing in NumPy is similar to indexing in Python lists, but it extends naturally to multi-dimensional arrays. Positive indexing starts at 0 for the first element, and you can use square brackets to access individual elements. Negative indexing allows you to access elements of an array counting from the end, making it a handy feature for quickly accessing elements from the back of the array without the need to calculate the exact index.
import numpy as np
# Create a 1D NumPy array
arr_1d = np.array([10, 20, 30, 40, 50])
# Accessing individual elements
print(arr_1d[0]) # Output: 10
print(arr_1d[3]) # Output: 40
print(arr_1d[-1]) # Output: 50
print(arr_1d[-2]) # Output: 40
In Multidimensional Arrays
In a multidimensional array, you address each dimension by a separate bracket.
Let's create a 2D array and access a specific element:
# Creating a 2D array
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
# Accessing an element at row index 1 and column index 2
print(arr[1, 2])
# Output: 6
In the above example, arr[1, 2]
refers to the element in the second row (index 1) and the third column (index 2), which is number 6
Slicing in NumPy
Slicing allows you to extract a portion of an array by specifying a range of indices.
The syntax for slicing is start:stop:step
, where start
is the index to start the slice, stop
is the index to stop before, and step
is the step size between elements.
# Creating a 1D array
arr_1d = np.array([10, 20, 30, 40, 50, 60, 70, 80, 90, 100])
# Basic slicing
print(arr_1d[2:6]) # Output: [30 40 50 60]
print(arr_1d[:5]) # Output: [10 20 30 40 50]
print(arr_1d[5:]) # Output: [ 60 70 80 90 100]
# Slicing with step
print(arr_1d[1::2]) # Output: [ 20 40 60 80 100]
- The slice
[2:6]
means to start at the third element (index 2) and end at the sixth element (index 5). The step value is not specified, so it defaults to 1. This means that the subarray will contain the elements 30, 40, 50, and 60. - the slice
[:5]
means to start at the beginning of the array, first element (index 0) and end at the fifth element (index 4). The step value is not specified, so it defaults to 1. This means that the subarray will contain the elements 10, 20, 30, 40, and 50. - the slice
[5:]
means to start at the sixth element (index 5) and end at the end of the array. The step value is not specified, so it defaults to 1. This means that the subarray will contain the elements 60, 70, 80, 90, and 100. - the slice
[1::2]
means to start at the second element (index 1) and end at the end of the array(as no explicit end value defined), skipping every other element(step size of 2). This means that the subarray will contain the elements 30, 50, 70, 90.
In Multidimensional Arrays
Slicing in multidimensional arrays is similar to regular slicing in Python, but with multiple dimensions, you can slice along each dimension independently.
# Creating a 2D array
arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
# Slicing: getting the first two rows and columns 1 to 3
print(arr[0:2, 1:3])
Output:
[[2 3]
[6 7]]
In the above example, arr[0:2, 1:3]
returns a 2D array that consists of the first two rows (index 0 and 1) and the second and third columns (index 1 and 2) of the original arr
You can also mix integer indexing with slice indexing. This will yield an array of lower rank than the original array.
# Creating a 2D array
arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
# Mixing integer indexing with slices
row_r1 = arr[1, :] # Rank 1 view of the second row of arr
row_r2 = arr[1:2, :] # Rank 2 view of the second row of arr
print(row_r1, row_r1.shape)
print(row_r2, row_r2.shape)
Output:
[5 6 7 8] (4,)
[[5 6 7 8]] (1, 4)
row_r1
is using integer indexing to select the second row of the arrayarr
. Thearr[1, :]
syntax means selecting the row at index 1 (the second row), and the colon:
indicates that we want all columns in that row. This results in a rank 1 view of the selected row. A rank 1 view refers to an array that is treated as a one-dimensional vector, even if it's technically a multi-dimensional array.row_r2
is using slicing to select the second row of the arrayarr
. Thearr[1:2, :]
syntax uses slicing with a start index of 1 and an end index of 2 (exclusive). This results in a rank 2 view of the selected row. A rank 2 view refers to an array that is treated as a two-dimensional matrix with rows and columns. This is the more standard way of dealing with arrays in NumPy.
Boolean Indexing in NumPy
Boolean indexing in NumPy is a powerful technique that allows you to select elements from an array based on a certain condition. It involves using a boolean (True/False) array of the same shape as the original array to filter the elements. Elements corresponding to True values in the boolean array are selected, while elements corresponding to False values are excluded.
1D Arrays
Let's start with a one-dimensional array:
# Creating a 1D array
arr = np.array([1, 2, 3, 4, 5])
# Creating a boolean array that checks if elements are greater than 2
mask = arr > 2
print(mask)
Output:
[False False True True True]
Now, we can use this boolean array (mask) to index the original array:
print(arr[mask])
# Output: [3 4 5]
2D Arrays
Example 1
# Creating a 2D array
arr = np.array([[1, 2], [3, 4], [5, 6]])
# Creating a boolean array that checks if elements are greater than 2
mask = arr > 2
print(mask)
Output:
[[False False]
[ True True]
[ True True]]
Again, we can use this boolean array to index the original array:
print(arr[mask])
# Output: [3 4 5 6]
Example 2: Now, lets do another example of 2D array, where we are interested to find only those rows, where the sum of its values is greater than 10
arr_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
# Boolean indexing to select rows with sum greater than 10
row_sum_mask = np.sum(arr_2d, axis=1) > 10
print(row_sum_mask)
Output:
[False True True]
Now, we can use this boolean to index the array
selected_rows = arr_2d[row_sum_condition]
print(selected_rows)
# Output:
# [[4 5 6]
# [7 8 9]]
Multi-Dimensional Arrays
Boolean indexing also works in the same way for multi-dimensional arrays. Here is an example with a 3D array:
# Creating a 3D array
arr = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
# Creating a boolean array that checks if elements are greater than 4
mask = arr > 4
print(mask)
Output:
[[[False False]
[False False]]
[[ True True]
[ True True]]]
And again, we can use this boolean array to index the original array:
print(arr[mask])
# Output: [5 6 7 8]
Fancy Indexing
Fancy indexing, also known as integer array indexing, is a method of indexing arrays by using arrays of integers. This method allows you to access and modify complicated patterns of array data in a simple way.
1D Array
Here's an example with a one-dimensional array:
# Creating a 1D array
arr = np.array([10, 20, 30, 40, 50])
# Creating an index array
ind = np.array([3, 1, 4])
# Fancy indexing
print(arr[ind])
# Output: [40 20 50]
In this example, arr[ind]
returns a new array composed of the 4th, 2nd, and 5th elements of arr
.
2D Array
Fancy indexing also works in multiple dimensions. Let's consider a two-dimensional array:
# Creating a 2D array
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
# Creating two index arrays
row_ind = np.array([0, 1, 2])
col_ind = np.array([2, 1, 0])
# Fancy indexing
print(arr[row_ind, col_ind])
# Output: [3 5 7]
In this example, arr[row_ind, col_ind]
returns a 1D array composed of the values of arr
at the positions (0,2)
, (1,1)
, and (2,0)
.
With Slicing
Fancy indexing can be combined with other indexing schemes. For example, you can combine fancy indexing with slicing:
print(arr[1:, [1, 2]])
# Output:
# [[5 6]
# [8 9]]
In this case, arr[1:, [1, 2]]
returns a 2D array that consists of the rows from index 1 to the end (:
) and the columns at indices 1 and 2 of the original arr
.
With Boolean Masks
You can also use boolean masks with fancy indexing:
arr = np.array([10, 20, 30, 40, 50])
mask = np.array([True, False, True, False, True]) # Boolean mask
selected_elements = arr[mask]
print(selected_elements)
# Output: [10 30 50]
Here, the boolean mask selects elements where the corresponding mask value is True
.