Indexing and Slicing in NumPy

In this post, we will cover one of the core concepts you need to grasp in NumPy , indexing and slicing, which allow you to access and manipulate specific elements or subarrays within your arrays.

Indexing in NumPy

Indexing in NumPy is similar to indexing in Python lists, but it extends naturally to multi-dimensional arrays. Positive indexing starts at 0 for the first element, and you can use square brackets to access individual elements. Negative indexing allows you to access elements of an array counting from the end, making it a handy feature for quickly accessing elements from the back of the array without the need to calculate the exact index.

import numpy as np
 
# Create a 1D NumPy array
arr_1d = np.array([10, 20, 30, 40, 50])
 
# Accessing individual elements
print(arr_1d[0])   # Output: 10
print(arr_1d[3])   # Output: 40
print(arr_1d[-1])  # Output: 50
print(arr_1d[-2])  # Output: 40

In Multidimensional Arrays

In a multidimensional array, you address each dimension by a separate bracket.

Let's create a 2D array and access a specific element:

# Creating a 2D array
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
 
# Accessing an element at row index 1 and column index 2
print(arr[1, 2])
# Output: 6

In the above example, arr[1, 2] refers to the element in the second row (index 1) and the third column (index 2), which is number 6

Slicing in NumPy

Slicing allows you to extract a portion of an array by specifying a range of indices.

The syntax for slicing is start:stop:step, where start is the index to start the slice, stop is the index to stop before, and step is the step size between elements.

# Creating a 1D array
arr_1d = np.array([10, 20, 30, 40, 50, 60, 70, 80, 90, 100])
 
# Basic slicing
print(arr_1d[2:6])    # Output: [30 40 50 60]
print(arr_1d[:5])     # Output: [10 20 30 40 50]
print(arr_1d[5:])     # Output: [ 60  70  80  90 100]
 
# Slicing with step
print(arr_1d[1::2])   # Output: [ 20  40  60  80 100]

The slice [2:6] means to start at the third element (index 2) and end at the sixth element (index 5). The step value is not specified, so it defaults to 1. This means that the subarray will contain the elements 30, 40, 50, and 60.
the slice [:5] means to start at the beginning of the array, first element (index 0) and end at the fifth element (index 4). The step value is not specified, so it defaults to 1. This means that the subarray will contain the elements 10, 20, 30, 40, and 50.
the slice [5:] means to start at the sixth element (index 5) and end at the end of the array. The step value is not specified, so it defaults to 1. This means that the subarray will contain the elements 60, 70, 80, 90, and 100.
the slice [1::2] means to start at the second element (index 1) and end at the end of the array(as no explicit end value defined), skipping every other element(step size of 2). This means that the subarray will contain the elements 30, 50, 70, 90.

In Multidimensional Arrays

Slicing in multidimensional arrays is similar to regular slicing in Python, but with multiple dimensions, you can slice along each dimension independently.

# Creating a 2D array
arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
 
# Slicing: getting the first two rows and columns 1 to 3
print(arr[0:2, 1:3])

Output:

[[2 3]
 [6 7]]

In the above example, arr[0:2, 1:3] returns a 2D array that consists of the first two rows (index 0 and 1) and the second and third columns (index 1 and 2) of the original arr

You can also mix integer indexing with slice indexing. This will yield an array of lower rank than the original array.

# Creating a 2D array
arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
 
# Mixing integer indexing with slices
row_r1 = arr[1, :]    # Rank 1 view of the second row of arr
row_r2 = arr[1:2, :]  # Rank 2 view of the second row of arr
 
print(row_r1, row_r1.shape)
print(row_r2, row_r2.shape)

Output:

[5 6 7 8] (4,)
[[5 6 7 8]] (1, 4)

row_r1 is using integer indexing to select the second row of the array arr. The arr[1, :] syntax means selecting the row at index 1 (the second row), and the colon : indicates that we want all columns in that row. This results in a rank 1 view of the selected row. A rank 1 view refers to an array that is treated as a one-dimensional vector, even if it's technically a multi-dimensional array.
row_r2 is using slicing to select the second row of the array arr. The arr[1:2, :] syntax uses slicing with a start index of 1 and an end index of 2 (exclusive). This results in a rank 2 view of the selected row. A rank 2 view refers to an array that is treated as a two-dimensional matrix with rows and columns. This is the more standard way of dealing with arrays in NumPy.

Boolean Indexing in NumPy

Boolean indexing in NumPy is a powerful technique that allows you to select elements from an array based on a certain condition. It involves using a boolean (True/False) array of the same shape as the original array to filter the elements. Elements corresponding to True values in the boolean array are selected, while elements corresponding to False values are excluded.

1D Arrays

Let's start with a one-dimensional array:

# Creating a 1D array
arr = np.array([1, 2, 3, 4, 5])
 
# Creating a boolean array that checks if elements are greater than 2
mask = arr > 2
 
print(mask)

Output:

[False False  True  True  True]

Now, we can use this boolean array (mask) to index the original array:

print(arr[mask])
 
# Output: [3 4 5]

2D Arrays

Example 1

# Creating a 2D array
arr = np.array([[1, 2], [3, 4], [5, 6]])
 
# Creating a boolean array that checks if elements are greater than 2
mask = arr > 2
 
print(mask)

Output:

[[False False]
 [ True  True]
 [ True  True]]

Again, we can use this boolean array to index the original array:

print(arr[mask])
 
# Output: [3 4 5 6]

Example 2: Now, lets do another example of 2D array, where we are interested to find only those rows, where the sum of its values is greater than 10

arr_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
 
# Boolean indexing to select rows with sum greater than 10
row_sum_mask = np.sum(arr_2d, axis=1) > 10
 
print(row_sum_mask)

Output:

[False True True]

Now, we can use this boolean to index the array

selected_rows = arr_2d[row_sum_condition]
 
print(selected_rows)
# Output:
# [[4 5 6]
#  [7 8 9]]

Multi-Dimensional Arrays

Boolean indexing also works in the same way for multi-dimensional arrays. Here is an example with a 3D array:

# Creating a 3D array
arr = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
 
# Creating a boolean array that checks if elements are greater than 4
mask = arr > 4
 
print(mask)

Output:

[[[False False]
  [False False]]

 [[ True  True]
  [ True  True]]]

And again, we can use this boolean array to index the original array:

print(arr[mask])
 
# Output: [5 6 7 8]

Fancy Indexing

Fancy indexing, also known as integer array indexing, is a method of indexing arrays by using arrays of integers. This method allows you to access and modify complicated patterns of array data in a simple way.

1D Array

Here's an example with a one-dimensional array:

# Creating a 1D array
arr = np.array([10, 20, 30, 40, 50])
 
# Creating an index array
ind = np.array([3, 1, 4])
 
# Fancy indexing
print(arr[ind])
 
# Output: [40 20 50]

In this example, arr[ind] returns a new array composed of the 4th, 2nd, and 5th elements of arr.

2D Array

Fancy indexing also works in multiple dimensions. Let's consider a two-dimensional array:

# Creating a 2D array
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
 
# Creating two index arrays
row_ind = np.array([0, 1, 2])
col_ind = np.array([2, 1, 0])
 
# Fancy indexing
print(arr[row_ind, col_ind])
 
# Output: [3 5 7]

In this example, arr[row_ind, col_ind] returns a 1D array composed of the values of arr at the positions (0,2), (1,1), and (2,0).

With Slicing

Fancy indexing can be combined with other indexing schemes. For example, you can combine fancy indexing with slicing:

print(arr[1:, [1, 2]])
 
# Output:
# [[5 6]
# [8 9]]

In this case, arr[1:, [1, 2]] returns a 2D array that consists of the rows from index 1 to the end (:) and the columns at indices 1 and 2 of the original arr.

With Boolean Masks

You can also use boolean masks with fancy indexing:

arr = np.array([10, 20, 30, 40, 50])
 
mask = np.array([True, False, True, False, True])  # Boolean mask
 
selected_elements = arr[mask]
print(selected_elements)  
# Output: [10 30 50]

Here, the boolean mask selects elements where the corresponding mask value is True.

Array Manipulation Mathematics