NumPy
Array Manipulation

Mastering NumPy Array Manipulation

n this post, we will dive deep into NumPy's array manipulation functions, exploring various techniques and providing code examples to help you grasp these concepts effectively.

Reshaping Arrays

One common operation that we often encounter when working with arrays is reshaping - the process of changing the dimensions of an array while keeping the same number of elements. We will discuss three important functions for reshaping arrays: reshape, ravel, and flatten.

The reshape Function

The reshape function allows us to change the dimensions of an array while maintaining the original number of elements. This function takes the desired shape as an argument and returns a new view of the array with the specified shape.

Syntax: numpy.reshape(newshape)

# Creating a 1D array
arr_1d = np.array([1, 2, 3, 4, 5, 6])
 
# Reshaping to a 2x3 matrix
arr_reshaped = arr_1d.reshape(2, 3)
 
print("Original array:")
print(arr_1d)
print("\nReshaped array:")
print(arr_reshaped)

Output:

Original array:
[1 2 3 4 5 6]

Reshaped array:
[[1 2 3]
 [4 5 6]]

The ravel Function

The ravel function is used to flatten a multi-dimensional array into a 1D array. It returns a flattened array by iterating through the input array row by row.

# Flattening the reshaped array using ravel
arr_flattened_ravel = arr_reshaped.ravel()
 
print("Flattened array using ravel:")
print(arr_flattened_ravel)

Output:

Flattened array using ravel:
[1 2 3 4 5 6]

The flatten Function

Similar to ravel, the flatten function also flattens a multi-dimensional array into a 1D array. However, it returns a new copy of the array rather than a view.

# Flattening the reshaped array using flatten
arr_flattened_flatten = arr_reshaped.flatten()
 
print("Flattened array using flatten:")
print(arr_flattened_flatten)

Output:

Flattened array using flatten:
[1 2 3 4 5 6]

Comparing ravel and flatten

While both ravel and flatten functions achieve the same goal of flattening an array, they have a crucial difference - their impact on the original array.

Let's explore the difference between them with an example:

# Creating a 2D array
original_array = np.array([[1, 2, 3],
                           [4, 5, 6]])
 
# Using flatten()
flattened_array = original_array.flatten()
 
# Using ravel()
raveled_array = original_array.ravel()
 
# Modifying the original array
original_array[0, 0] = 100
 
# Displaying results
print("Original Array:\n", original_array)
print("Flattened Array:", flattened_array)
print("Raveled Array:", raveled_array)

Output:

Original Array:
 [[100   2   3]
 [  4   5   6]]
Flattened Array: [1 2 3 4 5 6]
Raveled Array: [100   2   3   4   5   6]

Now, here's the key difference:

  1. flatten(): The flatten() function returns a new array that is a completely independent copy of the original array. Any changes made to the original_array after using flatten() will not affect the flattened_array, and vice versa.

  2. ravel(): The ravel() function returns a flattened array that may share data with the original array. This means that changes made to the original_array can affect the values in the raveled_array, and vice versa.

In practice, if you need a flattened array and don't care about sharing data with the original array, both flatten() and ravel() can be used. However, if you want to ensure that the flattened array is independent of the original array, it's safer to use flatten(). If memory efficiency is a concern and sharing data is acceptable, ravel() could be more suitable.

Understanding axis

In NumPy, the "axis" parameter refers to a crucial concept that helps you specify the direction along which an operation is performed on arrays. Let's break down this concept step by step.

Understanding the Axis Concept

Imagine you have a 2D array (also known as a matrix) like this:

[[1, 2, 3],
 [4, 5, 6],
 [7, 8, 9]]
  • The rows are the horizontal sequences: [1, 2, 3], [4, 5, 6], and [7, 8, 9].
  • The columns are the vertical sequences: [1, 4, 7], [2, 5, 8], and [3, 6, 9].

The "axis" parameter helps you choose whether you want to perform operations along these rows, columns, or even higher dimensions in more complex, multi-dimensional arrays.

Working with Axis in NumPy

When you perform operations on arrays in NumPy, like calculating the sum, mean, or applying functions, the "axis" parameter comes into play.

  • If you specify axis=0, you're asking NumPy to perform the operation along the columns. For example, np.sum(arr, axis=0) would calculate the sum of each column.

  • If you specify axis=1, you're instructing NumPy to perform the operation along the rows. For example, np.mean(arr, axis=1) would calculate the mean of each row.

Here's a simple example to illustrate this:

# Creating a 2D array
arr = np.array([[1, 2, 3],
                [4, 5, 6],
                [7, 8, 9]])
 
# Calculating the sum along columns (axis=0)
column_sum = np.sum(arr, axis=0)
print("Column Sum:")
print(column_sum)
 
# Calculating the mean along rows (axis=1)
row_mean = np.mean(arr, axis=1)
print("\nRow Mean:")
print(row_mean)

Output:

Column Sum:
[12 15 18]

Row Mean:
[2. 5. 8.]

Handling Higher Dimensions

In multi-dimensional arrays, you can use higher values for the "axis" parameter to target operations along those dimensions. For instance, with a 3D array, you might use axis=0 for operations along the first dimension, axis=1 for the second, and so on.

Joining Arrays

Combining arrays is crucial when dealing with larger datasets or merging data from different sources. NumPy provides several functions for joining arrays: concatenate, stack, hstack, and vstack.

The concatenate Function

The concatenate function is a versatile tool that allows us to join multiple arrays along a specified axis. It takes a sequence of arrays and the axis along which to concatenate them as arguments.

# Creating arrays for concatenation
arr1 = np.array([[1, 2], 
				 [3, 4]])
arr2 = np.array([[5, 6]])
arr3 = np.array([[5, 6], 
				 [7, 8]])
 
# Concatenating arrays along axis 0 (vertical concatenation)
vertical_concatenated = np.concatenate((arr1, arr2), axis=0)
 
# Concatenating arrays along axis 1 (horizontal concatenation)
horizontal_concatenated = np.concatenate((arr1, arr3), axis=1)
 
print("Vertical Concatenation:")
print(vertical_concatenated)
 
print("\nHorizontal Concatenation:")
print(horizontal_concatenated)

Output:

Vertical Concatenation:
[[1 2]
 [3 4]
 [5 6]]

Horizontal Concatenation: 
[[1 2 5 6] 
 [3 4 7 8]]

The hstack and vstack Functions

The hstack and vstack functions are specialized versions of the concatenate function. They are commonly used for horizontal and vertical concatenation, respectively.

# Vertical stacking
vertical_stacked = np.vstack((arr1, arr2))
 
# Horizontal stacking
horizontal_stacked = np.hstack((arr1, arr3))
 
print("Vertical Stacking:")
print(vertical_stacked)
 
print("\nHorizontal Stacking:")
print(horizontal_stacked)

Output:

Vertical Stacking:
[[1 2]
 [3 4]
 [5 6]]

Horizontal Stacking: 
[[1 2 5 6] 
 [3 4 7 8]]

The stack Function

The stack function is slightly different from the previous ones. It allows us to join arrays along a new axis, creating a new dimension in the resulting array. Remember that all input arrays must have the same shape otherwise you will get ValueError.

Let's start by creating two 1D arrays and then stack them using the NumPy stack() function along axis=0 (default value, so you can choose to omit this)

# Creating two 1D arrays
arr4 = np.array([10, 20, 30])
arr5 = np.array([40, 50, 60])
 
# Stacking arr1 and arr2 along axis=0
stacked_arr0 = np.stack((arr4, arr5), axis=0) # axis=0 is default
 
print("Stacked along axis=0:\n", stacked_arr0)
print("Shape of the stacked array:", stacked_arr0.shape)
print(f"Shape of the input array: {arr4.shape} and {arr5.shape}")

Output:

Stacked along axis=0:
 [[10 20 30]
 [40 50 60]]
Shape of the stacked array: (2, 3)
Shape of the input array: (3,) and (3,)

In the above example, stacking along axis=0 creates a new array where each original array becomes a row in the new array. The shape of the new array is (2, 3), indicating that it is a 2D array with 2 rows and 3 columns.

Now, let's stack the same arrays along axis=1.

# Stacking arr1 and arr2 along axis=1
stacked_arr1 = np.stack((arr4, arr5), axis=1)
 
print("Stacked along axis=1:\n", stacked_arr1)
print("Shape of the stacked array:", stacked_arr1.shape)

Output:

Stacked along axis=1:
 [[10 40]
 [20 50]
 [30 60]]
Shape of the stacked array: (3, 2)

When stacking along axis=1, each original array becomes a column in the new array. The shape of the new array is (3, 2), indicating that it is a 2D array with 3 rows and 2 columns.

So, by using the stack() function, we have transformed two 1D arrays into a 2D array. The axis parameter determines whether the original arrays become rows (axis=0) or columns (axis=1) in the new array.

Similarities and Differences

All these array joining functions have their specific use cases, but they share some common characteristics:

  • Concatenation Axis: The primary difference among these functions lies in the axis along which they concatenate or stack the arrays. The concatenate function allows us to choose the axis explicitly, while hstack and vstack are specialized versions for horizontal and vertical stacking.
  • New Axis: The stack function introduces a new axis in the resulting array, creating a higher-dimensional array compared to the others.

Splitting Arrays

There are times when we need to break down an array into smaller segments to perform specific tasks or analyze subsets of data. In this section, we'll explore array splitting techniques using functions like split, hsplit, and vsplit.

The split Function

The split function divides an array into multiple subarrays along a specified axis. It takes the array to split and the number of equally-sized subarrays as arguments.

# Creating an array for splitting
arr_to_split = np.array([1, 2, 3, 4, 5, 6])
 
# Splitting the array into 3 subarrays
split_result = np.split(arr_to_split, 3)
 
print(split_result)
print("\nSplit Result:")
for subarray in split_result:
    print(subarray)

Output:

[array([1, 2]), array([3, 4]), array([5, 6])]

Split Result:
[1 2]
[3 4]
[5 6]

The hsplit and vsplit Functions

The hsplit and vsplit functions are specialized versions of the split function. They are used for horizontal and vertical splitting, respectively.

  • The hsplit function splits an array into multiple sub-arrays horizontally (column-wise).
  • The vsplit function splits an array into multiple sub-arrays vertically (row-wise).
# Creating a 2D array for splitting
arr_to_split_2d = np.array([[1, 2, 3],
                            [4, 5, 6]])
 
# Horizontal splitting
horizontal_split = np.hsplit(arr_to_split_2d, 3)
 
# Vertical splitting
vertical_split = np.vsplit(arr_to_split_2d, 2)
 
print("Horizontal Splitting:")
for subarray in horizontal_split:
    print(subarray)
 
print("\nVertical Splitting:")
for subarray in vertical_split:
    print(subarray)

Output:

Horizontal Splitting:
[[1]
 [4]]
[[2]
 [5]]
[[3]
 [6]]

Vertical Splitting:
[[1 2 3]]
[[4 5 6]]

Adding and Removing Elements

In this section, we'll explore the techniques of adding and removing elements using functions like append, insert, and delete.

The append Function

The append function is a common method for adding elements to the end of an array. This function takes a value or an array and adds it to the end of the original array.

# Creating an initial array
arr = np.array([1, 2, 3])
 
# Appending a single element
arr_appended_single = np.append(arr, 4)
 
# Appending another array
arr_to_append = np.array([5, 6])
arr_appended_array = np.append(arr, arr_to_append)
 
print("Array after appending single element:")
print(arr_appended_single)
 
print("\nArray after appending another array:")
print(arr_appended_array)

Output:

Array after appending single element:
[1 2 3 4]

Array after appending another array:
[1 2 3 5 6]

The insert Function

The insert function is used to insert values in the input array along the specified axis at the specified index.

Let's look at an example:

# Creating a 1D array
arr = np.array([1, 2, 3])
 
# Inserting values at index 1
inserted_arr = np.insert(arr, 1, [4, 5, 6])
 
print(inserted_arr)

Output:

[1 4 5 6 2 3]

In this example, the values [4, 5, 6] is inserted into arr at index 1.

You can also use the insert function with a 2D array and specify an axis:

# Creating a 2D array
arr = np.array([[1, 2, 3], [7, 8, 9]])
 
# Inserting values along axis 0
inserted_arr = np.insert(arr, 1, [[4, 5, 6]], axis=0)
 
print(inserted_arr)

Output:

[[1 2 3]
 [4 5 6]
 [7 8 9]]

In this case, the values [4, 5, 6] were inserted as a new row at index 1.

The delete Function

The delete function helps you remove elements at a specified index or along a specified axis. It returns a new array with the chosen elements removed.

Let's look at an example:

# Creating a 1D array
arr = np.array([1, 2, 3])
 
# Deleting the element at index 1
deleted_arr = np.delete(arr, 1)
 
print(deleted_arr)

Output:

[1 3]

In this example, the element at index 1 of arr was deleted.

You can also use the delete function with a 2D array and specify an axis:

# Creating a 2D array
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
 
# Deleting the sub-array (row) at index 1 along axis 0
deleted_arr = np.delete(arr, 1, axis=0)
 
print(deleted_arr)

Output:

[[1 2 3]
 [7 8 9]]

In this case, the sub-array (row) at index 1 was deleted.