Numpy referentie - Machine Learning Cursussen

Numpy is dé library voor wetenschappelijk rekenen in Python. Het biedt een hoogperformant multidimensioneel array object, en tools om met deze objecten te werken. Hier geven we een referentie van de meest belangrijke functionaliteit, maar bekijk zeker ook de officiële documentatie.

Om Numpy te gebruiken, moeten we eerst het numpy package importeren:

import numpy as np

Arrays¶

Een numpy array is een raster van waarden, allemaal van hetzelfde type, en wordt geïndexeerd aan de hand van integers. De shape van een array is een tuple van integers die de grootte van de array langs elke dimensie aangeeft.

# Create a vector
a = np.array([1, 2, 3])
print(a)
print(f"type(a): {type(a)}\na.shape: {a.shape}\na[0]: {a[0]}\na[1]: {a[1]}\na[2]: {a[2]}")

[1 2 3]
type(a): <class 'numpy.ndarray'>
a.shape: (3,)
a[0]: 1
a[1]: 2
a[2]: 3

# Change an element of the array
a[0] = 5
print(a)

[5 2 3]

# Create a rank 2 array
b = np.array([[1, 2, 3], [4, 5, 6]])
print(b)
print(
    f"type(b): {type(b)}\nb.shape: {b.shape}\nb[0, 0]: {b[0, 0]}\nb[0, 2]: {b[0, 2]}\nb[1, 0]: {b[1, 0]}"
)

[[1 2 3]
 [4 5 6]]
type(b): <class 'numpy.ndarray'>
b.shape: (2, 3)
b[0, 0]: 1
b[0, 2]: 3
b[1, 0]: 4

Indexatie kan ook gebeuren aan de hand van tuples van integers waardoor we de indexatie via variabelen kunnen doen.

index = (0, 2)
print(b[index])

Speciale arrays¶

Er zijn ook talrijke functies om speciale matrices te creëren op basis van shape parameters.

# Create a range of 5 values
np.arange(5)

array([0, 1, 2, 3, 4])

# Create a equally spaced range between 0 and 1 with 5 steps
np.linspace(0, 1, 5)  # 5 values from 0 to 1

array([0. , 0.25, 0.5 , 0.75, 1. ])

# Create an array of all zeros
a = np.zeros((3, 2))
print(a)

[[0. 0.]
 [0. 0.]
 [0. 0.]]

# Create an array of all ones
b = np.ones((4, 5))
print(b)

[[1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]]

# Create a constant array
c = np.full((2, 3), 7)
print(c)

[[7 7 7]
 [7 7 7]]

# Create a 5x5 identity matrix
d = np.eye(5)
print(d)

[[1. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0.]
 [0. 0. 1. 0. 0.]
 [0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 1.]]

# Create an array filled with random values
rng = np.random.default_rng()  # Create a random number generator
e = rng.random((2, 3))
print(e)

[[0.48062712 0.33460006 0.06931556]
 [0.32320907 0.67773802 0.24421142]]

Slicing¶

Zoals Python lists, kunnen we indexen gebruiken om slices uit numpy arrays te bekomen. We kunnen dat apart voor iedere dimensie doen

a = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])

# Last column of a
print(a[:, 3])
# or
print(a[:, -1])

[ 4  8 12]
[ 4  8 12]

# First two rows and columns 1 and 2 of a
print(a[0:2, 1:3])

[[2 3]
 [6 7]]

# Assign the first 2 rows and columns 1 and 2 of a to b
b = a[:2, 1:3]
print(f"a: {a}")
print(f"b: {b}")

a: [[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]
b: [[2 3]
 [6 7]]

# Now change an element of b
b[0, 0] = 777
print(f"a: {a}")
print(f"b: {b}")

a: [[  1 777   3   4]
 [  5   6   7   8]
 [  9  10  11  12]]
b: [[777   3]
 [  6   7]]

Indien dit gedrag niet gewenst is, moeten we een expliciete kopie nemen van de oorspronkelijke array.

c = a[:2, 1:3].copy()  # Explicit copy
print(f"a: {a}")
print(f"c: {c}")

a: [[  1 777   3   4]
 [  5   6   7   8]
 [  9  10  11  12]]
c: [[777   3]
 [  6   7]]

c[0, 0] = 999
print(f"a: {a}")  # Verify that a is unchanged
print(f"c: {c}")

a: [[  1 777   3   4]
 [  5   6   7   8]
 [  9  10  11  12]]
c: [[999   3]
 [  6   7]]

Boolean indexing¶

Je kan boolean waarden (True of False in Python) gebruiken om specifieke element uit een array te lichten. Dit wordt frequent gebruikt om arrays te filteren op basis van een bepaalde conditie.

a = np.array([[1, 2], [3, 4], [5, 6]])

bool_idx = a > 2  # Find the elements of a that are bigger than 2;

print(bool_idx)
print(a[bool_idx])
# or
print(a[a > 2])

[[False False]
 [ True  True]
 [ True  True]]
[3 4 5 6]
[3 4 5 6]

Er kunnen ook speciale functies aangeroepen worden om condities te evalueren.

a = np.array([[1, np.nan], [3, 4], [np.nan, 6]])
print(a)

[[ 1. nan]
 [ 3.  4.]
 [nan  6.]]

np.isnan(a)  # Find the nan elements of a

array([[False,  True],
       [False, False],
       [ True, False]])

a[np.isnan(a) == False]
print(a)
# Or
a[~np.isnan(a)]
print(a)

[[ 1. nan]
 [ 3.  4.]
 [nan  6.]]
[[ 1. nan]
 [ 3.  4.]
 [nan  6.]]

Datatypes (dtype)¶

Numpy probeert het datatype af te leiden uit de input, maar je kan ook expiciete types meegeven of casten.

a = np.array(["fruit", "meat", "vegetable", "dairy"])
print(a)
print(a.dtype)

['fruit' 'meat' 'vegetable' 'dairy']
<U9

a.astype("float64")

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[24], line 1
----> 1 a.astype("float64")

ValueError: could not convert string to float: np.str_('fruit')

a = np.arange(10)
print(a)
print(a.dtype)

[0 1 2 3 4 5 6 7 8 9]
int64

print(a.astype("float32"))
print(a.astype("float32").dtype)

[0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]
float32

b = np.array([1.5, 2.5, 3.5], dtype=np.int32)
print(b)
print(b.dtype)

[1 2 3]
int32

Element-gewijze operaties¶

a = np.array([[1, 2], [3, 4]], dtype=np.float64)
print(f"a: {a}\n")
print(f"a * 3: {a * 3}")

a: [[1. 2.]
 [3. 4.]]

a * 3: [[ 3.  6.]
 [ 9. 12.]]

print(f"a - 3: {a - 3}")

a - 3: [[-2. -1.]
 [ 0.  1.]]

b = np.array([[5, 6], [7, 8]], dtype=np.float64)
print(f"b: {b}\n")
print(f"a + b: {a + b}")

b: [[5. 6.]
 [7. 8.]]

a + b: [[ 6.  8.]
 [10. 12.]]

print(f"a: {a}\n")
print(f"b: {b}\n")
print(f"a*b: {a * b}")  # Elementwise of "Hadamard" product

a: [[1. 2.]
 [3. 4.]]

b: [[5. 6.]
 [7. 8.]]

a*b: [[ 5. 12.]
 [21. 32.]]

Dot product¶

a = np.array([9, 10])
b = np.array([11, 12])

print(f"a: {a}\n")
print(f"b: {b}\n")
print(f"aTb: {a.dot(b)}")

a: [ 9 10]

b: [11 12]

aTb: 219

print(f"aTb: {a[0] * b[0] + a[1] * b[1]}")

aTb: 219

# Alternative syntax for matrix multiplication
print(f"aTb: {a @ b}")

aTb: 219

a = np.array([[1, 2], [3, 4], [5, 7]])
b = np.array([[5, 6], [7, 8], [10, 11]])

print(f"a: {a}\n")
print(f"b: {b}\n")
print(f"aTb: {a @ b}")

a: [[1 2]
 [3 4]
 [5 7]]

b: [[ 5  6]
 [ 7  8]
 [10 11]]

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[35], line 6
      4 print(f"a: {a}\n")
      5 print(f"b: {b}\n")
----> 6 print(f"aTb: {a @ b}")

ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 3 is different from 2)

print(f"aTb: {a.T @ b}")

aTb: [[ 76  85]
 [108 121]]

Lineaire algebra met Numpy¶

Numpy biedt krachtige functies voor lineaire algebra operaties die essentieel zijn voor machine learning. Hier zijn enkele belangrijke functies.

Vector en matrix Normen¶

De norm van een vector of matrix is een maat voor de “grootte”. De meest gebruikte normen zijn:

# Vector normen
v = np.array([3, 4])
print(f"Vector: {v}")
print(f"L2 norm (Euclidean): {np.linalg.norm(v)}")
print(f"L1 norm (Manhattan): {np.linalg.norm(v, ord=1)}")
print(f"L∞ norm (Maximum): {np.linalg.norm(v, ord=np.inf)}")

# Matrix normen
A = np.array([[1, 2], [3, 4]])
print(f"\nMatrix:\n{A}")
print(f"Frobenius norm: {np.linalg.norm(A, 'fro')}")

Vector: [3 4]
L2 norm (Euclidean): 5.0
L1 norm (Manhattan): 7.0
L∞ norm (Maximum): 4.0

Matrix:
[[1 2]
 [3 4]]
Frobenius norm: 5.477225575051661

Determinant¶

De determinant van een matrix geeft informatie over de lineaire transformatie die de matrix representeert.

A = np.array([[1, 2], [3, 4]])
det_A = np.linalg.det(A)
print(f"Matrix A:\n{A}")
print(f"Determinant: {det_A}")

# Een matrix met determinant 0 is singulier (niet inverteerbaar)
B = np.array([[1, 2], [2, 4]])
det_B = np.linalg.det(B)
print(f"\nMatrix B:\n{B}")
print(f"Determinant: {det_B} (singulier!)")

Matrix A:
[[1 2]
 [3 4]]
Determinant: -2.0000000000000004

Matrix B:
[[1 2]
 [2 4]]
Determinant: 0.0 (singulier!)

Matrix inverse¶

De inverse van een matrix A is een matrix A⁻¹ zodat A⁻¹A = I (identiteitsmatrix):

A = np.array([[1, 2], [3, 4]])
A_inv = np.linalg.inv(A)

print(f"Matrix A:\n{A}")
print(f"\nInverse A⁻¹:\n{A_inv}")

# Verificatie: A⁻¹A should be identity matrix
identity_check = A @ A_inv
print(f"\nA × A⁻¹ (should be identity):\n{identity_check}")
print(f"\nIdentity matrix:\n{np.eye(2)}")

Matrix A:
[[1 2]
 [3 4]]

Inverse A⁻¹:
[[-2.   1. ]
 [ 1.5 -0.5]]

A × A⁻¹ (should be identity):
[[1.0000000e+00 0.0000000e+00]
 [8.8817842e-16 1.0000000e+00]]

Identity matrix:
[[1. 0.]
 [0. 1.]]

Eigenwaarden en Eigenvectoren¶

Eigenwaarden en eigenvectoren zijn fundamenteel in lineaire algebra en machine learning.

A = np.array([[4, 2], [1, 3]])
eigenvalues, eigenvectors = np.linalg.eig(A)

print(f"Matrix A:\n{A}")
print(f"\nEigenwaarden: {eigenvalues}")
print(f"\nEigenvectoren:\n{eigenvectors}")

# Verificatie: A × v = λ × v voor elke eigenwaarde λ en eigenvector v
for i in range(len(eigenvalues)):
    lambda_i = eigenvalues[i]
    v_i = eigenvectors[:, i]

    left_side = A @ v_i
    right_side = lambda_i * v_i

    print(f"\nEigenwaarde {i + 1}: {lambda_i:.3f}")
    print(f"A × v = {left_side}")
    print(f"λ × v = {right_side}")
    print(f"Verschil: {np.allclose(left_side, right_side)}")

Matrix A:
[[4 2]
 [1 3]]

Eigenwaarden: [5. 2.]

Eigenvectoren:
[[ 0.89442719 -0.70710678]
 [ 0.4472136   0.70710678]]

Eigenwaarde 1: 5.000
A × v = [4.47213595 2.23606798]
λ × v = [4.47213595 2.23606798]
Verschil: True

Eigenwaarde 2: 2.000
A × v = [-1.41421356  1.41421356]
λ × v = [-1.41421356  1.41421356]
Verschil: True

Stelsels van lineaire vergelijkingen oplossen¶

Voor het oplossen van lineaire systemen van de vorm Ax = b:

# Stel we hebben het systeem:
# 2x + 3y = 7
# x + 4y = 6

A = np.array([[2, 3], [1, 4]])
b = np.array([7, 6])

# Oplossen met np.linalg.solve
x = np.linalg.solve(A, b)
print(f"Coëfficiënten matrix A:\n{A}")
print(f"Constanten vector b: {b}")
print(f"Oplossing x: {x}")

# Verificatie
verification = A @ x
print(f"\nVerificatie A × x = {verification}")
print(f"Should equal b = {b}")
print(f"Correct: {np.allclose(verification, b)}")

Coëfficiënten matrix A:
[[2 3]
 [1 4]]
Constanten vector b: [7 6]
Oplossing x: [2. 1.]

Verificatie A × x = [7. 6.]
Should equal b = [7 6]
Correct: True

Singular Value Decomposition (SVD)¶

SVD is een krachtige matrix factorization techniek die veel gebruikt wordt in machine learning.

# Maak een voorbeeldmatrix
A = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])
print(f"Original matrix A ({A.shape}):\n{A}")

# SVD: A = U × Σ × V^T
U, s, Vt = np.linalg.svd(A)

print(f"\nU shape: {U.shape}")
print(f"s (singular values): {s}")
print(f"Vt shape: {Vt.shape}")

# Reconstructie van de originele matrix
# We moeten s omzetten naar een diagonale matrix van de juiste grootte
S = np.zeros((U.shape[1], Vt.shape[0]))
S[: len(s), : len(s)] = np.diag(s)

A_reconstructed = U @ S @ Vt
print(f"\nReconstructed matrix:\n{A_reconstructed}")
print(f"\nReconstruction accurate: {np.allclose(A, A_reconstructed)}")

Original matrix A ((4, 3)):
[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]

U shape: (4, 4)
s (singular values): [2.54624074e+01 1.29066168e+00 2.40694596e-15]
Vt shape: (3, 3)

Reconstructed matrix:
[[ 1.  2.  3.]
 [ 4.  5.  6.]
 [ 7.  8.  9.]
 [10. 11. 12.]]

Reconstruction accurate: True

Matrix rank¶

De rank van een matrix geeft het aantal lineair onafhankelijke rijen of kolommen aan.

# Full rank matrix
A = np.array([[1, 2], [3, 4]])
rank_A = np.linalg.matrix_rank(A)
print(f"Matrix A:\n{A}")
print(f"Rank: {rank_A} (full rank voor 2x2 matrix)")

# Rank deficient matrix
B = np.array([[1, 2], [2, 4]])
rank_B = np.linalg.matrix_rank(B)
print(f"\nMatrix B:\n{B}")
print(f"Rank: {rank_B} (rank deficient!)")

# Grotere matrix
C = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
rank_C = np.linalg.matrix_rank(C)
print(f"\nMatrix C:\n{C}")
print(f"Rank: {rank_C} (rank deficient voor 3x3 matrix)")

Matrix A:
[[1 2]
 [3 4]]
Rank: 2 (full rank voor 2x2 matrix)

Matrix B:
[[1 2]
 [2 4]]
Rank: 1 (rank deficient!)

Matrix C:
[[1 2 3]
 [4 5 6]
 [7 8 9]]
Rank: 2 (rank deficient voor 3x3 matrix)

Array reshaping voor lineaire algebra¶

Vaak moeten we arrays hervormen voor matrix operaties.

# 1D array naar column vector
v = np.array([1, 2, 3, 4])
print(f"Original 1D array: {v} (shape: {v.shape})")

# Verschillende manieren om een column vector te maken
col_vector1 = v.reshape(-1, 1)
col_vector2 = v[:, np.newaxis]
col_vector3 = np.expand_dims(v, axis=1)

print(f"\nColumn vector (reshape): \n{col_vector1} (shape: {col_vector1.shape})")
print(f"\nColumn vector (newaxis): \n{col_vector2} (shape: {col_vector2.shape})")
print(f"\nColumn vector (expand_dims): \n{col_vector3} (shape: {col_vector3.shape})")

# Row vector
row_vector = v.reshape(1, -1)
print(f"\nRow vector: {row_vector} (shape: {row_vector.shape})")

# Flatten een matrix terug naar 1D
matrix = np.array([[1, 2], [3, 4]])
flattened = matrix.flatten()
print(f"\nMatrix:\n{matrix}")
print(f"Flattened: {flattened} (shape: {flattened.shape})")

Original 1D array: [1 2 3 4] (shape: (4,))

Column vector (reshape): 
[[1]
 [2]
 [3]
 [4]] (shape: (4, 1))

Column vector (newaxis): 
[[1]
 [2]
 [3]
 [4]] (shape: (4, 1))

Column vector (expand_dims): 
[[1]
 [2]
 [3]
 [4]] (shape: (4, 1))

Row vector: [[1 2 3 4]] (shape: (1, 4))

Matrix:
[[1 2]
 [3 4]]
Flattened: [1 2 3 4] (shape: (4,))