Likelihood - Machine Learning Cursussen

In dit labo leer je hoe je maximum likelihood schatting toepast, onzekerheid kwantificeert via betrouwbaarheidsintervallen en predictie-intervallen, en modellen vergelijkt met de likelihood ratio test.

import numpy as np
from scipy import stats

✍️ Maximum Likelihood Schatting¶

Een vastgoedmakelaar verzamelt data over huizen. Je hebt de volgende data:

Oppervlakte (in m²): [50, 60, 70, 80, 90, 100, 110, 120, 130, 140]
Prijs (in €1000): [150, 180, 195, 220, 245, 260, 285, 300, 320, 340]

Bereken de Maximum Likelihood schattingen voor het lineaire model

\begin{align} \text{Prijs} &= b_0 + b_1 \times \text{Oppervlakte} + e \cr e &\sim \mathcal{N}(0, \sigma^2) \end{align}

(1)

# Data
area = np.array([50, 60, 70, 80, 90, 100, 110, 120, 130, 140])
price = np.array([150, 180, 195, 220, 245, 260, 285, 300, 320, 340])

# Design matrix X (add column of ones for intercept)
X = np.column_stack([np.ones(len(area)), area])
y = price

# OLS solution: b_hat = (X^T X)^{-1} X^T y
XtX_inv = np.linalg.inv(X.T @ X)
b_hat = XtX_inv @ X.T @ y

# Calculate residuals
residuals = y - X @ b_hat

# ML estimate for sigma (unbiased version)
N = len(y)
M = 2  # number of parameters (intercept + slope)
sigma_hat = np.sqrt(np.sum(residuals**2) / (N - M))

print("Maximum Likelihood Estimates:")
print(f"  b0: {b_hat[0]:.3f}")
print(f"  b1:     {b_hat[1]:.3f}")
print(f"  noise:  {sigma_hat:.3f}")

Maximum Likelihood Estimates:
  b0: 52.303
  b1:     2.076
  noise:  3.955

✍️ Betrouwbaarheidsintervallen¶

Bereken de 95% betrouwbaarheidsintervallen voor $b_0$ en $b_1$ .

# Calculate standard errors
# SE(b_j) = σ * sqrt([(X^T X)^{-1}]_{jj})
se_b = sigma_hat * np.sqrt(np.diag(XtX_inv))

# 95% confidence intervals using t-distribution
alpha = 0.05
t_critical = stats.t.ppf(1 - alpha / 2, df=N - M)

ci_lower = b_hat - t_critical * se_b
ci_upper = b_hat + t_critical * se_b

print(f"\n95% Confidence Intervals:")
print(f"  b0: [{ci_lower[0]:.3f}, {ci_upper[0]:.3f}]")
print(f"  b1: [{ci_lower[1]:.3f}, {ci_upper[1]:.3f}]")


95% Confidence Intervals:
  b0: [42.337, 62.269]
  b1: [1.975, 2.176]

✍️ Predictie-intervallen¶

Bereken het 95% predictie-interval voor een huis van 95 m²
Vergelijk dit met het betrouwbaarheidsinterval voor het gemiddelde van alle huizen van 95 m²

# Predict for a house of 95 m²
area_new = 95
x_new = np.array([1, area_new])

# Point prediction
y_pred = x_new @ b_hat

# Calculate leverage (hat value)
# h_new = x_new^T (X^T X)^{-1} x_new
h_new = x_new @ XtX_inv @ x_new

# Standard error for prediction (includes inherent noise)
# SE_pred = σ * sqrt(1 + h_new)
se_pred = sigma_hat * np.sqrt(1 + h_new)

# Standard error for mean response (excludes inherent noise)
# SE_mean = σ * sqrt(h_new)
se_mean = sigma_hat * np.sqrt(h_new)

# 95% prediction interval
pred_lower = y_pred - t_critical * se_pred
pred_upper = y_pred + t_critical * se_pred

# 95% confidence interval for mean
mean_lower = y_pred - t_critical * se_mean
mean_upper = y_pred + t_critical * se_mean

print(f"Prediction for a house of {area_new} square meters:")
print(f"  Point prediction: {y_pred:.2f}")
print("\n95% Prediction Interval (individual house):")
print(f"  [{pred_lower:.2f}, {pred_upper:.2f}]")
print(f"\n95% Confidence Interval (mean of all {area_new} square meters houses):")
print(f"  [{mean_lower:.2f}, {mean_upper:.2f}]")

Prediction for a house of 95 square meters:
  Point prediction: 249.50

95% Prediction Interval (individual house):
  [239.93, 259.07]

95% Confidence Interval (mean of all 95 square meters houses):
  [246.62, 252.38]

✍️ Likelihood Ratio Test¶

We hebben twee modellen:

Model 0 (enkel intercept):

\text{Prijs} = b_0 + e

(2)

Model 1:
$\text{Prijs} = b_0 + b_1 \times \text{Oppervlakte} + e$
(3)

Test of de oppervlakte significant bijdraagt aan het model (α = 0.05).

# Model 1: Full model (already fitted)
# Calculate log-likelihood for full model
log_likelihood_full = -N / 2 * np.log(2 * np.pi * sigma_hat**2) - np.sum(residuals**2) / (
    2 * sigma_hat**2
)

# Model 0: Reduced model (intercept only - just the mean)
b0_reduced = np.mean(y)
residuals_reduced = y - b0_reduced
sigma_reduced = np.sqrt(np.sum(residuals_reduced**2) / (N - 1))

# Calculate log-likelihood for reduced model
log_likelihood_reduced = -N / 2 * np.log(2 * np.pi * sigma_reduced**2) - np.sum(
    residuals_reduced**2
) / (2 * sigma_reduced**2)

# Likelihood ratio test statistic
Lambda = -2 * (log_likelihood_reduced - log_likelihood_full)

# Degrees of freedom: difference in number of parameters
df_test = 1  # M_1 - M_0 = 2 - 1

# Critical value from chi-squared distribution
alpha_test = 0.05
chi2_critical = stats.chi2.ppf(1 - alpha_test, df_test)

# P-value
p_value_lrt = 1 - stats.chi2.cdf(Lambda, df_test)

print("Likelihood Ratio Test:")
print(f"Test statistic Λ: {Lambda:.3f}")
print(f"  Degrees of freedom: {df_test}")
print(f"  Critical value (χ² at α={alpha_test}): {chi2_critical:.3f}")
print(f"  P-value: {p_value_lrt:.6f}")

print("\nConclusion:")
if Lambda > chi2_critical:
    print("  The full model explains the data much better.")
else:
    print("  The area parameter does not significantly improve the model.")

Likelihood Ratio Test:
Test statistic Λ: 56.348
  Degrees of freedom: 1
  Critical value (χ² at α=0.05): 3.841
  P-value: 0.000000

Conclusion:
  The full model explains the data much better.