How to evaluate and visualize regression results


When evaluating regression models, there are several metrics you can use to assess their performance beyond just Mean Squared Error (MSE) or Root Mean Squared Error (RMSE). Here are some commonly used evaluation metrics for regression:

  • Mean Absolute Error (MAE): This metric measures the average absolute difference between the predicted and actual values. It provides a measure of the model’s average prediction error.

  • R-squared (R²) or Coefficient of Determination: R-squared indicates the proportion of the variance in the dependent variable (target) that is predictable from the independent variables (features). It ranges from 0 to 1, where 1 indicates a perfect fit.

  • Mean Squared Logarithmic Error (MSLE): MSLE measures the average logarithmic error between the predicted and actual values. It can be useful when the target variable has exponential growth.

  • Explained Variance Score: This score measures the proportion of variance explained by the model. It ranges from 0 to 1, where 1 indicates a perfect fit.

  • Median Absolute Error (MedAE): Similar to MAE, this metric calculates the median absolute difference between the predicted and actual values. It is less sensitive to outliers compared to MAE.

  • R-squared Adjusted (Adjusted R²): Adjusted R-squared takes into account the number of predictors in the model. It penalizes the addition of unnecessary variables and helps avoid overfitting.

  • Mean Percentage Error (MPE): This metric calculates the average percentage difference between the predicted and actual values. It is useful when you want to understand the relative error of the model’s predictions.

  • Mean Absolute Percentage Error (MAPE): MAPE calculates the average percentage difference between the predicted and actual values, similar to MPE. It is commonly used in time series forecasting.

  • Quantile Loss: Quantile loss measures the accuracy of predicting specific quantiles of the target variable. It provides information about the model’s performance across different levels of uncertainty.

Visualization techniques for regression evaluation can include:

  • Scatter plots: Plotting the predicted values against the actual values can help visualize the overall performance of the model. Ideally, the points should lie close to a diagonal line.

  • Residual plots: Residuals are the differences between the predicted and actual values. Plotting the residuals against the predicted values or the independent variables can help identify patterns or heteroscedasticity (unequal variance).

  • Distribution plots: Comparing the distribution of predicted values with the actual values can provide insights into the model’s accuracy and whether it is capturing the underlying data distribution.

  • Regression line plot: Visualizing the regression line along with the data points can help understand the relationship between the independent and dependent variables.

Following code shows some example:

%matplotlib inline

import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

# Generate some random data for demonstration
np.random.seed(42)
X = np.random.rand(100, 1) # Independent variable
y = 2 + 3 * X + np.random.randn(100, 1) # Dependent variable with some noise

# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Fit the regression model
model = LinearRegression()
model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = model.predict(X_test)

# Evaluate the model using different metrics
mse = mean_squared_error(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

# Calculate adjusted R-squared
n = X_test.shape[0] # Number of samples
p = X_test.shape[1] # Number of predictors
adjusted_r2 = 1 - ((1 - r2) * (n - 1) / (n - p - 1))

# Scatter plot of actual vs predicted values
plt.scatter(y_test, y_pred)
plt.xlabel('Actual Values')
plt.ylabel('Predicted Values')
plt.title('Actual vs Predicted Values')
plt.show()

# Residual plot
residuals = y_test - y_pred
plt.scatter(y_pred, residuals)
plt.xlabel('Predicted Values')
plt.ylabel('Residuals')
plt.title('Residual Plot')
plt.axhline(y=0, color='r', linestyle='-')
plt.show()

print("Mean Squared Error (MSE):", mse)
print("Mean Absolute Error (MAE):", mae)
print("R-squared (R²):", r2)
print("Adjusted R-squared:", adjusted_r2)

Author: robot learner
Reprint policy: All articles in this blog are used except for special statements CC BY 4.0 reprint policy. If reproduced, please indicate source robot learner !
  TOC