A Beginner's Guide to Tuning Hyperparameters of Popular Machine Learning Algorithms


Machine Learning is a rapidly evolving field with algorithms that often require hyperparameters fine-tuning to improve model performance. Understanding the red lines and rules of thumb for tuning these hyperparameters is essential. Let’s explore some popular machine learning algorithms and the hyperparameters that govern them.

  1. Random Forest

    • n_estimators: This represents the number of trees in the forest. Generally, a higher number improves the model and makes the predictions stronger and more stable, but a very high number can result in longer computational time. Starting point can be 100.

    • max_features: It represents the number of features to consider when looking for the best split. Good starting points can be ‘auto’, ‘sqrt’ or ‘log2’.

    • max_depth: The maximum depth of the tree. You can leave this value as None resulting in full expansion of trees.

  2. Boosting Algorithms (XGBoost, LightGBM, CatBoost)

    • All the three boosting variants share some common hyperparameters like,

    • n_estimators: The number of boosting stages to perform. Gradient boosting is fairly robust to over-fitting so a large number usually results in better performance. A reasonable starting point could be 100.

    • learning_rate (or eta): It makes the model more robust by shrinking the weights on each step. A smaller value might require higher number of boosting rounds hence, increased computation. You can start with values like 0.1, 0.01 or so on.

    • max_depth: The maximum tree depth for boosting models. Starting points could be 6 to 10.

    • subsample: The fraction of observations to be randomly selected for each tree. Lower values make the algorithm conservative and prevents overfitting. Usually set at 0.8 to 1.

3.Transformer Models

Transformers rely more on architecture selection than hyperparameter tuning. You may however tune the following:

  • Learning Rate: Choose this too large and optimization might overshoot and diverge - too small, and it might never reach the minimum. 0.0001 could be a reasonable start.
  • Batch Size: Influences the noise in the gradient estimate, and the computational requirements of the algorithm. Try powers of 2 that fit into memory (e.g., 32, 64, 128).
  • Number of Layers (for architectures like BERT, etc): Depending on the complexity of the task, increasing the number of layers may increase model understanding but beware of overfitting.

Hyperparameter tuning can be a bit of an art and may require a bit of trial and error. Remember, as a rule of thumb, changing one hyperparameter can affect the behavior of the algorithm, so we should consider the interaction of parameters to achieve the best results.

Machine learning libraries like Scikit-learn provide handy tools like GridSearch and RandomizedSearch for hyperparameters optimization. AutoML, Bayesian Optimization, and Genetic Algorithms are also gaining popularity for hyperparameter tuning.

We hope this beginner-friendly guide serves as a useful starting point for your machine learning journey.


Author: robot learner
Reprint policy: All articles in this blog are used except for special statements CC BY 4.0 reprint policy. If reproduced, please indicate source robot learner !
  TOC