Boosting in Machine Learning

Boosting in Machine Learning
Boosting in Machine Learning

In machine learning, boosting is an ensemble meta-algorithm for primarily reducing bias and also variance in supervised learning and a family of machine learning algorithms that convert weak learners to strong ones. Unlike many Machine Learning models which focus on high-quality prediction done by a single model, boosting algorithms seek to improve the prediction power by training a sequence of weak models, each compensating the weaknesses of its predecessors.

To understand Boosting, it is crucial to recognise that boosting is a generic algorithm rather than a specific model. Boosting needs, you to specify a weak model (e.g. regression, shallow decision trees, etc) and then improves it.

Most boosting algorithms consist of iteratively learning weak classifiers with respect to a distribution and adding them to a final strong classifier. Then the weights are adjusted they are added in a way that they are related to weak learner accuracy, this is called re-weighting. I’m thinking of an average of the predictions from these models. By doing this, we would be able to capture more information from the data, right?

That’s primarily the idea behind ensemble learning. And where does boosting come in? Boosting is one of the techniques that use the concept of ensemble learning. A boosting algorithm combines multiple simple models (also known as weak learners or base estimators) to generate the final output.

Below are the steps that show the mechanism of the boosting algorithm:

  • Reading data
  • Assigning weights to observations
  • False prediction identification
  • Assigning the false prediction, along with a higher weightage, to the next learner
  • Finally, iterating Step 2 until we get the correctly classified output

Boosting can be used for:

  • Binary Categorisation
  • Multiclass Categorisation

There are several Algorithms for Boosting which comes the above to category. Different types of famous Boosting algorithms in Machine Learning are mentioned below:

  • Adaptive Boosting (also known as AdaBoost)
  • Gradient Boosting (GBM)
  • Extreme Gradient Boosting Machine (XGBM)
  • LightGBM
  • CatBoost

AdaBoost

AdaBoost, short for Adaptive Boosting, is a machine learning meta-algorithm formulated by Yoav Freund and Robert Schapire, who won the 2003 Gödel Prize for their work. It first predicts by using the original data and assigns equal weight to every point. Then it attaches higher importance to the observations the first learner fails to predict correctly. It repeats the process until it reaches a limit in the accuracy of the model.

When used with decision tree learning, information gathered at each stage of the AdaBoost algorithm about the relative ‘hardness’ of each training sample is fed into the tree growing algorithm such that later trees tend to focus on harder-to-classify examples. The individual learners can be weak, but as long as the performance of each one is slightly better than random guessing, the final model can be proven to converge to a strong learner. AdaBoost can be best used with a decision tree.

Below is the code for implementing AdaBoost using sklearn

Gradient Boosting (GBM)

For regression and classification problem we can make the best use of Gradient Boosting. This method creates the model in a stage-wise fashion. The model formed is an ensemble of weak prediction models. In gradient boosting, it trains many models sequentially. It builds new base learners that can correlate with the loss function’s negative gradient and that are connected to the entire system. Each new model gradually minimises the loss function of the whole system using a Gradient Descent method. The learning procedure consecutively fit new models to provide a more accurate estimate of the response variable. Below is the python code for implementing Gradient Boosting (GBM) using sklearn implemented in Jupyter notebook.

Extreme Gradient Boosting Machine (XGBM) (XGBoost)
XGBoost is one of the implementations of the Gradient Boosting concept, but what makes XGBoost unique is that it uses a more regularized model formalisation to control over-fitting, which gives it better performance. Therefore, it helps to reduce overfitting. It is one of the most loved machine learning algorithms at Kaggle. It can be used for supervised learning tasks such as Regression, Classification, and Ranking. This is used at any time for winning data science competition. It provides a Scalable, Portable and accurate library to provide you with the best of the machine. Below are the steps to evaluate its model:

  • Prepare Data
  • Build the model
  • Predict and Visualise
  • Evaluate
  • Compare
    More insight into the implementation is presented here.

LightGBM

LightGBM is a relatively new algorithm. It uses a tree-based learning algorithm. Light GBM grows tree leaf-wise while another algorithm grows level-wise, thus it grows the tree vertically. It will choose the leaf with max delta loss to grow. When growing the same leaf, Leaf-wise algorithm can reduce more loss than a level-wise algorithm. It is very fast in handling a large amount of data thus it is named as “light”. It can run on GPU and also consumes less memory to run and is fast in handling larger datasets. Thus, it is not advised to use LightGBM for smaller datasets.

There are 100 parameters, but don’t worry you need not know all of them but is very important for an implementer to know at least some basic parameters of Light GBM. It has such a huge number of parameters but this algorithm is easy to implement.

Some of the parameters most used are given below:

  • lambda
  • max_cat_group
  • max_depth
  • min_data_in_leaf
  • feature_fraction
  • bagging_fraction
  • min_gain_to_split
  • Task
  • Application
  • boosting
  • num_boost_round
  • num_leaves
  • learning_rate
  • device
  • metric

For using LightGBM in Anaconda, you can use below code in Anaconda command prompt.

CatBoost

CatBoost is a recently open-sourced machine learning algorithm from Yandex. It can easily integrate with deep learning frameworks like Google’s TensorFlow and Apple’s Core ML. It can work with diverse data types to help solve a wide range of problems that businesses face today. It is especially powerful in two ways:

  • It yields state-of-the-art results without extensive data training typically required by other machine learning methods, and
  • Provides powerful out-of-the-box support for the more descriptive data formats that accompany many business problems.

There are several advantages of using this library, some are mentioned below:

  • Performance: it is as good as other algorithms
  • Handles Categorical features automatically
  • Robust: It reduces the need for extensive hyper-parameter tuning and lower the chances of overfitting also which leads to more generalised models.
  • Easy-to-use: You can use CatBoost from the command line, using a user-friendly API for both Python and R.

For using CatBoost, you can use below code.

You can also use CatBoostRegressor class for implementing catboost.

Conclusion

Today there are plenty of algorithms which are there to help in one or other data science problems, with these several algorithms you can always have better accuracy than before, with this ending thought for the implementation of most of the above algorithms is presented in a beautiful manner here. Which you are always free to an observer there implementation and learn from it.

To read more about Machine Learning, click here.

By Gaurav Verma