Linchakin

Avengers Ensemble: How Ensemble Modeling Helps You Avoid Overfitting

 October 05, 2021     No comments   

Open TLDR

Ensemble models are models consisting of multiple models or algorithms. Individual models can be combined through various methods such as bagging, boosting and stacking. Ensemble modeling can reduce variance, minimize modeling method bias and thus decrease chances of overfitting. The predictions based on ensemble modelling methods tend to be more stable with lower variance. An exciting application of ensemble modelling comes from the public health body in the U.S., who started crowdsourcing forecasting models in a "Predict Influenza Season Challenge". They then combined them into an ensemble for better accuracy.

image
Nikola O. Hacker Noon profile picture

Nikola O.

PhD researcher using data science to solve problems. Enjoys thinking, science fiction and design.

LinkedIn social icongithub social icon

Ensemble modeling can reduce variance, minimize modeling method bias and decrease chances of overfitting.

Read on to learn the secrets of ensemble modeling and why it works better than single models.

Overfitting

I remember the excitement when I first achieved 97% accuracy.

The task was to predict whether a patient would develop Alzheimer's or not. However, the moment I ran the model on new (unseen) data, the accuracy went down to 57% and the excitement was gone 😢.

This was my first experience with overfitting. 

Overfitting happens when your algorithm can't generalize data patterns because it learns the data too well.

When you present new data to an overfitted model, the predictive performance goes down, as in my case.

Occam's Razor

A good model fit, one that can generalize data well, should be simple. This is the idea behind the law of parsimony, often called Occam's razor.

"Don't elaborate the nature of something beyond necessity."

The frequent interpretation of this statement is that simple solutions are better than complex solutions.

In other words

complex == bad
,
simple == good
. We can apply this thinking to the process of choosing the best model fit. 

Remember my initial 97% accuracy that went to 57% on the new data? 

A model that learns all the relationships in the training data becomes too complex. Then, when it comes to predicting outcomes for the new observations, it performs poorly because the model is inflexible and tailored only to the training data.

On the other hand, a simple model can have a looser fit on training data, but it can also achieve a good fit on unseen data.

Ensemble modeling, among other methods, can prevent overfitting.

Ensemble Modeling

Ensemble models are models consisting of multiple models or algorithms. Individual models can be combined through various methods such as bagging, boosting, and stacking.

In general, we can differentiate between two types of ensembles. They are either based on 1) many weaker models that are stronger together (e.g. boosting) or 2) fewer well-thought-out models combined via stacking into a metamodel.

You can imagine the weaker models as ants. One ant can't build an anthill, but the whole colony of ants, well, that's a different story. The second group of ensemble methods is more like Avengers. (Yes, it wasn't just a hook in the title.)

Every Avenger has a unique skill, and they are superheroes already, but they are stronger together. This is because their individual disadvantages are covered by their teammates.

Why Ensemble Modeling Works?

We established at the beginning that according to Occam's razor, simple models are better. But ensemble modeling is about combining multiple models, so how is that simpler? 

There is something I didn't mention earlier when talking about Occam's razor. As summarized by Domingos in 1999, it has two interpretations:

  1. "Simplicity is a goal in itself."
  2. "Simplicity leads to greater accuracy." 

Turns out the second bit is not always valid.

I talked about overfitting and how it occurs when a model or an algorithm overlearns the training data. However, as empirical studies have shown, this doesn't hold true all the time. Additionally, judging complexity by the number of models is not ideal.

A solution combining multiple models is more complex than one model for sure. However, looking at how a model or algorithm functions, i.e. predicts, can help us understand this mystery.

Simplicity doesn't always lead to greater accuracy.

Let's go back to the analogy world. When ants are building their anthills, they need every ant they can get. It's similar for the weak learners, although you can still overfit this type of ensemble, so you need to be careful. When Avengers meet for the first time, some of them are overconfident and don't appreciate teamwork. But after they go through challenges together, they change and improve for the better. 

Ensemble modeling averages out the biases of the individual models or methods. This causes the predictions to be more stable with lower variance. In other words, the predictions based on ensemble modeling are simpler in terms of their statistical characteristics. 

Ensemble Modeling in Practice

To consolidate how great ensemble modeling is, I have an example of their success in making an impact. In 2013, the Centre for Disease Control and Prevention in the U.S. started crowdsourcing forecasting models in a "Predict the Influenza Season Challenge" from research teams across the country.

These seasonal challenges became popular, and the idea developed into a website where you can click through individual and combined predictions. The main takeaway was that from all of the individual models, the forecasts from ensemble models were the most accurate. I recommend you have a look. 

Read more about ensemble modeling or how to avoid overfitting:

Other resources:

Nikola O. Hacker Noon profile picture
by Nikola O. @nikolao. PhD researcher using data science to solve problems. Enjoys thinking, science fiction and design.Let's connect!

Related Stories

Tags

Join Hacker Noon

Create your free account to unlock your custom reading experience.

Adblock test (Why?)


You may be interested in:
>> Is a Chromebook worth replacing a Windows laptop?
>> Find out in detail the outstanding features of Google Pixel 4a
>> Top 7 best earbuds you should not miss

Related Posts:
>> Recognizing 12 Basic Body Shapes To Choose Better Clothes
>>Ranking the 10 most used smart technology devices
>> Top 5+ Best E-readers: Compact & Convenient Pen
  • Share This:  
  •  Facebook
  •  Twitter
  •  Google+
  •  Stumble
  •  Digg
Email ThisBlogThis!Share to XShare to Facebook

Related Posts:

  • BRIEF THOUGHTS OF: KOPI KENANGANI think it was after this blog post that I haven't paid that much attention to the rapidly growing Kopi Susu trend in Jakarta and boy I might need to … Read More
  • SIN CIN RESTAURANT - THE LINQ KEMAYORANYou can take me out of Indonesia, but you can't take the Indonesia out of me, or should I say, the Chinese blood out of me. Despite the rapidly changi… Read More
  • THE WORLD OF GHIBLI EXHIBITION - RITZ CARLTON PACIFIC PLACE JAKARTASORRY GUYS for just writing about this entry aka my experience to The World of Ghibli exhibition in Ritz-Carlton Pacific Place Jakarta which I visited… Read More
  • BAKE CHEESE TART INDONESIA - GRAND INDONESIAOkay now this is exciting: the famous BAKE CHEESE TART is finally opening in Indonesia with Jakarta (particularly Grand Indonesia) being where the fir… Read More
  • ALMOND TREE CAKESHello my beloved blog readers, so several weeks ago I paid a visit to Almond Tree, a cake store with a new concept from the Mount Scopus Group, which … Read More
Newer Post Older Post Home

0 Comments:

Post a Comment


Copyright © 2025 Linchakin | Powered by Blogger
Design by Hardeep Asrani | Blogger Theme by NewBloggerThemes.com | Distributed By Gooyaabi Templates