Prediction of Heteroscedastic Sports Data By using Regression and Various Machine Learning Models
Main Article Content
Abstract
This study investigates the impact of extra deliveries—wide’s, no balls, byes, and leg byes—on T20I cricket match outcomes. It aims to find out whether these extras have a significant effect on the result and which type contributes the most runs. Using heteroscedastic data from the previous three T20I World Cups (20212024), the analysis underlines the importance of reducing extras to boost a team's chances of winning. OLS assumes that the error component has constant variance; when this assumption is violated, heteroscedasticity arises, resulting in incorrect predictions and conclusions. This study employs several types of machine learning techniques to increase prediction accuracy for heteroscedastic data. We applied machine learning models such as Decision Tree, Random Forest, Gradient Boosting, and K Nearest Neighbor, as well as Linear Regression, to predict values in test datasets. Machine learning models typically outperformed linear regression, resulting in decreased sum of squares errors, with some models producing predictions that were very close to observed values. This study shows that machine learning works well with heteroscedastic data.