Optimizing Water Quality Prediction: A Hybrid Approach Integrating Latent Semantic Analysis and Extreme Gradient Boosting for Yamuna River in Delhi, India

Main Article Content

Neetu Gupta, Surendra Yadav, Neha Chaudhary



The Yamuna River, a vital water source for multiple cities, faces severe pollution from industrial discharges, challenging the health of ecosystems and human populations. Existing methods to examine quality of water, particularly the Water Quality Index, rely on time-consuming and costly data collection processes, and traditional predictive models struggle to adapt to evolving environmental challenges. This necessitates advanced approaches to accurately and timely predict WQI, crucial for effective management of water resources.


This research employs machine learning approaches to predict WQI, emphasizing the limitations of current models. The study explores the potential of several models. Additionally, a novel hybrid methodology is proposed, integrating Latent Semantic Analysis (LSA) for dimensionality reduction and Extreme Gradient Boosting for enhanced prediction.


Water samples from nine diverse locations along the Yamuna River, with a focus on industrial areas, are collected and analysis is performed for various parameters. The calculated WQI is then subjected to various machine learning models and the proposed hybrid approach. The evaluation criteria include accuracy, responsiveness, and the ability to predict WQI based on limited, significant parameters.


The research demonstrates the effectiveness of proposed hybrid methodology in predicting WQI. The hybrid methodology, combining LSA and Extreme Gradient Boosting, achieves a remarkable maximum accuracy of 95.2%, surpassing other models and state-of-the-art techniques. The study contributes valuable insights into water quality assessment, offering a data-driven, efficient, and accurate approach to predict WQI, essential for sustainable water resource management.

Article Details