Automated Data Quality Assessment Using Ensemble Machine Learning Techniques
Main Article Content
Abstract
In the era of big data, ensuring data quality has become paramount for organizations seeking to derive meaningful insights from their datasets. This research paper presents a comprehensive study on automated data quality assessment using ensemble machine learning techniques. We explore the integration of multiple ML algorithms to identify data quality issues including missing values, outliers, inconsistencies, and integrity violations. Our proposed framework combines Random Forest, Gradient Boosting, and Neural Network classifiers to achieve superior accuracy (94.7%) compared to individual models. The study demonstrates that ensemble methods significantly outperform traditional rule-based approaches and single ML algorithms in detecting complex data quality patterns. We evaluate our approach on five real-world datasets from healthcare, finance, and e-commerce domains, showing consistent improvements in precision, recall, and F1-scores. The findings suggest that ensemble ML techniques provide a robust, scalable solution for automated data quality management in diverse organizational contexts.
