Concept drift detection using machine learning in data stream

Wang, Pingfan (2024) Concept drift detection using machine learning in data stream. Doctoral thesis, Northumbria University.

Preview

Text (Doctoral thesis)
wang.pingfan_phd(19036926).pdf - Submitted Version
Download (6MB) | Preview

Abstract

Machine learning applications in streaming data often grapple with dynamic changes in data distribution, particularly concept drift, where shifts in classification boundaries undermine the model’s performance. This challenge is further complicated by the inherent complexity of data streams, the underutilization of deep neural networks in addressing the issue, and a lack of comprehensive understanding of the concept drift.

Data streams are non-stationary and complex by nature, which poses significant challenges to concept drift detection. Deep neural networks, despite their immense predictive power, are rarely employed in this context due to their high computational costs. Current detection methods typically concentrate on pinpointing when a concept drift occurs, neglecting to explore the detailed information about the concept drift. This dearth of information, such as the concept drift’s onset, duration, severity, or endpoint, could be invaluable for a more nuanced understanding of the phenomenon.

To mitigate these issues, this thesis introduces several innovative methods:

Drift Detection Method with False Positive rate for Multi-label classification (DDM-FP-M): This novel approach extends the existing Drift Detection Method (DDM) to multi-label classification data streams. It incorporates a unique mechanism to adjust for false positives, enhancing the adaptability and accuracy of drift detection in complex data stream scenarios.

Noise Tolerant Drift Detection Method (NTDDM): NTDDM introduces a two-step process to discern true drifts from noise-induced false positives. It refines drift detection by filtering out misleading signals through subsampling and statistical detection methods, improving the reliability of drift detection in noisy data environments. The efficacy of this method is further validated through three newly proposed performance metrics specifically designed for concept drift detection.

Incremental Weighted Performance Drift Detection Method (IWPDDM): This method employs prediction confidence derived from the incremental learning of ensemble models to detect concept drift. It represents a shift in focus towards the model’s own response to concept drift, rather than solely relying on the model’s output. It creates an indicator using weighted prediction confidence from these models, ensuring stable and accurate drift detection, a significant improvement over traditional methods.

Model-centric Transfer Learning (MCDD) Framework: Recognizing the limited use of deep neural networks in concept drift detection due to computational constraints, this thesis proposes the MCDD framework. This approach relies solely on the model’s intrinsic changes to detect concept drift, making it a model-centric method. Our experiments demonstrate that mere changes in the model itself can accurately reflect concept drift. This framework strategically utilizes transfer learning to freeze parts of the network, significantly reducing computational needs while enhancing drift detection performance.

Quadruple-based Approach for Understanding Concept Drift in Data Streams (QuadCDD): The QuadCDD framework aims to provide a holistic understanding of concept drift. Most existing methods focus only on detecting the start point of concept drift, yet there is much information about concept drift that remains unexplored, such as its endpoint, severity, and type. This lack of information greatly reduces the specificity of adaptation strategies. The QuadCDD framework goes beyond merely identifying the onset of drift, equipping models with comprehensive information for more effective adjustments and a deeper understanding of the drift dynamics.

In conclusion, this thesis addresses a critical issue in machine learning on data streams. It provides practical and innovative concept drift detection algorithms, contributing significantly to both scientific research and practical applications in the field.

Item Type:	Thesis (Doctoral)
Uncontrolled Keywords:	concept drift detection, data stream, artificial intelligence
Subjects:	G400 Computer Science
Department:	Faculties > Engineering and Environment > Computer and Information Sciences University Services > Graduate School > Doctor of Philosophy
Depositing User:	John Coen
Date Deposited:	14 Feb 2024 15:45
Last Modified:	25 Jul 2024 03:30
URI:	https://nrl.northumbria.ac.uk/id/eprint/51691

Actions (login required)

View Item

Downloads

Downloads per month over past year

View more statistics