Minority class augmentation using GANs to improve the detection of anomalies in critical operations
Conferences
Overview
Research
Identity
Additional Document Info
View All
Overview
abstract
With the ever-increasing adoption of interconnected technologies and rapid digitization observed in modern-day life, many online networks and applications face constant threats to the security and integrity of their operations or services. For example, fraudsters and malicious entities are continuously evolving their techniques and approaches to bypass current measures in place to prevent financial fraud, vandalism in online knowledge bases and social networks like Wikipedia, and malicious cyber-attacks. As such, many of the supervised models proposed to detect these malicious actions face degradations in detection performance and are rendered obsolete over time. Furthermore, fraudulent or anomalous data representing these attacks are often scarce or very difficult to access, which further restricts the performance of supervised models. Generative adversarial networks (GANs) are a relatively new class of generative models that rely on unsupervised learning. Moreover, they have proven to effectively replicate the distributions of real data provided to them. These models can generate synthetic data with a degree of quality such that their resemblance to real data is almost indistinguishable, as demonstrated in image and video applications – like with the rise of DeepFakes. Based on the success of GANs in applications involving image-based data, this study examines the performance of several different GAN architectures as an oversampling technique to address the data imbalance issue in credit card fraud data. A comparative analysis is presented in this paper of different types of GANs used to fabricate training data for a classification model, and their impact on the performance of said classifier. Furthermore, we demonstrate that it is possible to achieve greater detection performance using GANs as an oversampling approach in imbalanced data problems.