Harnessing the power of GPUs to speed up feature selection for outlier detection

Fatemeh Azmandian, Ayse Yilmazer, Jennifer G. Dy, Javed A. Aslam, David R. Kaeli

Research output: Contribution to journalArticlepeer-review

10 Citations (Scopus)

Abstract

Acquiring a set of features that emphasize the differences between normal data points and outliers can drastically facilitate the task of identifying outliers. In our work, we present a novel non-parametric evaluation criterion for filter-based feature selection which has an eye towards the final goal of outlier detection. The proposed method seeks the subset of features that represent the inherent characteristics of the normal dataset while forcing outliers to stand out, making them more easily distinguished by outlier detection algorithms. Experimental results on real datasets show the advantage of our feature selection algorithm compared with popular and state-of-the-art methods. We also show that the proposed algorithm is able to overcome the small sample space problem and perform well on highly imbalanced datasets. Furthermore, due to the highly parallelizable nature of the feature selection, we implement the algorithm on a graphics processing unit (GPU) to gain significant speedup over the serial version. The benefits of the GPU implementation are two-fold, as its performance scales very well in terms of the number of features, as well as the number of data points.

Original languageEnglish
Pages (from-to)408-422
Number of pages15
JournalJournal of Computer Science and Technology
Volume29
Issue number3
DOIs
Publication statusPublished - May 2014
Externally publishedYes

Funding

By utilizing the parallelism that exists in the LoKDR algorithm, the GPU-based implementation of our pro- posed feature selection method offers significant performance improvements over the serial (CPU only) implementation. To quantify this gain, we perform a series of performance experiments where we pit the serial and GPU implementations head-to-head in terms of their running time on the real-world datasets. The GPU version is written using NVIDIA’s Compute Unified Device Architecture (CUDA)④ and the serial version is written in C. Our serial experiments are run on an Intelr Xeonr CPU E5405 running at 2.00 GHz. The GPU used in our experiments is the NVIDIA Tesla M2070, which is based on the NVIDIA Fermi GPU architecture. Fermi GPUs are designed for general-purpose high performance GPU computing. The Tesla M2070 modules are performance-optimized, high-end products. They offer 6 GB of GDDR5 ECC-protected memories on board with 1.566 GHz memory clock and 384-bit memory interface. Fermi architectures provide floating multiply-add (FMA) instructions for both 32-bit single-precision and 64-bit double-precision floating point numbers. The features supported by a CUDA hardware are described by the Compute Capability, and Fermi architectures have the support for CUDA Compute Capability 2.0.

FundersFunder number
Fermilab

    Keywords

    • GPU acceleration
    • feature selection
    • imbalanced data
    • outlier detection

    Fingerprint

    Dive into the research topics of 'Harnessing the power of GPUs to speed up feature selection for outlier detection'. Together they form a unique fingerprint.

    Cite this