An enhancement for image-based malware classification using machine learning with low dimension normalized input images

Son, Tran The, Lee, Chando, Le Minh, Hoa, Aslam, Nauman and Dat, Vuong Cong (2022) An enhancement for image-based malware classification using machine learning with low dimension normalized input images. Journal of Information Security and Applications, 69. p. 103308. ISSN 2214-2126

[img] Text
Manuscript_JISAS_2022.07.04_accepted_version.pdf - Accepted Version
Restricted to Repository staff only until 27 August 2023.
Available under License Creative Commons Attribution Non-commercial No Derivatives 4.0.

Download (1MB) | Request a copy
Official URL: https://doi.org/10.1016/j.jisa.2022.103308

Abstract

This paper proposes a simple and effective model applied for image-based malware classification using machine learning in which malware images (converted from malware binary files) are directly fed into the classifiers, i.e. k nearest neighbour (k-NN), support vector machine (SVM) and convolution neural networks (CNN). The proposed model does not use the normalized fixed-size square images (e.g. 64 × 64 pixels) or features extracted by image descriptor (e.g. GIST) for training classifiers as existing models do in the literature. Instead, the input images are normalized and horizontally sized down (the width of the image) to a lower dimension of 32 × 64, 16 × 64 or even 8 × 64 than square ones (e.g. 64 × 64 pixels) to reduce the complexity and training time of the model. It is based on the fact that the texture of the malware image is mainly vertically distributed as analysed in this paper. This finding is significant for training those devices which have limited computational resources such as IoT devices. The experiment was conducted on the Malimg, Malheur datasets which contains 9339 (25 malware families) and 3133 variant samples (24 malware families) using k-NN, SVM and CNN classifiers. The achieved results show that it is possible to reduce the dimension of the input images (i.e. 32 × 64, 16 × 64 or even 8 × 64) while still retaining the accuracy of classification as the same as the accuracy obtained by classifier feeding by the fixed-size square image (i.e. 64 × 64 pixels). As a result, training time of the propose model reduces by a half, a quarter, and one-eighth compared to training time taken by the same machine learning-based classifier (i.e. k-NN, SVM and CNN) feeding by fixed-sized square images, i.e. 64 × 64, respectively.

Item Type: Article
Additional Information: Funding information: Chando Lee would like to thank National IT Industry Promotion Agency (NIPA), Korea, for its funding. The research was done during his term as a World Friends Korea Advisor at Vietnam – Korea University of Information and Communication Technology, Da Nang City, Vietnam.
Uncontrolled Keywords: Image-based Malware Classification, k-NN, SVM, CNN, GIST descriptor
Subjects: G400 Computer Science
Department: Faculties > Engineering and Environment > Computer and Information Sciences
Faculties > Engineering and Environment > Mathematics, Physics and Electrical Engineering
Depositing User: John Coen
Date Deposited: 01 Sep 2022 12:04
Last Modified: 01 Sep 2022 12:15
URI: https://nrl.northumbria.ac.uk/id/eprint/49995

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year

View more statistics