The basis for this study is the observation that if malware samples are turned into grayscale images, the textural and structural patterns can be used to effectively classify them as either benign or malicious, as well as cluster malicious samples into respective threat families, Microsoft said.
The researchers used an approach that they called static malware-as-image network analysis (STAMINA), Jugal Parikh and Marc Marino from Microsoft Threat Protection Intelligence Team wrote in a blog post.
For the first part of the collaboration, the researchers built on Intel”s prior work on deep transfer learning for static malware classification and used a real-world dataset from Microsoft to ascertain the practical value of approaching the malware classification problem as a computer vision task.
Using the dataset from Microsoft, the study showed that the STAMINA approach achieves high accuracy in detecting malware with low false positives.
The results were detailed in a paper titled “STAMINA: Scalable deep learning approach for malware classification”.
To establish the practicality of the STAMINA approach, which posits that malware can be classified at scale by performing static analysis on malware codes represented as images, the study covered three main steps: image conversion, transfer learning, and evaluation.
The study was performed on a dataset of 2.2 million PE file hashes provided by Microsoft. This dataset was temporally split into 60:20:20 segments for training, validation, and test sets, respectively.
The joint research encourages the use of deep transfer learning for the purpose of malware classification.