Deep Android Malware Detection
Malware attacks and their prevention is a growing problem, especially in mobile platforms, IoT-connected devices and vehicles. Given the sheer number of apps and devices, traditional malware detection techniques based on manual feature analysis are not scalable and better and more effective detection techniques based on machine learning are required.
In this project, we proposed a new malware detection framework that takes inspiration from existing deep learning and natural language processing advances. Our system allows performing static malware analysis on an APK before installing on the device. Thus, the apk is first disassemble and the opcode sequence extracted as shown in Figure 1. The neural network, whose architecture is depicted in Figure 2, then goes through the sequence of instruction to classify it as malicious or benign.
Figure 1: Work-flow of how an Android application is disassembled to produce an opcode sequence. And processed by our Deep neural network approach to classify an app.
Our approach is able to train from data without manual intervention or expert knowledge, reducing the bias of the decision and showing a better scalability and performance than other machine learning approaches such as n-grams, as shown in Figure 3. Relevant features are automatically extracted from the opcode sequence can be combined with other features such as permissions for further improvement.
Figure 2: Propose Deep learning architecture for malware analysis
Figure 3: Malware classification results for our system on both a small and large datasets compared with results from the literature.
Our approach has also shown a better performance to new type of malware, as demonstrated in Figure 4 in a zero day scenario, where a new family of malware not used in trained is tested.
Figure 4: Malware classification results for our system in the zero day scenario, where a family is left-out of training at each iteration, compared against the n-gram system
Our framework has been extended to deal with obfuscation, showing very encouraging performance as displayed in Figure 5.
Figure 5: Detection performance of our model trained with both obfuscated and unobfuscated sample, and comparison against state of art in obfuscated malware detection for Android
Finally, once our system has been trained, large numbers of files can be efficiently scanned using a GPU implementation, even in an embedding device implementation, as shown in Figure 6. A mobile App from our malware detector was developed and shortlisted and Finalist in the Mobile World Scholar Challenge at the Mobile World Congress 2019
Figure 6: Comparing the time taken to reach a classification decision and number of programs that can be classified per second, for our proposed neural network system and a conventional n-gram based system.
This work has been recently been further extended to include the addition of proprietary Android APIs as a new feature set in a multi-view deep learning setting for zero-day detection, and explainability work has also been undertaken.
Figure 7: Multi-view Discriminative Adversarial Network (DANdroid) architecture for Android malware detection
Figure 8: DANdroid zero-day detection performance vs. the state-of-the-art
Figure 9: Multi-view Deep Learning Network Architecture for Android malware detection
Figure 10: Similarity of activation measurements for CNN vs. LIME on the Ssmsp Android malware family
Citations
Deep Android Malware Detection
McLaughlin, N., Martinez del Rincon, J., Kang, B., Yerima, S., Miller, P., Sezer, S., Safaeisemnani, Y., Trickel, E., Zhao, Z., Doupé, A. & Joon Ahn, G., 22 Mar 2017, Proceedings of the ACM Conference on Data and Applications Security and Privacy (CODASPY) 2017.
DANdroid: A Multi-View Discriminative Adversarial Network for Obfuscated Android Malware Detection
Millar, S., McLaughlin, N., Martinez del Rincon, J, Miller, P., Zhao, Z., 22 Mar 2020, Proceedings of the ACM Conference on Data and Applications Security and Privacy (CODASPY) 2020.
Multi-view deep learning for zero-day Android malware detection
Millar, S., McLaughlin, N., Martinez del Rincon, J, Miller, P., May 2021, Elsevier Journal of Information Security and Applications, Vol. 58, 2021.
Towards Explainable CNNs for Android Malware Detection
Kinkead, M., Millar, S., McLaughlin, N., O’Kane, P., Mar 2021, The Third International Symposium on Machine Learning and Big Data Analytics For
Cybersecurity and Privacy (MLBDACP21), 2021.