Packing algorithms are broadly used to avoid anti-malware systems, and the proportion of packed malware has been growing rapidly.
However, just a few studies have been conducted on detection various types of packing algorithms in a systemic way.
Few studies on detecting packing algorithms have been conducted during last two decades.
In this thesis, we propose a method to classify single-layer packing, re-packing, or multi-layer packing algorithms of given packed executables.
First, we scale the entropy values of a single-layer packed, re-packed, or multi-layer packed executable and convert the entropy values of a particular location of memory into symbolic representations.
Our proposed method uses symbolic aggregate approximation (SAX), which is known to be effective for large data conversions.
Second, we classify the distribution of symbols using supervised learning classification methods, i.e., naive Bayes and support vector machines for detecting packing algorithms.
The results of our experiments involving a collection of 324 single-layer packed benign programs and 326 single-layer packed malware programs with 19 packing algorithms demonstrate that our method can identify single-layer packing algorithms of given executables with a high accuracy of 95.35 %, a recall of 95.83%, and a precision of 94.13%.
We propose four similarity measurements for detecting packing algorithms based on SAX representations of the entropy values and an incremental aggregate analysis.
Among these four metrics, the fidelity similarity measurement demonstrates the best matching result, i.e., a rate of accuracy ranging from 95.0 to 99.9 %, which is from 2 to 13 higher than that of the other three metrics.
Based on experiments of 2196 programs and 19 packing algorithms, we identify that precision (97.7 %), accuracy (97.5%), and recall ( 96.8%) of our method are respectively high to confirm that entropy analysis is applicable in identifying re-packing and multi-layer packing algorithms.
Our study confirms that packing algorithms can be identified through an entropy analysis based on a measure of the uncertainty of the running processes and without prior knowledge of the executables.