Source code is frequently distributed insecurely. Java and.NET programs preserve almost all of the original source code’s information. They are significantly easier to reverse engineer than traditional apps that are provided as native code because of this. It is critical to safeguard an application from reverse engineering. We’ll look at code obfuscation in this article.
Source code is frequently distributed insecurely. Java and.NET programs supplied as byte code and MSIL (Microsoft Intermediate Language) preserve almost all of the information from the original source code. They are significantly easier to reverse engineer than traditional apps that are provided as native code because of this. Malicious users can employ reverse engineering to tamper with software and circumvent license constraints, while competitors can use it to extract proprietary algorithms and data structures. It is critical to safeguard an application from reverse engineering. We’ll look at code obfuscation in this article.What is obfuscation of code?
Obfuscation literally means “to make something less apparent and difficult to grasp.”
Obfuscation is a technique used in computers to convert code into a version that is functionally equivalent to the original but much more difficult to interpret and reverse engineer utilizing tools. We’re not expecting that obfuscation will render the code hard to decipher. The goal is to make reverse engineering the code so expensive that it becomes infeasible. The time required to obfuscate and the time required to deobfuscate should be significantly different.
Methods of code obfuscation
Methods of obfuscation are classed according to the information they aim to obfuscate. Some basic transformations focus on the program’s lexical structure, while others focus on the data structures or control flow. Obfuscation techniques are further divided into categories based on the type of action they execute on the targeted data. Some approaches influence the ordering, while others affect the aggregation of control or data.
The following are the various obfuscation techniques:
- Obfuscation of the application’s layout, such as source code formatting, variable names, and comments.
- Obfuscation of data structures: Obfuscates the data structures used by the software.
- Obfuscation of data storage: Changes the way data is stored in memory. Converting local variables to global variables is an example.
- Obfuscation of encoding: Changes the way data is interpreted. Replacing a variable I with a derived value c1*i +c2 is an example.
- Obfuscation of aggregation: Changes the way data is gathered together. Splitting one array into numerous sub-arrays is an example.
- Obfuscation of data ordering: Changes the order of data. Rearranging the elements of an array by storing the ith element in a new location specified by a function f is an example (i).
- Control obfuscation: Obfuscates the program’s control flow.
- Obfuscation of aggregation: Changes the way statements are gathered together. Inlining, for example, is the process of replacing a function call with the function’s body.
- Changing the order in which statements are executed is known as ordering obfuscation. Reversing a loop so that it iterates backwards is an example.
- Obfuscates the control flow of a program by injecting object-level code that has no source code equivalent, or by inserting additional redundant code or code that will never be executed (dead code).
- Preventive transformation: The primary purpose of this strategy is to make it more difficult for deobfuscators to break the code.
- Targeted: Attempts to complicate automatic deobfuscation processes.
- Inherent: Attempts to take advantage of known flaws in deobfuscators.
Parameters for evaluating an obfuscation method’s quality
We should be able to evaluate the quality of the transformation in order to examine obfuscation approaches in depth. The potency, resistance, stealth, and expense of an obfuscation method are all factors that go into determining its quality.
- Potency refers to the degree to which the converted code is more cryptic than the original. Software complexity metrics specify numerous software complexity measurements, such as the number of predicates it contains, the depth of its inheritance tree, nesting levels, and so on. While effective software design aims to minimize complexity based on these characteristics, obfuscation aims to increase it.
- Resilience refers to the ability of the modified code to withstand automated deobfuscation attempts. The time and space required by the deobfuscator are a combination of the programmer’s effort and the time and space required by the deobfuscator. A one-way transformation that cannot be undone by a deobfuscator provides the maximum level of resilience. When obfuscation removes information such as source code formatting, this is an example.
The distinction between potency and resilience is that a transformation is potent if it can confuse a human reader, whereas a transformation is resilient if it can’t be undone by a deobfuscator tool.
Take, for example, the following transformation:
|→||if (1==2) S1;|
if (1>2) S2;
This transformation is powerful because it increases complexity, but it is also vulnerable since a deobfuscator can quickly undo it.
- Stealth: The obfuscated code’s stealth determines how well it fits in with the rest of the program. If the transformation introduces code that stands out from the rest of the program, a deobfuscator may have trouble detecting it, but a reverse engineer will have no trouble detecting it. Stealth varies depending on the program; what is stealthy in one program may not be in another.
- Cost: The obfuscated code’s execution time and space overhead as compared to the original code. A transformation that has no expense attached to it is considered free. Cost varies depending on the situation. A statement i=10 added at the outermost nesting level, for example, will cost substantially less than one inserted inside an inner loop.
Now that we’ve looked at the parameters for evaluating a transformation, let’s define and investigate one technique in further depth.The term “layout obfuscation” refers to changing the source file’s formatting. This entails eliminating source code comments, debug information, and renaming parts like the class, member variables, and the local variable.
Because there is no increase in space or time from the original application, source code comment removal and formatting removal are free transformations. The efficacy is poor due to the lack of semantic substance in the formatting. It’s a one-way transformation since the formatting can’t be retrieved once it’s gone. Variable name scrambling is similarly a one-way and free transformation, but it is far more effective than formatting removal. Layout obfuscation is used by Crema, one of the oldest Java obfuscators.
We’ll explore deeper into obfuscation throughout the next few issues of Palisade now that we’ve covered the basics. We’ll go over the different types of code obfuscation in depth.