Python is an easy and powerful programming language that allows us to write sophisticated programs: Dropbox and BitTorrent are excellent examples. It is common that Python programs are delivered in source code, but in some cases different techniques like obfuscation and compilation are applied to protect the code from curious eyes. But do these techniques really work?
In this article we will see some tools that supposedly help us to protect our code and how easily they are subverted.
We have two example programs written in Python: the first one is a simple function that asks for a password and shows a message; the second one is the same but this time we have used a class.
def main(): a = "toomanysecrets" res = raw_input("Please enter your password: ") if res == a: print "ACCESS GRANTED" else: print "ACCESS DENIED" if __name__ == "__main__": main()
class DoMain: def __init__(self): self.a = "toomanysecrets" def Ask(self): res = raw_input("Please enter your password: ") if res == self.a: print "ACCESS GRANTED" else: print "ACCESS DENIED" if __name__ == "__main__": dm = DoMain() dm.Ask()
Suppose I don’t want to deliver these programs code, then I have several options. Our first option is to obfuscate the code, thus making it difficult to read.
This program allows you to obfuscate the code but it is still completely valid for the Python interpreter. Here is an example with SecretApp1 and SecretApp2.
At a glance our code makes no sense, but if you look closely at the result we see the text strings in the code and we can recognize Python syntax. It is not too difficult to reconstruct the original code from the obfuscated code.
Despite its limitations, I invite you to visit the tool website to check its possibilities.
This tool was originally written to solve a challenge in a hacking competition at the Hack in the Box conference. I recommend reading this great article to learn more about it.
Unlike the previous tool, Htibctobf obfuscates Python code by modifying the AST (Abstract Syntax Trees). When you run this tool, we can see our obfuscated Python code in Fig. 3 and Fig. 4.
We can see the obfuscated code, including text strings, despite that it is not too difficult to reconstruct the original code as well.
Without a doubt an interesting concept with many possibilities, nevertheless it requires improvements to be useful.
In some cases perhaps it is enough to obfuscate the code, but let’s look for other options to protect our code more effectively, therefore we will have to resort to compile our Python code to create an executable.
Possibly one of the most popular choices to turn Python code into a Windows executable. Py2exe
First we have to create a file called Setup that includes a reference to the program we want to build/compile. See setup script.
from distutils.core import setup import py2exe setup(console=['secretapp1.py'])
We are now ready to compile our Python code into a Windows executable, so let’s run py2exe. See Fig. 5.
Once the building process is completed, py2exe creates a directory called “dist” which includes our executable and some necessary libraries. In Fig. 6 we can see that py2exe completes successfully and we execute our program in exe format.
We could now distribute this binary without fear to give out our code or maybe not?
This tool allows us to extract Python object file within the executable created using py2exe, basically inverting the process. Py2exe_extract
In Fig. 7 we can see how we use py2exe_extract to get the object file secretapp1.pyc (the content of this file is platform-independent and is known as Bytecode) from secretapp1.exe.
Now let’s explore ways to get the code from this object file.
Unwind is a disassembler for Python Bytecode that can be used to analyze object files “.pyc”. For this example, I’ve written a simple script in Python, mytest.py, that imports the disassembler and analyzes the pyc file. See code below.
import unwind print(unwind.disassemble('secretapp1.pyc'))
With this script you can run the following command and get a disassembly of the object file. See Fig. 8.
For low level lovers this will be your favorite choice
Another option is to use a decompiler like uncompyle2 to get the code directly from the object file “.pyc” without having to go through the disassembly as we previously saw.
This tool is powerful and easy to use as you can see in Fig. 9 using a simple command we get the source code for secretapp1.pyc.
Wow, we got source code!
Throughout the article, we have seen some obfuscation and compilation techniques to protect Python code, but we have also been able to subvert the entire protection quite easily
The following are other Python compilers that can be used in Windows, Linux, or MacOS, but they suffer from the same problems described in this article.
We could also analyze and subvert binaries using tools such as IDA PRO or Immunity Debugger but I will leave it for a future post. Another interesting tool that I have not mentioned is pyREtic, which is an extensible framework for in-memory Python Bytecode reverse engineering.
For an attacker to get the Python code is a matter of time, however to make things really difficult from a defensive point of view we have to combine different protection techniques.
Do you protect your Python programs? Which methods do you use?
– Simon Roses Femerling