AppSec: Myths about Obfuscation and Reversing Python

Python is an easy and powerful programming language that allows us to write sophisticated programs: Dropbox and BitTorrent are excellent examples. It is common that Python programs are delivered in source code, but in some cases different techniques like obfuscation and compilation are applied to protect the code from curious eyes. But do these techniques really work?

In this article we will see some tools that supposedly help us to protect our code and how easily they are subverted.

We have two example programs written in Python: the first one is a simple function that asks for a password and shows a message; the second one is the same but this time we have used a class.


def main():

        a = "toomanysecrets"

        res = raw_input("Please enter your password: ")
        
        if res == a:
                print "ACCESS GRANTED"
        else:
                print "ACCESS DENIED"

if __name__ == "__main__":
	main()

secretapp1.py


class DoMain:

        def __init__(self):
                self.a = "toomanysecrets"

        def Ask(self):
                res = raw_input("Please enter your password: ")

                if res == self.a:
                        print "ACCESS GRANTED"
                else:
                        print "ACCESS DENIED"

if __name__ == "__main__":
	dm = DoMain()
	dm.Ask()

secretapp2.py

Suppose I don’t want to deliver these programs code, then I have several options. Our first option is to obfuscate the code, thus making it difficult to read.

Pyobfuscate

This program allows you to obfuscate the code but it is still completely valid for the Python interpreter. Here is an example with SecretApp1 and SecretApp2.

simonroses_python_cap1
Fig. 1 – Obfuscated secretapp1

simonroses_python_cap2
Fig. 2 – Obfuscated secretapp2

At a glance our code makes no sense, but if you look closely at the result we see the text strings in the code and we can recognize Python syntax. It is not too difficult to reconstruct the original code from the obfuscated code.

Despite its limitations, I invite you to visit the tool website to check its possibilities.

Htibctobf

This tool was originally written to solve a challenge in a hacking competition at the Hack in the Box conference. I recommend reading this great article to learn more about it.

Unlike the previous tool, Htibctobf obfuscates Python code by modifying the AST (Abstract Syntax Trees). When you run this tool, we can see our obfuscated Python code in Fig. 3 and Fig. 4.

simonroses_python_cap3
Fig. 3 – Obfucated secretapp1

simonroses_python_cap4
Fig. 4 – Obfuscated secretapp2

We can see the obfuscated code, including text strings, despite that it is not too difficult to reconstruct the original code as well.

Without a doubt an interesting concept with many possibilities, nevertheless it requires improvements to be useful.

In some cases perhaps it is enough to obfuscate the code, but let’s look for other options to protect our code more effectively, therefore we will have to resort to compile our Python code to create an executable.

Py2exe

Possibly one of the most popular choices to turn Python code into a Windows executable. Py2exe

First we have to create a file called Setup that includes a reference to the program we want to build/compile. See setup script.


from distutils.core import setup
import py2exe

setup(console=['secretapp1.py'])

setup.py

We are now ready to compile our Python code into a Windows executable, so let’s run py2exe. See Fig. 5.

simonroses_python_cap5
Fig. 5 – Build secretapp1.exe

Once the building process is completed, py2exe creates a directory called “dist” which includes our executable and some necessary libraries. In Fig. 6 we can see that py2exe completes successfully and we execute our program in exe format.

simonroses_python_cap6
Fig. 6 – Build completed!

We could now distribute this binary without fear to give out our code or maybe not?

Py2exe_extract

This tool allows us to extract Python object file within the executable created using py2exe, basically inverting the process. Py2exe_extract

In Fig. 7 we can see how we use py2exe_extract to get the object file secretapp1.pyc (the content of this file is platform-independent and is known as Bytecode) from secretapp1.exe.

simonroses_python_cap7
Fig. 7 – Exctracting object file

Now let’s explore ways to get the code from this object file.

Unwind

Unwind is a disassembler for Python Bytecode that can be used to analyze object files “.pyc”. For this example, I’ve written a simple script in Python, mytest.py, that imports the disassembler and analyzes the pyc file. See code below.


import unwind

print(unwind.disassemble('secretapp1.pyc'))

mytest.py

With this script you can run the following command and get a disassembly of the object file. See Fig. 8.

simonroses_python_cap8
Fig. 8 – Python Bytecode

For low level lovers this will be your favorite choice 😉

Uncompyle2

Another option is to use a decompiler like uncompyle2 to get the code directly from the object file “.pyc” without having to go through the disassembly as we previously saw.

This tool is powerful and easy to use as you can see in Fig. 9 using a simple command we get the source code for secretapp1.pyc.

simonroses_python_cap9
Fig. 9 – Secretapp1 code from object file

Wow, we got source code!

Throughout the article, we have seen some obfuscation and compilation techniques to protect Python code, but we have also been able to subvert the entire protection quite easily 🙂

The following are other Python compilers that can be used in Windows, Linux, or MacOS, but they suffer from the same problems described in this article.

We could also analyze and subvert binaries using tools such as IDA PRO or Immunity Debugger but I will leave it for a future post. Another interesting tool that I have not mentioned is pyREtic, which is an extensible framework for in-memory Python Bytecode reverse engineering.

For an attacker to get the Python code is a matter of time, however to make things really difficult from a defensive point of view we have to combine different protection techniques.

Do you protect your Python programs? Which methods do you use?

— Simon Roses Femerling

This entry was posted in Pentest, Privacy, Security, Technology and tagged , , , , , . Bookmark the permalink.

15 Responses to AppSec: Myths about Obfuscation and Reversing Python

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.