Python Malware On The Rise

The vast majority of serious malware over the past 30 years has been written in Assembly or compiled languages such as C, C++, and Delphi. However, ever-increasing over the past decade, a large amount of malware has been written in interpreted languages, such as Python. The low barrier to entry, ease of use, rapid development process, and massive library collection has made Python attractive for millions of developers- including malware authors. Python has quickly become a standard language in which threat actors create Remote Access Trojans (RATs), information stealers, and vulnerability exploit tools. As Python continues to grow radically in popularity and the C malware monoculture continues to be challenged, it would seem only certain that Python will be increasingly utilized as malware in cyber attacks.

Image for post
Image Source: Stack Overflow

New call-to-action

THE TIMES THEY ARE A-CHANGIN’

Malware written in Python will also have adverse effects on file size, memory footprint, and processing power. Serious malware is often designed to be small, stealthy, have low memory footprint, and use limited processing power. A compiled malware sample written in C might be 200 KB, while a comparable malware sample written in Python might be 20 MB after converted into an executable. Both the CPU & RAM usage will also be significantly higher when using an interpreted language.

However, it’s 2020 and the digital landscape isn’t what it once was. The internet is faster than it’s ever been, our computers have more memory & storage capacity than ever, and CPUs get faster every year. Python is also more ubiquitous than ever, coming pre-installed on macOS and most all Linux distributions by default.

NO INTERPRETER? NO PROBLEM!

PyInstaller

Image for post

PyInstaller is capable of building Python applications into stand-alone executables for Windows, Linux, macOS and more by “freezing” Python code. It is one of the most popular methods to convert Python code into executable format and has been used widely for both legitimate and malicious purposes.

Let’s create a simple “Hello, world!” program in Python and freeze it into a stand-alone executable using PyInstaller:

$ cat hello.py
print('Hello, world!')

$ pyinstaller --onefile hello.py
...

$ ./dist/hello 
Hello, world!

$ file dist/hello 
dist/hello: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter
/lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=294d1f19a085a730da19a6c55788ec0
8c2187039, stripped

$ du -sh dist/hello 
7.0M	dist/hello

This process created a portable, stand-alone Linux ELF (Executable and Linkable Format) which is the equivalent to an EXE on Windows. Now let’s create and compile a “Hello, world!” program in C on Linux for comparison:

$ cat hello.c
#include 
int main() {
    printf("Hello, world!");
}

$ gcc hello.c -o hello

$ ./hello 
Hello, world!

$ file hello
hello: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter
/lib64/ld-linux-x86-64.so.2, BuildID[sha1]=480c7c75e09c169ab25d1b81bd28f66fde08da7c, for GNU/Li
nux 3.2.0, not stripped

$ du -sh hello
20K	hello

Notice how much larger the file size is: 7 MB (Python) vs 20 KB (C)! This demonstrates the major drawback we discussed previously about file size and memory usage. The Python executable is so much larger due to the fact it must bundle the Python interpreter (as a shared object file on Linux) inside the executable itself in order to run.

py2exe

Py2exe utilizes distutils and requires a small setup.py script to be created to produce an executable. Let’s create an example “Hello, world!” executable using py2exe:

> type hello.py
print('Hello, world!')

> type setup.py
import py2exe
from distutils.core import setup
setup(
    console=['hello.py'],
    options={'py2exe': {'bundle_files': 1, 'compressed': True}},
    zipfile=None
)

> python setup.py py2exe
...

> dist\hello.exe
Hello, world!

The hello.exe created by py2exe is similar in size to PyInstaller coming in at 6.83 MB.

Image for post

Nuitka

Image for post

Nuitka is perhaps the most underutilized, and yet more advanced method of compiling Python code to an executable. It translates Python code into a C program that then is linked against libpython to execute code the same as CPython. Nuitka can use a variety of C compilers including gcc, clang, MinGW64, Visual Studio 2019+, and clang-cl to convert your Python code to C.

Let’s create a “Hello, world!” Python program on Linux and compile it using Nuitka:

$ cat hello.py
print('Hello, world!')

$ nuitka3 hello.py
...

$ ./hello.bin
Hello, world!

$ file hello.bin 
hello.bin: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter
/lib64/ld-linux-x86-64.so.2, BuildID[sha1]=eb6a504e8922f8983b23ce6e82c45a907c6ebadf, for GNU/Linux 
3.2.0, stripped

$ du -sh hello.bin
432K	hello.bin

Nuitka produced a portable binary very simply, and at 432 KB is a fraction of the size of what PyInstaller or py2exe can produce! How is Nuitka able to do this? Let’s take a look at the build folder:

$ cloc hello.build/
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
C                               11           2263            709           8109
C/C++ Header                     1              1              0              7
-------------------------------------------------------------------------------
SUM:                            12           2264            709           8116
-------------------------------------------------------------------------------

Nuitka produced over 8,000 lines of C code from our 1 line Python program. The way Nuitka works is it actually translates the Python modules into C code and then uses libpython and static C files of its own to execute in the same way as CPython does.

This is very impressive, and it seems highly likely the Nuitka “Python compiler” will see further adoption as time goes on. As we’ll see later, Nuitka might have a further, built-in advantage in protection against Reverse Engineering (RE). There already exist several tools to easily analyze binaries produced by PyInstaller and py2exe to recover Python source code. However, by Nuitka translating the Python code to C it is much more difficult to reverse engineer.

YO DAWG, I HEARD YOU LIKE TOOLS…

Image for post

Python malware can take advantage of a massive ecosystem of open-source Python packages and repositories. Almost anything you could think of, someone has already built it using Python. This is a huge advantage to malware authors as simplistic capabilities can be cherry-picked from the open web and more complex capabilities likely don’t need to be written from scratch.

Let’s take a look at three simple, yet powerful tool examples:

  1. Code Obfuscation
  2. Taking Screenshots
  3. Performing Web Requests

Tool Example 1 — Obfuscation

Here’s a small example of how pyarmor can obfuscate Python code:

$ cat hello.py 
print('Hello, world!')

$ pyarmor obfuscate hello.py
...

$ cat dist/hello.py
from pytransform import pyarmor_runtime
pyarmor_runtime()
__pyarmor__(__name__, __file__, b'\x50\x59\x41\x52\x4d\x4f\x52\x00\x00\x03\x08\x00\x55\x0d\x0d\
x0a\x04\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x40\x00\x00\x00\xd5\x00\x00\x00\x00\x00\x00
\x18\xf4\x63\x79\xf6\xaa\xd7\xbd\xc8\x85\x25\x4e\x4f\xa6\x80\x72\x9f\x00\x00\x00\x00\x00\x00\x0
0\x00\xec\x50\x8c\x64\x26\x42\xd6\x01\x10\x54\xca\x9c\xb6\x30\x82\x05\xb8\x63\x3f\xb0\x96\xb1\x
97\x0b\xc1\x49\xc9\x47\x86\x55\x61\x93\x75\xa2\xc2\x8c\xb7\x13\x87\xff\x31\x46\xa5\x29\x41\x9d\
xdf\x32\xed\x7a\xb9\xa0\xe1\x9a\x50\x4a\x65\x25\xdb\xbe\x1b\xb6\xcd\xd4\xe7\xc2\x97\x35\xd3\x3e
\xd3\xd0\x74\xb8\xd5\xab\x48\xd3\x05\x29\x5e\x31\xcf\x3f\xd3\x51\x78\x13\xbc\xb3\x3e\x63\x62\xc
a\x05\xfb\xac\xed\xfa\xc1\xe3\xb8\xa2\xaa\xfb\xaa\xbb\xb5\x92\x19\x73\xf0\x78\xe4\x9f\xb0\x1c\x
7a\x1c\x0c\x6a\xa7\x8b\x19\x38\x37\x7f\x16\xe8\x61\x41\x68\xef\x6a\x96\x3f\x68\x2b\xb7\xec\x60\
x39\x51\xa3\xfc\xbd\x65\xdb\xb8\xff\x39\xfe\xc0\x3d\x16\x51\x7f\xc9\x7f\x8b\xbd\x88\x80\x92\xfe
\xe1\x23\x61\xd0\xf1\xd3\xf8\xfa\xce\x86\x92\x6d\x4d\xd7\x69\x50\x8b\xf1\x09\x31\xcc\x19\x15\xe
f\x37\x12\xd4\xbd\x3d\x0d\x6e\xbb\x28\x3e\xac\xbb\xc4\xdb\x98\xb5\x85\xa6\x19\x11\x74\xe9\xab\x
df', 1)

$ python dist/hello.py
Hello, world!

Tool Example 2 — Screenshots

A screenshot can easily be taken with python-mss like this:

from mss import mss

with mss() as sct:
    sct.shot()

Tool Example 3 — Web Requests

The external IP address of a compromised endpoint can easily be obtained using requests like so:

import requests

external_ip = requests.get('http://whatismyip.akamai.com/').text

THE STRENGTH OF EVAL()

The eval() function is very powerful and can be used to execute strings of Python code from within the Python program itself. This single function is often seen as an advanced capability in compiled malware. It is the ability to run high-level scripts or “plugins” on-the-fly when utilized correctly. This is similar to when C malware includes a Lua scripting engine to give the malware the ability to execute Lua scripts. This has been seen in high-profile malware such as Flame.

Let’s imagine a hypothetical APT group is interacting remotely with some Python-based malware. If this group came into an unexpected situation where they needed to react quickly, being able to directly execute Python code on the end target would be highly beneficial. In addition, the Python malware could be placed on a target effectively “featureless” and capabilities could be executed on the target on an as-needed basis to remain stealthy.

INTO THE WILD

Image for post

Image Source: Lord of the Rings — Fellowship of the Ring

SeaDuke

Some fantastic analysis of SeaDuke was conducted by Palo Alto’s Unit 42. The decompiled Python source code Unit 42 uncovered can be found here. In addition, F-Secure published a great whitepaper on Duke malware that covers SeaDuke and associated malware.

The SeaDuke malware is a Python trojan that was made into a Windows executable using PyInstaller and packed with UPX. The Python source code was obfuscated to make the code more difficult for analysts to read. The malware had many capabilities including several methods to establish persistence on Windows, ability to run cross-platform, and perform web requests for command & control.

Image for post

PWOBot

The malware was very full featured and included the ability to log key strokes, establish persistence on Windows, download & execute files, execute Python code, create web requests, and mine cryptocurrency. Some great analysis of PWOBot was conducted by Palo Alto’s Unit 42.

PyLocky

Some great analysis of PyLocky was conducted by Trend Micro. Talos Intelligence analysts reversed engineered PyLocky and were able to create a file decryptor for victims to restore their encrypted files.

PoetRAT

The malware was dropped using malicious Microsoft Word documents. The RAT presented many capabilities for stealing information including file extraction over FTP, taking images with webcams, uploading additional tools, keylogging, browser enumeration, and credential theft. Talos Intelligence reported on this threat actor and produced a fantastic writeup on the unknown actor that used this malware.

This short script was used by the threat actor to capture web cam images:

Image for post

Image Source: Talos Intelligence

Open Source

PYTHON MALWARE ANALYSIS TOOLS

uncompyle6

For example, taking our “Hello, world!” script from earlier and executing it as a module I’m presented with a pyc file (byte code). We can recover the source code of our script by using uncompyle.

$ xxd hello.cpython-38.pyc 
00000000: 550d 0d0a 0000 0000 16f3 075f 1700 0000  U.........._....
00000010: e300 0000 0000 0000 0000 0000 0000 0000  ................
00000020: 0002 0000 0040 0000 0073 0c00 0000 6500  [email protected].
00000030: 6400 8301 0100 6401 5300 2902 7a0d 4865  d.....d.S.).z.He
00000040: 6c6c 6f2c 2077 6f72 6c64 214e 2901 da05  llo, world!N)...
00000050: 7072 696e 74a9 0072 0200 0000 7202 0000  print..r....r...
00000060: 00fa 2d2f 686f 6d65 2f75 7365 722f 746d  ..-/home/user/tm
00000070: 702f 7079 7468 6f6e 5f61 7274 6963 6c65  p/python_article
00000080: 2f6e 2f74 6573 742f 6865 6c6c 6f2e 7079  /n/test/hello.py
00000090: da08 3c6d 6f64 756c 653e 0100 0000 f300  ........
000000a0: 0000 00

$ uncompyle6 hello.cpython-38.pyc | grep -v '#'
print('Hello, world!')

pyinstxtractor.py (PyInstaller Extractor)

> python pyinstxtractor.py hello.exe
...

This will produce pyc files you can then use with the uncompyle6 decompiler to recover source code.

python-exe-unpacker

> python python_exe_unpack.py -i hello.exe
...

DETECTING PYTHON COMPILED EXECUTABLES

PyInstaller writes the string “pyi-windows-manifest-filename” near the end of the executable, you can see it here in a hex editor (HxD):

Image for post

Here’s a YARA rule for detecting PyInstaller compiled executables (Source):

import "pe"

rule PE_File_pyinstaller
{
    meta:
        author = "Didier Stevens (https://DidierStevens.com)"
        description = "Detect PE file produced by pyinstaller"
    strings:
        $a = "pyi-windows-manifest-filename"
    condition:
        pe.number_of_resources > 0 and $a
}

Here’s a second YARA rule for detecting py2exe compiled executables (Source):

import "pe"

rule py2exe
{
  meta:
        author = "Didier Stevens (https://www.nviso.be)"
        description = "Detect PE file produced by py2exe"
  condition:
        for any i in (0 .. pe.number_of_resources - 1):
          (pe.resources[i].type_string == "P\x00Y\x00T\x00H\x00O\x00N\x00S\x00C\x00R\x00I\x00P\x00T\x00")
}

CONCLUSION

New call-to-action

Join our newsletter

Follow Us

Discover More!