The Python Oracle

Are executables produced with Cython really free of the source code?

This video explains
Are executables produced with Cython really free of the source code?

--

Become part of the top 3% of the developers by applying to Toptal
https://topt.al/25cXVn

--

Track title: CC O Beethoven - Piano Sonata No 3 in C

--

Chapters
00:00 Question
02:26 Accepted answer (Score 23)
05:24 Thank you

--

Full question
https://stackoverflow.com/questions/6238...

Question links:
[Making an executable in Cython]: https://stackoverflow.com/questions/2250...
[How to obfuscate Python code effectively?]: https://stackoverflow.com/questions/3344...
[Protecting Python Sources With Cython]: https://medium.com/@xpl/protecting-pytho...
[How to obfuscate Python code effectively?]: https://stackoverflow.com/questions/3344...
[uncompyle6]: https://pypi.org/project/uncompyle6/
[Is it possible to decompile a .dll/.pyd file to extract Python Source Code?]: https://stackoverflow.com/questions/3560...

Accepted answer links:
[__Pyx_AddTraceback]: https://github.com/cython/cython/blob/25...
[a huge loop in ]: https://github.com/python/cpython/blob/3...
[being faster]: https://stackoverflow.com/a/46723823/576...
[MSVC with ]: https://learn.microsoft.com/en-us/cpp/bu...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #cython

#avk47



ACCEPTED ANSWER

Score 28


The code is found in the original pyx-file next to your exe. Delete/don't distribute this pyx-file with your exe.


When you look at the generated C-code, you will see why the error message is shown by your executable:

For a raised error, Cython will emit a code similar to the following:

__PYX_ERR(0, 11, __pyx_L3_error) 

where __PYX_ERR is a macro defined as:

#define __PYX_ERR(f_index, lineno, Ln_error) \
{ \
  __pyx_filename = __pyx_f[f_index]; __pyx_lineno = lineno; __pyx_clineno = __LINE__; goto Ln_error; \
}

and the variable __pyx_f is defined as

static const char *__pyx_f[] = {
  "test.pyx",
  "stringsource",
};

Basically __pyx_f[0] tells where the original code could be found. Now, when an exception is raised, the (embedded) Python interpreter looks for your original pyx-file and finds the corresponding code (this can be looked up in __Pyx_AddTraceback which is called when an error is raised).

Once this pyx-file is not around, the original source code will no longer be known to the Python interpreter/anybody else. However, the error trace will still show the names of the functions and line-numbers but no longer any code snippets.

The resulting executable (or extension if one creates one) doesn't content any bytecode (as in pyc-files) and cannot be decompiled with tools like uncompyle: bytecode is produced when py-file is translated into Python-opcodes which are then evaluated in a huge loop in ceval.c. Yet for builtin/cython modules no bytecode is needed because the resulting code uses directly Python's C-API, cutting out the need to have/evaluate the opcodes - these modules skip interpretation, which a reason for them being faster. Thus no bytecode will be in the executable.

One important note though: One should check that the linker doesn't include debug information (and thus the C-code where the pyx-file content can be found as comments). MSVC with /Z7 options is such an example.


However, the resulting executable can be disassembled to assembler and then the generated C-code can be reverse engineered - so while cythonizing is Ok to make it hard to understand the code, it is not the right tool to conceal keys or security algorithms.