Fall2009/Intellectual Monopolies Reverse Engineering Exercise
From OpenSourceSoftwarePractice
Contents |
The Dark Side
Tools
Download
- http://www.ollydbg.de/
- With the great irony:
- http://www.ollydbg.de/download.htm
- Read the terms of restrictions for "reverse engineering" this tool itself :-)
Tutorial
The Bright Side
Launch Instances
Creating New Users
mkuser
Login in your Instance
ssh user@hostname
Get and Build Python
Download Python Source Code
http://www.python.org/download/
http://www.python.org/download/releases/2.6.2/
http://www.python.org/ftp/python/2.6.2/Python-2.6.2.tar.bz2
Decompress the Source Code
bunzip2 Python-2.6.2.tar.bz2 tar -xf Python-2.6.2.tar
Configure
cd Python-2.6.2
./configure
Compile with Debug
make OPT=-g
Test the Build
./python
type
for i in xrange(10):
print i
quit the interpreter
CTRL-D
Run from the Debugger
gdb ./python
list
break 23
Should see:
Breakpoint 1 at 0x8059f85: file ./Modules/python.c, line 23.
type
run
list
step
list
next 10
type
break parser.c:224
cont
list
Tinkering
Overview
An interpreter process your lines in several stages.
- Tokenization
- Building an expression tree
- Evaluation the expression tree
Grep is your Friend
grep regularexpression filestosearch
example
grep token *.c
useful options
- -r Search recursively down directories
- -i Ignore case sensitivity
- -v Returns lines where the expression was not found
- -n Print out lines numbers where expression was found
- -l Only list filenames where the expression was found
Common use
grep -i -r token *
ack is a better friend
To do all the nice things above in a simple way:
ack <search term>
or in Ubuntu:
ack-grep <search term>
(the package can be installed in Ubuntu with)
sudo apt-get install ack-grep
Look at the Tokenizer Code
cd Parser
- edit parser.c
- go to line 21
Replace
#define D(x)
with
#define D(x) x
Save
cd .. make OPT=-g
Run python again
./python
type
for i in xrange(10):
print i
Look at the output. Find from what lines in range.c is this output generated.
Bring line parser.c:21 to its original state.
Now manually search for the occurrences of the macro "D()". For example, there is one in line 226.
D(printf("Token %s/'%s' ... ", _PyParser_TokenNames[type], str));
Remove the "D(" at the beginning and the ")" at the end to get:
printf("Token %s/'%s' ... ", _PyParser_TokenNames[type], str);
Recompile, and launch the modified python interpreter
Then type inside the python type
a = 2
you will see:
>>> a = 2 Token NAME/'a' ... It's a token we know Token EQUAL/'=' ... It's a token we know Token NUMBER/'2' ... It's a token we know Token NEWLINE/ ... It's a token we know
type now:
b = 3
you will see:
>>> b = 3 Token NAME/'b' ... It's a token we know Token EQUAL/'=' ... It's a token we know Token NUMBER/'3' ... It's a token we know Token NEWLINE/ ... It's a token we know
and type now
c = ( a + b ) * 4
you will see:
>>> c = ( a + b ) * 4
Token NAME/'c' ... It's a token we know
Token EQUAL/'=' ... It's a token we know
Token LPAR/'(' ... It's a token we know
Token NAME/'a' ... It's a token we know
Token PLUS/'+' ... It's a token we know
Token NAME/'b' ... It's a token we know
Token RPAR/')' ... It's a token we know
Token STAR/'*' ... It's a token we know
Token NUMBER/'4' ... It's a token we know
Token NEWLINE/ ... It's a token we know
Go Wild
Find in the code, the location where the expression tree is build, and print it.
Parser
http://docs.python.org/library/parser.html
Using the (Abstract Syntax Trees) AST module:
http://docs.python.org/library/ast.html
Type in the interpreter:
import ast
nd = ast.parse("a=2")
ast.dump(nd)
this will print
"Module(body=[Assign(targets=[Name(id='a', ctx=Store())], value=Num(n=2))])"
Type
ast.dump( ast.parse("(aa+bb)*cc") )
will print
"Module(body=[Expr(value=BinOp(left=BinOp(left=Name(id='aa', ctx=Load()), op=Add(), right=Name(id='bb', ctx=Load())), op=Mult(), right=Name(id='cc', ctx=Load())))])"
Python's Grammar
Look at the mysteriously named file:
Grammar/Grammar
