Fall2009/Intellectual Monopolies Reverse Engineering Exercise

From OpenSourceSoftwarePractice

Jump to: navigation, search

Contents

The Dark Side

Tools

Download

Tutorial

The Bright Side

Launch Instances

Creating New Users

 mkuser

Login in your Instance

 ssh user@hostname

Get and Build Python

Download Python Source Code

 http://www.python.org/download/
 http://www.python.org/download/releases/2.6.2/
 http://www.python.org/ftp/python/2.6.2/Python-2.6.2.tar.bz2

Decompress the Source Code

 bunzip2 Python-2.6.2.tar.bz2
 tar -xf Python-2.6.2.tar

Configure

 cd Python-2.6.2 
 ./configure

Compile with Debug

  make OPT=-g

Test the Build

  ./python

type

   for i in xrange(10):
       print i


quit the interpreter

   CTRL-D

Run from the Debugger

 gdb ./python
 list
 break 23


Should see:

 Breakpoint 1 at 0x8059f85: file ./Modules/python.c, line 23.


type

 run
 list
 step
 list
 next 10


type

 break parser.c:224
 cont
 list

Tinkering

Overview

An interpreter process your lines in several stages.

  • Tokenization
  • Building an expression tree
  • Evaluation the expression tree

Grep is your Friend

 grep  regularexpression filestosearch

example

   grep  token  *.c

useful options

  • -r Search recursively down directories
  • -i Ignore case sensitivity
  • -v Returns lines where the expression was not found
  • -n Print out lines numbers where expression was found
  • -l Only list filenames where the expression was found


Common use

   grep  -i -r  token  *


ack is a better friend

To do all the nice things above in a simple way:

  ack <search term>

or in Ubuntu:

  ack-grep  <search term>

(the package can be installed in Ubuntu with)

  sudo    apt-get    install   ack-grep

Look at the Tokenizer Code

   cd Parser
  • edit parser.c
  • go to line 21


Replace


  #define D(x) 

with


  #define D(x)  x

Save

  cd ..
  make OPT=-g

Run python again

  ./python

type

   for i in xrange(10):
       print i

Look at the output. Find from what lines in range.c is this output generated.


Bring line parser.c:21 to its original state.

Now manually search for the occurrences of the macro "D()". For example, there is one in line 226.

  D(printf("Token %s/'%s' ... ", _PyParser_TokenNames[type], str));

Remove the "D(" at the beginning and the ")" at the end to get:

  printf("Token %s/'%s' ... ", _PyParser_TokenNames[type], str);


Recompile, and launch the modified python interpreter

Then type inside the python type

   a = 2

you will see:

  >>> a = 2
  Token NAME/'a' ... It's a token we know
  Token EQUAL/'=' ... It's a token we know
  Token NUMBER/'2' ... It's a token we know
  Token NEWLINE/ ... It's a token we know

type now:

  b = 3

you will see:

  >>> b = 3
  Token NAME/'b' ... It's a token we know
  Token EQUAL/'=' ... It's a token we know
  Token NUMBER/'3' ... It's a token we know
  Token NEWLINE/ ... It's a token we know   

and type now

  c = ( a + b ) * 4

you will see:

  >>> c = ( a + b ) * 4
  Token NAME/'c' ... It's a token we know
  Token EQUAL/'=' ... It's a token we know
  Token LPAR/'(' ... It's a token we know
  Token NAME/'a' ... It's a token we know
  Token PLUS/'+' ... It's a token we know
  Token NAME/'b' ... It's a token we know
  Token RPAR/')' ... It's a token we know
  Token STAR/'*' ... It's a token we know
  Token NUMBER/'4' ... It's a token we know
  Token NEWLINE/ ... It's a token we know

Go Wild

Find in the code, the location where the expression tree is build, and print it.

Parser

http://docs.python.org/library/parser.html

Using the (Abstract Syntax Trees) AST module:

http://docs.python.org/library/ast.html

Type in the interpreter:

  import ast
  nd = ast.parse("a=2")
  ast.dump(nd)

this will print

  "Module(body=[Assign(targets=[Name(id='a', ctx=Store())], value=Num(n=2))])"

Type

  ast.dump( ast.parse("(aa+bb)*cc") )

will print

  "Module(body=[Expr(value=BinOp(left=BinOp(left=Name(id='aa', ctx=Load()), op=Add(), right=Name(id='bb', ctx=Load())), op=Mult(), right=Name(id='cc', ctx=Load())))])"

Python's Grammar

Look at the mysteriously named file:

     Grammar/Grammar
Personal tools