HTTP POST analyzer in Python

I was curious what kind of information my computer was sending to the outside world so I whipped up a simple HTTP POST analyzer/logger. https://github.com/DanMcInerney/postanalyzer

#!/usr/bin/python
import logging
logging.getLogger("scapy.runtime").setLevel(logging.ERROR)
from scapy.all import *
log = open('postanalyzer.log', 'a')
prev_ack = 0
prev_body = ''
interface = 'wlan0'
def cb(pkt):
    global prev_ack, prev_body
    post_found = 0
    if pkt.haslayer(Raw):
        load = repr(pkt[Raw].load)[1:-1]
        try:
            headers, body = load.split(r"\r\n\r\n", 1)
        except:
            headers = load
            body = ''
        ack = pkt[TCP].ack
        if prev_ack == ack:
            newBody = prev_body+headers
            print 'Fragment found; combined body:\n\n', newBody
            print '-----------------------------------------'
            prev_body = newBody
            log.write('Fragment found; combined body:\n\n'+newBody+'\n-----------------------------------------\n')
            return
        header_lines = headers.split(r"\r\n")
        for h in header_lines:
            if 'post /' in h.lower():
                post_found = h.split(' ')[1]
        if post_found:
            for h in header_lines:
                if 'host: ' in h.lower():
                    host = h.split(' ')[1]
                    print 'URL:',host+post_found
                elif 'referer: ' in h.lower():
                    print h
            prev_body = body
            prev_ack = ack
            if body != '':
                print '\n'+body
                print '-----------------------------------------'
            log.write(pkt.summary()+'\n')
            for h in header_lines:
                log.write(h+"\n")
            if body != '':
                log.write(body)
            log.write('\n-----------------------------------------\n')
sniff(iface=interface, filter='tcp port 80', prn=cb, store=0)

Breakdown

import logging
logging.getLogger("scapy.runtime").setLevel(logging.ERROR)
from scapy.all import *
log = open('postanalyzer.log', 'a')
prev_ack = 0
prev_body = ''
interface = 'wlan0'

Import logging before scapy so that it can actually prevent scapy from the annoying warnings it outputs every time it runs. Open the log file next, then set up a few global variables. We set log up with the (‘a’) arguement so that we’re appending to the file rather than overwriting it each time.
—————————————————–

def cb(pkt):
    global prev_ack, prev_body
    post_found = 0
    if pkt.haslayer(Raw):
        load = pkt[Raw].load

Define the callback function then set a condition for the packet. We will only continue processing the packet if it has a raw layer. If we didn’t specify “filter=’tcp and port 80′” in sniff(), this conditional would not be enough due to the fact that Ethernet also has a raw layer. TCP raw layer is almost always the HTTP data and since we are already only catching packets with a TCP layer we can safely ask if the packet has a raw layer and not get a bunch of nonTCP Ethernet packets. Last we set the load variable up. pkt[Raw].load is just the HTTP data.

So far I cannot tell the difference between pkt[Raw] and pkt[Raw].load. They seem to give me the same output but all other examples I see use pkt[Raw].load so I’m going to continue using that.
—————————————————–

try:
    headers, body = load.split(r"\r\n\r\n", 1)
except:
    headers = load
    body = ''

This separates the HTTP headers and body. Headers are separated by a \r\n while the body is separated by \r\n\r\n
—————————————————–

ack = pkt[TCP].ack
if prev_ack == ack:
    newBody = prev_body+headers
    print 'Fragment found; combined body:\n\n', newBody
    print '-----------------------------------------'
    prev_body = newBody
    log.write('Fragment found; combined body:\n\n'+newBody+'

Here we’re checking to see if the packet is a fragment. This is something I was stuck on for a while when I was writing LANs.py because I didn’t understand how TCP worked fundamentally. If a packet is carrying too much information then it gets split into several fragments. Fragmented packets have no HTTP headers, just data. This means you can’t just parse the HTTP headers or something to figure out if it’s a fragment so you must use the data in the TCP layer alone. TCP allows for consistent and reliable data transfer by ensuring packets are recieved in order and on time. It does this using a sequence and acknowledgement number.

Basically, and without taking a whole post to explain this, when a TCP connection is opened and the server sends you a packet of information that packet has a sequence and acknowledgement number that you can access with scapy via pkt[TCP].seq and pkt[TCP].ack respectively. When your computer receives that packet it sends the server a TCP packet back and adds the length of the data it just received to the server’s sequence number and saves that new number as response packet’s acknowledgement number. The response packet’s sequence number then become the acknowledgement number of the data packet the server sent. Hard to grasp from words so look at this graphic from packetlife.net:
tcp_flow
To sum this up quickly the ack only changes when the client or server recieves data and wishes for the other to know that it received data. In this case, of one machine sending another a bunch of packet fragments, the ack won’t change at all between those fragments. It will only change after the fragments are all sent and the receiver sends a packet back telling the sender it recieved it all and from there on the ack will never repeat itself. This means the only thing we need to tell if a packet is a fragment is the ack. If the ack is the same as the previous packet recieved or sent then the packet is a fragment of the first packet with that ack.

Once we determine it’s a fragment we add the body of the fragment to the previous packet’s body then change the stored value as the combination of the previous body + the new body. Eventually we will have a variable that contains all the fragmented raw loads in one pretty string.
—————————————————–

header_lines = headers.split(r"\r\n")
    for h in header_lines:
        if 'post /' in h.lower():
            post_found = h.split(' ')[1]

We already split the HTTP data into headers and body so now lets split the headers amongst themselves and after that check if any of them contain ‘post /’ which indicates it’s an HTTP POST packet. Once we’ve determined it’s a POST packet we copy the location that it’s POSTing to via the last line here. The entire header will look like this.

POST /wp-admin/post.php HTTP/1.1\r\n

So we split it into pieces with the delimiter of a space which gives us a list of 3 items: POST, the location, and the HTTP version. Take the second item with list[1]. This also populates the post_found variable from being 0 to being something which we can use as a test to see if the packet is a POST packet or not.
—————————————————–

if post_found:
    for h in header_lines:
        if 'host: ' in h.lower():
            host = h.split(' ')[1]
            print 'URL:',host+post_found
        elif 'referer: ' in h.lower():
            print h
    prev_body = body
    prev_ack = ack
    if body != '':
        print '\n'+body
        print '-----------------------------------------'

Now if we determine it’s a POST packet we run through the headers again. The host header will just be the domain name while the POST header is the one that specifies where on the domain the document is that we’re sending to. If the packet has a referer header then we include that in the output since that might be somewhat interesting and could flag the packet as coming from somewhere we didn’t want it to come from.

Once we parse the headers we reset the global variables prev_body and prev_ack so we can continually check for packet fragments and finally if the body of the raw load isn’t empty we print that to the screen.
—————————————————–

log.write(pkt.summary()+'\n')
for h in header_lines:
    log.write(h+"\n")
if body != '':
    log.write(body)
    log.write('\n-----------------------------------------\n')

Simply writing the full headers and body to the log file here and separating the packets with a line. We include more data in the log file than we do in the terminal output.
—————————————————–

sniff(iface=interface, filter='tcp port 80', prn=cb, store=0)

Sniff() is a function which captures all packets on a specific interface. In this case I set the interface variable to be wlan0, my wireless interface. Just edit that variable to be your own interface. We’re looking for HTTP POSTs which travel on the TCP layer. Scapy doesn’t have HTTP support built in (although you can look here) so the topmost layer we can narrow it down to is TCP port 80.
—————————————————–
Run script:

python postanalyzer.py

postanalyzer
—————————————————–
Modify the POST data on the fly

If you want to modify the data being sent, check out my previous post on feeding scapy packets using iptables rather than the sniff() function. Reliable DNS spoofing with Python: Scapy + Nfqueue

flattr this!

Tagged with: , ,
Posted in Python

Leave a Reply

Your email address will not be published. Required fields are marked *

*


− 1 = five

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>