XSScrapy: fast, thorough XSS vulnerability spider


Unsatisfied with the current crop of XSS-finding tools, I wrote one myself and am very pleased with the results. I have tested this script against other spidering tools like ZAP, Burp, XSSer, XSSsniper, and others and it has found more vulnerabilities in every case. This tool has scored me dozens of responsible disclosures in major websites including an Alexa Top 10 homepage, major financial institutes, and large security firms’ pages. Even the site of the Certified Ethical Hacker certificate progenitors fell victim although that shouldn’t impress you much if you actually know anything about EC-Council :). For the record they did not offer me a discounted CEH. Shame, but thankfully this script has rained rewards upon my head like Bush/Cheney on Halliburton; hundreds of dollars, loot, and Halls of Fame in just a few weeks of bug bounty hunting.

I think I’ve had my fill of fun with it so I’d like to publicly release it now. Technically I publicly released it the first day I started on it since it’s been on my github the whole time but judging by the Github traffic graph it’s not exactly the Bieber of security tools. Hopefully more people will find some use for it after this article which will outline it’s logic, usage, and shortcomings.

Basic usage

Install the prerequisite python libraries, give it a URL, and watch it spider the entire site looking in every nook and cranny for XSS vulnerabilities.

apt-get install python-pip
git clone https://github.com/DanMcInerney/xsscrapy
cd xsscrapy
pip install -r requirements.txt
scrapy crawl xsscrapy -a url="http://example.com"

To login then scrape:

scrapy crawl xsscrapy -a url="http://example.com/login" -a user=my_username -a pw=my_password

All vulnerabilities it finds will be places in formatted-vulns.txt. Example output when it finds a vulnerable user agent header:


XSS attack vectors xsscrapy will test

  • Referer header (way more common than I thought it would be!)
  • User-Agent header
  • Cookie header (added 8/24/14)
  • Forms, both hidden and explicit
  • URL variables
  • End of the URL, e.g. www.example.com/<script>alert(1)</script>
  • Open redirect XSS, e.g. looking for links where it can inject a value of javascript:prompt(1)

XSS attack vectors xsscrapy will not test

  • Other headers

Let me know if you know of other headers you’ve seen XSS-exploitable in the wild and I may add checks for them in the script.

  • Persistent XSS’s reflected in pages other than the immediate response page

If you can create something like a calendar event with an XSS in it but you can only trigger it by visiting a specific URL that’s different from the immediate response page then this script will miss it.


DOM XSS will go untested.

  • CAPTCHA protected forms

This should probably go without saying, but captchas will prevent the script from testing forms that are protected by them.

  • AJAX

Because Scrapy is not a browser, it will not render javascript so if you’re scanning a site that’s heavily built on AJAX this scraper will not be able to travel to all the available links. I will look into adding this functionality in the future although it is not a simple task.

Test strings

There are few XSS spiders out there, but the ones that do exist tend to just slam the target with hundreds to thousands of different XSS payloads then look for the exact reflection. This is silly. If < and > are filtered then <img src=x onerror=alert(1)> is going to fail just as hard as <script>alert(1)</script> so I opted for some smart targeting more along the logical lines ZAP uses.


When doing the initial testing for reflection points in the source code this is the string that is used. It is short, uses a very rare letter combination, and doesn’t use any HTML characters so that lxml can accurately parse the response without missing it.


This string is useful as it has every character necessary to execute an XSS payload. The “x” between the angle bracket helps prevent false positives that may occur like in some ASP filters that allow < and > but not if there’s any characters between them.


Embedded javascript injection. The most important character for executing an XSS payload inside embedded javascript is ‘ or ” which are necessary to bust out of the variable that you’re injecting into. The other characters may be necessary to create functional javascript. This attack path is useful because it doesn’t require < or > which are the most commonly filtered characters.


If we find an injection point like: <a href=”INJECTION”> then we don’t need ‘, “, <, or > because we can just use the payload above to trigger the XSS. We add a few capital letters to bypass poorly written regex filters, use prompt rather than alert because alert is also commonly filtered, and use 99 since it doesn’t require quotes, is short, and is not 1 which as you can guess is also filtered regularly.


Xsscrapy will start by pulling down the list of disallowed URLs from the site’s robots.txt file to add to the queue then start spidering the site randomly choosing 1 of 6 common user agents. The script is built on top of web spidering library Scrapy which in turn is built on the asynchronous Twisted framework since in Python asynchronosity is simpler, speedier and more stable than threading. You can choose the amount of concurrent requests in settings.py; by default it’s 12. I also chose to use the lxml library for parsing responses since it’s 3-4x as fast as the popular alternative BeautifulSoup.

With every URL the script spiders it will send a second request to the URL with the 9zqjx test string as the Referer header and a third request to the URL with /9zqjx tacked onto the end of the URL if there are no variables in the URL in order to see if that’s an injection point as well. It will also analyze the original response for a reflection of the user agent in the code. In order to save memory I have also replaced the original hash-lookup duplicate URL filter stock in Scrapy with a bloom filter.

When xsscrapy finds an input vector like a URL variable it will load the value with the test string 9zqjx. Why 9zqjx? Because it has very few hits on Google and is only 5 characters long so it won’t have problems with being too long. The script will analyze the response using lxml+XPaths to figure out if the injection point is inbetween HTML tags like <title>INJECTION</title>, inside an HTML attribute like <input value=”INJECTION”>, or inside embedded javascript like var=’INJECTION';. Once it’s determined where the injection took place it can figure out which characters are going to be the most important and apply the appropriate XSS test string. It will also figure out if the majority of the HTML attributes are enclosed with ‘ or ” and will apply that fact to its final verdict on what may or may not be vulnerable.

Once it’s found a reflection point in the code and chosen 1 of the 3 XSS character strings from the section above it will resend the request with the XSS character string surrounded by 9zqjx, like 9zqjx'”()=<x>9zqjx, then analyze the response from the server. There is a pitfall to this whole method, however. Since we’re injecting HTML characters we can’t use lxml to analyze the response so we must to use regex instead. Ideally we’d use regex for both preprocessing and postprocessing but that would require more time and regex skill than I possess. That being said, I have not yet found an example in the wild where I believe this would have made a difference. Definitely doesn’t mean they don’t exist.

This script doesn’t encode its payloads except for form requests. It will perform one request with the unencoded payload and one request with an HTML entity-encoded payload. This may change later, but for now it seems to me at least 95% of XSS vulnerabilities in top sites lack any filtering at all making encoding the payload mostly unnecessary.

After the XSS characters string is sent and the response processed using regex xsscrapy will report its finding to the DEBUG output. By default the output of the script is set to DEBUG and will be very verbose. You can change this within xsscrapy/settings.py if you wish. If it doesn’t find anything you’ll see “WARNING: Dropped: No XSS vulns in http://example.com”.

It should be noted that redirects are on which means it will scrape some domains that aren’t just part of the original domain you set as the start URL if a domain within the start URL redirects to them. You can disable redirects by uncommenting REDIRECT_ENABLED in xsscrapy/setting.py.

Manually testing reported vulnerabilities

If you see a hit for a vulnerability in a site, you’ll need to go manually test it. Tools of the trade: Firefox with the extensions HackBar and Tamper Data. Go to formatted-vulns.txt and check the “Type:” field.

header: Use Firefox + Tamper Data to edit/create headers and payload the header value.
url: Just hit the URL with Firefox and change the parameter seen in “Injection point” to a payload
form: Use Firefox + HackBar. Enter the value within the “POST url” field of the xsscrapy report into the top half of HackBar then check the box “Enable Post data” and enter your variable and payload in the bottom box, e.g., var1=”><sVG/OnLoaD=prompt(9)>&var2=<sVG/OnLoaD=prompt(9)>
end of url: Use Firefox, go to the URL listed within the xsscrapy report, and add your payload to the end of the URL like example.com/page1/<sVG/OnLoaD=prompt(9)>

The “Line:” fields are just there to quickly identify if the vulnerability is a false positive and to see if it’s an HTTP attribute injection or HTML tag injection. Recommended payloads:

Attribute injection: “><sVG/OnLoaD=prompt(9)>
Between tag injection: <sVG/OnLoaD=prompt(9)>


  • I would like the payloaded request URLs to not automatically be URL encoded as they are now. This hasn’t been a huge deal so far, but it would be a nice addition. It seems as easy as monkey-patching the Request class and eliminateing safe_url_string but I haven’t had success yet.
  • Payloads appended to the end of cookies should be added so we can exploit that vector. Done 8/24/14.
  • Add ability to scrape AJAX-heavy sites. ScrapingHub has some middleware but still not the simplest task.
  • iframe support Done 8/25/14

flattr this!

Posted in Uncategorized
15 comments on “XSScrapy: fast, thorough XSS vulnerability spider
  1. This is pretty incredible work, thanks for posting!

  2. ricardo says:

    awesome work thanks for sharing :)

  3. Generated says:

    Undeclared dependencies:

  4. Hola! I’ve been following your weblog for a long time now and finally got the bravery to go ahead and give you a shout out
    from Lubbock Texas! Just wanted to say keep up
    the fantastic job!

  5. Baudwolf says:

    Little correction: when installing, the “requirements.txt” file is located inside the xsscrapy folder you clone from git. You have to cd into that folder, then type “pip install requirements.txt”

  6. wvss says:

    thanks for your sharing!
    but the xsscrapy’s accuracy is not good!
    the 64 xss vulnerabilities in wavsep environments,xsscrapy just find 43,just 70%,less than my wvss !

    • By God you’re right. I’ve tested it against wavsep and analyzed the results. The false negatives are due to difficulty in determining the delimiting quote be it ‘ or ” within pages. I have been struck with some great ideas that will improve the delimiter quote logic and should cut the amount of requests xsscrapy makes by half. Fixing this ASAP but it’s a big rewrite.

    • 66/66 wavsep detection now. :)
      And holy hell people screw up their javascript tag filtering A LOT. Rescanned a bunch of sites I already scanned and now with the new logic engine to figure out which quote is necessary to break out of JS, I’m getting tons more hits than before. Thank you for bringing wavsep to my attention.

  7. Krossfade says:

    I know this isn’t technically your problem, but I was wondering if you could help me with installing Scrapy. Whenever I get to the Cryptography package, the install fails. More info here: http://stackoverflow.com/questions/25594642/python-cryptography-install-fails

  8. Krossfade says:

    Scratch that last comment as I successfully installed Scrapy. When I try to run XSScrapy, however, I get this error: http://pastebin.com/w4GRvDeM

    Any idea of what I can do to resolve it?

  9. mike says:

    Seems like a good tool, but it takes some tweaking to get it to work on a vanilla ubuntu system. In your post above, to install requirements you need:

    pip install -r requirements.txt

    You forgot the -r flag.

  10. Eric Bode says:

    Dear Dan,

    This tool works like a charm, only in the install steps it should be:
    pip install -r requirements.txt

    instead of without -r, for me at least.

    Thanks for the awesome tool. :)

  11. Doug says:

    You mention that this doesn’t handle AJAX sites very well. Any thoughts on methods to feed requests to it from another tool that may handle AJAX type content better?

Leave a Reply

Your email address will not be published. Required fields are marked *


+ nine = 16

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>