red_spider

synopis:Site validation spider based on redbot

Mark Nottingham released redbot - a modern replacement for the classic cacheability tester. I’ve been using it at work to audit website performance before releases since proper HTTP caching makes an enormous difference in perceived site performance.

redbot is a focused tool and provides a great deal of detail about at most one page and, optionally, its resources. I wanted to expand the scope to testing an entire site and performing content validation and created red_spider.py which allows you to perform all of those checks by spidering an entire site, receiving a nice HTML report and, optionally, also validating page contents as well.

--help

Display all available options and full help

--format=REPORT_FORMAT

Generate the report as HTML or text

--report=REPORT_FILE

Save report to a file instead of stdout

--validate-html

Validate HTML using tidylib

--skip-media

Skip media files: <img>, <object>, etc.

--skip-resources

Skip resources: <script>, <link>

Skip links whose URL matches the specified regular expression

--save-page-list=PAGE_LIST

Save a list of URLs for HTML pages in the specified file

--save-resource-list=RESOURCE_LIST

Save a list of URLs for pages resources in the specified file

--log=LOG_FILE

Specify a location other than stderr

-v
--verbosity

Increase the amount of information displayed or logged