CSS Extractor - review usage of external, imported or embedded CSS stylesheets

Webpage extractionImage ExtractorLink Extractor► CSS / Stylesheet ExtractorInline CSS to ClassesKeyword Extractor
SERP specificSERP rank CheckerSERP rank DominationSERP rank Comparison
MiscMultiSearchDuplicate Content FinderConvert / Text Transform
Otherjquery.optionBox pluginA note about browsers
ArticlesScraper with PHP and jqueryHow to parse SERPs in jqueryHow to execute stored PHP

CSS Extractor explained

The purpose of this utility is to generate a full overview of CSS resources associated with a HTML document, and then map the actual usage of those CSS resources in the HTML. It goes in three steps :

  1. Extraction of CSS resources


  2. Parsing of each CSS resource


  3. Usage mapping of CSS resources

    In the HTML / webpage, each tag is parsed and all class and id -references is counted. A table showing the actual usage of each styledefinition for each CSS-resource is then being built. The output will contain a sortable table for each CSS resource, showing all CSS definitions, usage of those CSS definitions and the source of the CSS resource.

CSS Extractor

The CSS Extractor scans the HTML document for valid CSS resources, then parses each resource for CSS classes and style definitions. The CSS Extractor detects and parses the following elements :


  • External stylesheets

    <link rel="stylesheet" type="text/css" ..> references. .css or other accessible reference

  • Imported stylesheets

    @import .. (also nested imports). Typically found in <style> .. </style> and external stylesheets-

  • Embedded stylesheets

    <style> .. </style> embedded stylesheets found anuwhere in the HTML document.

  • Compressed stylesheets

    For external and imported stylesheets, the CSS parser will automatically detect and decompress zipped resources.

CSS Parser

The parser extracts each style or CSS class definition, but are excluding definitions for the following elements :


  • HTML tags

    Currently they are : * <html> <body> <a> <div> <span> <blockquote> <tr> <td> <table> <caption> <th> <tbody> <thead> <tfoot> <fieldset> <legend> <em> <dt> <dd> <dl> <ol> <h1> <h2> <h3> <h4> <h5> <h6> <font> <del> <dfn> <kbd> <tt> <ins> <var> <samp> <ul> <li> <p> <hr> <img> <br> <pre> <abbr> <cite> <code> <sup> <sub> <acronym> <strong> <b> <u> <strike> <small> <big> <button> <input> <textarea> <select> <form> <label> <iframe> <applet> <object>

  • Pseudo classes

    :before, :after, :first-child, :visited, :hover, :link, :active, :focus

  • Input substyling

    [type=button], [type=checkbox], [type=file], [type=hidden], [type=image], [type=password], [type=radio], [type=reset], [type=submit], [type=text]

Tip : Rather than investigate the output on screen, print it on paper and use the sheets as reference. Since each styledefinition is considered, the output easily get very huge.




Help to improve : This tool is based on a load of Perl-Compatible Regular Expressions (PCRE, or wiki) and is still under development. If you discover any misbehavior or strange / unexpected output, or you have suggestions to future improvements, please report to css@4horizons.com with link to the page not being parsed properly.

Also, suggestions for the visual output is very much appreciated. It is very hard to figure out a form of presentation which is both detailed and gives easy overview.