The Keyword Extractor & Analysis Service described

Webpage extractionImage ExtractorLink ExtractorCSS / Stylesheet ExtractorInline CSS to Classes► Keyword Extractor
SERP specificSERP rank CheckerSERP rank DominationSERP rank Comparison
MiscMultiSearchDuplicate Content FinderConvert / Text Transform
Otherjquery.optionBox pluginA note about browsers
ArticlesScraper with PHP and jqueryHow to parse SERPs in jqueryHow to execute stored PHP

Note : The Keyword Extractor & Analysis is purely experimental and far from completed


 The Report tables

  • Levenshtein / title

    The Levenshtein algorithm describes how many replaces you have to do for transforming one word to another. It is desireable for a webpage having variants of keywords, and words leading to "did you mean .." in Google

 FAQ - Frequently Asked Questions

  • There appears some words in the report, I'm not seeing on the webpage - why?

    The Keyword Extractor extracts all content, also hidden content, content with display : none; or content hidden in containers with no size. An example is YouTube, which has a lot of such hidden content.

 Definitions - Word, Keyword, Clause and Sentence

  • Words

    A word is any sequence of characters between punctuation marks or stop characters. Stop characters is spaces, periods, commas etc

  • Keywords

    A keyword is a word considered particular important in context

  • Clauses

    A clause is a sequence of words between punctuation marks or two stop characters

  • Sentences

    A sentence is a sequence of clauses between periods, exclamation points or question mark.

► The Keyword extraction and analysis cycle explained

  1. Distillation

    The content of the webpage being processed is cleaned for tags, comments, beyond scope garbage, scripts, noscripts and so on.

  2. "Homogenization" (in lack of better word)

    The content is stripped for unindexable not searchable e fl characters and/or character sequences and is then splitted up in regular text fragments. Also, HTML entities and other language specific content is tried to be as unambiguously as possible.

  3. Text processing

    The now pure regular text is runned trough some processing filters
        A. Some algorithms for identifiyng the content.
        B. Custumized filters, eg Number Processing, Experimental processing.
        C. Justification, for example the experimental dictionaries.

  4. Extraction

    The real Word, Clause and Sentence defragmentation.

  5. Analysis

    It is now possible to count words, recognise keywords, calculate density, relevance to page title and more

 Number processing

  • Exclude Numerals

    Strips out numbers. That is integers only such as 0, 117, 3456. Text with numbers, such as 87a, 100k, 80ties is not stripped.

  • Exclude Roman Numbers

    Strips out any Roman Number between I - MMMCMXCIX (1-3999).
    Only correct Roman Number notation is considered. VIII is a Roman Number - IIX is not (theoretically both is "8", but VIII is correct notation)

  • Exclude spelled numbers

    Strips out af broad range of spelled numbers, such as 1st, 17th, billions...

 Experimental processing

  • Exclude days and abbreviations

    Strips out weekdays eg monday, mon ...

  • Exclude months and abbreviations

    Strips out months eg october, oct ...

  • Exclude Colors

    Strips out color names, such as red, purple, magenta ...