Link Extractor - extract internal, outgoing and incoming links

Webpage extractionImage Extractor► Link ExtractorCSS / Stylesheet ExtractorInline CSS to ClassesKeyword Extractor
SERP specificSERP rank CheckerSERP rank DominationSERP rank Comparison
MiscMultiSearchDuplicate Content FinderConvert / Text Transform
Otherjquery.optionBox pluginA note about browsers
ArticlesScraper with PHP and jqueryHow to parse SERPs in jqueryHow to execute stored PHP

► What does mean?

It means that the Link Parser didnt find any content to show for the link, such as a anchor text or an image. That is, for instance, if the visual "content" of the link consist of a span or div, with a background-image attached, but the anchor text itself is empty.



► For some links, Anchor Text appears completely blank?

This is no error. Surprisingly many sites use completely white or 1x1 pixel transparent images as anchor text in links, for many various reasons (very often along with a background-image as described above). Tip : Using Google Chrome, Safari or FireFox with FireBug installed, you will always be able to check the content of the Anchor Text-field.



► How to read Title / Alt row

When a link has the title attribute defined, <a href=.. title="title" ..</a>, as it should to help users, you will see that title in the row. If the anchor text contains an image, the alternate text <img alt="alternate text" for the image is parsed as well, and you will see it in the row as [ALT: alternate text]. However, if the alt attribute is defined, but empty (alt="") you will be noticed by [ALT: undefined].



► There is some links in the browser showing the webpage, not included in the results - is that an error?

Absolutely not. Any HTML-link, <a ..>..</a>, even wrongly coded, is detected by the Link Parser. "Missing" links is typically part of some HTML loaded through AJAX. The Link Finder (or any searchengine for that matter) is unable to index links that is being injected by the client after the page is loaded.



► There is no Incoming Links - why?

The Incoming Links Finder performs some searches on the Internet, refines the results and deliver a result of ultimately 200 referring sites as a total (there needs to be a limit somehow). The refining process strips all references to the target webpage from the pagedomain itself (including root, subdomains and subpages). Beyond that, links from google, alexa, youtube, cached material and any link with anchor text as / or # is stripped out. Also, a referring site will only be listed once. Even if a site have 1000 links to the target-page, it will only be listed once. Furthermore, there is the possibility that the target webpage has no incoming links at all, the links is not to be found - or, the Incoming Link Finder just did a bad job :-)



► Some websites are slower to load - why?

The average time for the linkanalysis to complete (normal webpage with 50-500 links on lowcost connection) should be below 4-5 secunds. However, some few sites can take up to 10-15 secunds to complete. This happends on Microsoft IIS/ SharePoint hosts. When the Link Parser-algorithm meets HTML containing poorly coded <img>-tags, without WIDTH or HEIGHT attributes defined (for some reason more likely on ISS/SharePoint-platforms) the link parser re-inspects the images remote to get the scaling correct. This process is remarkebly slower by IIS than for stage of the art webservers, such as Apache running on Debian. I honestly dont know why. In fact, surprisngly many ISS/Sharepoint-driven websites contains simple validation-errors, such as missing closing-tags /a> for links.



► Not getting any results?

You have propably entered an URL with or without www, where DNS / the host either lack the canonical name www or redirects the request. www.example.com may have been accepted () but access to the content is redirected to example.com - and therefore, the linkparser doesnt find any match. Hint : If you have no internal links, but ingoing links, the target-site is redirecting to another site for sure. An example is cnn.com, which generates a lot of incoming links but no links in other categories - switch to www.cnn.com. It is impossible for the link-finder to predict how each website internally is organized.


Solution : Remove or add www, opposit to previous attempt.

Note : Even though www is a subdomain, the linkparser always consider www as a part of the root.