How to parse SERP's in jquery

Webpage extractionImage ExtractorLink ExtractorCSS / Stylesheet ExtractorInline CSS to ClassesKeyword Extractor
SERP specificSERP rank CheckerSERP rank DominationSERP rank Comparison
MiscMultiSearchDuplicate Content FinderConvert / Text Transform
Otherjquery.optionBox pluginA note about browsers
ArticlesScraper with PHP and jquery► How to parse SERPs in jqueryHow to execute stored PHP


By using a scraper it is quite easy to perform a serverside websearch and deliver the SERP back to the client. But the websearch is not very useful without some way to parse and index the searchengine result page.


No long explanation is nessecary. The various searchengines has some differences, but basically the SERPs is wrapped around an easy detectable almost identical HTML-structure. The major searchengines always delivers at least the following items :

For the code examples below, it is assumed that the SERPs is loaded into an element (div, span whatever) id'ed "result". The javascript code is designed to operate on the first element only, with a method for removing the element. This is the best approach, since the SERP's can be iterated along removing parsed SERP_items in the same loop.


For Google, Bing and Yahoo you can iterate through the SERP's by using the generic pseudo code below :

var element = "#result";
while (<PARSER>.hasMore(element)) {
	var raw_SERP_item = <PARSER>.SERP(element);
	var title = <PARSER>.SERP_title(element);
	var URL = <PARSER>.SERP_url(element);
	var description = <PARSER>.SERP_text(element);
	<PARSER>.SERP_remove();
}

Parsing Google SERP's

Google store SERP's in a list-structure <li> with the class-name "g", page-title as <h3>'s and the page-description in an "s"-element. This gives the following simple javascript-class :

var GoogleParser = {
	hasMore : function(element) {
		return ($(element).find('li.g').length>0);
	},
	SERP : function(element) {
		return $(element).find('li.g:first').html();
	},
	SERP_title : function(element) {
		return $(element).find('li.g:first').find('h3.r').text();
	},
	SERP_url : function(element) {
		return $(element).find('li.g:first').find('h3.r').find('a').attr('href');
	},
	SERP_text : function(element) {
		var g=$(element).find('li.g:first').find('.s').clone();
		$(g).find('.osl').remove(); 
		$(g).find('.f').remove();
		return $(g).text();
	},
	SERP_remove : function(element) {
		$(element).find('li.g:first').remove();
	}
};

Parsing Bing SERP's

Microsofts' Bing also store SERP's in a list-structure <li> with the class-name "sa-wr", also page-title as <h3>'s but the page-description in a more logical <p>-tag. This gives the following simple javascript-class :

var BingParser = {
	hasMore : function(element) {
		return ($(element).find('li.sa-wr').length>0);
	},
	SERP : function(element) {
		return $(element).find('li.sa-wr:first').html();
	},
	SERP_title : function(element) {
		return $(element).find('li.sa-wr:first').find('h3').text();
	},
	SERP_url : function(element) {
		return $(element).find('li.sa-wr:first').find('h3').find('a').attr('href');
	},
	SERP_text : function(element) {
		return $(element).find('li.sa-wr:first').find('p').text();
	},
	SERP_remove : function(element) {
		$(element).find('li.sa-wr:first').remove();
	}
};

Parsing Yahoo SERP's

Yahoo is a little bit different. Yahoo are delivering SERP's in an overall structure called "web", and each SERP_item as <li>'s in an ordered list :

var YahooParser = {
	hasMore : function(element) {
		return ($(element).find('#web').find('ol li').length>0);
	},
	SERP : function(element) {
		return $(element).find('#web').find('ol li:first').find('.res').html();
	},
	SERP_title : function(element) {
		return $(element).find('#web').find('ol li:first').find('h3').text();
	},
	SERP_url : function(element) {
		return $(element).find('#web').
		                  find('ol li:first').find('h3').find('a').attr('href');
	},
	SERP_text : function(element) {
		return $(element).find('#web').find('ol li:first').find('.abstr').text();
	},
	SERP_remove : function(element) {
		$(element).find('#web').find('ol li:first').remove();
	}
};

Now you have the basic methods for parsing SERP's. Of course, like me, you are likely to desire methods for counting the SERP's, just looking for certain domains and so on. But this task should be easy implementable for everyone who understand the basic technique.


blog comments powered by Disqus