Desert Scraping: How it can help your online business

11 January 2009 - By Ahson Rafiq - Filled in SEO

A lot of black hat SEO techniques are now widely used by various web developers and website owners mainly because of the instant results they give. Because of this, a lot of sites have been banned as well by search engines mainly because the consequences of these methods include disrupting the search engine algorithms, not to mention complaints from real audiences. But not all black hat ways are harmful-it all depends on how you are going to use them.

If you have already grown tired of trying almost all kinds of trusted white hat, gray hat and black hat techniques but the results didn’t go they way you expected, then it’s time that you venture on resurrecting the dead-by means of desert scraping.

Desert scraping is a special black hat SEO technique that involves the reactivation of dead domains in order to help boost the power of your site. To do this, here are the steps:

1. Purchase a domain name. Afterwards you set up Catch-All subdomains on your domain name by using the Apache config and Mod-Rewrite.

2. You write a script in which you can get content from a database and display it on its own subdomain. There are no specific templates needed to do this, but if you don’t know how to write your script, then below are some samples:

 

$HTTP["host"] =~ "(^|^www\.)domain\.com$" {
server.document-root = "/www/pages/domain.com/htdocs/"
accesslog.filename = "/www/logs/access_www_domain_com.log"
}
else $HTTP["host"] =~ "\.domain\.com$" {
server.document-root = "/www/pages/domain.com/subs/"
accesslog.filename = "/www/logs/access_sub_domain_com.log"
}

3. Once you have done so, set up the main page which points the links to the freshest subdomains (together with their titles) so that they will be indexed in search engines.

4. Then you subscribe to services which monitor domains which are about to expire, such as DeletedDomains.com, or DeactivatedOn.com.

5. Using a daily cronjob task, scan the daily list of domains which have been deactivated that day, and store this list in your database. Below is a script you can use to do the task:

$d = file_get_contents( 'http://deactivatedon.com/new/org_070623_0001.html' );
preg_match_all( '/[a-z-]+?\.org/', strtolower( $d ), $m );

6. Once this is done, check the expired domains using Archive.org. You will have to do a deep crawl and substitute the links with their local equivalents. Do the same thing to the images found in your templates.

$pspell = pspell_new( 'en' );
foreach( $m[0] as $d ) {
   if( strlen( $d ) <= 22 && $d != "w3.org" && strpos($d,"xn--") === false && substr_count( $d, '-' ) <= 1 ) {
      $t = str_replace( '-', '', $d );
      $check = array();
      $words = array();
      for( $j = 0; $j < ( strlen( $t ) - 5 ); $j++ ) {
         for( $i = 4; $i < strlen( $t ); $i++ ) {
            if( pspell_check( $pspell, substr( $t, $j, $i ) ) ) {
               $check[$j]++;
               $words[] = substr( $t, $j, $i );
            }
         }
      }
      if( count( $check ) > 0 ) {
         $domains[str_replace( array("\n","\r","\t"), "", $t )] = $words;
      }
   }
}

7. Do a simple algorithm so that all the ads you can find will be replaced by your own. It doesn’t really hurt to replace those links with your sites knowing that you are in need of more popularity than they do.

$words = array_values( $domains );
$domains = array_keys( $domains );
for( $i = 0; $i < count( $domains ); $i++ ) {
   $url = explode( '.', $domains[$i] );
   $q = 'SELECT * FROM subs WHERE
      domain = "'.mysql_escape_string( $url[count($url)-2] ).'" AND
      tld = "'.mysql_escape_string( $url[count($url)-1] ).'" LIMIT 1';
   $r = mysql_query( $q );
   if( mysql_num_rows( $r ) == 1 ) continue;
   $s = @file_get_contents( 'http://web.archive.org/web/*/http://www.'.$domains[$i] );
   if( empty( $s ) && $x[$i] < 3 ) {
      $x[$i]++;
      $i--;
      continue;
   }
   if( strpos( $s, ' Sorry, no matches.' ) !== false ) continue;
   $s = str_replace( array("\n","\r","\t"), '', $s );
   $m = array();
   preg_match_all( '/<a href="http:\/\/web\.archive\.org\/web\/([0-9]+?)\/http:\/\/([^"]+?)">[a-zA-Z0-9,\s]+?<\/a>[\s\*]+?<br>/', $s, $m, PREG_SET_ORDER );
   if( count( $m ) == 0 ) continue;
   $pages = array();
   foreach( $m as $d ) {
      $pages[substr( $d[1], 0, 4 )][] = $d;
   }
   foreach( $pages as $key => $p ) {
      $z = count( $p );
      if( $z > $c ) $k = $key;
   }
   $link = $pages[$k][count($pages[$k])-1];
   for( $j = 0; $j < 5; $j++ ) {
      $site = file_get_contents( 'http://web.archive.org/web/'.$link[1].'/http://'.$link[2] );
      if( empty( $site ) ) continue;
      break;
   }
   $lsite = strtolower( $site );
   if( empty( $site )
   || strpos( $lsite, 'was registered' ) !== false
   || strpos( $lsite, 'not in archive.' ) !== false
   || strpos( $lsite, 'for sale' ) !== false
   || strpos( $lsite, '<frameset' ) !== false ) {
      continue;
   }
   $dom = $url[count($url)-2].'.'.$url[count($url)-1];
   $data = clean( $site, $dom );
   if( !$data ) continue;
   $hash = crc32( $dom );
   $hash = sprintf( "%u", $hash );
   $q = 'INSERT INTO subs SET
      domain = "'.mysql_escape_string( $url[count($url)-2] ).'",
      tld = "'.mysql_escape_string( $url[count($url)-1] ).'",
      hash = '.$hash.',
      `keys` = "'.mysql_escape_string( implode( '|', $words[$i] ) ).'",
      content = "'.mysql_escape_string( implode( "\n", $data ) ).'",
      ts = NOW()';
   mysql_unbuffered_query( $q );
}

8. Upload the scraped site on your subdomain by using the old domain (omit the tld). Thus, when the site was previously named “proana.com” then your subdomain would become “proana.mydomain.com”.

9. Make the cronjob add the new subdomain on the list of the completed ones so it will be included on the site’s main page and be indexed.

What just happened?

Upon doing this trick, you have just created a site rich in unique content as well as niche coverage. New content is provided and a new string of niches are hence made on that domain, and by the time the subdomains are fully indexed, the old pages on the dead domains will begin to fall from the index. Hence, you will no longer suffer from duplicate content and your site will grow on its own. What you have to do now is make more of these sites so that yours will grow bigger and bigger. Thus, prepare yourself for more coding jobs as you are required to write the codes yourself.

 

*codes taken from: http://www.seo-blackhat.com/article/ess-the-real-desert-scraping.html

Google Buzz No Related Posts

This website uses IntenseDebate comments, but they are not currently loaded because either your browser doesn't support JavaScript, or they didn't load fast enough.

11 Responses to “Desert Scraping: How it can help your online business”

  1. T.. Says:

    if you remove everything above the link to my blog – the post would have the same effect.

  2. IronBlogger Says:

    I not very big on black hat techniques which is why I'm not even going to try this out or get into it. If your into than I guess this post is good for you.

  3. Bradd Says:

    interesting site it really learns me a lot.

  4. game-girl Says:

    I am sure I get to know something new from your post. I can not say that all this is Greek to me but technical aspects are really very complicated and need some technical mind to master but it will a great resourse for many other people.

  5. bhjayalaxmi Says:

    lol i don't have html knowledge to try this black hat method .But thank you for sharing with us :) .

  6. bhjayalaxmi Says:

    how many more black hat methods you are going to discuss on our blog?

  7. T Shirts Says:

    Really this site is fantastic site for us. I seeing that every posts have unique quality and this post sowing SEO black hat techniques which it is not good fo ra site. But i guess this post is good and thanks for sharing this post with us.

  8. Hilton Head Rental Says:

    Thank you very Much for posting such interesting black hat SEO technique. Please post some more techniques on your next blog.

  9. Backlink Builder Says:

    I’m not big on commenting, but nice post.

  10. StartWorkingFromHome Says:

    Pretty Good

  11. Payday No Teletrack Says:

    Amazing post and great idea……………..

Leave a Reply