Up Close and Personal with Bing Team
Click to Play

Up Close and Personal with Bing...
The buzz around the announcement of Microsoft's new search brand Bing is still going strong. Users are commenting on everything from Bing's interface to the...

Web News

Vodafone Access Gateway brings 3G indoors
There are just two problems with escaping the polluted, overcrowded streets of the city for a life in a rural idyll: firstly, the broadband sucks (although Digital Britain may fix that); secondly, getting a steady phone signal...

Gov't CIO weighs up public-sector app store
Government chief information officer John Suffolk has said he is considering the creation of a virtual applications store for the public sector. Sutton made the comment on Friday in an online debate on...

Experts question UK broadband tax
One of the biggest surprises in the Digital Britain report was the news that everyone with a fixed line telephone would pay a broadband tax. At 50p a month the amount is unlikely to break the bank but experts are...


06.23.09

Learning The Basic Terminology Of Robot.txt Files

Patrick HareBy Patrick Hare

On many occasions customers come to us with the complaint that they can't be found. They either had rankings on all search engines and suddenly disappeared, or never were seen in the first place. Believing that they are the victims of a ban in the search engines, they come to us for search engine optimization advice. In many cases, the culprit is found in the robots.txt file, in the form of the classic:

User-agent: *
Disallow: /

(Special Note: Using this command will make your site disappear in the search engines!) The forward slash after the disallow tells the engines to ignore all files. The soluton to this problem is to delete the forward slash, which tells search engines that everything is fair game. If you use Google Webmaster Tools, you will be told that the robots file prevents the indexing of your site. Many times a webmaster will upload this accidentally, or forget to take it down when a dev site goes live. The command effectively tells every honest search engine spider to stop reading your site and go away. Note that unethical spiders that scrape for phone numbers, email addresses, and content will not even bother to look at your robots.txt file, unless they are programmed to look for the files you don't want found. If you are looking to block search spiders from dishonest people on the internet, the robots.txt file is probably not going to help you, so you should look to server level exclusions.

Depending on the complexity of your site, the robots.txt file can be modified to support your SEO initiatives. If you have a series of pages in a shopping cart, forum, or section that you want to exclude, you can disallow a specific directory:

Disallow: /Example

Relax and Enjoy Free* Managed Hosting till
the Summer Solstice 21st June 2009

If you have multiple directories, you would just add them to the list:

User-agent:
Disallow: /Example
Disallow: /secret_plans
Disallow: /things_we_do_not_want_the_world_to_know

or you can use a newer wildcard format that disallows pages with certain phrases of string segments in them. If you wanted to disallow all the pages with a session ID in them, you could use a command that says:

Disallow: /*sessionid

Keep in mind that this will effectively shut out search engines for these pages, so you should ensure that your string is long enough that it does not accidentally blind the engines to pages that you want to get found. The wildcard robots disallow is ideal for people who may have bought sites and then found out that the site was a parked domain with thousands of "junk" pages installed by a previous owner. Even if you don't have any of those pages on your site, it can take months for Google to notice that they no longer exist. By excluding them in your robots file, the removal of those cached pages can take less time.

Continue reading this article.


About the Author:
Patrick Hare has been managing online and offline marketing projects since 1999. From 2005 to present, he has been with Scottsdale Arizona's Web.com Search Agency (formerly Submitawebsite). Patrick provides Search Engine Optimization and Marketing advice to in-house customers and Web.com Jacksonville's web design group.

About DevWebProUK
DevWebProUK is for professional developers ... those who build and manage applications and sophisticated websites. DevWebProUK delivers via news and expert advice New Strategies In Development.





DevWebProUK is brought to you by:

SecurityConfig.com NetworkingFiles.com
NetworkNewz.com WebProASP.com
DatabaseProNews.com SQLProNews.com
ITcertificationNews.com SysAdminNews.com
LinuxProNews.com WirelessProNews.com
CProgrammingTrends.com ITmanagementNews.com






-- DevWebProUK is an iEntry, Inc. publication --
iEntry, Inc. 2549 Richmond Rd. Lexington KY, 40509
2009 iEntry, Inc.  All Rights Reserved  Privacy Policy  Legal 

archives | advertising info | news headlines | free newsletters | comments/feedback | submit article


Delivering IT Solutions DevWebProUK News Archives About Us Feedback DevWebProCanada Home Page About Article Archive News Downloads WebProWorld Forums Jayde iEntry Advertise Contact