|
But did the spider actually visit? Maybe not. Dynamic page content is often invisible to most search engine spiders, so it never gets indexed. Increase the traffic to your dynamic site by making your valuable content visible to search engine spiders.
Dynamic Pages Are Easy To Maintain
The content of static pages doesn't change unless you actually code the changes into your HTML file: open the file, edit the content, save the file, and upload it to the server. All search engine spiders can index static Web pages.
A dynamic Web page is a template that displays specific information in response to queries. Most of the page content comes from the database connected to the Web site. Visitors love them since they get quick access to the information they want. These sites are easy for webmasters to update: as product offerings or prices change, just edit your database instead of hundreds of individual Web pages.
Search engine spiders have a much tougher time with dynamic sites. Some get stuck because they can't supply the information the site needs to generate the page. Other spiders deliberately stay away from dynamic pages to avoid getting trapped in the site.
What Was The Question Again?
Visitors find information in a dynamic site by using a search query. That query can either be typed into a search form by the visitor or already be coded into a link on the home page - making the link a pre-defined search of the site's catalog. In that later case, the portion of the link containing the search parameters is called a 'query string.'
But a search engine spider doesn't know to use your search function - or what questions to ask. Dynamic scripts often need certain information before they can return the page content: cookie data, session id, or a query string are common requirements. Spiders usually stop indexing a dynamic site because they can't answer the question.
If the spider does accidentally wander deeper into your site, it could inadvertently get caught in a "spider trap": a badly written CGI script that requests information the spider can't supply. Then the spider and your server navigate a never-ending loop where a request for a page is met with a request for information.
Getting a spider trapped inside your server is not just bad for the spider. The repeated requests for pages can crash the server. If you share server space with other Web sites and have a problem with site downtime, ask your Web host to check for CGI script problems on other sites.
It's All In The Name
A page's actual URL address often poses a problem too because most dynamic page URL's contain query strings. Here's an example of the URL for a book search result page on Barnes and Noble's Web site:
http://shop.barnesandnoble.com/booksearch/isbnInquiry.asp?
userid=2IMXLT5XN1&mscssid=QEUFGRFF5X2G9H2UCMJQLAKJ8JV83FMD&isbn=0452269350
Look closely at the URL. See the question mark after /isbnInquiry.asp? Most search engine spiders get to the "?" in the query string and stop indexing because of the probability of getting caught in a spider trap.
Attract Spiders To Your Web
So, you've got all this invisible content - what do you do? Search engines know about the problem, but most have shown very little interest in addressing it. Infoseek and HotBot are the exception. Their search engine spiders can index dynamic page content, but don't do it automatically. You have to invite them in.
HotBot recommends that you submit your dynamic page with all the arguments added onto the URL (www.website.com/products/search/product_query.asp?prod_id=22929). You can also submit a static page that contains links to the dynamic URLs you need indexed.
Infoseek's spider, called Slurp, will index dynamic pages that you submit, but won't crawl through your dynamic Web site by default.
You do have options to get indexed by the other search engines, but no matter which you select, you'll have to spend some time and effort to make sure your dynamic content gets indexed.
Add Dynamic Links To Static Pages
Include links to important dynamic content on your static pages. The simplest way is a straightforward table of contents page that links to your most important dynamic pages. It gives spiders a way to index content without having to answer any questions. If you have a small site with few products, this is a quick way to get more of your content indexed.
However, the table of contents won't help you with search engine spiders that stop at query strings. Increase your chances by including good, descriptive links to your major product categories on a static products page. Search engines that stop at query strings will still index the content of the products page - including your link titles. Other search engines that can follow dynamic links can visit the actual dynamic page content without a query.
Remove Query Strings From Dynamic URL's
Amazon.com uses this method to get its product selections indexed by search engines. For instance, a search on Google for Rachael Carson's book, Silent Spring, returns a result that takes you directly to the appropriate dynamic page at Amazon: http://www.amazon.com/exec/obidos/ISBN%3D0395683297/103-0475212-8205437. Because the URL doesn't contain any query strings, all search engines can index Amazon's product line.
This method works, but it's also the most technically demanding solution. If you decide to use this method, you can select from several different options, depending on the type of Web server you use and the software you're using to integrate your database with your Web site:
Cold Fusion: Reconfigure your Cold Fusion setup to replace the "?" in a query string with a '/' and pass the value to the URL. The browser interprets that as a static URL page.
Instead of http://www.mystore.com/products.cfm?prod_id=22343, you get a string like this: http://www.mystore.com/products.cfm/22343.
CGI Scripts: Path_Info (or Script_Name) is a variable in a dynamic application that contains the complete URL address (including the query string information). Write a script that strips out all the information before the query string and set the balance of the information equal to a variable. You can then use that variable in your URL address.
Apache: has a special rewrite module that allows you to translate URL's containing query strings into URL addresses that search engine spiders can follow. The module, mod_rewrite, isn't automatically installed with Apache software. Check with your Web host or administrator and see if it's available on your server.
Visit the Apache Web site for more information on the mod_rewrite module.
Active Server Pages: Most search engines will index .asp pages if the "?" is removed from the URL. XQASP offers a product that will automatically remove the query strings from your .asp pages and replace them with "/" marks.
A note of caution: these four methods make your dynamic page appear to have its own sub-directory, so the browser will look for images and links there. You can completely avoid broken links and pages by using all absolute URL addresses on your page, but that will make maintenance more difficult later. Alternatively, use URL addresses that are relative to the root directory of your site, not the document. Use /homepage.htm instead of ../homepage.htm and you'll be fine.
Remember The Rules
Don't get so caught up in modifying your page design or URL addresses that you forget the basic rules for search engine optimization. Your pages need to have good content, META tags, a high link popularity score, appropriate keywords, and more before you can climb to the top of the search engine ranking.
NetMechanic's Search Engine Power Pack helps
you get all your pages ready to submit to search engines. It contains the tools you need to get and keep a top ranking.
Deep linking at Dynamic Pages
or will help you get all your important pages indexed - dynamic and static alike.
Just think how easily you could promote your dynamic Web site if those pesky search engine spiders could just index the content of your dynamic pages. Well, the exciting news is that some spiders can do just that! Other search engines have taken a smaller step. They allow webmasters to submit important dynamic pages directly - one page at a time.
Problems With Dynamic Sites
A dynamic site has content delivered from a database in response to user queries. It's usually easy to spot a dynamically generated page: just look for a file extension like .asp, .php, or .cfm.
The problem isn't with the file extension, but with the special characters that indicate a query string - a question mark, equal sign, or ampersand (?, = , &). The presence of the special characters indicates a non-static HTML page, or one that only exists in response to a particular query from a visitor.
In the past, search engine spiders avoided those pages entirely because they often became "spider traps." Spiders would get caught on the page, unable to continue through the site or backtrack out of it. Sometimes, the problem would be so severe that the Web site's server would crash - locking out all visitors!
Google's Webmaster Guidelines page notes that:
"If you decide to use dynamic pages (i.e., the URL contains a '?' character), be aware that not every search engine spider crawls dynamic pages as well as static pages. It helps to keep the parameters short and the number of them small."
Our February 2001 Webmaster Tip discusses several ways to rewrite the URLs of dynamic pages to remove the problem characters. However, that doesn't always help search engines crawl through your dynamic pages.
Opening Doors To Dynamic Sites
Google has been quietly allowing its spider (called Googlebot) to crawl dynamically generated pages and index the content. This experiment has actually been going on for over a year and Google admits to approaching the project "slowly and cautiously."
Google's Webmaster FAQs page warns that:
"We are able to index dynamically generated pages. However, because our web crawler can easily overwhelm and crash sites serving dynamic content, we limit the amount of dynamic pages we index."
Inktomi is also slowly adding dynamic capability to its spider too (but remember that Inktomi doesn't allow any free site submissions anymore). Inktomi's Web site notes that its spider (called Slurp) can index dynamic pages, but will not "crawl them by default."
"If your Web site is based on dynamic links and you want your site to appear in our search engine, one approach is to have some static pages which have links to your dynamic pages."
Indeed, that's easiest way to get both Googlebot and Slurp to index your dynamic pages. Because the value of the initial query is included in the link text, the spiders aren't expected to create the query, only index the results.
For instance, if your ecommerce site has all its product information inside a database, it's a good idea to create a products page that contains the links that retrieve important information about your major product categories. Googlebot and Slurp reliably follow these dynamic links.
So the dynamic links you want these spiders to index might look like this:
title="See a list of ink jet printers in
stock!">Ink Jet Printers
title="See a list of laser printers in
stock!">Laser Printers
Both Googlebot and Slurp follow the links on the static HTML page and index the content of the dynamically generated pages. But they stop there. Neither spider will reliably go deeper into your dynamic pages unless you create more static pages that link to that additional information.
AltaVista indexes dynamic pages submitted using its Trusted Feed premium inclusion program. This service gives Alta Vista partners more control over how their pages are indexed and ranked. The service description specifically notes one of the benefits: "accepts pages that are traditionally difficult for crawlers to index, such as framed pages or pages with dynamic content."
The paid inclusion is a reliable way for large Web sites to get their entire database content spidered, but the cost is often way too much for smaller sites to even consider.
Deep Submit Helps With Some Engines
If you can't justify the expense of a premium inclusion program and you don't want the bother of creating static HTML pages with dynamic links, you do have another option: deep submit your dynamic pages directly to search engines.
When you deep submit pages, you're ensuring that the search engine indexes all the important pages in your site by submitting individual pages directly to the search engine.
This way, you can tell the search engines exactly what query results you want indexed. Unfortunately, even this can cost you money if individual engines require you to pay to submit. For instance:
Lycos InSite Select service lets you submit dynamic URLs directly, but charges per URL.
AskJeeves/Teoma engine accepts dynamic pages directly, but only through the Site Submit function, which also charges per URL indexed.
AltaVista still offers a free submit option, but you'll do a lot more typing using the free submit than with the Trusted Feed program. That's because the Basic Submit limits submissions to one URL at a time. You end up with a lot of repetitive typing and the chances of making a mistake increase.
This is actually a big problem with any type of deep submit strategy because every search engine wants the URL, your email address, and more information about your site. You could spend hours - or days - submitting the same data over and over to various search engines.
Or you could use NetMechanic's easy deep submission tool Search Engine Starter to submit multiple pages of your Web site to top search engines like Google, AllTheWeb, and HotBot. Search Engine Starter will help you get all your important pages indexed - dynamic and static alike. . |