How Website Indexing Works (And How To Make It Work Better)

www.forbes.com
4 min read
fairly easy
As a site administrator, not only do you want to lay down some rules, you also want to set some priorities.
By David Hunter, CEO of Epic Web Studios and ASAPmaps in Erie, PA. He also co-founded dbaPlatform, a local SEO software.

getty

Suppose you've just composed the most objectively useful, engaging and brilliant web content ever. Now suppose that content remained unseen and unheard of, never once appearing in search results. While that may seem unconscionable, it's exactly why you cannot overlook website indexing.

Search engines like Google love delivering the good stuff just as much as you love discovering it, but they cannot serve users results that haven't been indexed first. Search engines constantly add to their colossal libraries of indexed URLs by deploying scouts called "spiders," or "web crawlers," to find new content.

How Web Crawlers Index Content

Even for spiders, the web is a lot to navigate, so they rely on links to guide their way, pointing them from page to page. In particular, they've got their eyes on new URLs, sites that have undergone changes and dead links. As the web crawlers come across new or recently altered pages, they render it out much like a web browser would, seeing what you see.

However, whereas you might skim over the content quickly for the information you need, the crawlers are much more thorough. They scale the page up and down, creating an index entry for every unique word. Thus it's possible that a single web page could be referenced in hundreds (if not thousands) of index entries!

Getting To Know Your Crawlers

At any given time, there may be hundreds of different spiders crawling the internet, some good and some bad (e.g., those looking to scrape email directories or collect private information for spamming purposes). But there are a handful you want to be particularly aware of.

• Googlebot (Google)

• Bingbot (Bing)

• Slurp (Yahoo)

• Facebot (Facebook external links)

• Alexa crawler (aka ia_archiver, for Amazon's Alexa)

Give Crawlers Guidelines With Robots.txt And Meta Directives

There may be situations where you do…
YEC
Read full article