Googlebot

A friend of mine recently emailed to ask, how TinyURL impacts SEO? It's a good question and one many folks can't answer so, I thought I'd blog my answer to his question!

For anyone not familiar with TinyURL, in layman terms it's a tool where users can enter long displaying URLs to get a shortened version. TinyURLs are often used where long URLs might wrap and therefore break, such as in email or social media web applications like Twitter. In more technical terms, TinyURLs are short, dynamically created URLs that redirect users to another intended URL via 301 redirect. Because TinyURLs "301" or permanently redirect, search engines should not index the TinyURL but instead should index and pass PageRank to the actual URL.

It is important to note, TinyURLs to paid links passing PageRank is a violation of Google Webmaster Guidelines and that sites like Twitter use nofollow techniques to prevent spam.

On their own, TinyURLs can be search engine friendly from a technical perspective. At the same time, I wouldn't suggest replacing your site's navigation with TinyURLs and would point out that tracking TinyURLs via analytics might be difficult.

Sites claiming to offer a new, innovative solution for "Flash SEO" called SWFAddress aka "Deep Links", "Deep Linking" and/or other. Unfortunately, these sites are promoting techniques based on SWFAddress which is a method for Flash SEO that I've blogged about, taken the creator to task on and that even he admits, is sub-optimal in terms of SEO!

"The case is valid. Deep links with anchors published on other sites will tell Google to index the start page."
- Google Groups

Not to worry though because identifying sites using SWFAddress is easy! If a Flash site uses #anchors (a pound sign) in it's URLs chances are it's using SWFAddress. The problem with this SWFAddress is that it functions in only one direction, or so to speak.

Google ignores the #anchor in SWFAddress URLs as well as the entire path following the #anchor in URL. When users with Flash cut and paste a link from their address bar into their blog, digg and/or Linkedin, Google ignores everything starting with the #anchor and as a result misallocates keyword relevancy and PageRank to the "start page".

Some credit where it's due would have been nice but, either way I commend the good folks at Asual for their efforts as well as the new "COPY LINK TO CLIPBOARD" link in the footer of there SEO sample pages.

Like their "Quick Start Guide", it seems that the Google Webmaster Team's new booklet titled "Making the Most of Your Content: A Publisher's Guide to the Web" has slipped under most folk's radar but, not mine! Not to be confused with "Marketing and Advertising Using Google", this new "Google booklet" provides a wealth of information for anyone interested in search and especially Google.com. Best of all "Making the Most of Your Content: A Publisher's Guide to the Web" by the Google Webmaster Team is FREE of charge and can be downloaded in .pdf format free, by anyone with a connection to the internet.

"Making the Most of Your Content: A Publisher's Guide to the Web" by Google starts out with an overview of how Googlebot crawls the web. From there, the booklet explains how search has evolved since 2001 and introduces what many refer to as the "Google Freshness Factor". "What's new in Google web search?" is followed by a section called "Can Google find your site?". In "Can Google find your site?" the webmaster team explains how it's possible for Google to miss websites and or page on the web.

In the section titled "Can Google index your site?" the Google webmaster team explains how important structure and content are to search engines. This section investigates "indexability" and issues that hamper Google's ability to download a page for inclusion in search engine results pages. A few common mistakes by webmasters impacting indexability include fully dynamic pages, Flash, Javascript and frames. Google suggests using "alternative text" (important to note that "alternative text" is produced by ALT attributes) as well as descriptive file names (for example ourlogo.jpg and not image2.jpg) in web pages. In addition to these Google provides information on how to make URLs more search engine friendly, sever and network issues impacting search and the Robots Exclusion Protocol in terms of Robots.txt and/or robot meta data.

"Controlling what Google indexes" explains how webmasters can prevent Google from indexing page contents and how webmasters wishing to have content included on Google may do so. The booklet then explains the differences between robots.txt and robots meta tags. Webmasters wishing to have their content indexed by Google in search results should see the section called "Controlling caching and snippets". "Controlling caching and snippets" explains how Google chooses snippets displayed in Google search results and provides meta data examples to help webmasters and online marketers better control what users see in search results.

My favorite section of the new Google Webmaster Team booklet "Making the Most of Your Content: A Publisher's Guide to the Web", is called "Does your site have unique and useful content?". In this section Google reveals that search engine results are based on 200 criteria in addition to PageRank and seems to indicate that webmasters shouldn't "fixate" on PageRank alone but also on other factors Google considers. Google provides a few tips for webmasters looking to increase rankings in this section including:

1. Make great content that grabs users attention.

2. Involve users by helping to create a community with your site.

3. Monitor site usage via Google Webmaster Tools (Google Sitemaps in xml), Google Analytics, Urchin and/or other.

4. Quality inbound links, Google says they are important.

5. Clear text links, Google says text links and the "anchor text" or words linking to those links are important.

In addition to what webmasters should do to help make their sites more Googlebot friendly, Google says webmasters should not fill pages with keywords, cloak (return different results to users and search engines) or use "crawler pages" to manipulate search engines.

The next section of Google's new booklet for webmasters is Q&A, containing frequently asked questions answered by the Google Webmaster Team. My favorite is "Why can't you do one-on-one support for my website?" and Google's answer is that there are 100 million sites.

Following the Google Webmaster Team Q&A there is a glossary of definitions where Google "boils down" 20 or so, technical definitions into terms that anyone can understand. Two definitions really stood out to me because they are often not typically understood by all webmasters, developers and/or designers.

Dynamic content - Content such as images, animations, or video which rely on Flash, JavaScript, frames, or dynamically generated URLs.

To index - The process of having your site's content added to a search engine.

All and all, this is one of the best resources I've seen for helping non-technical folks better understand the basics of "natural" or "organic search". No matter your level of expertise I suggest "Making the Most of Your Content: A Publisher's Guide to the Web" by the Google Webmaster Team. This easy to understand booklet published by Google is one resource that I'll be using to help better explain search to clients in simple terms.

Both Google's "Making the Most of Your Content" & "Marketing and Advertising Using Google are available free.

Links to this post:
Search Engine Land - SearchCap: The Day In Search, February 11, 2008

As you know I'm always looking for ways to help Flash developers make content accessible to search engines. Today I received a link to a site claiming to have the answer to that age old question, "How to SEO Flash"?

The site claims to have a sample of "SEO SWFAddress 2.0" code that "provides a better separation between the content and the presentation." Better than what I'm not sure! Either way, the urls in the "SEO sample" still contain #anchors. Googlebot ignores #anchors in URLs and I'm really hoping "SWFObject 2.0" isn't based around such a myth!

A good example of this is Google's cache of "SEO Sample" portfolio 2.
Here is what the user sees:
http://www.asual.com/swfaddress/samples/seo/#/portfolio/2/?desc=true

As you can see the two pages are different and that is called cloaking!

- Sample from SWFAddress 2.0 Website

As an SEO I'm always looking for ways to help Flash developers make Flash sites more "search engine friendly". I recently came across an article on the Adobe Developer Connection that sounded interesting at first. As I kept reading, I was surprised by what they call a "solution" for the "one URL per page" issue as it relates to sites in Flash. To solve this deep-linking issue, Adobe proposes a method for directly linking to content that's "buried". The technique they suggest uses a variation of RESTful URLs in Flash. REST or "representational state transfer", basically uses one or more distinct URL/s linking directly to different content or different states of content within web-based applications.

The technique uses a "frame anchor" located in the URL to specify one specific frame in the main timeline. As a result, the playhead jumps to that specific frame in Flash and users with Flash enabled see content associated therein by the Flash developer.

"The syntax for writing a URL to point to a particular anchor location in HTML is to use the pound sign (#) followed by the designated name for the anchor, as in the following examples:

* #section1: a URL that points to the anchor named "section1" in the current HTML page

* some_html_page.html#appendix: a URL that opens the URL some_html_page.html and then scrolls to the anchor named "appendix""

from: "Deep-linking to frames in Flash Websites"

This solution may work for "deep linking" in Flash but it's yet another nightmare when it comes to making Flash sites search engine friendly. Bottom line, Googlebot ignores #anchors but browsers do not. So when users link to "any URL dot com" /home.html#/about.html from their blog, PageRank, "link juice" and/or relevancy intended for about.html is instead given to home.html.