JohnMu aka Googler John Mueller, confirmed Google's use of sitemaps on Sunday and suggests using only quality meta data in xml sitemaps.

In his Google Groups post, John Mueller goes on to mention specifics as to how Google uses meta data in xml sitemaps submitted via Google Webmaster Tools :

URL - According to Mueller, it's best to list only working URLs in xml sitemaps and only the correct version for canonical URLs. For canonical URLs, he suggests providing the "/" version and not "index.html" in his example. He goes on to point out the importance of using the same URL found in the site's navigation and if necessary to use 301 redirects to that same URL when necessary. The navigation issue if important especially if something other than a crawler creates your sitemap. Either way, it's worth testing to be sure your Sitemap URLs are identical to those in the user path (I've actually had near knock down drag outs over this issue). JohnMu suggest only including URLs to indexable content like (X)HTML pages and other documents. In addition he points out, it's best to only include URLs webmastes want indexed.

Last modification date - In his post Mueller points out the difficulty Google can have with determining a "Last modification date" for dynamic sites due to their dynamic nature. He suggests either using the correct time or none at all. John suggests using a "Last modification date" but not "Change frequency" unless webmasters can establish a consistent frequency.

Change frequency - Like "Last modification date", Mueller suggests not using a date/time if the actual one isn't available.

Priority - Mueller suggests not including "Priority" meta data in xml sitemaps unless webmasters feel they can provide accurate data.

In summary, JohnMu suggests XML files that contain URLs for inclusion in Google's index and only those found in the site's navigation. He suggests "Date or change frequency" and "Priority" as optional meta data.

UPDATE: JohnMu has posted additional information over at Search Engine Roundtable in response to Barry's post.

- beu

Google recently announced that they will soon start including landing page load time as a factor in determining Google AdWords Quality Score. Here are a few simple and easy tips designed to help anyone decrease their load times and speed up landing pages. I've listed just a few below but, feel free to comment with more if you'd like.

  • Avoid 301, 302 and JavaScript redirects to your landing pages and don't use interstitial pages.
  • Reduce or eliminate the number of session ID and arguments in landing pages.
  • Use absolute URLs for dependencies.
  • Use external CSS and move calls for external CSS to the top of the HEAD in your landing pages and just below the TITLE.
  • "Prefectch" landing page image dependencies near the top of the HEAD in your landing page HTML.
  • Keep page dependencies within the same domain. In other words, try to avoid framed content and/or any content dependancy residing at another domain from loading into your page.
  • Remove unnecessary "white space" in HTML code including text that is "commented out".
  • Avoid embedded Flash content in your landing pages, especially when content in Flash is being pulled from another source.
  • Avoid animated gifs and unoptimized images of any type.
  • Reduce the total number of images in your landing pages and specify their size in the src container.
  • Reduce the size of images in your landing page by 10%.
  • Use CSS instead of relying on "spacer.gif" or "clear.gif" images to style the look and feel of your landing pages.
  • Allow caching when possible.
  • Use external JavaScript and move external JavaScript like analytics code and other to the bottom of your landing pages.

- beu

Like their "Quick Start Guide", it seems that the Google Webmaster Team's new booklet titled "Making the Most of Your Content: A Publisher's Guide to the Web" has slipped under most folk's radar but, not mine! Not to be confused with "Marketing and Advertising Using Google", this new "Google booklet" provides a wealth of information for anyone interested in search and especially Best of all "Making the Most of Your Content: A Publisher's Guide to the Web" by the Google Webmaster Team is FREE of charge and can be downloaded in .pdf format free, by anyone with a connection to the internet.

"Making the Most of Your Content: A Publisher's Guide to the Web" by Google starts out with an overview of how Googlebot crawls the web. From there, the booklet explains how search has evolved since 2001 and introduces what many refer to as the "Google Freshness Factor". "What's new in Google web search?" is followed by a section called "Can Google find your site?". In "Can Google find your site?" the webmaster team explains how it's possible for Google to miss websites and or page on the web.

In the section titled "Can Google index your site?" the Google webmaster team explains how important structure and content are to search engines. This section investigates "indexability" and issues that hamper Google's ability to download a page for inclusion in search engine results pages. A few common mistakes by webmasters impacting indexability include fully dynamic pages, Flash, Javascript and frames. Google suggests using "alternative text" (important to note that "alternative text" is produced by ALT attributes) as well as descriptive file names (for example ourlogo.jpg and not image2.jpg) in web pages. In addition to these Google provides information on how to make URLs more search engine friendly, sever and network issues impacting search and the Robots Exclusion Protocol in terms of Robots.txt and/or robot meta data.

"Controlling what Google indexes" explains how webmasters can prevent Google from indexing page contents and how webmasters wishing to have content included on Google may do so. The booklet then explains the differences between robots.txt and robots meta tags. Webmasters wishing to have their content indexed by Google in search results should see the section called "Controlling caching and snippets". "Controlling caching and snippets" explains how Google chooses snippets displayed in Google search results and provides meta data examples to help webmasters and online marketers better control what users see in search results.

My favorite section of the new Google Webmaster Team booklet "Making the Most of Your Content: A Publisher's Guide to the Web", is called "Does your site have unique and useful content?". In this section Google reveals that search engine results are based on 200 criteria in addition to PageRank and seems to indicate that webmasters shouldn't "fixate" on PageRank alone but also on other factors Google considers. Google provides a few tips for webmasters looking to increase rankings in this section including:

1. Make great content that grabs users attention.

2. Involve users by helping to create a community with your site.

3. Monitor site usage via Google Webmaster Tools (Google Sitemaps in xml), Google Analytics, Urchin and/or other.

4. Quality inbound links, Google says they are important.

5. Clear text links, Google says text links and the "anchor text" or words linking to those links are important.

In addition to what webmasters should do to help make their sites more Googlebot friendly, Google says webmasters should not fill pages with keywords, cloak (return different results to users and search engines) or use "crawler pages" to manipulate search engines.

The next section of Google's new booklet for webmasters is Q&A, containing frequently asked questions answered by the Google Webmaster Team. My favorite is "Why can't you do one-on-one support for my website?" and Google's answer is that there are 100 million sites.

Following the Google Webmaster Team Q&A there is a glossary of definitions where Google "boils down" 20 or so, technical definitions into terms that anyone can understand. Two definitions really stood out to me because they are often not typically understood by all webmasters, developers and/or designers.

Dynamic content - Content such as images, animations, or video which rely on Flash, JavaScript, frames, or dynamically generated URLs.

To index - The process of having your site's content added to a search engine.

All and all, this is one of the best resources I've seen for helping non-technical folks better understand the basics of "natural" or "organic search". No matter your level of expertise I suggest "Making the Most of Your Content: A Publisher's Guide to the Web" by the Google Webmaster Team. This easy to understand booklet published by Google is one resource that I'll be using to help better explain search to clients in simple terms.

Both Google's "Making the Most of Your Content" & "Marketing and Advertising Using Google are available free.

Links to this post:
Search Engine Land - SearchCap: The Day In Search, February 11, 2008