Search

According to the White House, search engine optimization is a priority for US Government websites. Given the White House mandate and staggering number of Americans that search for health related information, you would think the new HealthCare.Gov website would be search engine friendly. Unfortunately the NEW site is not search friendly, due in part to the OLD site not being cleaned up properly. Until issues with the new version of the site are resolved and the old version of the site is cleaned up, users will continue to experience issues. Expert developers are usually focused on development and not technical search related issues.  As a result, technical search issues usually go unnoticed and continue to frustrate users.

Technical SEO site assessment is difficult to teach in a public setting because of the risk of potentially offending site owners.  Since we all own HealthCare.Gov, offending someone is not a problem.  As a result, I took a few minutes to check out the site and have documented a few critical issues below.  Please note, the list of issues outlined herein is by no means comprehensive and only took a few minutes to compile. Please feel free to post additional search related issues in the comment section below.  The objective of this post is to educate others and lend an extra set of eyes to the “A-Team".

Security:

It is widely known that HealthCare.Gov has a number of potential security issues and several of these are search related.

HealthCare.Gov security issues

Findings: Without going into detail for security reasons, it is currently possible to search and get results for “public and secure content” at HealthCare.Gov. Please note, this is an internal HealthCare.Gov IT issue, not a web search issue and has already been reported to HealthCare.gov.

Recommendation: Ensure access to content not intended for public consumption is password protected.

Accessibility:

"If fancy features such as JavaScript, cookies, session IDs, frames, DHTML, or Flash keep you from seeing all of your site in a text browser, then search engine spiders may have trouble crawling your site."

According to Google and Bing, websites should be tested with a text browser. Text browsers make it possible for webmasters to "see" sites more like search engine crawlers. This kind of testing will also reveal issues experienced by individuals with disabilities when accessing the site on an assistive device.

HealthCare.Gov accessibility issues

Findings: Individuals with disabilities and search engines may have difficulty accessing and interacting with portions of the new website.

Recommendation: Ensure that fancy site elements don't interfere with the delivery of important textual content across platforms, even when images and JavaScript are disabled.

Searcher Intent:

When users search for [healthcare.gov], chances are they want to navigate to HealthCare.Gov the US Government health insurance market place.

HealthCare.Gov search results

Findings: Currently when users search for [healthcare.gov] they are returned Google search results above. Clicking on the top result in the site link section takes searchers to finder.healthcare.gov which "is not the Health Insurance Marketplace.

Recommendation: Demote the Sitelink in question via webmaster tools.

Version Issues:

When the same text content appears on different webpages, it is considered duplicate content by search engines. There is no penalty for duplicate content but it can thin certain ranking signals. As a result, search engines recommend that webmasters specify the preferred version of each page.

HealthCare.Gov search results

Findings: www.HealthCare.Gov includes the same content as well as different combinations of content from various versions of both the old and new website. For example, Spa.HealthCare.Gov , www.HealthCare.Gov, Finder.HealthCare.Gov and LocalHelp.HealthCare.Gov just to name a few. As a result, it is possible that searchers will arrive at the unintended subdomain and the site will appear not to work.

Recommendation: Use rel=canonical attributes to specify which page version is preferred and return 410 HTTP responses for pages at additional subdomains.

Soft 404 Pages:

Usually, when someone requests a page that doesn’t exist, a server will return a 404 (not found) error. This HTTP response code clearly tells both browsers and search engines that the page doesn’t exist. As a result, the content of the page (if any) won’t be crawled or indexed by search engines.https://support.google.com/webmasters/answer/181708?hl=en

HealthCare.Gov soft 404

Findings:HealthCare.Gov errors do not redirect to a dedicated 404 landing page or return a 404 HTTP response. As a result, URLs for pages without content will be indexed by search engines when posted online. In addition, versions of older pages like http://finder.healthcare.gov/404.html return a 302 HTTP response which is a temporary redirect. As a result, site error pages will continue to be indexed and frustrate users.

Recommendation: Create a dedicated 404 page which returns a 404 HTTP response and redirect error requests to the dedicated 404 URL.

Development Platform Indexing:

Findings: The new HealthCare.Gov website appears to have been developed at the subdomain Test.HealthCare.Gov. This subdomain does not appear to have been password protected and as a result was crawled and indexed by search engines. Currently 100s of pages from this subdomain are indexed in search results. In order to help prevent searchers from going to the developer version of the site, Test.HealthCare.Gov now returns a 503.  Disallowing via robots.txt or returning a 503 will not prevent pages from appearing in search results.  The only way to prevent content from appearing in search results is to add the noindex meta tag or password protection.

HealthCare.Gov development platform

Recommendation:  To have this content removed from search results return a 401 HTTP response.

Breadcrumbs

"A breadcrumb trail is a set of links (breadcrumbs) that can help a user understand and navigate your site's hierarchy." In order to understand information in a page, searchers need to know where they have landed in the site architecture.

HealthCare.Gov development platform

Findings: When users arrive at the page above from search results there is currently nothing to indicate where the user is within the site architecture. For example, if a users arrives at the page above from search, there is nothing to indicate whether this information applies to business or individual health care plans.

Recommendation: Implement breadcrumb navigational elements in each page.

One of the biggest Super Bowl XLV TV commercial fumbles in terms of search engine marketing, was GoDaddy's ad for GoDaddy.co featuring Joan Rivers and Jillian Michaels. It took several minutes for the URL to even appear in Google search results. This delay was probably due in part to the JavaScript redirect employed. Once indexed, the URL provided no meta description and as a result appeared without a snippet. Snippets in search results help users and increase click through rates. To add insult to injury, when Godaddy.co finally appeared in search results it did so directly under a competitor's ad with no GoDaddy ad in sight.

Over the holidays, Google rolled out a pretty major update to Webmaster Tools. This latest update provides much more detail in terms of data and reporting. So much in fact, that some folks seem confused now about the difference between Google Webmaster Tools and Google Analytics. The big difference for SEO is that Google Webmaster Tools shows Google's own data for URLs in Google SERPs and doesn't track web pages like Google Analytics. In addition to the key difference in reporting, Google Webmaster Tools requires no installation. While it's difficult to say for sure, this update should force folks to abandon the ignorance is bliss mentality when it comes to analytics reporting once and for all.

BUT before diving in, here is a little background...

In 2005, with a little help from a Googler named Vanessa Fox, Google launched Google Sitemaps. This program has since, evolved into what we know as Google Webmaster Central. About the same time, Google bought Urchin and shortly after made Google Analytics free to everyone. Back then small to medium sized sites that couldn't afford enterprise analytics relied primarily on ranking reports to measure search visibility.

Ranking reports are created with software that emulates users and sends automated queries to search engines. The software then records data about positioning within organic search results by specific keywords and URLs. Ranking reports don't provide bounce rates but, they do provide an important metrics for measuring SEO ROI directly from Google SERPs. That being said, automated queries from ranking software are expensive for search engines to process and as a result they are a direct violation of search engine guidelines.

In 2010 Google introduced personalization features in organic search engine results. These personalized results are based on the user's history, previous query, IP address and other factors determined by Google. Over the past two years, Google's personalized search results have rendered ranking reporting software nearly useless.

Enter Analytics… Without accurate ranking reports, analytics may seem like a decent alternative tool for measuring SEO ROI by URL but, is that really the case? If analytics were enough why did Google recently update Google Webmaster Tools? These are a couple of the questions that I hope to answer.

To start off, let's establish a few laws of the land...

Google Webmaster Tools Update Case Study: Redirects

Experiment: To compare 301 and 302 reporting accuracy between Google Analytics and Google Webmaster Tools

Hypothesis: Google Analytics incorrectly attribute traffic when 301 and/or 302 redirects are present.

Background: Google ranks pages by URL, for that reason accurate reporting by specific URL is critical. In order for Google Analytics to record activity a page must load and Google Analytics JavaScript must execute. Google Analytics reports based on a page and not URL. While most "pages" have URLs not all URLs result in page views. This is the case when 301 and/or 302 server side redirected URLs appear in search results.

Procedure: For this comparison, I created apples.html and allowed it to be indexed by Google. I then created oranges.html and included noindex meta to prevent indexing until the appropriate time. After ranking in Google's SERPs, apples.html was 301 redirected to oranges.html and results recorded.

Result:
According to Google Analytics, oranges.html is driving traffic from Google SERPs via "apples" related keywords. Google Webmaster Tools on the other hand, reports each URL individually by keyword and remarks the 301 redirect.

Conclusion: Google Analytics reports oranges.html is indexed by Google and ranks in Google SERPs for apples.html related keywords. However reporting that data to clients would be a lie. Oranges.html hasn't been crawled by Google and isn't actually indexed in Google SERPs. Secondly, until Google crawls and indexes the URL oranges.html it is impossible to determine how or if it will rank in Google search results pages. In addition, this data becomes part of the historical record for both URLs and is calculated into bounce rates for URLs not shown in SERPs.

(Google's Caffeine has improve the situation for 301 redirects as time between discovery and indexing are reduced.)

Google Webmaster Tools Update Case Study: Complex redirects

Experiment: To compare differences in tracking via multiple redirects from SERPs ending on off-site pages.

Hypothesis: Multiple redirects ending off-site are invisible to Google Analytics because there is no page load.

Background: Google ranks pages by URL, for that reason accurate reporting by URL is critical. In order for Google Analytics to record activity a page must load and Google Analytics JavaScript must execute. While most "pages" have URLs not all URLs render pages. In most cases 301 issues are resolved by engines over time, however 302 issues will remain. The same is the case for multiple redirects ending off-site.

(For those who aren't aware, this is one way spammers try and trick Google into "crediting" their site with hundreds or thousands and sometimes even hundreds of thousands of content pages that actually belong to someone else.)

Procedure: To test how Google Analytics handles multiple redirects, I created page1.html which 302 redirects to page2.html which 301 redirects to another-domain.com. Google indexes the content from another-domain.com but SERPs show it as residing at the URL page2.html.

Result: Despite being ranked in SERPs, Google Analytics has no data for these URLs. Google Webmaster Tools reports the first two URLs and remarks redirects.

Conclusion: Google Webmaster Tools recognizes the existence of the URLs in question while Google Analytics doesn't at all and that is a major problem. For SEO reporting these URLs are critical, the content is real and it's impacting users as well as Google.

Google Webmaster Tools Update Case Study: Installation

Experiment: To compare tracking without Google Analytics tracking code installed.

Hypothesis: Google Analytics won't track if tracking code is not installed properly on each page within site architectures supporting analytics.

Background: In order for Google Analytics to record data it must be implemented correctly in each page and be able to communicate with Google. Legacy pages without the Google Analytics tracking code often rank in SERPs but, go unnoticed because they're invisible to analytics. In addition to this issue there are various other situations where untracked content appears in Google's index. Even when implemented properly, analytics tools are often prevented from reporting due to architectural problems.

Procedure: To test how Google Analytics works without proper installation, I setup an account but DID NOT implement the Google Analytics tracking code snippet into pages.

Result: Google Analytics reports that their has been no traffic and that the site had no pages but, Google Webmaster Tools reports as usual impressions by keyword, by URL, CTR and other.

Conclusion: In order to function properly Google Analytics must be implemented in each and every page and function properly in addition to being supported by the site architecture. Google Analytics requires extensive implementation in many cases which is an extra obstacle for SEO. Google Webmaster Tools data is direct from Google, requires no implementation and verification is easy.

Google Webmaster Tools Update Case Study: Site Reliability

Experiment: To see how Google Analytics tracks pages when a website goes offline.

Hypothesis: Google Analytics will not track site outages.

Background: In order for Google Analytics to record data it must be properly implemented, supported by the site's architecute and be able to communicate back and forth with Google.

Procedure: To test how Google Analytics reports when a site goes offline, I turned off a website with Google Analytics installed.

Result: Google Analytics reports no visitors and/or other metrics but suggests nothing about the real cause. Google Webmaster Tools - reports errors suggesting the site was down.

Conclusion: Google Analytics does not report site outages or outage error URLs whereas Google Webmaster Tools does. For SEO, site uptime is critical.

Final thoughts...

As illustrated above, analytics will report keywords for URLs that aren't indexed and won't report keywords for URLs that are indexed in SERPs. Analytics is unaware of redirected URLs even those indexed by Google and seen by users worldwide. Analytics can't tell the difference between a brief lack of visitors and periods of site downtime, it's possible for analytics tracking code to fire without pages loading and pages loading without firing tracking code. Analytics doesn't know framed content is indexed, or about legacy pages without tracking, alternative text versions of Flash pages, how long pages take to load, and on, and on, and on....

In fairness, the tool is doing what it is designed to do, folks using it just don't understand the limitations. Often times, they aren't aware data is fragmented and/or missing or that site architecture impact reporting ability. Checking Google to see if SERPs jive with reports never occurs for some reason.

I've been kvetching about these issues for years, to anyone and everyone who would listen. If you can't tell, few things F R U S T R A T E me more.

The case studies above represent just a few ways in which analytics data is skewed due to bad and/or missing data. Believe it or not, a substantial amount of analytics data is bogus. According to one Google Analytics Enterprise partner, 44% of pages with analtyics have analytics errors. On average analytics only tracks about 75% of traffic. Analytics is a weird beast, when something goes wrong nothing happens in analytics and sometimes it happens on invisible pages. :)

Bad data attacks like a virus from various sources polluting reporting exponentially, silently, undetected and over time. Sadly, very few folks including most "analytics experts" have the experience or expertise to track down issues like these by hand. Until now there has been no tool to report analytics not reporting. The recent Google Webmaster Tools update empowers webmasters by providing them with the best data available. This update exposes analytics issues. It also places the burden of proving data measurement accuracy back on the folks responsible for it.

Oh yeah, HAPPY NEW YEAR!

Matt Cutts provided some interesting details about where the industry is headed, last week at PubCon.

During the "Interactive Site Review" session, Matt suggested investigating the history of each domain name you own or plan to purchase. He suggested avoiding domains with a shady history and dumping domains that appear to have been burned in the past. To investigate the history of a domain, Matt suggests Archive.org. Matt said, blocking Archive.org via robots.txt is a great indication of spam when already suspected.

Matt mentioned speed several times. During the "Interactive Site Review" Matt said that webmasters need to pay more attention to speed. He pointed out that landing page load time factors into AdWords Quality Score and said speed will be a big trend in 2010. During Matt's "State of the Index" presentation, he pointed out Google's tools for measuring page speed and even mentioned webpagetest.org a third party tool. According to Matt, Google is considering factoring page load speed into rankings. Matt said, that Larry Page wants pages to flip for users on the internet. He illustrated this point with Google Reader's reduction of pages from 2mb to 185kb. Nothing official yet but, something to keep an eye on for sure!

During Q&A for "The Search Engine Smackdown" session Matt explained Caffeine as being like a car with a new engine but not an algorithm change. Matt said, Caffeine will help Google index in seconds and that it should be active within a few weeks on one data center. That said, Caffeine won't roll out fully until after the holidays. Matt pointed out that Google is built for load balancing and for that reason isolating individual IPs for Caffeine testing access is difficult. Matt also mentioned that AJAX SERPs and Caffeine aren't related but that Google will continue testing AJAX SERPs.

In case you missed it, I was in Las Vegas last week for PubCon 2009. It was my first PubCon and as you can imagine, lots of fun! As far as presentations, every presentation was great but I do have a few favorites. Here they are in order of appearance…

- One of my favorite presentations at PubCon was Rob Snell's "Ecommerce and Shopping Cart Optimization." Rob always impresses me with his creativity and common sense approach to increasing conversions with things like original content creation. Rob stressed liberating manufacturer content in addition to creating original product descriptions and content. Maybe it's a "southern thing" but is for certain, Rob is no "Dummy" when it comes to ideas for developing great content to increase ROI.

- Another great presentation was Ted Ulle's "SEO Design & Organic Site Structure." Ted's FRANKENSITE analogy was really great! He focused on the importance of keeping things simple and setting goals early. Ted offered some other really great advice about documenting decisions, graphic design being placed lower down the priority list and why "code geeks" shouldn't write copy. Splitting a cab with Ted was also a big thrill, it's not every day I get to ride with celebrities.

- Vanessa Fox's "Multivariate Testing and Conversion Tweaking" presentation was really interesting. In addition to providing recent data about the average number of keywords per query, Vanessa dove into the topic of personas and the role they play in conversions. According to Vanessa, focusing only on ranking reports can cause you to miss important information. That said, I've already pre-ordered Vanessa's new book and strongly suggest you do too.

- As always, Matt Cutts was truly entertaining during the "Interactive Site Review: Organic Focus" session at PubCon. (Tip, if your site is obviously spamming don't sign it up for review! ;) ) I know Barry has been giving Matt a hard time about not attending conferences lately but, Matt really went above and beyond even shaving a spammy head or two at PubCon 2009 :).

- Greg Hartnett, Michael McDonald, Barry Schwartz, Lee Odden and Loren Baker teamed up for "Search Bloggers: What's Hot and Trending?". This session was a jam packed PowerPoint free dialogue between the best in the industry.

- Saving the best for last, my favorite session was "Super Session : Search Engines and Webmasters." Shawn from Microsoft was up first and talked about Bing's recent changes. He demonstrated Bing's hover preview feature and talked about the new and improved MSNBOT 2.0b According to Shawn, Steve Blamer expects to win search and acquire 51% market share with Bing. After Shawn, Matt Cutts presented Google's "State of the Index." Matt talked a lot about the importance of site speed and Google's new social search experiment. He suggests digging deeper into Google Webmaster Tools as well as subscribing to the blog and YouTube channel.

PubCon was a great conference and I strongly suggest it to anyone interested in interactive marketing. Thanks again to Neil Marshall and the PubCon staff, Barry Schwartz and Search Discovery Inc..