Google Glass: Google Integration

Google Glass Ads

What paid advertisements might look like in Google Glass.

Google Glass view of ocean trench in Google Earth

What an ocean trench in Google Earth might look like when viewed through Google Glass.

Google Glass Street view

What Google's Streeview page for the Metropolitan Museum in New York City might look like in Google Glass.

Last week, Google revealed project Glass. Google Glass is a new software and hardware platform designed to help users explore and share their world, by putting the user “back in the moment.” Interestingly, Google’s “Glass” video did not include paid advertisements, virtual reality simulations, video display and did not make any clear references to integration with other Google properties. There are already competitors for Google’s Glass hardware and several different prototypes are currently being tested.

Google Shaded Glasses

In addition to the clear prototype revealed last week, when Google Glass is made available to the public, expect to see a shaded version that incorporates video monitoring functionality for integration with other Google products like GoogleTV, YouTube, Google Earth, Google Streetview and Google Maps.

  • Digg
  • Facebook
  • Google Bookmark
  • StumbleUpon

 

Why URLs Disallowed By robots.txt Appear In Google Search Results

It seems like there has been a lot of confusion about robots.txt recently and why URLs disallowed by robots.txt appear in Google search results.

The directives in robots.txt for disallowing URLs was originally intended to help site owners preserve bandwidth and prevent outages caused by robot traffic. Robots can consume lots of bandwidth and in the early years, it was not uncommon for Google to crash websites or make them inaccessible for users during a crawl cycle. By disallowing robots, site owners could limit bandwidth used by robots. Blocking robot traffic helped site owners to ensure that their website was available for humans.

At that time, even though many site owners did not want search engines using up bandwidth, they did not mind the traffic from search engines. Search engines on the other hand, wanted to return relevant results for URLs disallowed or not. Instead of returning no results for queries like [amazon.com] for example, if that site was disallowed by robots.txt, Google returned search results based on data collected from other websites. This is why disallowed URLs appear indexed in search results. Google can return uncrawled URL references for disallowed URLs by combining anchor text and description data extracted from other sites. As a result, uncrawled URL references for disallowed URLs may appear indexed in Google search results pages.

Disallowing a URL via robots.txt will not prevent it from appearing indexed in search results pages. In order to prevent URLs from appearing in search results pages, webmasters should implement rel=noindex meta and/or use password protection. In order to remove URLs disallowed by robots.txt but indexed in Google SERPS, webmasters should use the URL removal tool in Google Webmaster Tools.

Here are some other tips for success with robots.txt:

- Search engines do not check robots.txt for every page request. Many search engines update robots.txt data once every 24 hours. For that reason, disallowed URLs added between updates may be accidentally crawled and indexed. To ensure pages aren’t crawled, be sure to add future URLs to your robots.txt file 24 to 36 hours in advance of adding actual content.

- URLs in robots.txt are case-sensitive. For that reason, blocking aboutus.html will not prevent ABOUTUS.html, Aboutus.html, AbOuTUs.html and/or AboutUs.html from being crawled.

- “When you add the +1 button to a page, Google assumes that you want that page to be publicly available and visible in Google Search results. As a result, we may fetch and show that page even if it is disallowed in robots.txt or includes a meta noindex tag.” (http://support.google.com/webmasters/bin/answer.py?hl=en&answer=1634172)

- Disallowing URLs that 301 redirect, will prevent search engines from “seeing” the redirect. As a result, engines may continue to index the incorrect URL.

- Disallowing URLs with pages containing noindex meta will prevent search engines from “seeing” the noindex meta tag and as a result noindex pages may appear indexed in Google search results.

- When 301 redirecting www or non-www versions of URLs to the preferred version, don’t forget to redirect your robots.txt file as well.

  • Digg
  • Facebook
  • Google Bookmark
  • StumbleUpon

 

Google Drive Launch Imminent

google drive

According to a number of sources, Google is about to launch a new product called “Google Drive“. The code above, which is embedded in Google Docs does seems to support that idea. In addition to the embedded code (above), Google Docs also includes an “Add to My Drive” button (below) that is currently in place just not visible to users. After discovering the first description of GDrive in 2009, I’m always a little skeptical of Google actually launching this service but this evidence seems pretty clear.

google drive button

  • Digg
  • Facebook
  • Google Bookmark
  • StumbleUpon

 

Hijacking WeSolveForX.com Under Google X-Labs Nose

Earlier today Google launched www.wesolveforx.com. This site is reported to be the new website for Google’s highly Top Secret “X-Lab project” but, one secret is already out.

The content seen by users at www.wesolveforx.com actually resides on an internet marketing agency’s website http://www.thinkbelieveact.com/solveforx/. Meaning that Google’s new site www.wesolveforx.com, is currently little more than a domain name. To make things worse, this content is indexed on the agency’s website by both the www and non-www versions of the URL.

Background information aside, what interests me is that my +1 for www.wesolveforx.com was actually credited to Google’s agency website www.thinkbelieveact.com instead of where I intended. While this makes sense given the technical issue at hand, Google’s agency does appear to be benefiting in some ways from this situation. To be fair though, this situation may not have been easily avoidable because adding the +1 button to pages causes Google to ignore disallow directives in robots.txt and meta noindex tags. For that reason, maybe some trade-offs had to be made I’m not sure.

Either way, it seems like preventing +1 buttons from appearing in framed content might be a good idea?

UPDATE: Larry Page mentioned earlier today that the new site is now live. The new site is now live but as of the time of this update, the pages on Google’s agency site are also still live.

  • Digg
  • Facebook
  • Google Bookmark
  • StumbleUpon

 

What is Quality Content: The Road to Panda Recovery

Awesome Quality Content Google Panda WashShallow content” provides little value to users and offers low utility based on the intent of the query. However, technically this type of content is not considered spam by Google because it does not violate Google Webmaster Guidelines. Google provides information about creating websites and thin content but, thus far has provided little if any official documentation about shallow content. Until recently, shallow content fell between teams internally at Google. In order to address the influx of shallow content that resulted from Google Caffeine, Google launched “Panda.” Google Panda is a holistic approach for addressing site quality at scale algorithmically. The quality signals used by Google Panda were derived from answers to questions about site quality. According to Google, these questions were created by a Google Engineer whose last name is Panda and answered by Google’s little known Evaluation team. While there is no way of knowing how Google Evaluators answered Panda’s questions, those with an in depth understanding of related research have a pretty good idea.

In order for answers to Panda’s questions to be incorporated by Google as algorithmic quality signals and used for rankings, Google would first need to tie these answers to specific site elements and/or data. Since any algorithm can be gamed, Google is not going to talk about specific element or data signals used for rankings. With that being the case, Google designs algorithms to scale with present data as opposed to possible data. According to Google, by doing so, they are able to draw reliable inferences from actual evidence in data. In fact, Google could draw signals from a wide variety of present data sources both on-site and off. In addition, Google could combine present data to draw inferences about data not present. This data could be weighed differently depending on the type of site, query, location, presence of other signals as well as any number of countless other data variables. As a result of the level of complexity involved, “algorithm chasing” could prove quite difficult in the future.

Improving site quality from the user’s perspective is the best way for sites negatively impacted by Google Panda to recover. According to Google, “high quality content is content that you can send to your child to learn something.” By raising standards for producers and driving progress in terms of content quality, Google Panda will increase the quality expectations of users over time. In order to successfully acquire and convert traffic from organic search channels moving forward, marketers, webmasters and site owners will need to understand, employ and embrace best practices for developing quality content. With that in mind, a number of web professionals clearly still do not understand the meaning of “quality content.” The purpose of this post is to help those individuals who are not quality or search experts to increase their level of knowledge in these areas.

The examples of Panda’s questions published by Google focus on quality, authority, credibility and trust. Fortunately when it comes to content evaluation, authority, site credibility and trust, there is no shortage of research available. I’ve been researching these topics for years and can say that Panda’s questions closely align, as they should, with published research in related areas. For example, the standard used by librarians to evaluate content quality was developed by William Katz and the dimensions for assessing authority was developed by Peter Wilson. In addition to quality evaluation and authority, other Panda questions focus on trust and credibility. These questions align with research by BJ Fogg. In addition to evaluation, authority, trust and credibility, quality content also possesses certain technical characteristics according to Google and W3C. I’ll touch on just a few of these topics below and try to tie them together into specific actionable items for improving quality.

Content Authority:

Author identification is the first step in terms of determining authorship and in turn authority. In order for an author to be perceived by users as an authority on a specific topic, users must first be able to identify the author’s identity and then recognize the author’s expert qualifications.

Types of Authority:

  • Personal authority – author
  • Institutional authority – publisher
  • Textual authority – document
  • Intrinsic plausibility authority – text

When authors provide certain qualifications related to a specific topic, users may perceive the author as an authoritative source of information on a specific topic. That “Authority” increases the perceived value of content in the minds of users. Authority is also used by search engines in a wide variety of ways and on a number of levels. In fact, Google recently began displaying author information in search results. According to Google, displaying author information in search results “helps users discover great content.” By displaying author information in search results, Google helps users to identify authors and that is the first step in terms of the authoritative content discovery process.

Just like print, authors should provide qualifications related to their specific topical area(s) of expertise. In addition to being recognizable to users, authors should provide this data in a way that is clear to search engines. To accomplish both goals, bloggers for example could create a “bio” page outlining their credentials and qualifications. By including a link to their bio page from each article, blog authors could help to establish perceived authority. In order for Google to key in on these elements and fully associate author information with a page, authors should link their bio page to their Google+ profile page using the appropriate mark up which includes the rel=author parameter. In order to help fully verify authorship, authors should consider linking from their Google+ citations to their bio page and/or verifying their address via their Google+ profile page.

Institutional authority comes from the reputation of the publisher. In addition to rel=author, Google also supports the rel=publisher parameter which confirms a website is the publisher of a Google+ page. Including both parameters in the same page is perfectly fine to do as well, even though doing so may currently result in an error according to Google’s Rich Snippet Tool.

Also important to note, the importance of evaluation is based on the perceived significance of negative consequences that could result from having the wrong information. For that reason, providing obvious expert qualifications for medical related content is critical.

Site Credibility:

Authority also goes a long way in terms of establishing credibility. Credible websites have a high percentage of up-time, are easy to use, easy to navigate and have high PageSpeed. They look professional based on the purpose of the site, make verifying content accuracy easy and provide physical contact information. Finally they demonstrate concerns about users and user information. For example they include a privacy policy and provide secure pages when sensitive data is requested. Credibility is the sum of expertise plus trustworthiness according to Fogg. There are four types of credibility and each could be translated into quality signals used by search engines.

Types of Credibility:

  • Presumed credibility which is based on beliefs. Brand names or .edu sites could be signals of presumed credibility to search engines.
  • Earned credibility is another type of credibility and it is based on previous experiences. Search engines could derive earned credibility quality signals from on-site ratings and reviews as well as third party ratings and reviews about a website, company or product.
  • Surface credibility is the third type of credibility and is based on face value. Surface credibility signals could include overall site design, the presence of trust icons and PageSpeed.
  • Reputed credibility is another form of credibility. Reputed credibility signals could be derived by search engines from off-site references, citations or inbound links from other websites.

Content Purpose:

It is critical for content to fulfill a unique purpose, otherwise site content may be considered to have no purpose and therefore be considered low quality.

Certain actions such as cloaking, writing text in such a way that it can be seen by search engines but not by users, or setting up pages/links with the sole purpose of fooling search engines may result in removal from our index.
- Google

Poor quality content purpose signals could include:

  • pages with lots of words saying little
  • pages with no purpose
  • pages whose sole purpose is advertising or affiliate income
  • blocking Archive.org

Content Scope:

In terms of scope, the purpose of each page should align with the scope of the website as a whole.

“It’s important for webmasters to know that low quality content on part of a site can impact a site’s ranking as a whole. For this reason, if you believe you’ve been impacted by this change you should evaluate all the content on your site and do your best to improve the overall quality of the pages on your domain. Removing low quality pages or moving them to a different domain could help your rankings for the higher quality content.”
- Google

Poor quality content scope signals could include:

  • pages that don’t appear to align with the purpose of other pages
  • irrelevant content
  • less than extensive information
  • duplicate content
  • general information

Content Reliability:

Uptime, accessibility and functionality can all be quality indicators.

Poor quality content reliability signals could include:

  • site outages
  • lack of functionality
  • 404 pages that don’t return a 404 error
  • slow PageSpeed

Content Relevancy:

Content should be relevant to the user’s query based on the intent of the query.

“One of the most important steps in improving your site’s ranking in Google search results is to ensure that it contains plenty of rich information that includes relevant keywords, used appropriately, that indicate the subject matter of your content.”
- Google

Poor quality content relevancy signals could include:

  • “CLICK HERE” anchor text
  • high impressions with low clickthrough rates (CTR) in Google Webmaster Tools
  • high bounce rates
  • irrelevant links
  • relevant content that is irrelevant based on searcher intent
  • targeting on high value keyword terms

Content Recency:

Content should be kept up to date and/or updated frequently depending on the purpose of the website.

“A frequently updated site encourages people to return – as long as your content remains relevant and engaging. A useful post once a week is better than low-quality content published daily. A great idea is to search Google for subjects of interest in your field. If you can’t find a good answer, create a blog post about the subject – chances are that other people are searching for the same thing.”
- Google

Poor quality content recency signals could include:

  • old copyright dates on pages
  • lack of byline dates in posts
  • lack of modified dates in pages
  • lack of support for If-Modified-Since
  • software needing updates
  • expired content
  • out of date XML Sitemaps
  • expired product feeds

Content Format:

Visual elements should align with the purpose of a site and not interfere with content. Site design should convey an appropriate level of professionalism and images should be original and high quality based on the purpose of the site. When ads appear, they should not interfere with content or user interaction with content.

Excessive Ads

“Provide high-quality content on your pages, especially your homepage. This is the single most important thing to do. If your pages contain useful information, their content will attract many visitors and entice webmasters to link to your site. In creating a helpful, information-rich site, write pages that clearly and accurately describe your topic. Think about the words users would type to find your pages and include those words on your site.”

- Google

Poor quality content format signals could include:

Content Arrangement:

Content should always be presented in an ordered and orderly fashion. Navigational elements should make it clear to users where they are within the site architecture. Sites should include navigational breadcrumbs with delimiters and links that accurately reflect site hierarchy.

“Users know good content when they see it and will likely want to direct other users to it. This could be through blog posts, social media services, email, forums, or other means. Organic or word-of-mouth buzz is what helps build your site’s reputation with both users and Google, and it rarely comes without quality content.
- Google

Poor quality content arrangement signals could include:

  • breadcrumbs without delimeters
  • sites without a clear hierarchy
  • image based navigational elements
  • sites without an HTML site map for users
  • too many links on a single page
  • island pages

Content Treatment:

Content should be unique, factual, accurate, objective, use proper spelling and grammar. Both sides of a story should be presented when possible and with as little bias as possible. Unbiased content, reviews and rankings help increase perceptions of trustworthiness. According to Google “70% of Americans look at product reviews before making a purchase” and “the worldwide average for product reviews is a 4.3 out of 5.0.”

“We looked at a variety of signals to detect low quality sites. Bear in mind that people searching on Google typically don’t want to see shallow or poorly written content, content that’s copied from other websites, or information that are just not that useful.”

- Google

Poor quality content treatment signals could include:

  • bias
  • poor spelling
  • bad grammar
  • inaccurate information
  • factual errors

Additional Content Quality Resources:

  • Digg
  • Facebook
  • Google Bookmark
  • StumbleUpon