Google

If you missed it earlier, be sure to check out Barry Schwartz's live blog coverage of SMX Live: You&A with Matt Cutts, head of Google's Webspam team. For those interested, I have posted my notes from the session below.

Penguin:

  • The lead engineer for what would come to be known as "Penguin" picked that name.
  • Penguin addresses spam.
  • Impacted sites are believed to be in current violation of Google Webmaster Guidelines.
  • The only way to recover from Penguin is to properly address guidelines violations.
  • Impacted site experience an algorithmic demotion in search results but not a penalty.
  • Penguin is not a "penalty" or "manual action" because it is algorithmic and therefore not manual.
  • There is no whitelist for Penguin.
  • Google uses 200+ signals for rankings and Penguin is the latest.
  • Sites hit by Penguin can fully recover once guidelines violations are resolved.

Panda:

  • Panda was named after the lead engineer who's last name is Panda.
  • Addresses thin and/or low quality content.
  • Prior to Panda low quality content fell between the Search Quality team and Web Spam.
  • Since Panda, Search Quality and WebSpam teams at Google work closer together.
  • Sites hit by Panda can fully recover once content issues are resolved.

"Manual Actions" the new "Penalty"

  • According to Matt, "We don't use "penalty" anymore, we use "manual action" vs an algorithmic thing."
  • Manual reviews result in a manual actions whereas algorithmic detection results in a demotion.
  • 99 percent of manual actions are accompanied by webmaster notifications via Google Webmaster Tools.
  • Algorithmic issues do not result in a notification via Google Webmaster Tools.

Unnatural Link Notifications:

  • Unnatural link messages imply a manual action (penalty).
  • For unnatural link notifications webmasters should submit a reconsideration request.
  • According to Matt, "typically if you get a notification you will see a downgrade in rankings."
  • Google wants "to see a real effort" on the part of webmasters when it comes to removing unnatural links. Some webmasters have gone so far as to scan in images of letters sent to domain owners requesting links be removed.
  • When reinclusion requests are submitted for unnatural link notifications, "Google reviews a random sample to see if those links are removed."
  • Webmasters should attempt to remove at least 90% of unnatural links pointing to their site.
  • Google understands it is difficult to remove links and is working on alternative solutions.
  • Google is working on a new feature which will allow webmasters to "disavow" links pointing to their website.
  • If you cannot remove some links, it may be possible to remove the entire page if it is not the homepage or similar.

Paid Links that pass PageRank:

  • Despite the fact that Google is able to detect paid links passing PageRank and does not count these links, Google recently started taking manual action by penalizing these sites.
  • According to Matt, Google is taking manual action and penalizing sites with links passing PageRank because companies continue to profit off of these practices.
  • Google wants people to understand that PageRank passing paid links are a link scheme, a waste of time and money.

Affiliate links:

  • Google handles affiliate links well but including rel=nofollow never hurts.
  • "Nofollowed links account for less than 1% of all links on the internet."
  • Negative SEO

    • The recent reaction to "Negative SEO" has been interesting.
    • Negative SEO has been around a long time.
    • According to Matt, "It is possible for people to do things to sites, like steal domains."
    • Matt pointed out that Google changed the wording of Google Webmaster Guidelines some time ago to address negative SEO. It says, "Practices that violate our guidelines may result in a negative adjustment of your site's presence in Google, or even the removal of your site from our index."

    Bounce Rate:

    • According to Matt, Google Analytics data is not used for rankings.
    • Bounce rates from search results is noisy because of redirects, spam and/or other issues.
    • Bounce rates do not accurately measure quick answers.
    • Because users often get the answer they want and then leave, bounce rate is not a good metric for Google to use.

     

     

    It seems like there has been a lot of confusion about robots.txt recently and why URLs disallowed by robots.txt appear in Google search results.

    The directives in robots.txt for disallowing URLs was originally intended to help site owners preserve bandwidth and prevent outages caused by robot traffic. Robots can consume lots of bandwidth and in the early years, it was not uncommon for Google to crash websites or make them inaccessible for users during a crawl cycle. By disallowing robots, site owners could limit bandwidth used by robots. Blocking robot traffic helped site owners to ensure that their website was available for humans.

    At that time, even though many site owners did not want search engines using up bandwidth, they did not mind the traffic from search engines. Search engines on the other hand, wanted to return relevant results for URLs disallowed or not. Instead of returning no results for queries like [amazon.com] for example, if that site was disallowed by robots.txt, Google returned search results based on data collected from other websites. This is why disallowed URLs appear indexed in search results. Google can return uncrawled URL references for disallowed URLs by combining anchor text and description data extracted from other sites. As a result, uncrawled URL references for disallowed URLs may appear indexed in Google search results pages.

    Disallowing a URL via robots.txt will not prevent it from appearing indexed in search results pages. In order to prevent URLs from appearing in search results pages, webmasters should implement rel=noindex meta and/or use password protection. In order to remove URLs disallowed by robots.txt but indexed in Google SERPS, webmasters should use the URL removal tool in Google Webmaster Tools.

    Here are some other tips for success with robots.txt:

    - Search engines do not check robots.txt for every page request. Many search engines update robots.txt data once every 24 hours. For that reason, disallowed URLs added between updates may be accidentally crawled and indexed. To ensure pages aren't crawled, be sure to add future URLs to your robots.txt file 24 to 36 hours in advance of adding actual content.

    - URLs in robots.txt are case-sensitive. For that reason, blocking aboutus.html will not prevent ABOUTUS.html, Aboutus.html, AbOuTUs.html and/or AboutUs.html from being crawled.

    - "When you add the +1 button to a page, Google assumes that you want that page to be publicly available and visible in Google Search results. As a result, we may fetch and show that page even if it is disallowed in robots.txt or includes a meta noindex tag." (http://support.google.com/webmasters/bin/answer.py?hl=en&answer=1634172)

    - Disallowing URLs that 301 redirect, will prevent search engines from "seeing" the redirect. As a result, engines may continue to index the incorrect URL.

    - Disallowing URLs with pages containing noindex meta will prevent search engines from "seeing" the noindex meta tag and as a result noindex pages may appear indexed in Google search results.

    - When 301 redirecting www or non-www versions of URLs to the preferred version, don't forget to redirect your robots.txt file as well.

    google drive

    According to a number of sources, Google is about to launch a new product called "Google Drive". The code above, which is embedded in Google Docs does seems to support that idea. In addition to the embedded code (above), Google Docs also includes an "Add to My Drive" button (below) that is currently in place just not visible to users. After discovering the first description of GDrive in 2009, I'm always a little skeptical of Google actually launching this service but this evidence seems pretty clear.

    google drive button