Google

 

According to Google, "everything is going Google+" but, few search marketers truly understand what that means. Here are a few points to help bring everyone up to speed.

 

Google+ Sign In:

Even though keyword level data for signed in users is "Not Provided" in Google Analytics, Google's goal is to increase the number of signed in user searches.

According to a recent Google Jobs post:

Google+ Signed in users

"The mission of the search growth marketing team is to make that information universally accessible by enabling and educating users around the world to search on Google, search more often, and search while signed-in. Research and analysis has shown that putting Google search access points at the fingertips of users is an effective way of achieving these goals. And the more users that are signed in to Google, the better we can tailor their search results and create a unified experience across all of the Google products that they use."

When users are signed in, Google can better tailor search results and better target ads. Better ads and better search results increase Google's market share not to mention ad revenue. Google+ is one of many programs intended to help increase signed in users.

Google+ Links:

In order to return relevant search results for human users based on what is important to human users, Google needs access to analyze content and links created by humans.

When Google and its "secret sauce" PageRank algorithm were originally developed, the web was a very different place than it is today. At that time, blogs, Tweets and Facebook did not exist. In the late 1990's, content and links tended to be created by humans and both were freely accessible to Google's crawlers. Back then important websites were "likely to receive more links from other websites." As a result, Google was able to leverage the "citation graph" of the internet to measure "importance" based on "people's subjective idea of importance."

Today, content and links tend to be created by software and not by humans. The best place to find high quality human made content and links today is deep within the password protected confines of social media websites. These issues are both problematic for Google because most social media sites prevent Google from accessing high quality content and links.

For all the skeptics, Google does appear to have billions of Facebook pages indexed. That being said, many of the Facebook pages that Google has indexed are duplicate content from Wikipedia, Facebook and other sources. In cases where Facebook pages are accessible to Google crawlers, outbound links are almost always password protected, nofollowed, disallowed via robots.txt or links to internal Facebook pages which cannot be crawled. As a result, Google is limited to extracting only external Facebook content and a few social media signals which can easily be spammed.

Google+ is like the internet used to be before social media websites existed and PageRank ruled the land. Google+ Ripples even provides a visual representation of impact factor like data similar to PageRank. PageRank or not, Google+ is a place where human made content and links are accessible to Google. According to Google, Google+ represents the "unification of all of Google's services with a common social air." This "social air" makes Google+ a place where more important websites are still likely to receive more links than less important websites. Google+ is a new "citation graph" where Google can once again crawl human crafted content and links to measure page importance based on people's subjective ideas about importance.

Google+ Spam Prevention:

Even if Google's crawlers could access the highest quality human crafted content and links on social media sites, fake content, reviews and unnatural link spam are of little value to Google. Without access to social media user account data, detecting these types of spam can be difficult.

According to anti-spam software experts, 40% of social media profiles are spam and by 2014 as many as 15% of reviews on social media sites are expected to be fake. In order to help address these issues, on March 1, 2012 Google moved to a single unified privacy policy across all Google properties. With this new level of shared data, Google's Spam & Abuse Team (the same team that handles GMail spam) has the most advanced systems in existence at its disposal to fight spam on Google+. Google+ has been designed to provide Google's Spam & Abuse Team with an almost endless selection of potential spam detection signals.

For example and without going into too much detail, Google accounts that frequently send and receive GMail, participate in Google+ Hangouts, watch YouTube videos and that are associated with an Android phone that moves around town, might be considered legitimate. On the other hand, if several accounts are associated with the same IP address and one is used to spam Blogger with duplicate blog posts authored by an associated account, each account could be considered untrustworthy.

It is difficult say for sure which signals Google is currently using, but with Google+ the potential for future spam signals is nearly unlimited. Spam, ranking manipulation, impersonation, deceptive behavior, fake profiles and adding people to circles too aggressively are all violations of Google+ guidelines.

Google+ Identification:

In order for content to be authoritative and trustworthy, its source must be identifiable. At the same time, spammers usually setup multiple accounts using fictitious identities.

Google CEO and Co-Founder Larry Page has stated "It's really important to know the identity of people so you can share things and comment on things and improve the search ecosystem, you know, as you and as a real person. I think all those things are absolutely crucial. That is why we have worked so hard on Google+, on making it an important part of search."

Google+ was initially developed as an "identity service." The success of Google+ depends on users using their real name. Real names are entities and Google can use entity related data to infer additional information. This type of data can be especially helpful when it comes to returning better search results for queries where expertise is required, and for queries about a specific individual where multiple individuals have the same name.

According to Google, "The internet would be better if we knew you were a real person rather than a dog or a fake person. Some people are just evil and we should be able to ID them and rank them downward." In order to set up a Google+ Profile or Google+ Page for business, Google requires your "common name". In some cases, Google may require an image of the user's drivers license, proof of identification and/or references to verify a user's name as well as his/her identity. For an author's picture to appear in Google search results, Google requires authors to provide a "recognizable headshot" photo. Images like these not only help searchers recognize authors, they can also by used by Google facial recognition software in various ways to help fight spam.

For example, in the near future expect to see Google roll out Google+ custom URLs for a nominal fee, paid by credit card. Because credit card transactions are one method for verifying a users identity, this approach allows Google to verify the identities of multiple users in a short time at scale.

Google believes that, "letting authors verify their name helps increase their credibility and trustworthiness in the eyes of their readers." In addition to name verification, Google+ provides tools for identity verification that Google can use to combat various forms of entity authentication fraud.

Google+ User Data:

Google can only collect personal information from users who are willing to provide personal information. According to a former Google employee, "Google could still put ads in front of more people than Facebook, but Facebook knows so much more about those people. Advertisers and publishers cherish this kind of personal information, so much so that they are willing to put the Facebook brand before their own."

Google+ allows Google to ask users for personal information that otherwise could not be collected. Without Google+, Google would have no reason to collect personal data like relationship status, employment, occupation, education or places lived. In addition to collecting direct user data, Google+ collects indirect user data from Google +1 buttons. Google +1 buttons have been widely adopted and are currently embedded within billions of webpages. According to Google, +1s provide contextual value when users are in the market for a particular product. It only stands to reason that +1s also allow Google to collect sentiment related data. Once collected, Google can translate this new gold mine of user data into increased ad revenue through targeted ads for signed in users.

As you can see, Google+ is far more than just another social network!

Search engines have focused on simply "matching keywords to queries" for years. This approach is slightly problematic however, because it disassociates keyword meanings for multiple keyword queries. For example, search engines might interpret the query [Paris Hilton] (a proper noun and named entity) as simply a request for instances where the words "hilton" and "paris" appear within a page. With a large enough set of data, fortunately it is possible to make statistical inferences about the intent of a user's query. As a result, Google has relied on statistical inference for uncertain data queries like [Paris Hilton] and [b&b ab] (bed & breakfast in Alberta) for years.

In 2010 Google purchased Metaweb Technologies, Inc. which was the company behind Freebase. Freebase was/is an "open, shared database of the world's knowledge". Before being acquired by Google, Metaweb was in the process of identifying millions of "entities and mapping out how they're related" via Freebase. In addition to entity mapping, Freebase also looks at what words other sites use to refer to entities. In May 2012 Google launched "Knowledge Graph," a “graph” which is built in part on Freebase. According to Google, Knowledge Graph can "understand real-world entities and their relationships to one another." Google hopes Knowledge Graph will improve search results and provide more immediate answers to user's questions in search results pages.

The concept behind Freebase and Google's use of graphed entities is pretty interesting but, I would like to know more about what is really going on under the hood of Google Knowledge Graph. Since Knowledge Graph launched, I have spent hours trying to break it, find bugs, discover issues and/or to identify abnormalities. Remarkably I must say, until last week I had found very little. Then as they say, "it happened!" Last Thursday, while looking for a good example of Google Knowledge Graph results to use in a presentation, I got the search result below.

SERP for Matt Cutts

 

Suddenly it dawned on me, Matt did not go to UNC Law School!

 

Matt Cutts SERP

I clicked on "University of North Carolina School of Law" in Matt's Google's Knowledge Graph result under his bio from Wikipedia but, it returned search results for another entity [university of north carolina at chapel hill]. From that result, I searched for [unc] and was returned this result.

Just to be sure what I was seeing was correct, I deleted all cookies, signed out of Google and restarted my browser. After refreshing all of my settings, I searched for [unc founded] and was returned this search result.

At that point, I realized UNC's founding date even seemed off? I checked and according to the University of North Carolina Planning Department, UNC was founded in 1793 not 1789. To be sure this was not the date UNC's Law School was founded, I checked the UNC School of Law website. According to the site, the first law professor did not arrive at UNC until 1845. Then went back and checked Wikipedia's page for UNC and it did not contain any text being displayed in Google's Knowledge Graph search results either.

With the suspected smoking gun already in hand, I went to Freebase.com and searched for [UNC]. You guessed it, Freebase.com's first result for [UNC] was exactly what had appeared in Knowledge Graph results "University of North Carolina School of Law". It turns out Matt is not alone, all UNC graduates listed in Freebase.com are listed as UNC School of Law graduates even if they did not attend the UNC School of Law. At that point it was clear, Google Knowledge Graph "thinks" UNC and UNC's School of Law are a the same or a single entity because that is what Freebase.com is "telling" Google Knowledge Graph.

Because Freebase data appears in Google Knowledge Graph search results and Google's main search results this issue also means results for 100+ notable figures are potentially incorrect. For instance, according to Google Knowledge Graph results US President James K Polk graduated from UNC's School of Law but UNC's School of Law was founded when he was already in office.

Knowledge Graph Results for James K Polk

In addition to Matt Cutts and President Polk, search results for [Michael Jordan college] in Google's main search results are also incorrect due to this issue.

Knowledge Graph Results for Michael Jordan

Other UNC School of Law alumni according to Freebase and potentially Google Knowledge Graph, include Alge Crumpler, Lawrence Taylor, Andy Griffith, Rick Dees, Roger Mudd, Vince Carter, Jerry Stackhouse and even Thomas Layton, the former CEO of Metaweb.

This issue is potentially due at least in part to the fact that only a shell page for UNC (UNC being the parent University of UNC Law School) existed in Freebase.com until yesterday. To hopefully help improve the quality of Google Knowledge Graph results, I added an image, description, UNC's correct founding date and other information from UNC.edu to UNC's Freebase page yesterday.

With fingers crossed that Matt's wild and crazy UNC Law School days are not his best kept secret, that my site won't vanish from Google tomorrow and that the US Secret Service won't show up at my door, I removed "Law School" from both Matt's and President Polk's profiles in Freebase. As a result, Matt Cutts and President Polk are now the only non-Law School students / graduates in UNC's Freebase page. It will be interesting to see how long these changes take to appear in Google's Knowledge Graph search results.

Google Knowledge Graph is really interesting and seems to be working pretty well despite a few bugs. This is yet another edge case but a situation you should know about. Instances where different entities have the same or similar names are problematic. Instances were multiple keywords are similar to multiple keyword entities are also problematic. Google may already be using Knowledge Graph data based on Freebase.com to determine whether on not content falls in or out of scope. For all of these reasons and others, it is important to ensure you keep an eye on Knowledge Graph results that relate to you. If you notice issues, click on "feedback" just below Knowledge Graph results on the right hand site of Google search results pages.

If you missed it earlier, be sure to check out Barry Schwartz's live blog coverage of SMX Live: You&A with Matt Cutts, head of Google's Webspam team. For those interested, I have posted my notes from the session below.

Penguin:

  • The lead engineer for what would come to be known as "Penguin" picked that name.
  • Penguin addresses spam.
  • Impacted sites are believed to be in current violation of Google Webmaster Guidelines.
  • The only way to recover from Penguin is to properly address guidelines violations.
  • Impacted site experience an algorithmic demotion in search results but not a penalty.
  • Penguin is not a "penalty" or "manual action" because it is algorithmic and therefore not manual.
  • There is no whitelist for Penguin.
  • Google uses 200+ signals for rankings and Penguin is the latest.
  • Sites hit by Penguin can fully recover once guidelines violations are resolved.

Panda:

  • Panda was named after the lead engineer who's last name is Panda.
  • Addresses thin and/or low quality content.
  • Prior to Panda low quality content fell between the Search Quality team and Web Spam.
  • Since Panda, Search Quality and WebSpam teams at Google work closer together.
  • Sites hit by Panda can fully recover once content issues are resolved.

"Manual Actions" the new "Penalty"

  • According to Matt, "We don't use "penalty" anymore, we use "manual action" vs an algorithmic thing."
  • Manual reviews result in a manual actions whereas algorithmic detection results in a demotion.
  • 99 percent of manual actions are accompanied by webmaster notifications via Google Webmaster Tools.
  • Algorithmic issues do not result in a notification via Google Webmaster Tools.

Unnatural Link Notifications:

  • Unnatural link messages imply a manual action (penalty).
  • For unnatural link notifications webmasters should submit a reconsideration request.
  • According to Matt, "typically if you get a notification you will see a downgrade in rankings."
  • Google wants "to see a real effort" on the part of webmasters when it comes to removing unnatural links. Some webmasters have gone so far as to scan in images of letters sent to domain owners requesting links be removed.
  • When reinclusion requests are submitted for unnatural link notifications, "Google reviews a random sample to see if those links are removed."
  • Webmasters should attempt to remove at least 90% of unnatural links pointing to their site.
  • Google understands it is difficult to remove links and is working on alternative solutions.
  • Google is working on a new feature which will allow webmasters to "disavow" links pointing to their website.
  • If you cannot remove some links, it may be possible to remove the entire page if it is not the homepage or similar.

Paid Links that pass PageRank:

  • Despite the fact that Google is able to detect paid links passing PageRank and does not count these links, Google recently started taking manual action by penalizing these sites.
  • According to Matt, Google is taking manual action and penalizing sites with links passing PageRank because companies continue to profit off of these practices.
  • Google wants people to understand that PageRank passing paid links are a link scheme, a waste of time and money.

Affiliate links:

  • Google handles affiliate links well but including rel=nofollow never hurts.
  • "Nofollowed links account for less than 1% of all links on the internet."
  • Negative SEO

    • The recent reaction to "Negative SEO" has been interesting.
    • Negative SEO has been around a long time.
    • According to Matt, "It is possible for people to do things to sites, like steal domains."
    • Matt pointed out that Google changed the wording of Google Webmaster Guidelines some time ago to address negative SEO. It says, "Practices that violate our guidelines may result in a negative adjustment of your site's presence in Google, or even the removal of your site from our index."

    Bounce Rate:

    • According to Matt, Google Analytics data is not used for rankings.
    • Bounce rates from search results is noisy because of redirects, spam and/or other issues.
    • Bounce rates do not accurately measure quick answers.
    • Because users often get the answer they want and then leave, bounce rate is not a good metric for Google to use.