Google+ Red Box Of Secrets

 

According to Google, “everything is going Google+” but, few search marketers truly understand what that means. Here are a few points to help bring everyone up to speed.

 

Google+ Sign In:

Even though keyword level data for signed in users is “Not Provided” in Google Analytics, Google’s goal is to increase the number of signed in user searches.

According to a recent Google Jobs post:

Google+ Signed in users

The mission of the search growth marketing team is to make that information universally accessible by enabling and educating users around the world to search on Google, search more often, and search while signed-in. Research and analysis has shown that putting Google search access points at the fingertips of users is an effective way of achieving these goals. And the more users that are signed in to Google, the better we can tailor their search results and create a unified experience across all of the Google products that they use.

When users are signed in, Google can better tailor search results and better target ads. Better ads and better search results increase Google’s market share not to mention ad revenue. Google+ is one of many programs intended to help increase signed in users.

Google+ Links:

In order to return relevant search results for human users based on what is important to human users, Google needs access to analyze content and links created by humans.

When Google and its “secret sauce” PageRank algorithm were originally developed, the web was a very different place than it is today. At that time, blogs, Tweets and Facebook did not exist. In the late 1990′s, content and links tended to be created by humans and both were freely accessible to Google’s crawlers. Back then important websites were “likely to receive more links from other websites.” As a result, Google was able to leverage the “citation graph” of the internet to measure “importance” based on “people’s subjective idea of importance.”

Today, content and links tend to be created by software and not by humans. The best place to find high quality human made content and links today is deep within the password protected confines of social media websites. These issues are both problematic for Google because most social media sites prevent Google from accessing high quality content and links.

For all the skeptics, Google does appear to have billions of Facebook pages indexed. That being said, many of the Facebook pages that Google has indexed are duplicate content from Wikipedia, Facebook and other sources. In cases where Facebook pages are accessible to Google crawlers, outbound links are almost always password protected, nofollowed, disallowed via robots.txt or links to internal Facebook pages which cannot be crawled. As a result, Google is limited to extracting only external Facebook content and a few social media signals which can easily be spammed.

Google+ is like the internet used to be before social media websites existed and PageRank ruled the land. Google+ Ripples even provides a visual representation of impact factor like data similar to PageRank. PageRank or not, Google+ is a place where human made content and links are accessible to Google. According to Google, Google+ represents the “unification of all of Google’s services with a common social air.” This “social air” makes Google+ a place where more important websites are still likely to receive more links than less important websites. Google+ is a new “citation graph” where Google can once again crawl human crafted content and links to measure page importance based on people’s subjective ideas about importance.

Google+ Spam Prevention:

Even if Google’s crawlers could access the highest quality human crafted content and links on social media sites, fake content, reviews and unnatural link spam are of little value to Google. Without access to social media user account data, detecting these types of spam can be difficult.

According to anti-spam software experts, 40% of social media profiles are spam and by 2014 as many as 15% of reviews on social media sites are expected to be fake. In order to help address these issues, on March 1, 2012 Google moved to a single unified privacy policy across all Google properties. With this new level of shared data, Google’s Spam & Abuse Team (the same team that handles GMail spam) has the most advanced systems in existence at its disposal to fight spam on Google+. Google+ has been designed to provide Google’s Spam & Abuse Team with an almost endless selection of potential spam detection signals.

For example and without going into too much detail, Google accounts that frequently send and receive GMail, participate in Google+ Hangouts, watch YouTube videos and that are associated with an Android phone that moves around town, might be considered legitimate. On the other hand, if several accounts are associated with the same IP address and one is used to spam Blogger with duplicate blog posts authored by an associated account, each account could be considered untrustworthy.

It is difficult say for sure which signals Google is currently using, but with Google+ the potential for future spam signals is nearly unlimited. Spam, ranking manipulation, impersonation, deceptive behavior, fake profiles and adding people to circles too aggressively are all violations of Google+ guidelines.

Google+ Identification:

In order for content to be authoritative and trustworthy, its source must be identifiable. At the same time, spammers usually setup multiple accounts using fictitious identities.

Google CEO and Co-Founder Larry Page has statedIt’s really important to know the identity of people so you can share things and comment on things and improve the search ecosystem, you know, as you and as a real person. I think all those things are absolutely crucial. That is why we have worked so hard on Google+, on making it an important part of search.

Google+ was initially developed as an “identity service.” The success of Google+ depends on users using their real name. Real names are entities and Google can use entity related data to infer additional information. This type of data can be especially helpful when it comes to returning better search results for queries where expertise is required, and for queries about a specific individual where multiple individuals have the same name.

According to Google, “The internet would be better if we knew you were a real person rather than a dog or a fake person. Some people are just evil and we should be able to ID them and rank them downward.” In order to set up a Google+ Profile or Google+ Page for business, Google requires your “common name“. In some cases, Google may require an image of the user’s drivers license, proof of identification and/or references to verify a user’s name as well as his/her identity. For an author’s picture to appear in Google search results, Google requires authors to provide a “recognizable headshot” photo. Images like these not only help searchers recognize authors, they can also by used by Google facial recognition software in various ways to help fight spam.

For example, in the near future expect to see Google roll out Google+ custom URLs for a nominal fee, paid by credit card. Because credit card transactions are one method for verifying a users identity, this approach allows Google to verify the identities of multiple users in a short time at scale.

Google believes that, “letting authors verify their name helps increase their credibility and trustworthiness in the eyes of their readers.” In addition to name verification, Google+ provides tools for identity verification that Google can use to combat various forms of entity authentication fraud.

Google+ User Data:

Google can only collect personal information from users who are willing to provide personal information. According to a former Google employee, “Google could still put ads in front of more people than Facebook, but Facebook knows so much more about those people. Advertisers and publishers cherish this kind of personal information, so much so that they are willing to put the Facebook brand before their own.

Google+ allows Google to ask users for personal information that otherwise could not be collected. Without Google+, Google would have no reason to collect personal data like relationship status, employment, occupation, education or places lived. In addition to collecting direct user data, Google+ collects indirect user data from Google +1 buttons. Google +1 buttons have been widely adopted and are currently embedded within billions of webpages. According to Google, +1s provide contextual value when users are in the market for a particular product. It only stands to reason that +1s also allow Google to collect sentiment related data. Once collected, Google can translate this new gold mine of user data into increased ad revenue through targeted ads for signed in users.

As you can see, Google+ is far more than just another social network!

  • Digg
  • Facebook
  • Google Bookmark
  • StumbleUpon

 

Google Plus Custom URL Facts

Google started rolling out Google+ custom vanity URLs to a limited number of verified and pre-approved Google+ accounts last week. According to Google, Google+ custom URLs are a short and easy way to remember web addresses for Google+ profiles and pages. In fact, Google+ vanity URLs are similar to existing profiles.google.com URLs, only shorter. Did you know that when Google+ Custom URLs are activated, in many cases profiles.google.com URLs stop working? Either way, here are a few things you should probably know about Google+ Custom URLs.

Google+ Custom URL: Customization

Believe it or not, Google+ Custom URLs allow for further customization because they can be used along with various Google TLDs and even domain names.

Examples:

TIP: Unfortunately Google+ custom URLs are not currently accessible via G.co or Goo.gl however, you can probably expect to see a similar but shorter paid service like this in the future.

Google+ Custom URL: Deep Linking

Google+ Custom URLs allow for deep linking to “posts,” “about,” “photos,” “videos” and “plusone” pages by simply adding the page name after the trailing slash in custom URLs.

Examples:

TIP: even though you can add “posts” as a deep link, that is not be necessary when “posts” are the same as the default page or when “posts” pages specify the default URL as the canonical.

Google+ Custom URL: Speed

Google+ Custom URLs take considerably longer to load than the actual page. When tested with WebPageTest.org using the default test, my custom URL took more than a second longer to load than the actual page. This issue is caused in part by the multiple redirects put in place by Google. (In fairness however, the default test used by WebPageTest.org originates from Dulles, VA using IE 8 via DSL.)

TIP: Technically capitalization does not matter for Google+ Custom URLs. At the same time, not capitalizing first letters in names of profile page URLs often results in the need for an additional redirect which slows page load speed even more.

Google+ Custom URL: Canonicals & Duplicate Content

Google+ custom URLs are short, easy to post, more recognizable to users and search friendly in some ways but not others. With Google+ custom URLs there is little need to worry about canonicalization, duplicate content or custom URL content appearing in organic search results. Behind the scenes, Google is using a series of 301 and 302 redirects in conjunction with the rel=canonical attribute to help ensure that preferred content and vanity URLs both appear in organic search results. The down side for PageRank fans, usually 302 redirects do not pass PageRank or link equity signals. That said, Google can treat on-site 302 redirects as 301s if it chooses.

Google+ Custom URL: Process

Google pre-assigns one unique and individual custom URL per user or organizations. Custom URLs should reflect real names and must be 6 characters long or more. Even though it is possible to appeal Google’s pre-assigned custom URL, not everyone is currently eligible for this service. Currently you can only claim Google+ custom URLs via a desktop computer. Mobile devices and pads are not supported.

Google+ Custom URL: Invites

In order to claim a free custom URL for Google+, users and business accounts must be verified and enabled for this service by Google. Currently there is no way to enable custom URLs manually. As a result, there is no way to request an invite for Google+ Custom URLs and no way to invite others. Google will be rolling out Google+ custom vanity URLs over the next few weeks and months. For now, checking your Google+ profile regularly and monitoring your email for an invite are about all you can do. Either way, GOOD LUCK and please be sure to follow Google.com/+BrianUssery

  • Digg
  • Facebook
  • Google Bookmark
  • StumbleUpon

 

Google Knowledge Graph Review

Search engines have focused on simply “matching keywords to queries” for years. This approach is slightly problematic however, because it disassociates keyword meanings for multiple keyword queries. For example, search engines might interpret the query [Paris Hilton] (a proper noun and named entity) as simply a request for instances where the words “hilton” and “paris” appear within a page. With a large enough set of data, fortunately it is possible to make statistical inferences about the intent of a user’s query. As a result, Google has relied on statistical inference for uncertain data queries like [Paris Hilton] and [b&b ab] (bed & breakfast in Alberta) for years.

In 2010 Google purchased Metaweb Technologies, Inc. which was the company behind Freebase. Freebase was/is an “open, shared database of the world’s knowledge”. Before being acquired by Google, Metaweb was in the process of identifying millions of “entities and mapping out how they’re related” via Freebase. In addition to entity mapping, Freebase also looks at what words other sites use to refer to entities. In May 2012 Google launched “Knowledge Graph,” a “graph” which is built in part on Freebase. According to Google, Knowledge Graph can “understand real-world entities and their relationships to one another.” Google hopes Knowledge Graph will improve search results and provide more immediate answers to user’s questions in search results pages.

The concept behind Freebase and Google’s use of graphed entities is pretty interesting but, I would like to know more about what is really going on under the hood of Google Knowledge Graph. Since Knowledge Graph launched, I have spent hours trying to break it, find bugs, discover issues and/or to identify abnormalities. Remarkably I must say, until last week I had found very little. Then as they say, “it happened!” Last Thursday, while looking for a good example of Google Knowledge Graph results to use in a presentation, I got the search result below.

SERP for Matt Cutts

 

Suddenly it dawned on me, Matt did not go to UNC Law School!

 

Matt Cutts SERP

I clicked on “University of North Carolina School of Law” in Matt’s Google’s Knowledge Graph result under his bio from Wikipedia but, it returned search results for another entity [university of north carolina at chapel hill]. From that result, I searched for [unc] and was returned this result.

Just to be sure what I was seeing was correct, I deleted all cookies, signed out of Google and restarted my browser. After refreshing all of my settings, I searched for [unc founded] and was returned this search result.

At that point, I realized UNC’s founding date even seemed off? I checked and according to the University of North Carolina Planning Department, UNC was founded in 1793 not 1789. To be sure this was not the date UNC’s Law School was founded, I checked the UNC School of Law website. According to the site, the first law professor did not arrive at UNC until 1845. Then went back and checked Wikipedia’s page for UNC and it did not contain any text being displayed in Google’s Knowledge Graph search results either.

With the suspected smoking gun already in hand, I went to Freebase.com and searched for [UNC]. You guessed it, Freebase.com’s first result for [UNC] was exactly what had appeared in Knowledge Graph results “University of North Carolina School of Law”. It turns out Matt is not alone, all UNC graduates listed in Freebase.com are listed as UNC School of Law graduates even if they did not attend the UNC School of Law. At that point it was clear, Google Knowledge Graph “thinks” UNC and UNC’s School of Law are a the same or a single entity because that is what Freebase.com is “telling” Google Knowledge Graph.

Because Freebase data appears in Google Knowledge Graph search results and Google’s main search results this issue also means results for 100+ notable figures are potentially incorrect. For instance, according to Google Knowledge Graph results US President James K Polk graduated from UNC’s School of Law but UNC’s School of Law was founded when he was already in office.

Knowledge Graph Results for James K Polk

In addition to Matt Cutts and President Polk, search results for [Michael Jordan college] in Google’s main search results are also incorrect due to this issue.

Knowledge Graph Results for Michael Jordan

Other UNC School of Law alumni according to Freebase and potentially Google Knowledge Graph, include Alge Crumpler, Lawrence Taylor, Andy Griffith, Rick Dees, Roger Mudd, Vince Carter, Jerry Stackhouse and even Thomas Layton, the former CEO of Metaweb.

This issue is potentially due at least in part to the fact that only a shell page for UNC (UNC being the parent University of UNC Law School) existed in Freebase.com until yesterday. To hopefully help improve the quality of Google Knowledge Graph results, I added an image, description, UNC’s correct founding date and other information from UNC.edu to UNC’s Freebase page yesterday.

With fingers crossed that Matt’s wild and crazy UNC Law School days are not his best kept secret, that my site won’t vanish from Google tomorrow and that the US Secret Service won’t show up at my door, I removed “Law School” from both Matt’s and President Polk’s profiles in Freebase. As a result, Matt Cutts and President Polk are now the only non-Law School students / graduates in UNC’s Freebase page. It will be interesting to see how long these changes take to appear in Google’s Knowledge Graph search results.

Google Knowledge Graph is really interesting and seems to be working pretty well despite a few bugs. This is yet another edge case but a situation you should know about. Instances where different entities have the same or similar names are problematic. Instances were multiple keywords are similar to multiple keyword entities are also problematic. Google may already be using Knowledge Graph data based on Freebase.com to determine whether on not content falls in or out of scope. For all of these reasons and others, it is important to ensure you keep an eye on Knowledge Graph results that relate to you. If you notice issues, click on “feedback” just below Knowledge Graph results on the right hand site of Google search results pages.

  • Digg
  • Facebook
  • Google Bookmark
  • StumbleUpon

 

Google Penguin, Panda, Unnatural Links, Matt Cutts & More – SMX Notes

If you missed it earlier, be sure to check out Barry Schwartz’s live blog coverage of SMX Live: You&A with Matt Cutts, head of Google’s Webspam team. For those interested, I have posted my notes from the session below.

Penguin:

  • The lead engineer for what would come to be known as “Penguin” picked that name.
  • Penguin addresses spam.
  • Impacted sites are believed to be in current violation of Google Webmaster Guidelines.
  • The only way to recover from Penguin is to properly address guidelines violations.
  • Impacted site experience an algorithmic demotion in search results but not a penalty.
  • Penguin is not a “penalty” or “manual action” because it is algorithmic and therefore not manual.
  • There is no whitelist for Penguin.
  • Google uses 200+ signals for rankings and Penguin is the latest.
  • Sites hit by Penguin can fully recover once guidelines violations are resolved.

Panda:

  • Panda was named after the lead engineer who’s last name is Panda.
  • Addresses thin and/or low quality content.
  • Prior to Panda low quality content fell between the Search Quality team and Web Spam.
  • Since Panda, Search Quality and WebSpam teams at Google work closer together.
  • Sites hit by Panda can fully recover once content issues are resolved.

“Manual Actions” the new “Penalty”

  • According to Matt, “We don’t use “penalty” anymore, we use “manual action” vs an algorithmic thing.”
  • Manual reviews result in a manual actions whereas algorithmic detection results in a demotion.
  • 99 percent of manual actions are accompanied by webmaster notifications via Google Webmaster Tools.
  • Algorithmic issues do not result in a notification via Google Webmaster Tools.

Unnatural Link Notifications:

  • Unnatural link messages imply a manual action (penalty).
  • For unnatural link notifications webmasters should submit a reconsideration request.
  • According to Matt, “typically if you get a notification you will see a downgrade in rankings.”
  • Google wants “to see a real effort” on the part of webmasters when it comes to removing unnatural links. Some webmasters have gone so far as to scan in images of letters sent to domain owners requesting links be removed.
  • When reinclusion requests are submitted for unnatural link notifications, “Google reviews a random sample to see if those links are removed.”
  • Webmasters should attempt to remove at least 90% of unnatural links pointing to their site.
  • Google understands it is difficult to remove links and is working on alternative solutions.
  • Google is working on a new feature which will allow webmasters to “disavow” links pointing to their website.
  • If you cannot remove some links, it may be possible to remove the entire page if it is not the homepage or similar.

Paid Links that pass PageRank:

  • Despite the fact that Google is able to detect paid links passing PageRank and does not count these links, Google recently started taking manual action by penalizing these sites.
  • According to Matt, Google is taking manual action and penalizing sites with links passing PageRank because companies continue to profit off of these practices.
  • Google wants people to understand that PageRank passing paid links are a link scheme, a waste of time and money.

Affiliate links:

  • Google handles affiliate links well but including rel=nofollow never hurts.
  • “Nofollowed links account for less than 1% of all links on the internet.”
  • Negative SEO

    • The recent reaction to “Negative SEO” has been interesting.
    • Negative SEO has been around a long time.
    • According to Matt, “It is possible for people to do things to sites, like steal domains.”
    • Matt pointed out that Google changed the wording of Google Webmaster Guidelines some time ago to address negative SEO. It says, “Practices that violate our guidelines may result in a negative adjustment of your site’s presence in Google, or even the removal of your site from our index.

    Bounce Rate:

    • According to Matt, Google Analytics data is not used for rankings.
    • Bounce rates from search results is noisy because of redirects, spam and/or other issues.
    • Bounce rates do not accurately measure quick answers.
    • Because users often get the answer they want and then leave, bounce rate is not a good metric for Google to use.

     

     

    • Digg
    • Facebook
    • Google Bookmark
    • StumbleUpon

     

    Google Penguin Recovery Information & Penguin Checklist

    Google Penguin

    Google Penguin

    On April 24, 2012, Google launched “Penguin.” Google’s Penguin update specifically targets sites that are violating Google Webmaster quality guidelines. If your site experienced a decline in organic search traffic from Google on April 24, 2012, you have probably been impacted by Google Penguin. Penguin is an algorithmic webspam update and not a holistic site wide quality update like Panda. Penguin is similar to Panda in that it is a filter which Google plans to update periodically.

    Make no mistake about it, Google believes sites hit by Penguin are currently attempting to spam Google search results. The only way to recover from Penguin is to identify and resolve quality guideline violations. Reconsideration requests will not help sites hit by Penguin. As mentioned previously, Penguin is intended to demote sites that violate Google Webmaster quality guidelines. Once guideline violations are identified, resolved and Google processes these changes,successful recovery from Penguin should come with Google’s next update.

    Matt Cutts recently said, “If you’ve cleaned and still don’t recover, ultimately, you might need to start all over with a fresh site.” That said, there are several points which I have not seen mentioned elsewhere. Google does not list every potential quality guideline violation in Webmaster Guidelines. Most of the webmasters, site owners and SEOs that I have talked with in recent weeks have missed critical guideline violations due to a lack of technological savvy or understanding of what Google refers to as the “spirit” of Google Webmaster Guidelines. Before starting over, a third party site audit is highly recommended to ensure a site is in fact clean.

    There are a number of interesting Penguin related Patents. In the past year, there have also been some interesting trends in Google’s spam related Patent Applications. In November 2006 Google submitted four US Patent applications by essentially the same inventors within a 10 day period. This original group of four US Patent Applications remained primarily dormant for almost 5 years. In 2011 however, two of the original four US Patent applications from 2006 were updated nine times. These four US Patent Applications and the recent iterations account for six of nine US Patent Applications filed by Matt Cutts, head of Google’s Webspam team.

    11/21/2006 20070100817 “DOCUMENT SCORING BASED ON DOCUMENT CONTENT UPDATE”
    by Acharya; Dean; Haahr; Henzinger; Lawrence; Pfleger; Tong;
    - updated June 30, 2011 original claims canceled new application 20110258185 filed
    - updated June 30, 2011 20110264671 filed (Cutts added as inventor)
    - updated September 14, 2011 20120005199 (Cutts removed as inventor) (mentions inception date by Cutts)

    11/22/2006 20070088692 “DOCUMENT SCORING BASED ON QUERY ANALYSIS”
    by Dean; Haahr; Henzinger; Lawrence; Pfleger; Sercinoglu; Tong;
    - updated September 26, 2011 20120016870
    - updated September 26, 2011 20120016871 (Cutts added as inventor)
    - updated September 26, 2011 20120016874 (Cutts removed as inventor)
    - updated September 26, 2011 20120016888
    - updated September 26, 2011 20120016889
    - updated September 26, 2011 20120023098 (mentions inception date by Cutts)

    According to Google’s latest US Patent Applications, “legitimate” pages attract links slowly. Any large spike in links may be a signal of spam. Sudden growth in links to “individual documents may indicate a potentially synthetic web graph, which is an indicator of an attempt to spam. This indication may be strengthened if the growth corresponds to anchor text that is unusually coherent or discordant. This information can be used to demote the impact of such links, when used with a link-based scoring technique.” Furthermore, the date when links are added “can also be used to detect “spam,” where owners of documents or their colleagues create links to their own document for the purpose of boosting the score assigned by a search engine.”

    In order to differentiate legitimate pages from spam Google may also “consider mentions of the document in news articles, discussion groups, etc. on the theory that spam documents will not be mentioned, for example, in the news. Any or a combination of these techniques may be used to curtail spamming attempts.”

    Safe to say Penguin parts are probably scattered all over the US Patent Office! With that in mind, these recently updated US Patent Applications reveal a lot interesting insight into how Google uses links, anchor text and historical data to detect spam. Since Google Penguin targets spam, these facets can all be used and incorporated for helping sites recover.

    Google Penguin Recovery: Impact Verification

    In order to recover from Google’s Penguin update, webmasters must verify that the site has in fact been hit by Penguin.

    - Verify via analytics that the site did in fact experience a drop in organic search traffic from Google on April 24, 2012.

    - Verify via Google Webmaster Tools that the site experienced a drop in “Avg. position” on April 24, 2012.

    - Ensure that no other issues could be responsible for this drop in organic search traffic and/or “Avg. position” from Google on April 24, 2012. For instance, product removal, site changes, robots.txt disallow, noindex meta, decline in search volume for specific keywords, hacking, seasonality, rel=canonical and/or other.

    Google Penguin Recovery: On Site Quality Guideline Violations Audit Checklist

    Google Penguin Checklist In order to begin the recovery process for Google Penguin, webmasters should work to identify and resolve existing on site quality guidelines violations.

    - Remove pages made specifically for search engines. For instance “cookie-cutter” pages, “doorway pages”, “doorway domain” pages, “throw away domain” pages, “crawler pages”, “hub spoke pages” and/or other.

    - Discontinue participation in link schemes and remove links that could be mistaken as link exchanges or reciprocal linking programs. Remove sponsored WordPress themes, WordPress plugins that hide links, link networks and “this website created by” links to website design firm websites..

    - Remove links to spammy, low quality and “bad neighbor” sites and ensure the site only links to relevant sites with quality content.

    - Inspect Google’s cached text version of pages for the presence of ANY hidden text or links.

    - Do not assume that “Official” plugins and scripts from reputable third party sources do not violate quality guidelines.

    - Remove redirects that could be considered “sneaky” or an attempt to manipulate search engines.

    - Remove all paid links that pass PageRank and ensure paid links include rel=nofollow attributes.

    - Remove any attempts to sculpt PageRank from the site.

    - Remove user-agent, referrer and/or IP dependent scripting.

    - Remove any machine generated content.

    - Remove all unnecessary, dead, irrelevant, outdated and/or potentially spammy links originating from the site. To help identify these links use a free tool like Xenu Link Sleuth or your favorite paid tool to compile a list of links originating on site.

    Google Penguin Recovery: Inbound Link Quality Audit Checklist

    In order to recover from Google Penguin, webmasters should work to resolve general and obvious off site link quality guidelines violations. The first step is to identify low quality 3rd party sites with links pointing to the site.

    1. See Google Webmaster Tools > Traffic > Links to Your Site > Who links the most > More>> for a list of domains linking to the site and download all domains. Remove domains with legitimate, natural, organic, inbound links to the site from the list.

    2. Use Whois to find domain owner contact information for low quality 3rd party sites with links to the site.

    3. Email low quality site owners to request that all links to the site be removed immediately.

    The next step is to narrow the focus of the inbound link quality audit to the specific keywords and URLs hit hardest by Penguin. First, webmasters should identify which keywords were hit hardest by Penguin and compile a list of the URLs for keyword terms hit hardest by Penguin.

    1. Compile a list of most impacted keyword term/s and the URLs for those keyword terms from Google Webmaster Tools.

    See Google Webmaster Tools > Traffic > Search Queries > Search Queries
    - click “filter” located top middle left, change “All” to “Web” and click “Apply” to save
    - click on “With change”
    - click on 25 rows and change it to 500 rows
    - click “Change” for “Avg. position” on the right hand side to sort “Change” from most to least
    - These steps should result in a list of the 500 most popular keyword terms responsible for driving traffic to the site from organic web search, sorted by the amount of change in “Avg. Position”.
    - Clicking on specific terms at this should provide query details by URL which can be downloaded.

    2. Identify and create a list of low quality 3rd party URLs with links to impacted site URLs using impacted site keywords in 3rd party link anchor text. If applicable, demand a full and complete list of all links purchased and/or links built by third parties within the past 5 years.

    3. Collect whois information for 3rd party domains. Use Whois to find domain owner contact information.

    4. Contact 3rd party website owners, provide specific linking URLs and demand links be removed immediately.

    Once violations are identified, resolved and links removed, Google must re-crawl and reprocess these changes. Once this process is under way, recovery should come with the next Google Penguin update but, there are no guarantees. As of date of publication, there have been no Penguin updates and Google has not disclosed a time line for future Penguin updates.

    There is a remote possibility that some sites have been acidently hit by Penguin. If you believe a site has been hit by Penguin without cause, report it via Google’s Penguin form.

    Google Penguin Recovery: Future

    Google’s Penguin update raises important new questions about links from other sites and the impact they can have on rankings. In the past, Google’s official line has always been that inbound links from low quality sites could not harm quality sites as long as quality sites did not return the link. Google Penguin however makes it seem possible for rogue sites to actively harm the rankings of otherwise reputable sites. Without providing details, there are always fringe operators. It is possible to harm the rankings of other sites through black hat practices like “Negative SEO“. While the majority of site owners have nothing to worry about, it never hurts to be vigilant in terms of site monitoring for any unusual inbound link activity. Under Google’s new unified privacy policy, it is entirely possible that Google user account information and activity data from across all properties is now directly fed into the spam signal pipeline. Such integration would make undetected negative SEO far more difficult and would help explain Google’s recent rapid iterations, the timing of Penguin and almost complete policy reversal on inbound links.

    Hopefully this information has been useful to you, thank you for checking out my blog!

    • Digg
    • Facebook
    • Google Bookmark
    • StumbleUpon