On April 24, 2012, Google launched “Penguin.” Google’s Penguin update specifically targets sites that are violating Google Webmaster quality guidelines. If your site experienced a decline in organic search traffic from Google on April 24, 2012, you have probably been impacted by Google Penguin. Penguin is an algorithmic webspam update and not a holistic site wide quality update like Panda. Penguin is similar to Panda in that it is a filter which Google plans to update periodically.
Make no mistake about it, Google believes sites hit by Penguin are currently attempting to spam Google search results. The only way to recover from Penguin is to identify and resolve quality guideline violations. Reconsideration requests will not help sites hit by Penguin. As mentioned previously, Penguin is intended to demote sites that violate Google Webmaster quality guidelines. Once guideline violations are identified, resolved and Google processes these changes,successful recovery from Penguin should come with Google’s next update.
Matt Cutts recently said, “If you’ve cleaned and still don’t recover, ultimately, you might need to start all over with a fresh site.” That said, there are several points which I have not seen mentioned elsewhere. Google does not list every potential quality guideline violation in Webmaster Guidelines. Most of the webmasters, site owners and SEOs that I have talked with in recent weeks have missed critical guideline violations due to a lack of technological savvy or understanding of what Google refers to as the “spirit” of Google Webmaster Guidelines. Before starting over, a third party site audit is highly recommended to ensure a site is in fact clean.
There are a number of interesting Penguin related Patents. In the past year, there have also been some interesting trends in Google’s spam related Patent Applications. In November 2006 Google submitted four US Patent applications by essentially the same inventors within a 10 day period. This original group of four US Patent Applications remained primarily dormant for almost 5 years. In 2011 however, two of the original four US Patent applications from 2006 were updated nine times. These four US Patent Applications and the recent iterations account for six of nine US Patent Applications filed by Matt Cutts, head of Google’s Webspam team.
11/21/2006 20070100817 “DOCUMENT SCORING BASED ON DOCUMENT CONTENT UPDATE”
by Acharya; Dean; Haahr; Henzinger; Lawrence; Pfleger; Tong;
- updated June 30, 2011 original claims canceled new application 20110258185 filed
- updated June 30, 2011 20110264671 filed (Cutts added as inventor)
- updated September 14, 2011 20120005199 (Cutts removed as inventor) (mentions inception date by Cutts)
11/22/2006 20070088692 “DOCUMENT SCORING BASED ON QUERY ANALYSIS”
by Dean; Haahr; Henzinger; Lawrence; Pfleger; Sercinoglu; Tong;
- updated September 26, 2011 20120016870
- updated September 26, 2011 20120016871 (Cutts added as inventor)
- updated September 26, 2011 20120016874 (Cutts removed as inventor)
- updated September 26, 2011 20120016888
- updated September 26, 2011 20120016889
- updated September 26, 2011 20120023098 (mentions inception date by Cutts)
According to Google’s latest US Patent Applications, “legitimate” pages attract links slowly. Any large spike in links may be a signal of spam. Sudden growth in links to “individual documents may indicate a potentially synthetic web graph, which is an indicator of an attempt to spam. This indication may be strengthened if the growth corresponds to anchor text that is unusually coherent or discordant. This information can be used to demote the impact of such links, when used with a link-based scoring technique.” Furthermore, the date when links are added “can also be used to detect “spam,” where owners of documents or their colleagues create links to their own document for the purpose of boosting the score assigned by a search engine.”
In order to differentiate legitimate pages from spam Google may also “consider mentions of the document in news articles, discussion groups, etc. on the theory that spam documents will not be mentioned, for example, in the news. Any or a combination of these techniques may be used to curtail spamming attempts.”
Safe to say Penguin parts are probably scattered all over the US Patent Office! With that in mind, these recently updated US Patent Applications reveal a lot interesting insight into how Google uses links, anchor text and historical data to detect spam. Since Google Penguin targets spam, these facets can all be used and incorporated for helping sites recover.
Google Penguin Recovery: Impact Verification
In order to recover from Google’s Penguin update, webmasters must verify that the site has in fact been hit by Penguin.
- Verify via analytics that the site did in fact experience a drop in organic search traffic from Google on April 24, 2012.
- Verify via Google Webmaster Tools that the site experienced a drop in “Avg. position” on April 24, 2012.
- Ensure that no other issues could be responsible for this drop in organic search traffic and/or “Avg. position” from Google on April 24, 2012. For instance, product removal, site changes, robots.txt disallow, noindex meta, decline in search volume for specific keywords, hacking, seasonality, rel=canonical and/or other.
Google Penguin Recovery: On Site Quality Guideline Violations Audit Checklist
In order to begin the recovery process for Google Penguin, webmasters should work to identify and resolve existing on site quality guidelines violations.
- Remove pages made specifically for search engines. For instance “cookie-cutter” pages, “doorway pages”, “doorway domain” pages, “throw away domain” pages, “crawler pages”, “hub spoke pages” and/or other.
- Discontinue participation in link schemes and remove links that could be mistaken as link exchanges or reciprocal linking programs. Remove sponsored WordPress themes, WordPress plugins that hide links, link networks and “this website created by” links to website design firm websites..
- Remove links to spammy, low quality and “bad neighbor” sites and ensure the site only links to relevant sites with quality content.
- Inspect Google’s cached text version of pages for the presence of ANY hidden text or links.
- Do not assume that “Official” plugins and scripts from reputable third party sources do not violate quality guidelines.
- Remove redirects that could be considered “sneaky” or an attempt to manipulate search engines.
- Remove all paid links that pass PageRank and ensure paid links include rel=nofollow attributes.
- Remove any attempts to sculpt PageRank from the site.
- Remove user-agent, referrer and/or IP dependent scripting.
- Remove any machine generated content.
- Remove all unnecessary, dead, irrelevant, outdated and/or potentially spammy links originating from the site. To help identify these links use a free tool like Xenu Link Sleuth or your favorite paid tool to compile a list of links originating on site.
Google Penguin Recovery: Inbound Link Quality Audit Checklist
In order to recover from Google Penguin, webmasters should work to resolve general and obvious off site link quality guidelines violations. The first step is to identify low quality 3rd party sites with links pointing to the site.
1. See Google Webmaster Tools > Traffic > Links to Your Site > Who links the most > More>> for a list of domains linking to the site and download all domains. Remove domains with legitimate, natural, organic, inbound links to the site from the list.
2. Use Whois to find domain owner contact information for low quality 3rd party sites with links to the site.
3. Email low quality site owners to request that all links to the site be removed immediately.
The next step is to narrow the focus of the inbound link quality audit to the specific keywords and URLs hit hardest by Penguin. First, webmasters should identify which keywords were hit hardest by Penguin and compile a list of the URLs for keyword terms hit hardest by Penguin.
1. Compile a list of most impacted keyword term/s and the URLs for those keyword terms from Google Webmaster Tools.
See Google Webmaster Tools > Traffic > Search Queries > Search Queries
- click “filter” located top middle left, change “All” to “Web” and click “Apply” to save
- click on “With change”
- click on 25 rows and change it to 500 rows
- click “Change” for “Avg. position” on the right hand side to sort “Change” from most to least
- These steps should result in a list of the 500 most popular keyword terms responsible for driving traffic to the site from organic web search, sorted by the amount of change in “Avg. Position”.
- Clicking on specific terms at this should provide query details by URL which can be downloaded.
2. Identify and create a list of low quality 3rd party URLs with links to impacted site URLs using impacted site keywords in 3rd party link anchor text. If applicable, demand a full and complete list of all links purchased and/or links built by third parties within the past 5 years.
3. Collect whois information for 3rd party domains. Use Whois to find domain owner contact information.
4. Contact 3rd party website owners, provide specific linking URLs and demand links be removed immediately.
Once violations are identified, resolved and links removed, Google must re-crawl and reprocess these changes. Once this process is under way, recovery should come with the next Google Penguin update but, there are no guarantees. As of date of publication, there have been no Penguin updates and Google has not disclosed a time line for future Penguin updates.
There is a remote possibility that some sites have been acidently hit by Penguin. If you believe a site has been hit by Penguin without cause, report it via Google’s Penguin form.
Google Penguin Recovery: Future
Hopefully this information has been useful to you, thank you for checking out my blog!