Hacking is on the rise! It seems like there are a lot of newly hacked pages unintentionally cloaking. These pages contain keyword rich links and redirect users to other websites. The sites being impacted aren't the typical mom and pop sites either. This week I've personally reached out to one major university, an international religious organization and an NFL franchise, to let them know their site was hacked, cloaking links and redirecting users. First though, I reported the site behaving badly via Google's new spam report tool.
Dr. Evil courtesy of http://www.stefanosalvadori.it
Google does a great job of trying to notify webmasters when they think a site is compromised but, documentation about cloaking is sparse. Sure I get it, Google doesn't want to give bad guys detailed instructions for doing bad things. I've intentionally left some facets out of this post for exactly that reason. My point however is, better information could help empower good guys who are unknowingly the victims of evil. Some of the webmasters I talk with can't identify the issue even after it is laid out in front of them. (For those who aren't aware cloaking is against Google's webmaster guidelines and can cause your site to be banned. "So, please don't try this at home!")
According to Google, cloaking is "presenting different content or URLs to users and search engines." "Presenting" different content itself, isn't the key take away though, when it comes to cloaking. For example, it is not considered cloaking if you "present" an image containing human-readable text (jpg, gif, png) that differs from it's ALT attribute. Flash on the other hand is different because "alternative content" is in most cases served up from the server. Cloaking is specifically when a site returns different content from the server.
Cloaking comes in several different flavors but, in order to identify any of them you'll need to understand each type as well as how they all work together. You won't find identifiable traces of cloaking via the free version of Google analytics or other paid programs. Google's documentation talks about "Serving up different results based on user agent." They don't really go into detail about the other types of cloaking which are by referrer and by IP. Believe it or not, it's actually possible to cloak by user-agent, referrer, IP and/or any combination of the three.
Here are a few simple easy ways to determine whether or not a site is cloaking:
Fetch as Googlebot - If you think your site may be hacked and is currently cloaking, test it with "Fetch as Googlebot" in Google's webmaster tools. Fetch as Googlebot uses user-agent Googlebot and comes from Google's IP range. The information Fetch as Googlebot provides is as close as you'll get to what Google actually "sees". Compare the page source code from "Fetch as Googlebot" with the page source code from your page and be sure they match.
Google's snippet - If you think a page is cloaking and don't have access to Fetch as Googlebot in Google's webmaster tools, go to Google's preferences, turn off Google Instant search results and increase your number of results to 100. Then perform an advanced operator query for the site. Browse through several search result pages looking for anything odd, especially in Google's snippet of the page TITLE element. If you don't see anything odd in search results, click on a few links in the search results to be sure you end up on the correct page.
User-Agent Switcher - This Firefox add-on will allow you to change your user-agent to Googlebot. While using user-agent Googlebot as your browser's user-agent, compare the source code of your pages to it for any differences. Important note, User-Agent Switcher allows you to change your own user agent but, doesn't allow access via a Google IP like Fetch as Googlebot mentioned above. If a site is cloaking by user-agent and IP you may not be able to see it using this add-on.