"Shallow content" provides little value to users and offers low utility based on the intent of the query. However, technically this type of content is not considered spam by Google because it does not violate Google Webmaster Guidelines. Google provides information about creating websites and thin content but, thus far has provided little if any official documentation about shallow content. Until recently, shallow content fell between teams internally at Google. In order to address the influx of shallow content that resulted from Google Caffeine, Google launched "Panda." Google Panda is a holistic approach for addressing site quality at scale algorithmically. The quality signals used by Google Panda were derived from answers to questions about site quality. According to Google, these questions were created by a Google Engineer whose last name is Panda and answered by Google's little known Evaluation team. While there is no way of knowing how Google Evaluators answered Panda's questions, those with an in depth understanding of related research have a pretty good idea.
In order for answers to Panda's questions to be incorporated by Google as algorithmic quality signals and used for rankings, Google would first need to tie these answers to specific site elements and/or data. Since any algorithm can be gamed, Google is not going to talk about specific element or data signals used for rankings. With that being the case, Google designs algorithms to scale with present data as opposed to possible data. According to Google, by doing so, they are able to draw reliable inferences from actual evidence in data. In fact, Google could draw signals from a wide variety of present data sources both on-site and off. In addition, Google could combine present data to draw inferences about data not present. This data could be weighed differently depending on the type of site, query, location, presence of other signals as well as any number of countless other data variables. As a result of the level of complexity involved, "algorithm chasing" could prove quite difficult in the future.
Improving site quality from the user's perspective is the best way for sites negatively impacted by Google Panda to recover. According to Google, "high quality content is content that you can send to your child to learn something." By raising standards for producers and driving progress in terms of content quality, Google Panda will increase the quality expectations of users over time. In order to successfully acquire and convert traffic from organic search channels moving forward, marketers, webmasters and site owners will need to understand, employ and embrace best practices for developing quality content. With that in mind, a number of web professionals clearly still do not understand the meaning of "quality content." The purpose of this post is to help those individuals who are not quality or search experts to increase their level of knowledge in these areas.
The examples of Panda's questions published by Google focus on quality, authority, credibility and trust. Fortunately when it comes to content evaluation, authority, site credibility and trust, there is no shortage of research available. I've been researching these topics for years and can say that Panda's questions closely align, as they should, with published research in related areas. For example, the standard used by librarians to evaluate content quality was developed by William Katz and the dimensions for assessing authority was developed by Peter Wilson. In addition to quality evaluation and authority, other Panda questions focus on trust and credibility. These questions align with research by BJ Fogg. In addition to evaluation, authority, trust and credibility, quality content also possesses certain technical characteristics according to Google and W3C. I'll touch on just a few of these topics below and try to tie them together into specific actionable items for improving quality.
Author identification is the first step in terms of determining authorship and in turn authority. In order for an author to be perceived by users as an authority on a specific topic, users must first be able to identify the author's identity and then recognize the author's expert qualifications.
Types of Authority:
- Personal authority - author
- Institutional authority - publisher
- Textual authority - document
- Intrinsic plausibility authority - text
When authors provide certain qualifications related to a specific topic, users may perceive the author as an authoritative source of information on a specific topic. That "Authority" increases the perceived value of content in the minds of users. Authority is also used by search engines in a wide variety of ways and on a number of levels. In fact, Google recently began displaying author information in search results. According to Google, displaying author information in search results "helps users discover great content." By displaying author information in search results, Google helps users to identify authors and that is the first step in terms of the authoritative content discovery process.
Just like print, authors should provide qualifications related to their specific topical area(s) of expertise. In addition to being recognizable to users, authors should provide this data in a way that is clear to search engines. To accomplish both goals, bloggers for example could create a "bio" page outlining their credentials and qualifications. By including a link to their bio page from each article, blog authors could help to establish perceived authority. In order for Google to key in on these elements and fully associate author information with a page, authors should link their bio page to their Google+ profile page using the appropriate mark up which includes the
rel=author parameter. In order to help fully verify authorship, authors should consider linking from their Google+ citations to their bio page and/or verifying their address via their Google+ profile page.
Institutional authority comes from the reputation of the publisher. In addition to
rel=author, Google also supports the
rel=publisher parameter which confirms a website is the publisher of a Google+ page. Including both parameters in the same page is perfectly fine to do as well, even though doing so may currently result in an error according to Google's Rich Snippet Tool.
Also important to note, the importance of evaluation is based on the perceived significance of negative consequences that could result from having the wrong information. For that reason, providing obvious expert qualifications for medical related content is critical.
Types of Credibility:
- Presumed credibility which is based on beliefs. Brand names or .edu sites could be signals of presumed credibility to search engines.
- Earned credibility is another type of credibility and it is based on previous experiences. Search engines could derive earned credibility quality signals from on-site ratings and reviews as well as third party ratings and reviews about a website, company or product.
- Surface credibility is the third type of credibility and is based on face value. Surface credibility signals could include overall site design, the presence of trust icons and PageSpeed.
- Reputed credibility is another form of credibility. Reputed credibility signals could be derived by search engines from off-site references, citations or inbound links from other websites.
It is critical for content to fulfill a unique purpose, otherwise site content may be considered to have no purpose and therefore be considered low quality.
"Certain actions such as cloaking, writing text in such a way that it can be seen by search engines but not by users, or setting up pages/links with the sole purpose of fooling search engines may result in removal from our index."
Poor quality content purpose signals could include:
- pages with lots of words saying little
- pages with no purpose
- pages whose sole purpose is advertising or affiliate income
- blocking Archive.org
In terms of scope, the purpose of each page should align with the scope of the website as a whole.
"It's important for webmasters to know that low quality content on part of a site can impact a site's ranking as a whole. For this reason, if you believe you've been impacted by this change you should evaluate all the content on your site and do your best to improve the overall quality of the pages on your domain. Removing low quality pages or moving them to a different domain could help your rankings for the higher quality content."
Poor quality content scope signals could include:
- pages that don't appear to align with the purpose of other pages
- irrelevant content
- less than extensive information
- duplicate content
- general information
Uptime, accessibility and functionality can all be quality indicators.
Poor quality content reliability signals could include:
- site outages
- lack of functionality
- 404 pages that don't return a 404 error
- slow PageSpeed
Content should be relevant to the user's query based on the intent of the query.
"One of the most important steps in improving your site's ranking in Google search results is to ensure that it contains plenty of rich information that includes relevant keywords, used appropriately, that indicate the subject matter of your content."
Poor quality content relevancy signals could include:
- "CLICK HERE" anchor text
- high impressions with low clickthrough rates (CTR) in Google Webmaster Tools
- high bounce rates
- irrelevant links
- relevant content that is irrelevant based on searcher intent
- targeting on high value keyword terms
Content should be kept up to date and/or updated frequently depending on the purpose of the website.
"A frequently updated site encourages people to return - as long as your content remains relevant and engaging. A useful post once a week is better than low-quality content published daily. A great idea is to search Google for subjects of interest in your field. If you can't find a good answer, create a blog post about the subject - chances are that other people are searching for the same thing."
Poor quality content recency signals could include:
- old copyright dates on pages
- lack of byline dates in posts
- lack of modified dates in pages
- lack of support for If-Modified-Since
- software needing updates
- expired content
- out of date XML Sitemaps
- expired product feeds
Visual elements should align with the purpose of a site and not interfere with content. Site design should convey an appropriate level of professionalism and images should be original and high quality based on the purpose of the site. When ads appear, they should not interfere with content or user interaction with content.
"Provide high-quality content on your pages, especially your homepage. This is the single most important thing to do. If your pages contain useful information, their content will attract many visitors and entice webmasters to link to your site. In creating a helpful, information-rich site, write pages that clearly and accurately describe your topic. Think about the words users would type to find your pages and include those words on your site."
Poor quality content format signals could include:
- page layouts with excessive advertisements
- Ad placement in the way of content
- keyword stuffed footers
- duplicate stock photos
- lack of contrast between text and background color
- images without dimensions
- missing ALT attributes
- poor quality code code
- bad TITLE elements
Content should always be presented in an ordered and orderly fashion. Navigational elements should make it clear to users where they are within the site architecture. Sites should include navigational breadcrumbs with delimiters and links that accurately reflect site hierarchy.
"Users know good content when they see it and will likely want to direct other users to it. This could be through blog posts, social media services, email, forums, or other means. Organic or word-of-mouth buzz is what helps build your site's reputation with both users and Google, and it rarely comes without quality content.
Poor quality content arrangement signals could include:
- breadcrumbs without delimeters
- sites without a clear hierarchy
- image based navigational elements
- sites without an HTML site map for users
- too many links on a single page
- island pages
Content should be unique, factual, accurate, objective, use proper spelling and grammar. Both sides of a story should be presented when possible and with as little bias as possible. Unbiased content, reviews and rankings help increase perceptions of trustworthiness. According to Google "70% of Americans look at product reviews before making a purchase" and "the worldwide average for product reviews is a 4.3 out of 5.0."
"We looked at a variety of signals to detect low quality sites. Bear in mind that people searching on Google typically don't want to see shallow or poorly written content, content that’s copied from other websites, or information that are just not that useful."
Poor quality content treatment signals could include:
- poor spelling
- bad grammar
- inaccurate information
- factual errors
Additional Content Quality Resources:
- Google tips for Bloggers
- Library of Congress Business Services
- UNC Content Evaluation
- Introduction to IR
- UCLA Evaluation
- EPA Quality Evaluation
- Baylor Content Evaluation
- W3C Quality tips
- W3C Quality Guidelines
- W3c Quality Tools
- Google Blog Post on Quality Content
- Google Browser Tool
- BJ Fogg Presentation on Credibility
- Google Webmaster Guidelines for Content Quality