Search engines have focused on simply "matching keywords to queries" for years. This approach is slightly problematic however, because it disassociates keyword meanings for multiple keyword queries. For example, search engines might interpret the query [Paris Hilton] (a proper noun and named entity) as simply a request for instances where the words "hilton" and "paris" appear within a page. With a large enough set of data, fortunately it is possible to make statistical inferences about the intent of a user's query. As a result, Google has relied on statistical inference for uncertain data queries like [Paris Hilton] and [b&b ab] (bed & breakfast in Alberta) for years.
In 2010 Google purchased Metaweb Technologies, Inc. which was the company behind Freebase. Freebase was/is an "open, shared database of the world's knowledge". Before being acquired by Google, Metaweb was in the process of identifying millions of "entities and mapping out how they're related" via Freebase. In addition to entity mapping, Freebase also looks at what words other sites use to refer to entities. In May 2012 Google launched "Knowledge Graph," a “graph” which is built in part on Freebase. According to Google, Knowledge Graph can "understand real-world entities and their relationships to one another." Google hopes Knowledge Graph will improve search results and provide more immediate answers to user's questions in search results pages.
The concept behind Freebase and Google's use of graphed entities is pretty interesting but, I would like to know more about what is really going on under the hood of Google Knowledge Graph. Since Knowledge Graph launched, I have spent hours trying to break it, find bugs, discover issues and/or to identify abnormalities. Remarkably I must say, until last week I had found very little. Then as they say, "it happened!" Last Thursday, while looking for a good example of Google Knowledge Graph results to use in a presentation, I got the search result below.
Suddenly it dawned on me, Matt did not go to UNC Law School!
I clicked on "University of North Carolina School of Law" in Matt's Google's Knowledge Graph result under his bio from Wikipedia but, it returned search results for another entity [university of north carolina at chapel hill]. From that result, I searched for [unc] and was returned this result.
Just to be sure what I was seeing was correct, I deleted all cookies, signed out of Google and restarted my browser. After refreshing all of my settings, I searched for [unc founded] and was returned this search result.
At that point, I realized UNC's founding date even seemed off? I checked and according to the University of North Carolina Planning Department, UNC was founded in 1793 not 1789. To be sure this was not the date UNC's Law School was founded, I checked the UNC School of Law website. According to the site, the first law professor did not arrive at UNC until 1845. Then went back and checked Wikipedia's page for UNC and it did not contain any text being displayed in Google's Knowledge Graph search results either.
With the suspected smoking gun already in hand, I went to Freebase.com and searched for [UNC]. You guessed it, Freebase.com's first result for [UNC] was exactly what had appeared in Knowledge Graph results "University of North Carolina School of Law". It turns out Matt is not alone, all UNC graduates listed in Freebase.com are listed as UNC School of Law graduates even if they did not attend the UNC School of Law. At that point it was clear, Google Knowledge Graph "thinks" UNC and UNC's School of Law are a the same or a single entity because that is what Freebase.com is "telling" Google Knowledge Graph.
Because Freebase data appears in Google Knowledge Graph search results and Google's main search results this issue also means results for 100+ notable figures are potentially incorrect. For instance, according to Google Knowledge Graph results US President James K Polk graduated from UNC's School of Law but UNC's School of Law was founded when he was already in office.
In addition to Matt Cutts and President Polk, search results for [Michael Jordan college] in Google's main search results are also incorrect due to this issue.
Other UNC School of Law alumni according to Freebase and potentially Google Knowledge Graph, include Alge Crumpler, Lawrence Taylor, Andy Griffith, Rick Dees, Roger Mudd, Vince Carter, Jerry Stackhouse and even Thomas Layton, the former CEO of Metaweb.
This issue is potentially due at least in part to the fact that only a shell page for UNC (UNC being the parent University of UNC Law School) existed in Freebase.com until yesterday. To hopefully help improve the quality of Google Knowledge Graph results, I added an image, description, UNC's correct founding date and other information from UNC.edu to UNC's Freebase page yesterday.
With fingers crossed that Matt's wild and crazy UNC Law School days are not his best kept secret, that my site won't vanish from Google tomorrow and that the US Secret Service won't show up at my door, I removed "Law School" from both Matt's and President Polk's profiles in Freebase. As a result, Matt Cutts and President Polk are now the only non-Law School students / graduates in UNC's Freebase page. It will be interesting to see how long these changes take to appear in Google's Knowledge Graph search results.
Google Knowledge Graph is really interesting and seems to be working pretty well despite a few bugs. This is yet another edge case but a situation you should know about. Instances where different entities have the same or similar names are problematic. Instances were multiple keywords are similar to multiple keyword entities are also problematic. Google may already be using Knowledge Graph data based on Freebase.com to determine whether on not content falls in or out of scope. For all of these reasons and others, it is important to ensure you keep an eye on Knowledge Graph results that relate to you. If you notice issues, click on "feedback" just below Knowledge Graph results on the right hand site of Google search results pages.