This stuff is tough
Thursday, February 25, 2010
Yesterday's news that the European Commission has opened a preliminary inquiry into competition complaints from three companies has generated a lot of questions about how Google's ranking works. Here, Amit Singhal, a Google Fellow responsible for ranking, who has worked in search for almost 20 years, explains the principles behind our algorithm.
Pop quiz. Get ready. You're only going to have a few milliseconds to answer this question, so look sharp. Here goes: "know the way to San Jose?" Now display the answer on a screen that’s about 14 inches wide and 12 inches tall. Find the answer from among billions and billions of documents. Wait a second - is this for directions or are we talking about the song? Too late. Just find the answer and display it. Now on to the next question. Because you'll have to answer hundreds of millions each day to do well at this test. And in case you find yourself getting too good at it, don’t worry: at least 20% of those questions you get every day you’ll have never seen before. Sound hard? Welcome to the wild world of search at Google. More specifically, welcome to the world of ranking.
Google ranking is a collection of algorithms used to seek out relevant and useful results for a user's query. There's a ton that goes into building a state-of-the-art ranking system like ours. Our algorithms use hundreds of different signals to pick the top results for any given query. Signals are indicators of relevance, and they include items as simple as the words on a webpage or more complex calculations such as the authoritativeness of other sites linking to any given page. Those signals and our algorithms are in constant flux, and are constantly being improved. On average, we make one or two changes to them every day. Lately, I’ve been reading about whether regulators should look into dictating how search engines like Google conduct their ranking. While the debate unfolds about government-regulated search, let me provide some general thinking behind our approach to ranking. Future ranking experts (inside or outside government) might find it helpful. Our philosophy has three main elements:
1. Algorithmically-generated results.
2. No query left behind.
3. Keep it simple.
After nearly two decades, I’ve lost count of how many times I've been asked why Google chooses to generate its search results algorithmically. Here's how we see it: the web is built by people. You are the ones creating pages and linking to pages. We are utilizing all this human contribution through our algorithms to order and rank our results. We think that's a much better solution than a hand-arranged one. Other search engines approach this differently -- selecting some results one at a time, manually curating what you see on the page. We believe that approach which relies heavily on an individual's tastes and preferences just doesn't produce the quality and relevant ranking that our algorithms do. And given the hundreds of millions of queries we have to handle every day, it wouldn't be feasible to handle each by hand anyway.
This brings me to the next point: leaving no query behind. Usually once I've explained to people the thinking behind algorithmically-generated results, some will ask me, "But what if you do a search, and the results you see are just plain lousy? Why wouldn't you just go in there by hand and change them?" The part of this question that's valid is in terms of lousy results. It happens. It happens all the time. Every day we get the right answers for people, and every day we get stumped. And we love getting stumped. Because more often than not, a broken query is just a symptom of a potential improvement to be made to our ranking algorithm. Improving the underlying algorithm not only improves that one query, it improves an entire class of queries, and often for all languages around the world in over 100 countries. I should add, however, that we do have clear written policies for websites that are included in our results, and we do take action on sites that are in violation of our policies or for a small number of other reasons (such as legal requirements, child porn, spam, viruses/malware, etc.). But those cases are quite different from the notion of rearranging the page you see one result at a time.
Finally, simplicity. This seems pretty obvious. Isn't it the desire of all system architects to keep their systems simple? We work very hard to keep our system simple without compromising on the quality of results. This is an ongoing effort, and a worthy one. Our commitment to simplicity has allowed us innovate quickly, and it shows.
Ultimately, search is nowhere near a solved problem. Although I've been at this for almost two decades now, I'd still guess that search isn't quite out of its infancy yet. The science is probably just about at the point where we're crawling. Soon we'll walk. I hope that in my lifetime, I'll see search enter its adolescence.
In the meantime, we're working hard at our ongoing pop quizzes. Here's one last one: "search engine." In 0.14 seconds from among a few hundred million pages, our initial results are: AltaVista, Dogpile Web Search, Bing and Ask.com. I guess I'd better get back to work.
Pop quiz. Get ready. You're only going to have a few milliseconds to answer this question, so look sharp. Here goes: "know the way to San Jose?" Now display the answer on a screen that’s about 14 inches wide and 12 inches tall. Find the answer from among billions and billions of documents. Wait a second - is this for directions or are we talking about the song? Too late. Just find the answer and display it. Now on to the next question. Because you'll have to answer hundreds of millions each day to do well at this test. And in case you find yourself getting too good at it, don’t worry: at least 20% of those questions you get every day you’ll have never seen before. Sound hard? Welcome to the wild world of search at Google. More specifically, welcome to the world of ranking.
Google ranking is a collection of algorithms used to seek out relevant and useful results for a user's query. There's a ton that goes into building a state-of-the-art ranking system like ours. Our algorithms use hundreds of different signals to pick the top results for any given query. Signals are indicators of relevance, and they include items as simple as the words on a webpage or more complex calculations such as the authoritativeness of other sites linking to any given page. Those signals and our algorithms are in constant flux, and are constantly being improved. On average, we make one or two changes to them every day. Lately, I’ve been reading about whether regulators should look into dictating how search engines like Google conduct their ranking. While the debate unfolds about government-regulated search, let me provide some general thinking behind our approach to ranking. Future ranking experts (inside or outside government) might find it helpful. Our philosophy has three main elements:
1. Algorithmically-generated results.
2. No query left behind.
3. Keep it simple.
After nearly two decades, I’ve lost count of how many times I've been asked why Google chooses to generate its search results algorithmically. Here's how we see it: the web is built by people. You are the ones creating pages and linking to pages. We are utilizing all this human contribution through our algorithms to order and rank our results. We think that's a much better solution than a hand-arranged one. Other search engines approach this differently -- selecting some results one at a time, manually curating what you see on the page. We believe that approach which relies heavily on an individual's tastes and preferences just doesn't produce the quality and relevant ranking that our algorithms do. And given the hundreds of millions of queries we have to handle every day, it wouldn't be feasible to handle each by hand anyway.
This brings me to the next point: leaving no query behind. Usually once I've explained to people the thinking behind algorithmically-generated results, some will ask me, "But what if you do a search, and the results you see are just plain lousy? Why wouldn't you just go in there by hand and change them?" The part of this question that's valid is in terms of lousy results. It happens. It happens all the time. Every day we get the right answers for people, and every day we get stumped. And we love getting stumped. Because more often than not, a broken query is just a symptom of a potential improvement to be made to our ranking algorithm. Improving the underlying algorithm not only improves that one query, it improves an entire class of queries, and often for all languages around the world in over 100 countries. I should add, however, that we do have clear written policies for websites that are included in our results, and we do take action on sites that are in violation of our policies or for a small number of other reasons (such as legal requirements, child porn, spam, viruses/malware, etc.). But those cases are quite different from the notion of rearranging the page you see one result at a time.
Finally, simplicity. This seems pretty obvious. Isn't it the desire of all system architects to keep their systems simple? We work very hard to keep our system simple without compromising on the quality of results. This is an ongoing effort, and a worthy one. Our commitment to simplicity has allowed us innovate quickly, and it shows.
Ultimately, search is nowhere near a solved problem. Although I've been at this for almost two decades now, I'd still guess that search isn't quite out of its infancy yet. The science is probably just about at the point where we're crawling. Soon we'll walk. I hope that in my lifetime, I'll see search enter its adolescence.
In the meantime, we're working hard at our ongoing pop quizzes. Here's one last one: "search engine." In 0.14 seconds from among a few hundred million pages, our initial results are: AltaVista, Dogpile Web Search, Bing and Ask.com. I guess I'd better get back to work.
Posted by: Amit Singhal, Google Fellow
Update 2 March, 10:30am
First of all, let me thank everyone for their kind comments and honest views in this discussion. Gary, I love search, after having done search for almost 20 years, I still come into work every morning like a kid going to a candy store. Alongside my passion for search, one fact that keeps me so excited is that what was science fiction in search research twenty years ago is now coming to fruition at Google. The semantic systems we have built are something I didn't expect to build in my lifetime. Secondly, Google has given me an environment where researchers like me can practice search in its pure algorithmic form. I can't put in words how incredibly satisfying this combination is for a search geek like me :-)
Posted by: Amit Singhal, Google Fellow
Posted by: Amit Singhal, Google Fellow
Amazing stuff
ReplyDeletegreat insight into the search technology..keep up the good work Google
ReplyDeleteGreat job Amit. I love that you clarified the definition of Google ranking being a collection of algorithms. Too often we think about it (or drive ourselves nuts about it) being based on one algorithm or formula.
ReplyDeleteMy favorite attribute about the entire Google search philosophy is 3. Keep it simple. Even after the YaBing conversion, Google will always have minimalism as a sustaining value proposition - and that's the way I like it.
Amit - you forgot to mention one last thing: those results for "search engine" will vary depending on where you live, where your web hosted is serving from, what you've search for and clicked on before, and what Google data center you are hitting at the time. Then, you leave the user to contemplate whether they want to see a web search result, a news result, a social network, a blog post, a book result, or a paid ad. While diversity and options are great, this may leave me, as the user, so much more confused and almost forgetting what I originally wanted to find.
ReplyDeleteNice! Love the first paragraph. Well done.
ReplyDeleteReally exciting to hear just how much more evolution there is in search.
ReplyDeleteI work in seo and can't wait for caffeine to 'kick in'. What I look forward to is socialprseo all coming together and link building on mass becoming a thing of the past.
Google really has a great thing, 'Not be stuffy' should be the next mission statement after not be evil that is!
Matthew D. Wright
This was really clear and I appreciate the honesty behind Googles search philosophy.
ReplyDeleteIf you don't mind me asking, why have you spent 20 years in search? You seem to have a huge passion for it and know the landscape for it's future. Even though I seem to have answered my own question(?) are there any other reasons?
I don't know if it's the same in the other countries, but in France (the place the complaints comes) the feeling is Google is manupaling the search results manually.
ReplyDeleteSome magazines or online website made the demonstration about some "rank boost" for some websites (e.g. allocine.com for movies, commentcamarche.net for hi-tech topics, etc...).
There's also in the background some internal Google docs about the manual search result flaging ("mandatory", "relevant", "spammy")...
Believe me, if you experience Google in France, you can sometimes have questions about some search results.
In the end, this inquiry is not a big surprise from a french view.
What is such a shame is that the interfering government do gooders will probably read the first line of this excellent post, consider themselves informed and then go off on a money wasting spree similar to the UK's bailout of Northern Rock. They will set limits, goals and other useless targets and then conveniently forget what they were meant to be doing in the first place when they get called infont of a parliamentary committe or a congressional oversight committee!
ReplyDeleteHowever without such imbeciles I wouldn't have any one to rant about and this post would have just said....
Excelent and clear info, keep up the knowledge stream, we are all sponges out here!
"... Here's one last one: "search engine." In 0.14 seconds from among a few hundred million pages, our initial results are: AltaVista, Dogpile Web Search, Bing and Ask.com. I guess I'd better get back to work"
ReplyDelete: )))))))))))))))
clear and honest from Google. thanks Amit.
ReplyDeleteUncle bill and his shills have a long standing credibility problem that extends back to well before Google first lifted Susan Wojcicki's garage
ReplyDeletedoor.
I've always been impressed with
Google's ability to stick to it's
philosophies, and continue to pursue
the magic.
Even though I started programming before the advent of the PC, I still believe in the magic of what we do and
cherish the presence of companies like
Google that exemplify magic in motion.
Given the difficulties that governments
have in running themselves, I could
just imagine what a government search
engine would look like, and just how
err honest those results might be. :)
Keep to the light!!!
--Doc
I think it is good to have some mechanical approach in having an algorithm working for us.
ReplyDeleteIt is self regulating in the sense that if it doesn't work, it will get punished and have to adapt to give better results.
It also gives us humans an opportunity to take some distance from a problem and do not interfere in the process in an attempt to have control over it.
It also enlightens us from boring tasks so we can use our creativity for things that are more fun to us.
you are very funny google, your statement sounds clear and now all people understand the algo, but why google not explain us this:
ReplyDeleteIf Company 1 have many landingpages and/or affiliates, all this guys have relevant sites or short urls maybe, and you display in the first 10 of 5 millions results 8 from Company(a1) with the effectiv same target url ! Than its not a question how your indexing and algo is working, its more a question why you not block more results with the same target url ??? or is it not true also is it me i see this issue only ???
This looks for me anti competition and nobody can tell me now that have to do with SEO or someone else, it have to do simple google not block this target urls and give 8 of 10 first results to the same company, i call this prefered listing !
Its easy to fix, if you crawl make a limit of 2 same target urls for the first 100 results, dont display the landingpages and short urls what have effectiv same target and nobody will cry.
Will think about !
Great info. It is just a pleasure to read this blog.
ReplyDelete[This looks for me anti competition and
ReplyDeletenobody can tell me now that have to do
with SEO or someone else,...]
There are certain situations where my
attempts to perform research are
hampered by junk result sets.
No matter how clever an algorithm Google
devises, there will be folks looking to
game the system and conversely folks
who inadvertently demote themselves.
It's a delicate balancing act for both
the indexer and the indexed in trying to
get both the quality content that is
oblivious to SEO and the content
leveraging every last strategy to weigh
out more by their relevance then their
cleverness in jousting the crawler.
Overall, I'm pretty happy with the
results I get, and when I get junk
it's typically pretty obvious to me
the nature of the issue and I adjust my queries accordingly.
Great to get reaction from Google directly but imho it includes a little bit to less really new instructions ;-)
ReplyDeleteEspecially for renoseo germany
Cheers!
2. No query left behind.
ReplyDeleteInteresting that you feel fixing one lousy result will fix a host of other issues. i would think the opposite would be true or possible. Identifying a solution specific for one set of results could cause a huge list of problems for many other website that were probably doing nothing wrong.
This type of thinking really helps me understand why mom and pop shops that are doing nothing wrong get de-indexed or demoted all of the time in your index.
Gary said...
ReplyDeleteThis was really clear and I appreciate the honesty behind Googles search philosophy.
If you don't mind me asking, why have you spent 20 years in search? You seem to have a huge passion for it and know the landscape for it's future. Even though I seem to have answered my own question(?) are there any other reasons?
February 25, 2010 6:13 PM
---
First of all, let me thank everyone for their kind comments and honest views in this discussion. Gary, I love search, after having done search for almost 20 years, I still come into work every morning like a kid going to a candy store. Alongside my passion for search, one fact that keeps me so excited is that what was science fiction in search research twenty years ago is now coming to fruition at Google. The semantic systems we have built are something I didn't expect to see in my lifetime. Secondly, Google has given me an environment where researchers like me can practice search in its pure algorithmic form. I can't put in words how incredibly satisfying this combination is for a search geek like me :-).
By representing a heuristic as an algorithm, Amit has attempted a sleight of hand. An algorithm can be tied back to a body of knowledge, such as mathematics, from which the algorithm is obtained.
ReplyDeleteSearch is informed by human intuition - it is heuristics that drive search ranking. Expressed as what programmers tend to call an algorithm, a heuristic embeds in a computer program, social and cultural assumptions. Expression as a computer program doesn't make the heuristic and the assumptions behind it, transparent, or remove the cultural and social biases. If anything, representing a value laden heuristic as a neutral algorithm allows Google to conceal US cultural values as eternal truths - deceiving the observer into thinking that human factors play no part in choosing how to rank search results.
Google's problem with Europeans is that at every turn, Google shows it favours the US view of the world and US interests, over non-US. Why then, faced with a search engine based on human informed heuristics, should we not fear implicit US favour in the results?
Explain the technical mechanisms, and demonstrate in actions, that people outside the US are regarded as being of equal value, and you may remove the fear that Google is implicitly favouring US interests.
When you have US residents, paying US taxes in US dollars, subjected to parochial US media, under US law, why would we assume that you're thinking about us, and valuing us? Show us that you genuinely *care* about us, and you'll change opinions.