« LinkedIn CheatSheet #1: How to make your name visible beyond your 3rd Degree | Sacred Cow Dung Home | What Did I Mean by Cheat? Clarification to "The Cheaters' Guide to LinkedIn" »
August 15, 2005
Isn't Smaller Just Better? More on this Stupid "Search Engine Index Size" Controversy
I like to look at trends. And I like to periodically measure the trends I look at.
Perhaps they are misleading but I do find tracking the rates of change in certain concepts on the Internet (and blogosphere) useful. I’d like to think these measurements provide me with some sort of early warning of things to come. (Wishful thinking on my part, no doubt.)
So, for several years now, I have been using a couple of utilities which regularly spider the various search engines to track rates of change in frequency of certain keywords, phrases, buzzwords, links, etc.
Since I only care about changes, the absolute numbers made no difference to me. I had always assumed the differences in search results could be chalked up to different search engine methodologies — and never thought that the sheer size of search results made a significant difference in the utility of a given search engine. If I had, I would have switched to the other search engines long ago.
Yes, I do meta-searches across various search engines when I am really trying to “dig deep” (I still use the Copernic Client for this but I know some are using Dogpile online these days for meta-searching). And Yes, I am using some of the newer blogosphere search engines as well.
But I still use Google as my first cut search engine for just about everything.
So when this controversy over index size began (by Yahoo, I believe), I was a bit surprised that anyone really cared. And then I realized that —
“what these other search engines are doing is leading with their weakness — by pretending it’s their strength”
One of the things I have consistently noted — but thought nothing of — was that Google consistently produced FAR FEWER search results. If fact, I have just looked over the absolute numbers of hits I have been tracking over the last few years and I find the following —
In terms of number of hits retrieved:
Yahoo > AllTheWeb > AltaVista >> MSN Search > Google >> Teoma
In other words, you can sort search engines into three groups — based on the size of their search results —
Group I — HIGH RETRIEVAL
Yahoo, AllTheWeb, AltaVista
Group II — MEDIUM RETRIEVAL
MSN, Google
Group III — LOW RETRIEVAL
Teoma
Group I Search Engines (Yahoo, AllTheWeb, and AltaVista) consistently produce 10 times more results than Group II Search Engines (MSN or Google).
Group II Search Engines (MSN and Google) consistently produce 10 times more results than Group III Search Engines (Teoma).
Now I would argue that, with the notable exception of Teoma —
the size of the search results is inversely proportional to the popularity of the search engine
In other words, while Google Search consistently produces 10 to 15x less hits than Yahoo Search, I would argue that Google Search is 10 to 15x more popular than Yahoo.
So what’s the issue? Doesn’t smaller results mean better targeting or higher relevance? Shouldn’t smaller mean more sophisticated and advanced technically?
Isn’t smaller just plain better?
BTW, I’m not sure the significance of this but over the last week or so I’ve actually seen a decrease by almost half of the total number of hits seen for a variety of searches over Yahoo. I assumed this was due to a change in methodology, but who knows — conspiracy theorists might equally assume that Yahoo had been inflating results and with the increase in inspection of their index over the last week, decided they might need to “strip a bit of redundancy” from their index.
Related Links
- buygoogle.com - Google investment news, valuation, business prospects
- This is the post that got me thinking about my own search engine trend tracking as possibly relevant to this debate of search engine index size. Of course, anyone with a site called buygoogle.com is likely to be a bit biased. So it’s not surprising than he outlines his methodology, does 10 searches, and, sure enough, comes to the exact opposite conclusion that I do. No need to make Google look better through size alone. Perhaps it would have been better to “leave your loved one alone” and not to post — than to post something which is just plain wrong.
Current Links
- Size of Googles Image Index Suddenly (Almost) Doubles - The Unofficial Google Weblog - google.weblogsinc.com _
- Yahoo! Index Size Inflated?
- Preoccupations: Yahoo!'s edge?
- Blogator.com - News Aggregator
- In Silicon Valley, a Debate Over the Size of the Web - New York Times
- Brilliant Thinking » Yahoo! bigger than Google
- Yahoo! search claim sparks Google challenge: report - Aug. 15, 2005
- Screw Size! I Dare Google & Yahoo To Report On Relevancy
- AardvarkBusiness.net Business Forum
- | Yahoo, Google Tussle Over Search Index Size | Technology Updates
- Search-Science: My index is bigger than yours
- aTypical Joe: A gay New Yorker living in the rural south.: Times technology: Size, spam, & will a fad fizzle?
Previously Noted Links
- Sacred Cow Dung: CONTROVERSY OF THE WEEK -- Does Size Really Matter? Google vs Yahoo
- Yahoo! Search blog: Our Blog is Growing Up And So Has Our Index
- John Battelle's Searchblog: How Many Pages Does Yahoo Index?
- John Battelle's Searchblog: In This Battle, Size Does Matter: Google Responds to Yahoo Index Claims
- John Battelle's Searchblog: More On Yahoo, Google, Index, Size
- aTypical Joe: A gay New Yorker living in the rural south.: Size matters
- More on the Total Database Size Battle and Googlewhacking With Yahoo
- Guillaume's Tech Blog: Google confused by Yahoo!
- Spare us the details
- Preoccupations: Yahoo's search figures
- www.eirikso.com » Blog Archive » The increase of pages indexed by Yahoo shows on Trendmapper?
- Boing Boing: Google responds to Yahoo's search index claims
Posted by cmayaud at 01:20 PM | Permalink| Comments (4)
Del.icio.us Tagging |
Digg This
| Posted to Blogging | CONTROVERSY OF THE WEEK | TRENDS | Venture Capital Process
Comments
"was that Google consistently produced FAR FEWER"
Can you prove it ?
I mean with results ?
How do you reach this ? Simply by looking to the total number of results displayed on the page ?
Usually everybody does affirm the exact inverse, Google produced far more, so I would be interrested to learned what make you think it does produce far fewer.
Thx
Posted by: Jean-Philippe at August 15, 2005 03:56 PM
One way quick way to prove this for yourself is to use a Link Analysis Tool such as "Link Popularity Check" (which you can download free from www.checkyourlinkpopularity.com) While this isn't a tool I use frequently, it's results are consistent with other spidering tools I use and shows the same relative results.
Again, I think this doesn't matter because it is the relevance of the results -- not the raw number produced -- which matters.
I am on googles side here -- we are not comparing Apples to Apples if we are looking at index sizes or search results as proxies to index sizes
Posted by: Christian Mayaud at August 15, 2005 04:06 PM
Right,
But the size of the index matters, especially because of the marketting impact or even if your index is larger then you can afford different kind of query etc ...
Regarding the "link popularity check" (argh a Windows software), I'm wondering how can you retreive usefull information from Yahoo:
ie:
http://search.yahoo.com/search?p=linkdomain%3Awww.sacredcowdung.com&prssweb=Search&ei=UTF-8&fl=0&x=wrt
http://www.google.com/search?q=link:www.sacredcowdung.com&hl=en&lr=&safe=off&filter=0
Despite the fact that Yahoo does show an impressive number of 800something links, are they all relevants ? sometimes the sacredcowdung link don't even appear on those web pages.
Posted by: Jean-Philippe at August 15, 2005 05:52 PM
The trend here is that numbers on the web are becoming ever more meaningless.
Here's a problem I personally have: I'd like to track the expansion of the use of selected terms, but I can't tell if a rising number is due to increased usage or broader indexing or oddball tweaking of algorithms, or even temporarily inaccessible web sites. The search engine penchant for estimating total hits is of course quite reprehensible. Imagine, all that compute power and their software can't even count.
Another problem: two or more people on different machines can hit search at exactly the same moment and get widely different numbers, even though each each see the same numbers if they hit search again. That's *not* a user-friendly feature of Google.
In truth, the focus should be on relevance, not quantity, but that's too hard to do, so they take the easy, brute force route and give us ever-higher *estimated* numbers, without disclosing the precision of those whacko numbers.
-- Jack Krupansky
Posted by: Jack Krupansky at August 15, 2005 07:29 PM
Post a comment
Thanks for signing in, . Now you can comment. (sign out)
(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)


