A comparison of Google's, Yahoo's, and Microsoft's indexThe same on WebGuild Silicon Valley
French version / Version françaiseIntroductionIndex sizeGoogle’s indexYahoo!’s indexMicrosoft’s indexIndex caching and refreshingConclusion[
Update – October 5, 2007] Five days after posting this article (in
French),
118 pages of the site are indexed on Google, which wins across the board for
exhaustiveness, relevance and speed. Without contest!

Yahoo! and Microsoft are still at the same point…and the others are worse: it’s unknown on Ask, and Exalead shows a thumbnail of a parking service for my site, which was parked over a year ago. Hello, relevance (it’s l'
exception française)!
* * *
IntroductionA few days ago I uploaded
XBRL.name, a glossary in 7 languages on
IFRS terminology.
For one, I was surprised to see that the domain name, which has existed on the site
Studio92.net for over two years, had retained the PR4 of the page it was on, but that wouldn’t last!

At the same time, you can imagine how avidly I’m on the lookout to see when my site will be indexed in the search engines. I check every day on
GYM.
The results are edifying! Here is the status as of October 1, after the site was uploaded on September 23, in other words in eight days.
I should specify that it’s not completed; only 1/7 of the site is finished, a little less than 200 pages out of approximately 1400 expected when the site is complete.
Finally, this post has no pretension to being more than it is: the simple tracking of a week of the indexing of a new site. Nothing scientific here, just a personal experience.
[Top]* * *
Index sizeIt goes without saying that each of the three
index generously exceeds
20 billion web pages!!! If you’re nostalgic, click
here...
The engines don’t communicate much on the topic, except Microsoft, which makes a point to let you know it has
caught up, quadrupling the size of its index from 5 billion to 20 billion pages. OK!
However, Yahoo! was already declaring more than
19 billion pages in…
August 2005 (despite
Jean Véronis’s questioning) and Google,
24 billion pages three months later (see
here, end of page 5)!
So while I partially agree with
Eric Enge when he states that
At some level, the exact index size is not a big issue, unless, your index is simply too small, I agree less with his idea that increased index size is related to increased relevance (
In short, Microsoft needed to make a move of this type to improve their relevance).
Relevance is not necessarily dependent on coverage (
What's at issue is coverage... and if you don't have the related sites in the index, you can't return the right result), since the engine may very well have the relevant site in its index and still keep quiet (not list a result).
And of course, Microsoft presented a demo to illustrate its point of view, specifically on "
shelli segal" and the site of a corresponding
designer, which appears first on Live Search but makes the grave error of being absent in Google’s index!
Might one suspect Microsoft of cooking up an ad hoc search just to justify its
relevance, relevance, relevance?
A good way to find out is to test it with
xbrl.name, where the
three search engines are on equal footing against it, since it was uploaded eight days ago without being intentionally presented for indexing; I just put the link on my blog and on several other sites.
[Top]* * *
Google’s index Until yesterday, Google returned 190 results total and gave the following excerpt for the site:
My SPIP site. Search. Home page. My SPIP site. Follow-up of the site's activity RSS 2.0 | Site Map | Private area | SPIP | template.
That is, it had saved the SPIP installation I tested, before opting for a site in HTML.

But today – sigh of relief – Google returns 300 results and finally
sees the new version of the site:

Conclusion: Google took note of the site in 8 days, although the content of the glossary
does not yet seem to be indexed.
[Top]* * *
Yahoo!’s index Yahoo! returns
30 results and the following excerpt:
This is the placeholder for domain xbrl.name. If you see this page after uploading site content ... This page has been automatically generated by Plesk.
Plus
one page correctly indexed. What about the 200-some others?

So Yahoo! presents a tenth as many results as Google and just one page indexed.
[Top]* * *
Microsoft’s indexJust one result! Period. Same excerpt as Yahoo.

Then that last line that kills me: “
Are you satisfied with Live Search? Tell us."
What to say? That in light of what preceded it, Microsoft definitely deserves its third place. Dead last!
The ranking is confirmed by my blog’s visit stats, as you can see in the table below:

Search engines were the source of 2,826 visits on
Adscriptor during September and represented 41.21% of total visits (188 visitors and 242 pages viewed per day, with an average time on site of 1'35'' per visit) (not everyone’s named
Otto, fortunately for him ;-).
With 2,575 referring links,
Google alone represents >91% of these visits, versus 5.4% from Yahoo! and three times less than Yahoo! for Microsoft. Google is overwhelming superior. Why?
Clearly, if Google weren’t there, I would have a presence on the Internet…with zero visibility on search engines!
[Top]* * *
Index caching and refreshingIn addition to size and relevance, one last aspect related to engine indices concerns their refreshing frequency, with a cache cycle that has shortened considerably recently for Google (I don’t use Yahoo! or Microsoft enough to say about them). Before, it seemed like the cache stayed around for a while and you could retrieve information several weeks later; now, it’s only a matter of days. For example, I was previously able to retrieve practically all of
Alexis Debat’s fake interviews, but as the days go on, fewer and fewer can be found.
[Top]* * *
ConclusionConcerning the performance
Microsoft claims,
Eric Enge is right when he says:
Ultimately, the point is, you can't return the right result if the site you should be returning for a given search is not in your index.
That’s clear. But it’s even worse to have the site in your index and not understand that the “right” site is precisely that one!
[Top]Share on FacebookP.S. Well, it seems that
Yahoo! and Microsoft are not giving up. They must have read my post overnight!
I tried
Yahoo! Search again (it was
recently improved, other details
here); the tool still offers no suggestions:

but it has finally correctly indexed the home page. Everything else was the same: 31 results total and only 2 of the site’s pages.

On
Live Search, too, the indexing is now correct for 2 of the site’s pages, which are the only results offered.

Meanwhile,
Google has gone
from 17 to 47 pages indexed: now several lengths ahead of the competition.
That said, given the
number of web pages on the Internet (???), it’s pretty remarkable to see a new site indexed in eight days on
GYM. And it makes sense why the next steps in
searching in 2010 will be:
- search engine verticalization
- personalization of results
- universal search
Not to mention
local search...
[Top]
GYM,
Google,
Yahoo,
Microsoft,
search engine,
index,
relevance,
Internet