 | I've been wondering about how google/other serach engines store their indexes. I know google had the programming contest where one person decided to try and store entries with similar topics together, and then had them indexed.
Of course the internet is growing day by day, and the number of people that have their own webisites with "unique" information is amazingly high. Googles index is already above 8 billion, so what if there was a way to cut that down even more? Who knows, the next techonoligical advancement could lead to their being 20 billion indexable pages, that might put alot of possibly non-nessicary work on googles computers.
Does anyone have any other ideas to storing and indexing pages? Maybe grouping by topic? subject? or just start mass indexing only XML feeds?
I think it would be cool to see sombody come out with a new break to the index method.
|
|