Why Is Anyone Still Using Google search? |
Opinion: How many times must Google be shown to be totally irrelevant, inaccurate, and ads driven when there is the Live search? Citation: Google is a lot like McDonalds. Both are global forces and probably aren't going anywhere soon. But you feel kind of guilty after buying or using anything from either of them, and you usually have a bad taste in your mouth for days. |
Search engine? What can be simpler? Do you think so? You can be right though. Indexing Internet content although requires time and processor recourses is quiet straight forward task. So where a problem is?
What's behind every problem and how it reflects in search result? Ok, let's look:
If index isn't updating periodically then a search result can contain broken links. It can be solved looking in search cache, however actual cause of broken link can be that previous version of link had incomplete or wrong information. Cache also doesn't allow to do drill down to get more information if it looks like relevant. Other case of obsolete link, that a link can be still active but content can not match search query.
Although web crawling is well known task, many search engines use very simplistic crawlers which can't handle well links calculated in JavaScript or links reachable from navigation trees or other navigation controls with multiple states. Reason of that that manipulation with navigation control can just chnage its state without bringing a new result. So many crawlers gave up after certain number of attempts not bringing anything different. Some sites, like blogs can not provide an index of content, like live journal. In this case search engines are not smart enough to perform searching tasks requiring elements of AI. Needless to say about impossibility of reaching protected content.
Many search engine doesn't assign any categorization information to indexed web content. So all pages look equal. It makes search result mostly irrelevant to a search query.
Categorization a search query makes sense only if index has categorization too. Matching query categories to categories in an index can provide much more relevant results.
Search result should be sorted by certain criteria, like last time update, or most relevant upon used categorization. Filtering can be helpful to eliminate duplicated or similar result coming from the same source.
Although actual number of entries matching a search query can be big, most of search engine will restrict a temporary built result to 1000 or less entries. It makes very likely that final result won't include search goal, especially if above problems exist in a search engine.
Indexing can be incremental like providing changes for updated pages. Activity index is also helpful to find out recently changed information relevant to a search query.
Result data shouldn't be driven by advertisement pushing certain result on first page. Censorship shouldn't be also applied to hide certain results. Funny thing that two giants as Microsoft and Google have controversial censorship patterns. For example I wrote a small blog about how to use Google mail with Java mail API. Do a search for words: javamail google using Microsoft's live search . A link to my blog will be second on first page. However if you try Google or Yahoo search, then you won't be able to find this link at all. You may guess, that Google doesn't index the blog? It isn't possible, because you can find link to my article from some page found by Google. So certainly Google twit is applied for my blog entry. Read more about Google censorship here.
Result shouldn't be build or requester location. Some engine like Google provides analyze of requester IP or language settings and trying to use result sorting based on location coefficient. A requester has to have choice to disable location involvement.
To get an answer on this question I wrote a small tool which helps in deep analyzing results provided by a search engine. The following score system for search engines was introduced:
All above tests perform for 10 different queries in the following categories:
Relevant | Actual | Comprehensive | Total | |
Yahoo | 1 | 2 | 2 | 5 |
MSN | 3 | 3 | 3 | 9 |
2 | 2 | 2 | 6 |
Download finesearch and test yourself to see how bad your favorite search engine Google is. MSN search engine seems improving last time, results became more comprehensive and accurate. Yahoo is slippig. The company is busy on reorganization their software. A friend of mine Yahoo's chief architect told me about this reason. They also hired a lot of new people who need some time to get familiar with the products of the company. I have to update recognition patterns for Yahoo every 2 months, they can't still establish output format. Google is so so, and keeping second place. Google is still very good in software development related searches, however has very poor blog coverage. Very relevant blog entries listed by MSN on first page do not appear in Google or Yahoo search results at all. Yahoo's crawler seems the best when working on sites with cumbersome navigation. For education purpose, the tool has search support over Craig's list. Please use this feature with precaution and do not abuse Craig's list.
Finesearch is easy deployable on such app servers as TJWS, Tomcat, or Sun Java(TM) System Application Server. However binary of Finesearch is prepackaged with TJWS, so you do not need to bother downloading and installing anything else. Just type java -jar finesearch.jar and select URL http://localhost:8080/finesearch in your browser. If you have port 8080 conflict, then edit rundescriptor inside of jar specifying a different port number.
Note that due instability of SF.net CVS, I decided to move CVS repository to my home machine. So SF.net CVS tree doesn't reflect the latest project code status. I'll try to provide weekly builds to allow get changes faster.
It's quite easy and will take just few minutes with Search Director, so even if your time is very valuable, you still can save hundred dollars on payment to Google. I'll provide a step by step instruction on a simple example.
I have a Java build tool and want to get it reachable for people who're trying to choose a building tool. I've created a home page for this build tool at http://7bee.j2ee.us/bee/index-bee.html . The tool name is 7Bee.
Generally this step is optional, because even if your page not listed on direct name search, then following to next steps of the remedy plan you have a good chance to get it listed and generate traffic to it.
Run Search Director an use relevant query like java build tool. Add a filtering rule with a name of your product, like 7bee in our case. make sure that result produced by a search engine didn't bring any entry on your product. There is no surprise.
Blogs, forums, surveys, visitors lists, encyclopedias, and any other type of pages allowing content management are your target. (You do not need to hack in WebDav or something like that.) Use filtering words like blog, forum survey, and so on to find such pages. You can also filter out direct competitor's pages by specifying their product names with low count value, like 1-2. Doing that I found two pages looked promising as:
After remedy actions defined in previous steps I could see my tool listed instantly in search results. Although search engines as Google still do not provide direct links to a product, it doesn't seem like bad, because a mention of the product can be found in first page of any relevant search.
I used the same technique to promote my servlet container which currently listed as 2nd entry of Google's result. Without usage of this tool it wasn't listed at all. Needless to mention that nothing required to make my products listed by MSN search within 2 first pages of a result. Certainly MSN produces much less biased and more relevant results. MSN search is just lacking in search of problem solving pages. So Google wins here. It looks like internally Google can have a good ranking mechanism, however it's completely destroyed by ads.
You're already aware those multiple problems of widely used search engines. So, is there any solution to avoid problems and create a right one? Certainly a solution exists and not one. I and my group are working on a new generation of search engines. We name it as an intelligent search engine, although it's probably too much. I'd like to share with you some design principles and hope to get some feedback from you, because you will be all users of the new search engine quite soon. I have also opennings in my group, so if you feel smart enough, then welcome to join. So, how does it work?
Contact dmitriy@google.com or sign on for jAddressBook account and check a shared folder
Note that some criticism to direction of some companies has a goal to improve products developed by them and doesn't have any personalized or other colors.