User value and complexity in community search.

By Jari Koister

Search is a continuously evolving field of innovation. Within the last 10 years, mainstream search has improved significantly through the introduction of statistical methods and collective intelligence. Google is of course the main example of how successful an improved search service can be. Improved collective intelligence functions, such as those introduced by Amazon, show how much users appreciate getting relevant and related information automatically. As the amount of information available on the web increases, so does the importance of search and filtering. This development shows in the number of new search companies popping up and starting to apply advanced techniques. The approaches vary widely. Some companies such Qihoo mine communities for questions and answers. Other services such as Quintura focus on cognitive models and visually driven search functions. Yet others such as Hakia and PowerSet focus on natural language analysis. In addition, new services such as Twine and Freebase attempt to add semantic meta data so that applications can intelligently find semantically related data. These companies all hope to be the next revolutionary search function, providing a user experience that will overshadow existing search services.

I have always been a proponent of applying techniques and technologies based on the problem at hand. I remember as a student when I met researchers who insisted on developing everything in Prolog or Scheme regardless of the problem at hand. Or object fanatics who could not accept any other tool than Eiffel or Smalltalk. In my mind, the best solutions are built by combining the techniques to highlight the best aspects of each one. I think most developers would agree with me here, but it can be very difficult to build such systems unless you have a good understanding of the problems you are trying to solve. The trick is to delay selecting the specific tool or technology until you know how you want to approach the problem.

In GroupSwim, we use a toolbox including proximity search, natural language processing, tagging and semantic web components to implement search functions. Each of them is used to solve specific aspects of our search problem, and we often combine them. Tag search is obviously a very popular search method in GroupSwim. To facilitate and improve tag search, we automatically perform natural language analysis on the data that is not tagged. We then auto-tag the data so that it will be included in tag search. The quality of these generated tags depends on both the language analysis, techniques from text summarization, language analysis using synonyms and hypernyms as well as community specific ontologies. We use the same techniques to do real-time natural language analysis and suggest appropriate tags to simplify the task for users. But it does not stop there. We use semantic web and linguistic information to find and suggest related information helping users narrow or widen their searches along semantically meaningful dimensions. Communities in GroupSwim are able to create their own ontologies for that purpose, and even if they do not, we apply universal ontologies to help users.

GroupSwim is also different in how we apply search in very problem-specific ways. Our focus is not on the general search problem.  Rather, our objective is to help communities for organizations and companies find information related specifically to their organizational and business needs. This makes it easier for us to apply natural language techniques and semantic web technologies. It also means we will be able to provide a superior search experience.

The figure below positions GroupSwim’s search function with respect to other available search functions. To do this, we identify three dimensions along which we characterize the functions. The first dimension is the degree of semantic awareness that the search functions have. We simplified the diagram so that the dimension has basic text search on one end and artificial intelligence based search on the other extreme. The second dimension concerns whether the search is intended to address the universal search problem at one extreme or one specific problem at the other. The third dimension concerns the domain specificity of the search function. A search such as Yahoo is very general and applies to any domain. One could easily imagine searches that are very specific to a problem domain such as medicine using domain knowledge to improve the search. GroupSwim enables communities to continuously add domain data to their community and thereby improve the search over time as the domain knowledge evolves.

http://img266.imageshack.us/img266/7060/semanticsfocussx8.gif

In summary, GroupSwim offers a balanced approach to search technologies, focusing on solving specific business related search problems in a superior way. We also enable communities and the system itself to improve and evolve search by building up and leveraging semantic web data created within or outside of GroupSwim. We believe this is the best way to introduce next generation web search and discovery techniques for our customers.

Tags: , , ,

Leave a Reply