Status > ANNOUNCED - 28-Jun-2009 Technological Area Market Area Start Date > 30-Jun-2010 Duration > 18 Months Participating countries > SPAIN, UNITED KINGDOM | Main contactExperienceOn Ventures S.L. Mr. Carlos Gonzalez-Cadenas > Chief Executive Officer Organisation type > SME |
The objective of the project is to create a scalable question-answering engine that is able to answer users' questions expressed in natural language. Although plenty of research has already been conducted in the area of Question Answering systems (from the classic BASEBALL, LUNAR, SHRDLU and ELIZA systems developed in the 60s, to more modern systems such as [1][2][3][4][5], evaluated in forums like [6] and [7] - see links below), most of these systems are very limited in scope (which limits the systems' usefulness) and have been conceived as proofs-of-concepts, not as real-world, scalable systems. The main aim of this applied research and development project is creating a system that can answer real-world questions in broader, mainstream domains, such as travel, education or real-estate. The travel domain, one of the richest domains when speaking about the existing quantity of the information and the diversity of the possible questions, will be used as a showcase for the system. Domain question answering systems are the future of searches: they complement search engines by being able to answer questions rather than just matching keywords of the query to the document index. This allows users to solve their problems at an order of magnitude faster than with conventional search engines. The success of this project is strategic not only for the partners involved in the project, but also for the ability of European companies and research organisations to compete successfully in possibly the most strategic market on the Internet: search. We anticipate that the biggest difficulties in the development of such a system (and where the vast majority of the research and development will be centred) will be due to the following requirements: 1) The users expect to obtain answers in near real-time, and this is difficult to achieve since: a) the volume of information in broad domains such as travel, education or real estate is very big, b) the users' questions can be very complex, and c) to achieve the high-quality of the results, computationally expensive techniques and algorithms from the area of automated reasoning are needed for query answering. 2) Although a significant body of research in areas such as natural language processing and automated reasoning is available, there is substantially less research in methods for integrating and bridging these disparate technologies. This project aims to have a production-ready engine as its main deliverable. In order not to produce yet another toy system along the lines of those that have been produced from basic research projects, this project will require the combination of applied research with an extensive engineering effort. The objective of obtaining a production-ready engine based on outstanding applied-research results is reflected in the composition of the project team. On the one hand, the team includes EXPERIENCEON - a company with outstanding engineering capabilities, commercial orientation and consumer marketing experience. The team also includes two acclaimed researchers from two world-class universities - Dr. Boris Motik from the UNIVERSITY OF OXFORD and Prof. Dr. Lluis Marquez from THE POLYTECHNICAL UNIVERSITY OF CATALONIA. Both Dr. Motik and Prof. Dr. Marquez have a proven track record in world-class applied research as well as technology transfer, having produced fully-working implementations based on their research results. [1]: TellMe Question Answering System, http://www.ics.mq.edu.au/%7Epizzato/tellme [2]: START Natural Language Query Answering System, http://start.csail.mit.edu/ [3]: University of Edinburgh QUALIM Question Answering Demo, http://demos.inf.ed.ac.uk:8080/qualim/ [4]: DFKI Open Domain Web QA System, http://experimental-quetal.dfki.de/ [5]: Arizona State University QA System, http://qa.wpcarey.asu.edu/ [6]: Text Retrieval Conference (TREC), http://trec.nist.gov/ [7]: Cross Language Evaluation Forum (CLEF), http://www.clef-campaign.org/
Research in the area of open-domain question answering has generated and still generates a lot of interest from the natural language processing (NLP), cognitive sciences, and knowledge engineering communities. Open-domain question answering is an extremely complex task that needs a formal theory and well-defined question answering methods. The formal theories for questions have been developed in the context of the research made in communities such as NLP, cognitive sciences, and knowledge engineering. This includes, for example, the conceptual theory of question answering proposed by Wendy Lehnert, and the mechanisms for generating questions developed by Graesser et al. These theories, however, have not been developed to handle large-scale data sets, and the resulting technologies have not been implemented using state-of-the-art technologies, such as high-performance parsers, named entity recognisers, information extractors, or reasoning engines. The progress of the research in open-domain question answering suggests that we are still very far from producing a real-world, scalable question answering engine that is actually capable of solving large numbers of users' questions. Given that there is a great user interest (from casual users to professional information analysts) in these systems, a much more realistic approach is to create question answering engines for specific real-world domains that would provide great value to the end-users in the short-term without having to face the extremely hard problems associated with open-domain question answering. Although developing a question answering engine with a particular application domain in mind is a much more tractable problem than developing a general system capable of handling any domain, there are significant research questions to be answered. We envisage that many assumptions of the previous research will need to be revisited, completed or extended, as some of the techniques developed in the past will not scale to massive amounts of information that exist today, specially given the requirement of near real-time query processing. In addition, although plenty of progress has been achieved in the past 20 years in the disciplines related to two main building blocks of question answering systems, namely NLP (i.e., "understanding" the question) and deductive databases (i.e., "solving" the question), there is still a significant gap between the results of this research and their application in real-world scenarios. Consequently, NLP and deductive databases have so far had a moderate impact in the industry. We will cover this gap in the context of this project.
Main contactUPC Natural Language Research Group
Universitat Polytecnica di Catalunya Ass. Prof. LLUIS MARQUEZ > Senior researcher at GPLN; Teacher at UPC http://www.lsi.upc.edu/~lluism/ Organisation type > University |
The Natural Language Processing Group will contribute to the natural language understanding aspects of the project - that is, all the subtasks going from the analysis of a query to the generation of a semantic representation of its contents. This will require: 1) The adaptation of current linguistic analysers (POS (Point of Sale) tagging, Named Entity recognition, word sense disambiguation, syntactic and semantic parsers, etc.) to the domain of application of the project, taking care also of robustness of colloquial style, agrammaticality, and noisy input text, 2) Improving the computational efficiency and memory requirements of the existing tools, and 3) Working on the core semantic analyser of the project by a) including syntactic information, word sense disambiguation, and semantic role labelling, and b) using machine learning techniques to train for local decisions guiding the search of optimal semantic analyses.
The NATURAL LANGUAGE PROCESSING RESEARCH GROUP (GPLN, http://www.lsi.upc.edu/~nlp) was founded in 1986 and is currently composed of 7 regular professors, 5 doctorate researchers, 10 assistant professors, and 11 PhD students. The professors teach computer science and computer engineering degrees at UPC, as well as at Artificial Intelligence PhD and Master programmes of the Software department. Ever since its creation the NLPG has been devoted to technologies and applications of automatic natural language processing. The languages dealt with by the group are Spanish, Catalan and English. The group's main areas of activity are the collection and management of multilingual lexical resources, information extraction from documents and question answering applications, statistical machine translation, machine learning for NLP, design of natural language interfaces, acquisition and exploitation of semantic information, and the development of basic techniques for language processing (morphosyntactic and semantic disambiguation, syntactic analysis, named entity recognition, semantic parsing, etc.). The group has focused on the constitution of a solid infrastructure of resources and linguistic processors that constitutes the basis of the different research and development lines that are being followed at present. Additionally, the group holds the Consolidated Investigation Group status (2005 SGR00130) of the Catalan Government Research Department. Lluis Marquez has been a Computer Science Engineer at the Universitat Politecnica de Catalunya (UPC) since 1992. He received his Ph.D. degree in Computer Science from the UPC in 1999 and the UPC prize for Doctoral Dissertations in the Computer Science area (2000). Currently, he is an Associate Professor of the Software Department (LSI, UPC) teaching at the Facultat d'Informatica de Barcelona. He is also a senior researcher of the TALP Center for research in Speech and Language Technologies (also at UPC). His current research interests are focused on Machine Learning architectures for Natural Language structured problems, including parsing, semantic role labelling, named entity extraction, and word sense disambiguation. Regarding applications, he is working on the introduction of high level linguistic information to Statistical Machine Translation and Oral Question Answering. He has published over 75 refereed papers on the previous topics in journals and at conferences of NLP and Machine Learning areas. He is also regularly on Programme Committees of major conferences, including: ACL, EMNLP, CoNLL, COLING, HLT, IJCAI, AAAI, NIPS, ICML. He was programme chair of CoNLL-2006 and organiser of the SemEval-2007 semantic evaluation competition and workshop. He also organised the shared tasks on syntactic and semantic parsing at CoNLL-2004, 2005, 2008 and 2009, and led the teams that prepared three evaluation tasks at Senseval-3 and SemEval-2007. He has been guest editor of the special issues: "Semantic Role Labelling" and "Computational Semantic Analysis of Language" for Computational Linguistics and Language Resources and Evaluation, respectively. Currently, he acts as president of the ACL SIG on Natural Language Learning (SIGNLL) and chairs the 13th Annual Conference of the European Association for Machine Translation (EAMT-2009), and the SEW-2009 NAACL-HLT workshop, "Semantic Evaluations: Recent Achievements and Future Directions".
Main contactOxford University / Computing Laboratory Dr. rer. pol. Boris Motik > University Lecturer Organisation type > University |
The University will be responsible mainly for the knowledge representation and automated reasoning aspects of the project, and will focus on the following tasks: 1) The adaptation of the existing knowledge representation formalisms, such as SWRL (Semantic Web Rule Language), OWL (Ontology Web Language), and KIF (Knowledge Interchange Format), to the use case in the project, 2) The development of scalable query answering and reasoning algorithms, 3) The development of a query engine based on the algorithms identified, and 4) The evaluation of the query engine in a realistic scenario.
OXFORD UNIVERSITY is a leading research university, and the Computing Laboratory employs a number of world-class experts in various fields of computer science. The expertise in knowledge representation is mainly concentrated on the Information Systems group, which is led by Prof. Ian Horrocks, Prof. Georg Gottlob, and Prof. Stephen Pullman. Prof. Horrocks is an internationally recognised expert in the field of knowledge representation, with an established track-record in technology transfer, and Prof. Gottlob is a leading expert in the field of database theory - a field closely related with knowledge representation. Dr. Boris Motik is a member of the Information Systems group. In the past, he has worked on the knowledge representation system KAON2, which has subsequently been acquired by the German company ONTOPRISE GMBH. Furthermore, Dr. Motik is a leading figure in the OWL 2 Working Group - an international standards group of the World Wide Web Consortium whose goal is to standardise a knowledge representation language for usage on the Web.
Main contactExperienceOn Ventures S.L. Mr. Carlos Gonzalez-Cadenas > Chief Executive Officer Organisation type > SME |
The organisation's contribution to the project will consist of the following: 1) Leading the development and engineering efforts of the NLP and the deductive database components, collaborating with our partners on the different aspects of the research involved in these components, 2) The design and development of the bridging components around the NLP system and the deductive database, 3) Integration and customisation of the different technology components produced by our partners, 4) Leading the optimisation and scalability efforts needed for operating the system in a production environment, 5) Establishing the adequate quality and testing procedures, and 6) The creation of the appropriate project management and development infrastructure.
EXPERIENCEON has extensive expertise in several different and very valuable fields. 1) Two years of applied research in question answering technologies: While other research initiatives have been focused exclusively on researching one of the various technologies needed to build a QA (Question/Answering) system, EXPERIENCEON has focused on investigating all the technologies needed to produce the finished system, as well as on how to best integrate these technologies. 2) Outstanding engineering capabilities in massively scalable systems: All the core members of our engineering team have over 7 years experience in building very complex distributed systems that need to operate in extremely demanding environments (lots of users, big volumes of data, short response times). 3) Experience in building and marketing consumer products: Our product team has the knowledge and experience needed for translating outstanding technologies with high potential into widely accepted consumer products. Each core member of the product team has played a key role in the creation and deployment of at least one consumer product in a national or international setting. EXPERIENCEON's core business is monetising the consumers' queries by means of existing advertising or affiliate/lead-generation networks. For example, if a user searching for travel in the search engine finally books the travel, EXPERIENCEON would get a commission from the purchase. EXPERIENCEON focuses on deploying question answering engines in mainstream verticals such as travel, leisure and real estate, which are a) very big in monetary volume and b) near to the transaction - therefore the conversion rates and the total volume of our business are projected to be very high in the mid term.
