Abstract:
Natural Language Interfaces to Databases (NLIDB) is an area of research that deals with the representation of users request to database in their native language. Currently, data warehouses are widely used by enterprises for decision making. It is important to mention that characteristics of decision-making systems are inherently different from transactional systems. The users of decision-making systems are top management (executives) who are normally non-technical having less knowledge of the data warehouse schema and about writing database technical queries. In fact, ad-hoc query is the information need of user that may not be fulfilled with front-end tools having predefined capabilities e.g. Reporting, OLAP or Data mining tools etc. Such information needs are not easy to express in technical query language. This motivated us to propose a Natural Language Based Retrieval System for Data Warehouse (NLRSDW) to support users especially in the ad-hoc query development. A Logical Schema-based Mapping (LSM) technique has been developed. Using this technique, targeted search is performed efficiently in the data instances. For targeted search, a LSM oriented mechanism has been presented. In addition, 3 searching strategies are elaborated which include 1) Identified elements searching 2) Proximal elements searching and 3) Level-wise searching to retrieve the matching instances for each data value. The retrieved instances are ranked with 5 criterions based on which an algorithm has been developed. Furthermore, solution to identify the aggregation constructs (i.e. aggregation function, measure, level and grouping attributes) accurately has been presented. Data Warehouses maintain aggregated computations to efficiently answer queries on large volume of data. It is very challenging task to interpret accurate aggregation constructs from the keyword-based query written on the Natural Language Interface to Data Warehouse. Later, a semi-automatic approach to build the Data Warehouse logical Schema-based Domain Thesaurus is proposed. This approach takes the Data Warehouse logical Schema as input and generates Domain Thesaurus using multiple sources containing Schema, Data Instances, WordNet, WWW and Domain Repository. The Thesaurus evolution approach is also presented which shows how Thesaurus can be technically expanded at user query time. An in-depth experimental evaluation has been carried out in comparison to existing systems. The results are encouraging. Using NLRSDW, non-technical users can easily write any ad-hoc information need in natural language. As a result, executives do not have to take support of IT staff and time to develop query is negligible.