Research Article

Performance Measurement of Data Fusion Strategies Using Mahalanobis Distance

Kim, Seongho¹ · Cho, Seongbin²

¹ Hanyang University, ² Sogang University

Published: December 2005 · Vol. 34 No. 6 · pp. 1853-1867

Full Text

Abstract

For successful implementation of customer relationship management, corporations today do their best to understand their customer's needs. Conjoint analysis has been used to analyze consumer behaviors toward goods and services. However, it is also notorious for its exponentially increasing number of hypothetical products to evaluate. The most economical and fastest way of gathering information about customers is through questionnaire. Especially today a common form of survey is moving to Internet survey in which respondents can easily get distracted during answering and thereby, survey results might lose sincerity. This study insists that one way of gathering sincere answers from customers is to give them a smaller amount of questions so that they can answer quickly, maintaining constant attention. Data fusion plays a key role in merging more than two databases and creating an integrated one. Few studies have investigated the intentional data missing where common attribute variables must be determined before collecting data. This study examines various donor location strategies in the area of intentional, preplanned data fusion. We newly introduce the concept of Mahalanobis distance in measuring dissimilarity between respondents in data fusion. In particular, the experiments are accomplished using the following five strategies: Strategy 1 - locating donors by correlation coefficient; Strategy 2 - by Euclidean distance; Strategy 3 - by Mahalanobis distance; Strategy 4 - by Euclidean distance after employing correspondence analysis; and Strategy 5 - by Mahalanobis distance after employing correspondence analysis. By increasing the level of missing, we evaluate the performance of the above five strategies. A part-worth data composed of 12 attributes and 35 attribute levels is used for the experiment. The sample is divided into two groups. The common attribute variables are selected by the size of variance of attributes. The missing variables are randomly selected among the non-common attributes and its all attribute levels are deleted. In the analysis, this type of systematic missing is designed to simulate preplanned data missing. Missing values in one group are substituted from the other group and vice versa. In the experimental design, the concept of ideal point is introduced. The ideal point means that the maximum attribute level represents the corresponding attribute. Here, continuous variables are converted into categorical variables. To measure the distance from these variables, correspondence analysis is performed in which the coordinates of (p - 1) dimensions are computed where p is the number of attributes. The purpose of ideal point is to decrease the number of dimensions since 19 common attribute levels exist. After applying the correspondence analysis, Euclidean distance and Mahalanobis distance are measured in the last two strategies, compared to purely measuring both distances in Strategies 2 and 3. A Monte Carlo simulation is conducted 20 times for the five strategies and three levels of missing. The results show that donor location strategy and the level of missing are both statistically significant. By comparing the means of experiment factors, Strategy 5 outperforms other strategies. Next accurate strategy is Strategy 4, and then Strategy 1, Strategy 2, Strategy 3. In the light of the above fact, correspondence analysis seems to play an essential role in decreasing the number of common variables while enhancing explanatory power since Strategies 4 and 5 are better than others. The purpose of introducing Mahalanobis distance is to reflect the statistical dependence structure of common variables because it has not been considered in the existing Euclidean distance-based data fusion techniques. In the analysis, Mahalanobis distance is effective only when a moderate number of common variables are included in the model. This study explores the possibility of applying Mahalanobis distance for measuring dissimilarity of customers and jointly using correspondence analysis in the area of data fusion. Future extension of this study might apply the proposed donor location strategies into various types of data sets and try to reach a comprehensive conclusion.

Keywords: correspondence analysisdata fusiondonor Location strategyMahalanobis distance