# Current issue

International Journal of Tourism and Hospitality Research - Vol. 31 , No. 10

 [ Article ] International Journal of Tourism and Hospitality Research - Vol. 31, No. 10, pp.85-97 ISSN: 1738-3005 (Print) Print publication date 31 Oct 2017 Received 10 Mar 2017 Revised 04 Sep 2017 Accepted 18 Sep 2017 DOI: https://doi.org/10.21298/IJTHR.2017.10.31.10.85 Grouping hotel restaurant customers based on a behavioral scoring model : An exploratory study Yukyeong Chong* ; Gunhee Lee† *Professor, College of Hospitality & Tourism, Sejong University, 209 Neungdong-ro, Gwangjin-gu, Seoul 05006, Republic of Korea (ykchong@sejong.ac.kr) Correspondence to : †Professor, Sogang Business School, Sogang University, e-mail: ghlee@sogang.ac.kr

Abstract

Segmentation, targeting, and positioning are the most important keys to a successful marketing strategy in the hospitality industry. Among these three keys, segmentation is the first step for a marketer to identify various customer needs and desires. Hospitality operators have been trying to increase customer satisfaction and corporate profits by utilizing mass marketing, database marketing, and individual marketing. Despite the increased interest in scoring consumer behavior, applications of the score remain difficult. The lack of understanding and utilization of scores has been an important issue in the hospitality industry. Analysis of customer behavior is not an easy problem to solve because dynamic modeling is required due to changes to customers’ records over time. The current study explores customer data in hotel restaurants and proposes an individual behavior scoring model (BSM) based on the traditional RFM (recency, frequency, monetary) concept. By comparing it with the traditional profiling scoring model (PSM), it is shown that BSM provides a high prediction power of future consumers' behavior. However, PSM has an important role in a complementary sense to identify potential customers who have low behavior scores. This research proposes how to build and validate BSM and PSM with a focus on the utilization of the two models to identify future potential customers efficiently.

 Keywords: Segmentation, Customer scoring, Behavior scoring model (BSM), Profiling scoring model (PSM), RFM measure

Ⅰ. Introduction

Segmentation, Targeting and Positioning, what we called STP, are most important key to successful marketing strategy in hospitality industry. This means dividing the characteristics of various customers into similar groups and implementing a marketing strategy tailored to the needs and desire of customers. For the first step, STP marketer should identify various customer's group (Kotler, Bowen, & Makens, 2017). Marketers believe that they should be aware of customer segments that can make them more satisfied than their competitors, provide them with direct marketing activities, and then provide products or services that can capture target segments. The most carefully selected step for this strategic approach is to choose which segments the company will focus on. Many hospitality companies use several methods for this approach. It is more important to know which method to use for this purpose (Bowen, 2000). For example, while male business people often enjoy hotels with restaurants such as bars and clubs, families or housewives may prefer hotels that include large restaurants or bakeries. Therefore, knowing who can be satisfied with the products and services offered by companies is a beginning and a necessary step in hospitality business.

Prior to 1950, direct marketers used ‘mail orders’ to accomplish mass marketing. The purpose of mass marketing in the traditional approach was to reach a larger number of customers and to reach a wider customer base. The traditional mass marketing processes have been challenged by one-to-one marketing of new approaches (Rygielski, Wang, & Yen, 2002). Although the purpose of direct marketing or mass marketing has not been changed, the current issues have changed to refer to a practice of database or relationship marketing that emphasizes individual customer and focuses on customers’ needs and wants (Petrison, Blattberg, & Wang, 1993). In other words, the recent marketing approach to improve the satisfaction of one-on-one individual customers is to build a deep relationship by filling each individual customer's needs rather than a wide customer base. A deep relationship with customers can be achieved through a more customized approach that utilizes individual customer data. Database marketing, sometimes is called integrated marketing, relationship marketing, or even maxi-marketing. Regardless of the names, all techniques seek to build customer’s behavior information (Nash, 2000). To accomplish sound relationship with customers, scoring techniques based on historical transaction data are important to differentiate customers to develop relationship marketing strategies. The most common scoring method is to sort the customers from those who are profitable to those who are not. Typical customer data available in this case are recency, frequency, and monetary data (Miglautsch, 2000). The more advanced form of customer data is the customer's transaction data.

The current research is exploratory study to investigate customer’s transaction data in hotel restaurants and checking possibility of applications in future customer’s behavior and aims to provide a view of scoring modeling in the context of the hospitality industry. In particular, predicted expenditure estimates were used to assign a score to each individual. The score proposed several scoring techniques and suggested segmentation of customers based on the scores. This paper is divided into three sections. First, traditional relationship marketing concepts and several scoring techniques are reviewed. In the next section, empirical study of behavior scoring models (BSM) and profiling scoring model (PSM) are conducted with investigating prediction power and customer segmentation. Conclusions that can provide new approach for customer behavior quantification are finally addressed.

Ⅱ. Literature review

It is important to understand data-driven relationship marketing. In general, four aspects of relationship marketing should be considered: statistical model produced by quantitative analysis, customer’s information collected at the individual level, design of linkage between analytic results and marketing activities to increase the effectiveness of customer contact, and time and efforts to make relationship building (Roberts, 1992). Well-designed sets of customer’s historical records that can track historical pattern of buying products or services are required to use of scientific statistical methods to support relationship marketers to keep strong relationships. Identifying profitable customers to expand relationship with customers is vital. Also building strong relationships with loyal customers is the key reason for marketing activities (Berson, Smith, & Thearling, 1999).

There are recent researches dealing with customer’s behavior score. One of the researches is to identify profitable customer based on RFM (Recency, Frequency, and Monetary) behavior using SOM (Self-Organizing Map) technique in u-Commerce industry (Cho, Moon, & Ryu, 2014). In this research, customer’s behavior score is proven to be recommendation service effectively. Another interesting research can be found on bank industry. The research proposed customer behavior score using mobile banking transaction history and break into six groups to attract and maintain customers with keeping high customer’s satisfaction (Noori, 2015).

In the hospitality industry, data-driven marketing has been emphasized as an important marketing issue lately. Integrated data sets and analytical techniques that are being used by hospitality marketers have been stressed to discover the answer to set customized service to a customer (Dev, Buschman, & Bowen, 2010). In three decades ago, building a customer-database for micro-marketing in a hotel has been practiced and showed far exceeded financial performance (Francese & Renaghan, 1990). However, as Dev et al. (2010) described, marketing communication has been changed as mobile marketing technologies appears. Individualized personal attention, incentives, and recognition have specially been important factors to cultivate brand loyalty in a hotel business (Francese & Renaghan, 1990). Building customer loyalty is an essential factor in creating relationships (Bowen & Shoemaker, 1998; Dube & Renaghan, 1999) and customer information is weighty in building such relationships between the hospitality business and the customer.

Researchers in around early the ’90s were aware of how important frequent guest programs are in the hotel and airline industries (McCleary & Weaver, 1991; Toh, Rivers, & Withiam, 1991; Tou & Hu, 1988). Frequent guest profiles, that are demographic and psychographic characteristics of the frequent customers, used pivotal sources to develop marketing strategies and target promotions in the restaurant business (Wilbourn, McCleary, & Phadeesuparit, 1997). Bowen (1990) indicated that using information available through existing databases prepares managers to be competitive in a radically changing industry, especially when such valuable customer information was provided through effective loyalty programs. People in management level, including hotelier or restaurateurs, should practically use databases for strategic purposes and not just for tactical focuses (Bowen, 2000).

Database-driven marketing approaches to customer relationship management (CRM) (Berson et al., 1999) can be used interchangeably with relationship marketing, or one-to-one marketing (Peppers et al., 1999). A database is a prerequisite of the CRM, which requires managerial philosophy that allows a company to become familiar with its customers. Also the CRM needs to work with data-driven activities in order to run CRM system effectively (Piccoli, O'Connor, Capaccioli & Alvarez, 2003). The major attraction of data mining is its capability to build predictive rather than retrospective models (Shmueli, Bruce, & Patel, 2016). In other words, data mining uses well-established statistical and machine learning techniques to build models that predict customer behavior and helps marketing users to target marketing campaigns more accurately. It also aligns campaigns more closely with the needs, wants, and attitudes of customers and prospects. Therefore, data mining aims to create models for decision-making that predict future behavior based on analyses of past activity (Berson et al., 1999; Magnini, Honeycutt, & Hodge, 2003). The ultimate goal of direct marketing, database marketing, CRM, and data mining is to differentiate customers, that is to say who are and will be the profitable valuable customers that a company has to try to have strong relationship with although the technical terms are phrased differently (Berson et al., 1999).

To make better decisions and identify more profitable customers, direct marketers have been aware of both relationship strength and relationship quality (Schijns & Schröder, 1996). Relationship strength has been frequently measured by behavioral or descriptive indicators (e.g., RFM) that can easily be captured in a database. Those behavioral differentiations, transaction information, have been used as segmentation variables among different customer groups (Fader, Hardie, & Lee, 2005; Sarvari, Ustundag, & Takci, 2016). It is important to discriminate against the worst customers to provide a customized service to the best customer. One way to determine who will be receiving an upcoming marketing personal contact, such as telephone calls and emails, by predicting likelihood to respond or expect sales from a perspective customer is predictive scoring models (Schijns & Schröder, 1996).

By using data from a single piece of previous contact information, recency, frequency, and monetary value, scoring models can predict future revenue, and these predictions are scores (Malthous & Derenthal, 2008). RFM code or customer-lifetime-value (CLV) has been studied to obtain the answers to quantify customer behaviors (Miglautsch, 2000; Borle, Singh, & Jain, 2008). According to Hughes’ calculation (1996), customers are broken down by frequency (e.g., number of visits a store) and frequency is categorized into five-quintile groups. Customers who visit the store many times are much more likely to visit again than those who seldom visit. Customers are also grouped by their monetary value. Similarly, a quintile categorization of the customers by how much a customer spent can be used. After customers are broken down by recency, frequency, and monetary value, each customer will be assigned a three-digit RFM code. Since each RFM code is constructed with a three-digit five quintile number, the total possible combination of values for the three-digit code is 125. That is, the RFM code is a single cell from 125 possible cells such as 555, 554,…, 445, 444, …, 355, 354, …, 113, 112, and 111. The RFM code is not an ordinal scale that has the property of order but a nominal scale that is assigned for the sole purpose of differentiating one object from another. The RFM code itself by nature, thus, could not be treated as a score that has an order of high and low (Qiasi et al., 2012).

Miglautsch (2000) discussed two common RFM scoring methods: customer quintile scoring and behavior quintile scoring. In the customer quintile scoring, customers are sorted by descending order and broken into five equal groups using their RFM information to generate 125 equal sized segments. On the other hand, the behavior quintile scoring method uses the monetary score that would generate an equal amount of sales in each quintile instead of using an equal number of individual in each group as used in the customer quintile scoring. However these scoring methods still remain to define each individual cell such as 435 or 233 and fail to score to individual customer in each cell. He discussed different weighing methods to convert RFM value to a single score by adding up three actual numbers, adding three RFM codes, and multiplying certain numbers by each RFM value.

In Rhee and McIntyre's study (2008), marketing firm's contact-efforts was considered to be the essential variable in scoring modeling. Such an approach is recognizable in some industries; however, the contact-efforts of the promotion campaign would largely be determined by which customer is valuable in hospitality operations. There will be considerable variation in scoring methods with subjectivity, leaving aside whether the scoring methods are right or wrong.

Another type of information, in addition to the behavioral transaction data, is customers' demographic data that can be used for understanding the current market situation. Sheth (1977) criticized using demographic factors as determinants or correlates of consumption behavior of consumers due to the lack of relevance of the factors and poor prediction etc. However, many researchers have used demographic profiles in various academic fields. For example, there is research with some topics include: the correlation demographic variables with consumer alienation in the marketplace (Lambert, 1981), the effect of demographic variables of modeling for determining segment membership using panel data (Gupta & Chintagunta, 1994), the influence of demographic characteristics over consumers' decision on usage frequency in the bank industry using adoption theory (Branca, 2008), the role and the effect of demographic and socioeconomic variables on travel choice (Kattiyapornpong & Miller, 2008), how demographic profiles affect consumers' on-line shopping behavior (Hashim, Ghani, & Said, 2009), and a good many others. Although marketers need much more information to comprehend customers' behavior in marketplaces other than demographic variables, demographic profiles serve a basic, yet important role in interpreting the characteristics of clusters, groups, or segments of customers (Yeh, Plante, & Agrawal, 2011).

In studies that especially use customer data, demographic profiles are essential in research, yet in the most part the usage of the information is mostly limited to descriptive analysis. Demographic profile could be used far more than describing group-characteristics. They could also be used as the same vehicles as RFM in scoring individual customer. This paper introduces how to build one-dimensional scoring model (BSM) reflecting customers' longitudinal behavior data and scoring model (PSM) based on demographic profiles. These scoring models are to differentiate segments of customers and predict customers’ future contribution to a restaurant. In the next empirical study section, after describing data and cleaning process, the monetary value analyses is given based on the expenditures and number of visits. Next part includes the comparison analysis between BSM and PSM in terms of model fitting and prediction power. Finally, the customers are distinguished by predicted expenditure estimates for future contribution.

Ⅲ. Methodology
1. Data description

This research uses customer data from a hotel in Seoul, Korea. The hotel is globally-franchised five-star hotel and at that time of the research operated 10 different restaurant outlets; Italian Restaurant, Lounge, Bakery and Beverage, Banquet, Club, French Restaurant, Bar, Chinese Restaurant, Japanese Restaurant, and Buffet when we obtained the data set. The hotel accumulates the types of restaurants that a customer visits, the gender and occupation of a customer, what time (month) a customer visits a restaurant, and how much money a customer spends each visit to a restaurant. Most of the customers who have a membership reside domestically, so tourists are not included.

Out of the hotel restaurant customers who have memberships, 959 customers information (11,466 transactions) was collected to identify customers' behavior. The data for the current research includes longitudinal information that may provide a customer's behavior pattern instead of using cross-sectional data that explains only a single transaction. To facilitate analysis, the individual expenditure data has been transformed to an average expenditure per month. The frequency of visits to each restaurant and the purchasing expenditures are obtained. The nature of the frequency of restaurant visits is a discrete variable and the monetary value is a continuous variable. Gender and occupation are the demographic variables available for this study.

2. Data cleaning process

Data cleaning is the next step after gathering data and refers to a process of removal of noise, errors, and incorrect input from a database (Adriaans & Zantinge, 1996). These are inevitable problems that analysts encounter as they begin to use a new data set. To some degree, any database system may have inconsistent, incomplete, or erroneous data. As much as 80 percent of the time associated with the data mining process will be spent dealing with these problems (Westphal & Blaxton, 1998). In this study, some fields, such as birth date, contain very little customer data, while other fields, such as joining date, have no data recorded at all, although there was a field for it. After removal of non-usable fields during the discovery state, gender, and occupation are selected as usable demographic variables. Frequency of restaurant visits and the expenditure of 959 customers are also collected.

For the scoring modeling analysis at the end, the customers who did not indicate their occupation were removed. Because this unidentified group might be included in any other occupation group, this group of customers was not considered for the study. 30 customers' data were deleted due to the missing value of occupation (23cases) and unusual transactions(7cases). Finally, 929 customers were selected for further analysis. Secondly, 340 customers (identified as a dormant group) visited less than four times during the 12-month period and were not included in the next analysis. In addition, no frequency, which means that a customer has not visited any restaurant in this hotel in a certain month, is transformed to ‘zero’ rather than treated as a missing value. Therefore, 589 active customers who had four or more visits during the 12-month observation period were used for further analysis. Table 1 presents the proportion of removed, active, and dormant customers in this study.

Table 1.
Distribution of active, dormant, and removed customers
Data group(N=959) Active customers Dormant customers Removed customers
Frequency(%) 589(61.4%) 340(35.5%) 30(3.1%)

3. Analysis

The monetary value is defined as how much a customer spends during a specified time interval. Unlike frequency that represents the number of visits, monetary value can be treated as a continuous random value. There are two types of analytic models in this study: behavioral scoring model (BSM) and profiling scoring model (PSM). Both BSM and PSM provide individual customer score which is equivalent with the expected expenditure of a customer for the next month (December in this case). In the BSM, 589 individual scoring models are estimated while one aggregated scoring model is used in the PSM based on the past 12-month transaction data. All statistical analyses of data were performed using the SAS(Statistical Analysis System).

In the BSM for expenditure analysis, 589 regression models are employed based on the transaction data during 11-month (January through November). The regression models for customers are as follows: yt = α + β1(time) + εt, where yt indicates expenditures per restaurant visit per month. In this model, α and β1 represent an intercept and a slope for changes of yt over time (11 months), respectively. The εt, represents individual variability treated as random error, assuming that the mean equals zero and constant variance is σ2.

Means and standard deviations of slopes and intercepts for each gender are shown in Table 2. The negative mean value of slope for females implies that the expenditures per restaurant visit of females decrease during the period from January through November, while the positive mean value of slope for males indicates increased expenditures over time. High standard deviations of slopes and intercepts indicate that large variability exists among individuals. Also, the average value of the intercept for males is higher than that for females. It is concluded that male customers spend more money than female customers at restaurants in this hotel, with expenditures increasing over time. However, due to large amounts of variability among individuals, these differences of slope(p=0.1765) and intercept(p=0.1219) between males and females are not statistically significant with 5% significant level.

Table 2.
Mean differences of intercept and slope of expenditures between male and female
Parameter estimate (N=589) Male(N=445) Female(N=144)
Intercept(Mean±SD) 62.31±84.49 52.50±73.19
Slope(Mean±SD) 0.35±10.33 -0.86±7.41

Figure 1 presents average values of slope and intercept estimates for food purchases by occupation. The slopes of four of the occupations--housewives, doctors, business owners, and professors--are located below zero, implying that expenditure per restaurant visit decrease from January through November. The slopes of four other occupations―government officers, lawyers, presidents/chairmen, and businessmen―are located above zero.

Figure 1.
Means of slope and intercept of expenditure by occupation

Note: Each position represents averages of estimates by each occupation

Tables 3 summarizes the comparison of mean values of slope and intercept estimates. Although slopes and intercepts for each occupation are shown to be different, due to the large subject variability, there is no statistical significance among occupations (p-value=0.8951 for intercept; p-value=0.8275 for slope) at 5% significant level.

Table 3.
Means and standard deviation of slope and intercept of expenditure by occupations
Occupation (N=589) Parameter estimates intercept (Mean ± SD) Parameter estimates slope (Mean ± SD)
Businessmen (n=128) 59.43 ± 81.95 0.18 ± 0.58
Housewives (n=108) 49.88 ± 72.48 -0.65 ± 7.32
Doctors (n=20) 63.99 ± 53.54 -1.08 ± 6.57
Business owners (n=6) 56.14 ± 86.05 -3.63 ± 7.99
Government officers (n=2) 21.33 ± 20.17 4.02 ± 6.91
Presidents / Chairmen (n=296) 63.87 ± 88.37 0.42 ± 10.39
Lawyers (n=17) 55.46 ± 62.34 1.16 ± 9.43
Professors (n=12) 63.61 ± 69.08 -2.55 ± 7.18

In the PSM, 589 customers with eleven months of data are used for building a predictive PSM model. At first, analysis of variance(ANOVA) model with two factors, gender and occupation, and one time covariate was used for the analysis of expenditures including two interaction effects: (time×gender) and (gender×occupation). Since the ANOVA model shows no significance of two interaction effects with 5% significant level, we consider only main effects of ANOVA model without interaction. Therefore, the final PSM is as yt = α + β1(time) + εt+ β2 (gender) + βj (occupation)i + εt, where yt indicates expenditures per month, a represents an intercept and b1 represents a slope for changes of yt over time (11 months). The term(occupation)i represents seven dummy variables. The et represents individual variability treated as random error with the assumption that the mean equals zero and constant variance is σ2. According to the summary of final PSM presented in Table 4, main effects of gender and occupation are statistically significant at 5% significance level. The results confirm that gender and occupation play in major role for building PSM model. The PSM is used in the comparison of model fitting and prediction power of BSM.

Table 4.
Analysis of variance of PSM
Source df Sum of squares Mean squares F value p value
Month 1 200.70 200.70 0.03 0.8552
Gender 1 58859.15 58859.15 9.77 0.0018**
Occupation 7 139520.39 19931.48 3.31 0.0016**
Note: **p<0.05

Ⅳ. Results
1. Model assessment and validation

The performances of the individual behavior models and the aggregating profiling model are evaluated in two ways, model fitting and assessment of prediction power on a test data set(December data). The validation of the model is a way to evaluate how good the model is at predicting the data set. The validation process is important because the results of data mining are often used for strategic issues throughout an organization. In data mining, there is a danger of over-fitting the model. That is, it is possible that the model can be highly predictive for a training set but can be less efficient with data not used in building the model (Groth, 1998). Therefore, the model validation process required for data mining is that after building the model on some historical data, the model can be applied to similar historical data from which the model was not built (Berson et al., 1999).

For the training and test method, the entire data set is divided into two data sets: a training set and a test set (or holdout sample). After the model is fitted using the training data set, the test set is applied to evaluate the model. In using the training and test method, it is known that the results of model assessment are sensitive to splitting up a small data set. To overcome this problem, Malthouse & Derenthal (2008) recommends stratified sampling to reduce the variation across the splits. In cross validation, one case is excluded from the original sample, and the model is trained based on the remaining sample. Then the trained model predicts the excluded case. This procedure is repeated for each case. The accuracy of each case is summed over the entire sample. The cross validation method may provide nearly unbiased estimators of the prediction accuracy (Sung, Chang, & Lee, 1999).

Figure 2 illustrates how to evaluate model fitting and prediction power in this study. We used the data from January through November to build predictive models of both BSM and PSM. Using the predictive model, the performance of December is predicted and compared to the actual value. In this case, the data for 11 months acts as a training set and the rest of the data in December works as a test set.

Figure 2.
An example of model fitting and prediction power

Assessment of model fitting is performed using deviance and Pearson’s chi-square for frequency of restaurant visits. The deviance and Pearson’s chi-square provide goodnessof-fit measures indicating discrepancy between actual frequency and predicted frequency generated from the predictive model using the training set. For assessment of monetary model fitting, root MSE(mean square error), R2, and adjusted R2 are measured as goodness of fit measures. Prediction power of models for frequency of restaurant visits and monetary value are evaluated using MAE(mean absolute error), MSE.

2. Assessment prediction power

Three statistics, RMSE(root mean square error), R2, and adjusted R2, are employed in the assessment of model fitting between the BSM and the PSM. Table 5 shows that the BSM generate smaller RMSE, larger R2 and adjusted R2 than the PSM does. Therefore, the BSM outperform to the PSM. Prediction power of the two models is investigated in the next phase to detect potential over fitting as well as validation for the individual models.

Table 5.
Assessment of model fitting between PSM and BSM
PSM 77.64 0.012 0.011
BSM 38.15 ± 35.65 0.28 ± 0.25 0.19 ± 0.28

Two estimated models based on eleven months of data were used to predict the expenditures per restaurant visit in December to assess prediction power. Each predicted value of expenditure is compared with the actual expenditure per restaurant visit in December. The results are displayed in Table 6, 7, and 8. Table 6 shows that the BSM provide similar patterns of expenditures to the actual value of expenditures in December. The correlation coefficient between the true value and the predicted value in the BSM(0.5904) is higher than the one in the PSM (0.1516). Table 7 provides mean values and standard deviations of MAE and MSE for the PSM and BSM. The BSM outperform the PSM in prediction, providing lower mean values of MAE and MSE. Table 8 summarizes the distributions of MAE and MSE in both models. Five number summary statistics indicate that the BSM is superior to the PSM.

Table 6.
Mean and standard deviation of predicted and true value
Variable N Mean ± SD Correlation coefficient1
Predicted value by PSM 589 65.55 ± 8.69 0.1516 (p-value = 0.0002)
Predicted value by BSM 589 62.90 ± 74.50 0.5904 (p-value < 0.0001)
True Value 589 68.91 ± 86.94 -
Note: 1indicates correlation between predicted value and true value

Table 7.
Evaluation of prediction power with December
Model N MAE ± SD MSE ± SD
PSM 589 60.26 ± 61.95 7462.39 ± 23933.84
BSM 589 44.50 ± 59.27 5487.17 ± 16670.38

Table 8.
Five number summary of prediction power with December
Five number summary
Model Maximum 75% (Q3) 50% (Q2) 25% (Q1) Minimum
MAE PSM 591.64 67.57 46.48 28.03 0.65
BSM 468.84 59.45 24.65 6.56 0
MSE PSM 350042.89 4545.63 2160.67 785.58 0.42
BSM 21980.95 3534.84 607.47 43.05 0

3. Customer segmentation by predictive expenditures

Market segmentation describes the division of a market into homogeneous groups, which will respond differently to promotions, communications, advertising, and other marketing mix variables. Direct marketers want to get away from mass-marketing campaigns and use a more consumer-oriented approach. This is done based on the behaviors exhibited by the customers, such as using similar services and products(Westphal & Blaxton, 1998). Segmenting techniques look for similarities and differences within a data set and group similar rows together into segments or clusters. It is supposed that there are high similarities within a segment and high differences between segments. There have been two traditional approaches to specifying market segments. The first one is to classify customers by objective variables such as sex, age, life cycle stage and personality. The second approach is based on the segments of situation-specific events, such as purchases and users of specific products, brand-loyal versus non-brand-loyal users, attitude toward the brand, etc. (Frank, Massy, & Wind 1972). The optimal number of segments is a subject of continuous research, although many approaches to segmentation allow the user to decide the number of segments (Groth, 1998).

The customers are first scored by predictive estimates of expected expenditure for the next month. Segmentation is performed using the scores with three groups: high value(top 25% scored customers), middle value(between top 25% and 50%), and low value(below 50%). The distribution of the 589 scores is summarized in Table 9. The average score is 153.04, implying that the expected expenditures for the next month(December in this case) of 589 active customers is $153.04. Table 9 also shows that the distribution of scores is highly skewed to the right, with a few extremely high scores. It is interesting to note that about 16% of customers can be treated as high value customers spending at least$150 for the next month.

Table 9.
Distribution of customer score by purchasing pattern (Unit: US dollar)
Mean ± SD Maximum 75% (Q3) 50% (Q2) 25% (Q1) Minimum
153.04 ± 355.30 4466.24 149.48 35.07 3.16 0

In fact, 68% of dormant customers and low value customers can be referred to as customer groups that hardly contribute to sales spending less than \$35 for the next month. As shown in Figure 3, the scores are validated through the relationship of segmentations with the average expenditures per visit and restaurant visits per month. It is certain that high value customers have high average expenditures per visit and number of visit per month.

Figure 3.
Relationship of segmentations with expenditures per visit and the number of visits

Ⅴ. Conclusions

The purpose of this research was to provide the efficient usage of customers' historical transaction data with scoring model within the context of the hotel restaurants. The purchasing history can be sources for BSM, while demographic information such as gender and occupation can be important factors of PSM. Unlike traditional behavior score such as RFM measure, we proposed behavior score defined as predictive expenditure for the next month. The score includes all historical information with emphasis on recent transactions. It is easy to understand because the score itself means expenditure. In particular, the proposed behavior score is powerful index to predict existing customers' future behavior.

In BSM, past transactions of a customer during 11 months can be summarized by intercept and slope on a regression model. Customers with high intercepts with negative slopes indicate that the customers are leaving on the given time period. Therefore churn analysis is required for further understandings. If customers have medium or high intercept with positive slope, cross-selling or up-selling promotion campaigns might be appropriate to increase their expenditure. Figure 1 and Table 3 illustrate averages of slopes and intercepts for each occupation. High standard deviations of both intercept and slope are detected due to large variability of individual customers within the same occupation. Such variability of customers' behavior affects poor performance of prediction in PSM. It is natural to say that behavior scores from BSM have high prediction power.

However, there are several limitations in BSM study. Firstly, handling personally identifiable data in the process of analyzing individual behavior is a very important issue of privacy. In order to comply with personal privacy protection and privacy laws, all personal identifiable information was deleted in the process of data handling. Since the members of the restaurant being studied are of a certain class of customers, it is decided not to mention the name of the restaurant to prevent from the possibility of personal identification. And it is decide to limit the use of personal behavior data for research purposes only. Therefore, in this study, we would like to mention the limitation that the source of the data cannot be disclosed in detail.

The second limitation is that BSM cannot be applied to new customers who do not have historical transaction data. In other words, BSM is only applicable to existing customers. Lastly, it cannot identify potential customers in low value segment. In this case, profile score rather than behavior score plays an important role to overcome these difficulties. For example, according to the results of Figure 1 and Table 3, the occupations of government officers, lawyers, presidents/chairmen, and businessmen have high profile scores so that we can promote these groups of people as new or potential customers. Although we competitively compare the prediction power of BSM and PSM in this study, the PSM will be an excellent complement to the BSM in distinguishing customers. In the management of new and existing customers, marketers should consider how to combine BSM based on the individual transaction data and PSM based on the aggregated demographic data efficiently as a powerful tool to understand customers and implement strategies. In practice, BSM can be used to identify and maintain loyal customer group avoiding churning. However, BSM has difficulty in application of new customer with no historical behavior data. In this case, PSM is useful tool to identify potential customers that had poor historical records in past. Therefore, promotion or up-selling campaign might be applied to make them valued customers.

References