New Developments in Advanced Statistical Methods: Introducing Multiple Regression
*The purpose of this post is to clarify statistical and interpretative issues surrounding common uses of multiple regression in research on brand equity. Given the breadth of multiple regression, this post will function as a primer on the topic.*
Brand Asset Valuator (BAV) has measured brand equity for over two decades and provides an empirical model for understanding changes in brand equity. Moreover, BAV is differentiated from other brand equity models because it is predictive and primarily concentrates on leading indicators (Energized Differentiation and Relevance) rather than lagging indicators (Esteem and Knowledge). Understanding developmental growth in brand equity is critical for explaining how brands ebb and flow over time. Multiple regression is an important statistical analysis in brand equity research because it enables us to predict the value of a dependent variable (e.g., brand loyalty) as accurately as possible from explanatory (predictor) variables (e.g., rewards programs, income, age, gender).
Multiple regression examines the relation between a single outcome and several independent variables. Multiple regression is applied when an outcome variable (Y) is considered to be a linear relation (straight line) versus a non-linear set of predictor variables (X), such that where k is the number of predictor variables, a is a numerical constant that represents an intercept and the various βs are numerous constants that each reflect how much change in Y will result from a one unit change in the X variable associated with the β, holding all other X variables constant. An error term (ε) is frequently added to the regression model to reflect the departures from linearity (a plot of the relation between the predictor X and the outcome Y is approximated by a straight line). It is imperative that variables are not very highly correlated (equal to or above .90) because this presents both logical and statistical problems. The logical problem is that unless you are doing analysis of structure (factor analysis, principal components, and structural-equation modeling), it is not a good idea to include redundant variables in the same analysis. Redundant variables are not needed because they inflate the size of the error terms and, most importantly, they actually weaken the statistical analysis. This is important to highlight because this is one of the primary reasons why most predictions fail.
A common use of multiple regression is to assess the effects of a variable on an outcome while statistically controlling for covariates. Simply put, a covariate is a variable that affects the relation between the dependent variable (DV) and independent variable (IV). For example, if an individual wants to know how well income and promotional discounts predict brand loyalty, after the effects of gender and age are controlled for, he/she would enter Age in Block 1 and then Income in Block 2. Once all sets of variables are entered, the overall model is assessed in terms of its ability to predict brand loyalty (dependent variable). The relative contribution of each block of predictors (IVs) is also assessed.
In hierarchical regression, IVs enter the regression equation in an order specified by the researcher. At Step 1 (Block 1), only the covariates are entered into the equation. At Step 2 (Block 2), the set of focal predictors or drivers (main effects) are added to the equation. Using our example above, when predicting brand loyalty (Y), a researcher might enter variables representing gender (C1) and age (C2) at Step 1 (Block 1), yielding the equation: . In Step 2 (Block 2), income (X1) and promotional discounts (X2) are added, yielding: . By entering gender and age as covariates in Step 1, it allows the main effects (predictors) of income and promotional discounts explaining the DV Brand Loyalty (Y) and provides a more precise R2 (total variance accounted for by the regression model). Furthermore, by entering gender and age as covariates, the possible effect of gender and age has been ‘removed’ and we can now determine whether our predictors (IVs) are still able to explain some of the remaining variance in our dependent variance and this allows for greater specificity of the predictive nature on overall brand loyalty.
Multiple regression is a standard statistical technique in brand equity research. This post highlights selected issues that may affect researchers who use this multiple regression. These issues include the need to consider the role of covariates and evaluating predictor relevance (avoiding redundancy when selecting predictor or driver variables). Consideration of these issues should improve the actionable insights that multiple regression can afford researchers as they build models on brand equity.