The output is as shown in range F3:J19 of Figure 1.
Now click on Gender from the list and press the Add Code button and, finally, click on Income from the list and press the Add Column button. As you can see, the Party#1 and Party#2 variables have been added to the worksheet. The result is shown on the left side of Figure 5. Since the Ordinary coding option was selected, the 0, 1 coding is used. Next, select Party from the list box and press the Add Code button (as shown on the right side of Figure 5). Now, select Age in the list box and press the Add Column button (to copy the Age data to the output). Now enter A3:D19 into the Input Range in the dialog box (as shown on the right side of Figure 4) and press the OK button.įigure 4 – Coding the Party variable via the data analysis tool You can use the Real Statistics Extract Columns from a Data Range data analysis tool to automate the coding of categorical variables.įor example, to create the coding for the Party and gender variables from Example 1, press Ctrl-m and select Extract Columns from a Data Range from the menu. Similarly, we see that the model forecasts that a 40 year-old man who is Independent will have an income of 52,697 (cell J23).įigure 3 – Forecasting with categorical data From Figure 3, you can see that the model forecasts that this person would have an income of 24,494 (cell J22) as calculated by the formula =TREND(J4:J19,F4:I19,F22:I22). The model can predict the income of a 25 year old woman who is a Democrat, provided you recognize that the coding is Age = 25, Gender1 = 0, Party1 = 0, Party2 = 1. The output from the Real Statistics Linear Regression data analysis tool on this input is shown in Figure 2.įigure 2 – Regression with categorical data We can now perform regression analysis on this range. The resulting coding is as shown in range F3:J19 of Figure 1. Party2 = 1 if Party is Dem and Party2 = 0 otherwise Party1 = 1 if Party is Rep and Party1 = 0 otherwise Since Party takes three values (Rep, Dem, Ind), two dummy variables, called Party1 and Party2, are needed to code Party, defined as follows: Gender1 = 1 if Gender is Male and Gender1 = 0 otherwise (i.e. Since Gender takes two values (Male and Female), one dummy variable, called Gender1, is sufficient to code Gender, defined as follows: In general, if the original data has k categorical values, the model will require k – 1 dummy variables.
There are three possible values for the Party affiliation variable and two possible values for the Gender. Each such dummy variable will only take the value 0 or 1 (although in ANOVA using Regression, we describe an alternative coding that takes values 0, 1 or -1).Įxample 1: Create a regression model for the data in range A3:D19 of Figure 1.Īge is a continuous random variable, while Party affiliation and Gender are categorical random variables. Categorical independent variables can be used in a regression analysis, but first, they need to be coded by one or more dummy variables (also called tag variables).