When you’ve created some kind of analysis model in R you will have specified the variables in some kind of formula. R “recognises” formula objects, which have their own class `formula`

. If, for example you used the `lm()`

command to create a regression result you will be able to extract the formula from the result.

mod <- lm(Fertility ~ ., data = swiss) formula(mod)

Fertility ~ Agriculture + Examination + Education + Catholic + Infant.Mortality

It can be useful to be able to extract the components of the model formula. For example you may want to examine how the R^{2} value alters as you add variables to the model.

## Extract the predictor variables

To access the parts of a formula you need the `terms()`

command:

terms(formula(mod))

The result contains various components; you want the `term.labels`

.

attr(terms(formula(mod)), which = "term.labels")

[1] "Agriculture" "Examination" "Education" "Catholic" [5] "Infant.Mortality"

You now have the variables, that is the predictor variables, from the formula. The next step is to get the response variable.

## Extract the response variable

The response variable can be seen using the `terms()`

command and the `variables`

component, like so:

attr(terms(formula(mod)), which = "variables")

list(Fertility, Agriculture, Examination, Education, Catholic, Infant.Mortality)

The result looks slightly odd but essentially it is a list and the 2nd component is the response.

vv <- attr(terms(formula(mod)), which = "variables") rr <- as.character(vv[[2]]) # The response variable name rr

[1] "Fertility"

Now you have the response variable, and the predictors from earlier, which you can use to “build” a formula.

## Building a formula

In its most basic sense a formula is simply a character string that “conforms” to the formula syntax: `y ~ x + z`

for example. You can build a formula with the `paste()`

command by joining the response, a `~`

character and the predictors you want (these themselves separated by `+`

characters).

The following example uses the `swiss`

dataset, which is built into base R.

mod <- lm(Fertility ~ ., data = swiss) # Get the (predictor) variables vars <- attr(terms(formula(mod)), which = "term.labels") # Get the response vv <- attr(terms(formula(mod)), which = "variables") rr <- as.character(vv[[2]]) # The response variable name # Now the predictors pp <- paste(vars, collapse = " + ") # All pp <- paste(vars[1], collapse = " + ") # 1st pp <- paste(vars[1:3], collapse = " + ") # 1,2,3 # Build a formula fml <- paste(rr, " ~ ", pp) fml

[1] "Fertility ~ Agriculture + Examination + Education + Catholic + Infant.Mortality"

Once you have your formula as a character object you can use it in place of a regular formula in commands.

## Using a “built” formula

The character string representing a formula can be used exactly as you would a “regular” formula:

lm(fml, data = swiss)

Call: lm(formula = fml, data = swiss) Coefficients: (Intercept) Agriculture Examination Education 66.9152 -0.1721 -0.2580 -0.8709 Catholic Infant.Mortality 0.1041 1.0770

One use for building a formula is in model testing. For example you create your regression model containing five predictors but maybe only the first three are really necessary. You can re-build the formula term by term and extract the R^{2} value for example. This would show you how the explained variance alters as you add more variables.

In another posting I’ll show how this process can be used for cross-validation.

See more tips and tricks at DataAnalytics.org.uk

Good stuff. When someone mentions the beta coefficient, what are they referring too?

LikeLike

The beta coefficients are standardised against one another, so they are essentially in the same scale. The original coefficients are in units corresponding to the individual variables.

LikeLike

Perfect – then in R there is package lm.beta. Thanks for explaining the difference between the two

LikeLike

See also an article on one of my other sites: http://www.dataanalytics.org.uk/Data%20Analysis/R%20Monographs/BetaCoeff.htm where there is some more explanation and some code.

LikeLike