Manipulating R formula

When you’ve created some kind of analysis model in R you will have specified the variables in some kind of formula. R “recognises” formula objects, which have their own class formula.  If, for example you used the lm() command to create a regression result you will be able to extract the formula from the result.

mod <- lm(Fertility ~ ., data = swiss)

Fertility ~ Agriculture + Examination + Education + Catholic + Infant.Mortality

It can be useful to be able to extract the components of the model formula. For example you may want to examine how the R2 value alters as you add variables to the model.

Extract the predictor variables

To access the parts of a formula you need the terms() command:


The result contains various components; you want the term.labels.

attr(terms(formula(mod)), which = "term.labels")
[1] "Agriculture" "Examination" "Education" "Catholic"
[5] "Infant.Mortality"

You now have the variables, that is the predictor variables, from the formula. The next step is to get the response variable.

Extract the response variable

The response variable can be seen using the terms() command and the variables component, like so:

attr(terms(formula(mod)), which = "variables")
list(Fertility, Agriculture, Examination, Education, Catholic,

The result looks slightly odd but essentially it is a list and the 2nd component is the response.

vv <- attr(terms(formula(mod)), which = "variables")
rr <- as.character(vv[[2]]) # The response variable name
[1] "Fertility"

Now you have the response variable, and the predictors from earlier, which you can use to “build” a formula.

Building a formula

In its most basic sense a formula is simply a character string that “conforms” to the formula syntax: y ~ x + z for example. You can build a formula with the paste() command by joining the response, a ~ character and the predictors you want (these themselves separated by + characters).

The following example uses the swiss dataset, which is built into base R.

mod <- lm(Fertility ~ ., data = swiss)

# Get the (predictor) variables
vars <- attr(terms(formula(mod)), which = "term.labels")

# Get the response
vv <- attr(terms(formula(mod)), which = "variables")
rr <- as.character(vv[[2]]) # The response variable name

# Now the predictors
pp <- paste(vars, collapse = " + ")       # All
pp <- paste(vars[1], collapse = " + ")    # 1st
pp <- paste(vars[1:3], collapse = " + ")  # 1,2,3

# Build a formula
fml <- paste(rr, " ~ ", pp)
[1] "Fertility ~ Agriculture + Examination + Education + Catholic + Infant.Mortality"

Once you have your formula as a character object you can use it in place of a regular formula in commands.

Using a “built” formula

The character string representing a formula can be used exactly as you would a “regular” formula:

lm(fml, data = swiss)
lm(formula = fml, data = swiss)

     (Intercept)       Agriculture       Examination         Education  
         66.9152           -0.1721           -0.2580           -0.8709  
        Catholic  Infant.Mortality  
          0.1041            1.0770

One use for building a formula is in model testing. For example you create your regression model containing five predictors but maybe only the first three are really necessary. You can re-build the formula term by term and extract the R2 value for example. This would show you how the explained variance alters as you add more variables.

In another posting I’ll show how this process can be used for cross-validation.

See more tips and tricks at


Add more to a histogram

A basic histogram

A histogram is a standard way to present the distribution of a sample of numbers. It is easy to make a histogram using R with the hist() command. For example:

x = norm(n = 50, mean = 10, sd = 1)
hist(x, col = "skyblue")

Produces a histogram resembling this:

Rplot hist
Basic Histogram

Add a rug

A rug plot can be added to more or less any graphic. The rug() command can add the rug to any side of the plot:

  • side = 1 is the bottom axis
  • side = 2 is the left axis

You can alter the colour and width of the rug lines using regular graphical parameters:

rug(x, side = 1, col = "blue")

Adds the rug like so:

Rplot hist rug
A Rug plot added to a histogram

Add a strip chart

A strip chart can also be added to any chart via the stripchart() command. However, you also need to specify add = TRUE to the command. Giving a bit of jitter helps to separate out points that are coincident:

             method = "jitter",
             pch = 23,
             bg = "pink",
             add = TRUE)

The final plot looks like so:

Rplot hist rug strip
A histogram with added rug and strip plot

There are many additional options for the stripchart() command.

See more tips and tricks at

Statistics for Ecologists Using R and Excel

This is a book about the scientific process and how you apply it to data in ecology. You will learn how to plan for data collection, how to assemble data, how to analyze data and finally how to present the results. The book uses Microsoft Excel and the powerful Open Source R program to carry out data handling as well as producing graphs.