Functions

Functions provide an efficient way to automate repetitive tasks, leading to cleaner, more maintainable code.

Why Functions are Useful

If you find yourself copying and pasting a code block more than twice, it’s time to consider creating a function.

Consider the example code provided:

library(tidyverse)

df <- tibble( 
  a = rnorm(5),
  b = rnorm(5),
  c = rnorm(5),
  d = rnorm(5)
)

df |> mutate(
  a = (a - min(a, na.rm = TRUE)) / 
    (max(a, na.rm = TRUE) - min(a, na.rm = TRUE)),
  b = (b - min(b, na.rm = TRUE)) / 
    (max(b, na.rm = TRUE) - min(b, na.rm = TRUE)),
  c = (c - min(c, na.rm = TRUE)) / 
    (max(c, na.rm = TRUE) - min(c, na.rm = TRUE)),
  d = (d - min(d, na.rm = TRUE)) / 
    (max(d, na.rm = TRUE) - min(d, na.rm = TRUE))
)
# A tibble: 5 × 4
      a      b     c     d
  <dbl>  <dbl> <dbl> <dbl>
1 1     0      0.187 0.863
2 0     1      0     0.388
3 0.612 0.0139 1     0    
4 0.597 0.293  0.742 1    
5 0.491 0.859  0.623 0.784

This code scales each column to a range between 0 and 1. However, a mistake slipped in: the second line refers to min(a) instead of min(b).

Writing a function streamlines this process and avoids such mistakes.

4 main advantages of functions:

  • Functions prevent the common copy-and-paste errors like not updating variable names consistently (above example)

  • If you need to change your code, and that is in a function, you only have to adjust the code in one place, rather than in multiple places around your script.

  • you can easily reuse functions across various projects, enhancing long-term efficiency.

  • If functions are clearly documented and given clear, expressive names (usually verbs, since you are doing somethign to the data), This makes your code more readable and easier to understand.

Anatomy of a Function

The function template is as follows:

function_name <- function(Arguments) {
  function_body
  return() # optional return statement
}

Parts of the function:

  1. Function Name: This is the name of the function object that will be stored in the R environment after the function definition and used for calling that function.
  2. Function arguments: These are the arguments that are passed to the function.
    • A function’s arguments typically fall into two broad categories: one supplies the data to compute on; the other controls the details of computation.
    • When you call a function, you typically omit the names of data arguments, because they are used so commonly (e.g., mean(vector1)). If you override the default value of an argument, use the full name (e.g., na.rm = TRUE).
  3. Function Body: This is the code that is executed when the function is called. It is enclosed in curly braces.
    • return statements are optional in R, but they tell the code to stop executing the function and return the value of the expression that follows the return statement. They are useful when you want to return a value before the end of the function body.

How to Write a Function

  1. Identify the pattern: Extract the recurring code and distinguish the static parts from the variable ones. In our case, this is the rescaling operation that’s repeated for each column.

for example, in the above code…

df |> mutate(
  a = (a - min(a, na.rm = TRUE)) / 
    (max(a, na.rm = TRUE) - min(a, na.rm = TRUE)),
  b = (b - min(b, na.rm = TRUE)) / 
    (max(b, na.rm = TRUE) - min(a, na.rm = TRUE)),
  c = (c - min(c, na.rm = TRUE)) / 
    (max(c, na.rm = TRUE) - min(c, na.rm = TRUE)),
  d = (d - min(d, na.rm = TRUE)) / 
    (max(d, na.rm = TRUE) - min(d, na.rm = TRUE))
)

The parts that change are the column names (a, b, c, d), denoted here as █

(█ - min(█, na.rm = TRUE)) / 
  (max(█, na.rm = TRUE) - min(█, na.rm = TRUE))
  1. Choose a function name: rescale01 clearly indicates that this function rescales numbers to a 0-to-1 range.

  2. Define arguments: The only variable element is the input column (a, b, c, and d, denoted by █ above). We name this argument x - conventional for a numerical vector, but you can use whatever name you want (eg. colname)

  3. Craft the body: This section incorporates the logic that’s consistently executed — the rescaling formula.

    For our example, the function is:

rescale01 <- function(x) {
  (x - min(x, na.rm = TRUE)) / 
    (max(x, na.rm = TRUE) - min(x, na.rm = TRUE))
}
  1. Testing the function: Confirm the function’s correctness with some simple example vectors.
rescale01(c(-10, 0, 10))
[1] 0.0 0.5 1.0
rescale01(c(1, 2, 3, NA, 5))
[1] 0.00 0.25 0.50   NA 1.00
  • Update the original mutate() call by applying the rescale01 function to each column:
df |> mutate(
  a = rescale01(a),
  b = rescale01(b),
  c = rescale01(c),
  d = rescale01(d)
)
# A tibble: 5 × 4
      a      b     c     d
  <dbl>  <dbl> <dbl> <dbl>
1 1     0      0.187 0.863
2 0     1      0     0.388
3 0.612 0.0139 1     0    
4 0.597 0.293  0.742 1    
5 0.491 0.859  0.623 0.784

How to Report and Describe a Function

For each function you will write in this course, you must include the following elements:

  1. Function Name: Start by writing the function name at the top in a comment.
# Function: add_numbers
  1. Purpose: Briefly describe what the function does.
# Purpose: This function adds two numbers and returns the result.
  1. Arguments: List each Argument, its expected type, and purpose.
# Arguments:
#   x - numeric: the first number to add.
#   y - numeric: the second number to add.
  1. Return Value: Describe what the function returns and its type.
# Returns: numeric, the sum of x and y.
  1. Function Code: Write the actual function code below the comments.
add_numbers <- function(x, y) {
   if (!is.numeric(x) || !is.numeric(y)) {
       return(NULL)
   }
   return(x + y)
}

,,, so to put it all together :

# Function: add_numbers
# Purpose: This function adds two numbers and returns the result.
# Arguments:
#   x - numeric: the first number to add.
#   y - numeric: the second number to add.
# Returns: numeric, the sum of x and y.
add_numbers <- function(x, y) {
   if (!is.numeric(x) || !is.numeric(y)) {
       return(NULL)
   }
   return(x + y)
}

and for our function rescale01:

# Function: rescale01
# Purpose: This function rescales a vector of numbers to a 0-to-1 range.
# Arguments:
#   x - numeric: the vector of numbers to rescale.
# Returns: numeric, the rescaled vector.
rescale01 <- function(x) {
  (x - min(x, na.rm = TRUE)) / (max(x, na.rm = TRUE) - min(x, na.rm = TRUE))
}

This practice will get you used to the necessary reporting and documentation of functions, which can prepare you for more advanced function documentation, for example using the roxygen2 package.

In Class Practice

copy this into your own environment and follow along to review

# MES 504 Coding Fundamentals Review: Functions
# Basic Structure of a Function

# function with name, but no body and no arguments 
# note function reporting minimums here

    # Function: empty_function
    # Purpose: nothing
    # Arguments: none
    # Returns: NULL
    empty_function <- function() {
      # ....
    }

    # call the function
    

# function with name and body, but no arguments 
    
    # Function: useless_function
    # Purpose: print a statement
    # Arguments: none
    # Returns: printed statement
    useless_function <- function(){
      print("this is all I do")
    }

    # call the function
    useless_function()
    
# function with name, body, and arguments
    # Function: add_function
    # Purpose: add two numbers
    # Arguments: x and y
    # Returns: z, or the sum of x and y
    add_function <- function(x,y){
      z <- x + y
      return(z)
    }
    
    # call the function
    add_function(1,2)

# more complex function with name, body and arguments     

    # Function: area_circle_numeric
    # Purpose: calculates the area of a circle and returns as number
    # Arguments: radius
    # Returns: a numeric value for the area of the circle 
    area_circle_numeric <- function(radius){
      pi * radius^2
    }

    # call the function
    area_circle_numeric(3)
    
    # can calculate area of multiple circles
    area_circle_numeric(c(1,5,29,10))

    # can also save into a variable `areas` for later calculations
    areas <- area_circle_numeric(c(1,5,29,10))

    ###############################
    
    # Function: area_circle_statement
    # Purpose: calculates the area of a circle and returns in a statement
    # Arguments: radius
    # Returns: a statement declaring the area of the circle
    area_circle_statement <- function(radius){
      area <- pi * radius^2
      paste("the area of the circle is" , 
            area)
    }

    # call the function
    area_circle_statement(c(1,5,29,10))
    
    # I want to round the number for the printed message, and add units.
    # what do I need to change/add?
    
    area_circle_statement <- function(radius){
      area <- pi * radius^2
      paste("the area of the circle is" , 
            round(area,2),
            "cm2")
    }
    
    # call the function
    area_circle_statement(c(1,5,29,10))
    
    # paste("the area of the circle is", toString(area_circle_numeric(c(1,5,29,10))))

    # Now I want to name each circle in the statement, 
    # So I can tell multiple circles apart.
    # what do I need to change/add?
    area_circle_statement <- function(name, radius, digits){
      area <- pi * radius^2
      paste("the area of circle",
            name,
            "is" , 
            round(area,digits),
            "cm2")
    }
    
    area_circle_statement(c("Tom","Jerry","Spike"), c(2,5,7), c(0,1,2))
    
    # call the function

References and Further Reading

functions

tweets about functions

R as a functional programming language

functions that make functions

metaprogramming

tidyverse style guide

Roxygen2