Functions provide an efficient way to automate repetitive tasks, leading to cleaner, more maintainable code.
Why Functions are Useful
If you find yourself copying and pasting a code block more than twice, it’s time to consider creating a function.
Consider the example code provided:
library(tidyverse)df<-tibble( a =rnorm(5), b =rnorm(5), c =rnorm(5), d =rnorm(5))df|>mutate( a =(a-min(a, na.rm =TRUE))/(max(a, na.rm =TRUE)-min(a, na.rm =TRUE)), b =(b-min(b, na.rm =TRUE))/(max(b, na.rm =TRUE)-min(b, na.rm =TRUE)), c =(c-min(c, na.rm =TRUE))/(max(c, na.rm =TRUE)-min(c, na.rm =TRUE)), d =(d-min(d, na.rm =TRUE))/(max(d, na.rm =TRUE)-min(d, na.rm =TRUE)))
# A tibble: 5 × 4
a b c d
<dbl> <dbl> <dbl> <dbl>
1 1 0 0.187 0.863
2 0 1 0 0.388
3 0.612 0.0139 1 0
4 0.597 0.293 0.742 1
5 0.491 0.859 0.623 0.784
This code scales each column to a range between 0 and 1. However, a mistake slipped in: the second line refers to min(a) instead of min(b).
Writing a function streamlines this process and avoids such mistakes.
4 main advantages of functions:
Functions prevent the common copy-and-paste errors like not updating variable names consistently (above example)
If you need to change your code, and that is in a function, you only have to adjust the code in one place, rather than in multiple places around your script.
you can easily reuse functions across various projects, enhancing long-term efficiency.
If functions are clearly documented and given clear, expressive names (usually verbs, since you are doing somethign to the data), This makes your code more readable and easier to understand.
Function Name: This is the name of the function object that will be stored in the R environment after the function definition and used for calling that function.
Function arguments: These are the arguments that are passed to the function.
A function’s arguments typically fall into two broad categories: one supplies the data to compute on; the other controls the details of computation.
When you call a function, you typically omit the names of data arguments, because they are used so commonly (e.g., mean(vector1)). If you override the default value of an argument, use the full name (e.g., na.rm = TRUE).
Function Body: This is the code that is executed when the function is called. It is enclosed in curly braces.
return statements are optional in R, but they tell the code to stop executing the function and return the value of the expression that follows the return statement. They are useful when you want to return a value before the end of the function body.
How to Write a Function
Identify the pattern: Extract the recurring code and distinguish the static parts from the variable ones. In our case, this is the rescaling operation that’s repeated for each column.
for example, in the above code…
df|>mutate( a =(a-min(a, na.rm =TRUE))/(max(a, na.rm =TRUE)-min(a, na.rm =TRUE)), b =(b-min(b, na.rm =TRUE))/(max(b, na.rm =TRUE)-min(a, na.rm =TRUE)), c =(c-min(c, na.rm =TRUE))/(max(c, na.rm =TRUE)-min(c, na.rm =TRUE)), d =(d-min(d, na.rm =TRUE))/(max(d, na.rm =TRUE)-min(d, na.rm =TRUE)))
The parts that change are the column names (a, b, c, d), denoted here as █
Choose a function name: rescale01 clearly indicates that this function rescales numbers to a 0-to-1 range.
Define arguments: The only variable element is the input column (a, b, c, and d, denoted by █ above). We name this argument x - conventional for a numerical vector, but you can use whatever name you want (eg. colname)
Craft the body: This section incorporates the logic that’s consistently executed — the rescaling formula.
# Function: add_numbers# Purpose: This function adds two numbers and returns the result.# Arguments:# x - numeric: the first number to add.# y - numeric: the second number to add.# Returns: numeric, the sum of x and y.add_numbers<-function(x, y){if(!is.numeric(x)||!is.numeric(y)){return(NULL)}return(x+y)}
and for our function rescale01:
# Function: rescale01# Purpose: This function rescales a vector of numbers to a 0-to-1 range.# Arguments:# x - numeric: the vector of numbers to rescale.# Returns: numeric, the rescaled vector.rescale01<-function(x){(x-min(x, na.rm =TRUE))/(max(x, na.rm =TRUE)-min(x, na.rm =TRUE))}
copy this into your own environment and follow along to review
# MES 504 Coding Fundamentals Review: Functions# Basic Structure of a Function# function with name, but no body and no arguments # note function reporting minimums here# Function: empty_function# Purpose: nothing# Arguments: none# Returns: NULLempty_function<-function(){# ....}# call the function# function with name and body, but no arguments # Function: useless_function# Purpose: print a statement# Arguments: none# Returns: printed statementuseless_function<-function(){print("this is all I do")}# call the functionuseless_function()# function with name, body, and arguments# Function: add_function# Purpose: add two numbers# Arguments: x and y# Returns: z, or the sum of x and yadd_function<-function(x,y){z<-x+yreturn(z)}# call the functionadd_function(1,2)# more complex function with name, body and arguments # Function: area_circle_numeric# Purpose: calculates the area of a circle and returns as number# Arguments: radius# Returns: a numeric value for the area of the circle area_circle_numeric<-function(radius){pi*radius^2}# call the functionarea_circle_numeric(3)# can calculate area of multiple circlesarea_circle_numeric(c(1,5,29,10))# can also save into a variable `areas` for later calculationsareas<-area_circle_numeric(c(1,5,29,10))################################ Function: area_circle_statement# Purpose: calculates the area of a circle and returns in a statement# Arguments: radius# Returns: a statement declaring the area of the circlearea_circle_statement<-function(radius){area<-pi*radius^2paste("the area of the circle is" , area)}# call the functionarea_circle_statement(c(1,5,29,10))# I want to round the number for the printed message, and add units.# what do I need to change/add?area_circle_statement<-function(radius){area<-pi*radius^2paste("the area of the circle is" , round(area,2),"cm2")}# call the functionarea_circle_statement(c(1,5,29,10))# paste("the area of the circle is", toString(area_circle_numeric(c(1,5,29,10))))# Now I want to name each circle in the statement, # So I can tell multiple circles apart.# what do I need to change/add?area_circle_statement<-function(name, radius, digits){area<-pi*radius^2paste("the area of circle",name,"is" , round(area,digits),"cm2")}area_circle_statement(c("Tom","Jerry","Spike"), c(2,5,7), c(0,1,2))# call the function