R’s gsub() and sub() functions help with text ma­nip­u­la­tion and are easy to use and combine with other functions. They can be seam­less­ly in­te­grat­ed into data analyses and sta­tis­ti­cal cal­cu­la­tions.

What do gsub() and sub() do in R?

R’s gsub() and sub() functions can replace patterns in strings. sub(), short for “sub­sti­tute”, finds the first instance of a pattern in a string and replaces it with another ex­pres­sion. This function only makes a single re­place­ment. gsub()stands for “global sub­sti­tute” and finds all the instances of a pattern in a string and replaces each of them with another ex­pres­sion.

Both functions have broad ap­pli­ca­tions in data cleaning and trans­for­ma­tion. Their main purpose is to delete unwanted patterns and adapt text. They are es­pe­cial­ly important for text ma­nip­u­la­tion in sta­tis­ti­cal analyses and machine learning ap­pli­ca­tions in R. For example, the functions can be used to extract certain patterns or transform data into the form necessary for an analysis.

What is the syntax of R’s gsub() and sub()?

The syntax of R’s gsub() and sub() functions is pretty similar. The two methods both take the following pa­ra­me­ters:

  • pattern: The pattern you’re looking for, in the form of a string or regular ex­pres­sion
  • re­place­ment: The ex­pres­sion the pattern should be replaced with
  • x: The vector or data frame to find and replace in

The structure of R’s gsub()

gsub(pattern, replacement, x)
R

The structure of R’s sub()

sub(pattern, replacement, x)
R

Examples for gsub()in R

The dis­tin­guish­ing feature of R’s gsub() is that it finds and replaces all instances of a pattern.

Deleting spaces

You can use gsub() to remove extra spaces from strings.

sentence <- "  Data science  is  powerful.  "
clean_sentence <- gsub("\\s+", " ", sentence)
cat(clean_sentence)
R

This produces the output:

"Data science is powerful."
R

The regular ex­pres­sion \\s+ cor­re­sponds to one or more con­sec­u­tive spaces. When used in the above example, it removes the empty spaces from the sentence.

Replacing phone numbers

R’s gsub() is also useful for anonymiz­ing or deleting private data such as phone numbers.

text <- "Contact us at 123-456-7890 for more information."
modified_text <- gsub("\\d{3}-\\d{3}-\\d{4}", "redacted phone number", text)
cat(modified_text)
R

Output:

"Contact us at redacted phone number for more information."
R

In the above example, we extract phone numbers with the regular ex­pres­sion \\d{3}-\\d{3}-\\d{4} and replace them with the string "redacted phone number".

Examples for sub()in R

If you just want to replace the first instance of a pattern, use R’s sub() function.

Replacing the first instance of a word

Let’s say we have a string with a repeated word and want to replace the first instance of that word.

text <- "Data science is powerful. Data analysis is fun."
result_sub <- sub("Data", "Information", text)
cat(result_sub)
R

The output looks as follows:

"Information science is powerful. Data analysis is fun."
R

R’s sub() searches the text for the string "Data" and replaces the first instance it finds with "Information".

Replacing numbers

We can also replace numbers with sub().

numeric_text <- "The cost is $1000. Please pay by 01/02/2024."
result <- sub("\\d+", "2000", numeric_text)
cat(result)
R

Output:

"The cost is $2000. Please pay by 01/02/2024."
R

The regular ex­pres­sion \\d+ cor­re­sponds to one or more digits. sub() just replaces the first group of digits in the text.

Tip

Read about other R functions like R substring and R rbind in our Digital Guide.

Web Hosting
Hosting that scales with your ambitions
  • Stay online with 99.99% uptime and robust security
  • Add per­for­mance with a click as traffic grows
  • Includes free domain, SSL, email, and 24/7 support
Go to Main Menu