The R Pro­gram­ming language is a popular sta­tis­ti­cal pro­gram­ming language used primarily in science and math­e­mat­ics for sta­tis­ti­cal computing. R is an in­ter­est­ing language with some dis­tinc­tive features. The language is quite fun once you get used to it.

What sets R apart from other pro­gram­ming languages?

R is not a general-purpose pro­gram­ming language like Java or Python. The language is intended for sta­tis­ti­cal computing. R has remained in the top 20 most popular pro­gram­ming languages for years despite some strong com­pe­ti­tion.

R is special because it comes with the whole package. R pro­gram­ming usually takes place in an in­ter­ac­tive en­vi­ron­ment, complete with read-eval-print loop (REPL) and in­te­grat­ed help. The open-source language is supported by a widely developed ecosystem. The community maintains the package repos­i­to­ry “The Com­pre­hen­sive R Archive Network” (CRAN). Data sets and sci­en­tif­ic white papers on new ap­proach­es and packages are also con­tin­u­al­ly being submitted.

These features make R the perfect pro­gram­ming en­vi­ron­ment for sta­tis­tics and data science. The in­ter­ac­tive en­vi­ron­ment promotes research and fosters playful learning of both the language and the un­der­ly­ing math­e­mat­ics.

R is a sta­tis­ti­cal pro­gram­ming language used for data analysis

R is a sta­tis­ti­cal pro­gram­ming language and concepts such as normal dis­tri­b­u­tion, sta­tis­ti­cal tests, models and re­gres­sion are commonly used. In addition to R, there are a number of com­pa­ra­ble sci­en­tif­ic languages, such as the com­mer­cial product Matlab and the more recent language Julia. Python has become another strong com­peti­tor in recent years.

Unlike Python, R has native support for sta­tis­ti­cal pro­gram­ming. The key dif­fer­ence is how the language operates on values. In R, you usually compute with multiple values at once. This is a special feature in R, as almost all other languages use a single number as the simplest value.

Let’s have a look at R’s approach to data pro­cess­ing with a simple example. Math­e­mat­i­cal op­er­a­tions can be performed in every pro­gram­ming language. This is also the case in R. Let’s add two numbers:

# returns 15
10 + 5
R

Nothing unusual so far. However, the same addition operation can be applied to a list of numbers in R. We can combine two numbers into a list and add a constant value:

# returns 15, 25
c(10, 20) + 5
R

This may be a sur­pris­ing result for seasoned pro­gram­mers. Even a modern, dynamic language like Python does not fa­cil­i­tate this:

# throws an error
[10, 20] + 5
R

Two lists can also be added in R. In this case, the list elements are not combined into one list, rather the ap­pro­pri­ate math­e­mat­i­cal operation is performed for each element:

# returns 42, 69
c(40, 60) + c(2, 9)
R

A loop is required to process multiple elements of a list in older languages like Java or C++. This is because these languages separate single values, or scalars, from composite data struc­tures, known as vectors. In R, the vector is the basic unit. A scalar operating as a one-element vector is unique to R pro­gram­ming.

What makes sta­tis­tics so special is that it makes math­e­mat­i­cal precision more flexible. In sta­tis­tics, you have to calculate with un­cer­tain­ties and imperfect data derived from reality. Something can, of course, always go wrong. But for­tu­nate­ly, R is equipped to deal with error to a certain extent. The language can handle missing values without crashing a running script.

Let’s look at an example of the language’s ro­bust­ness. In any pro­gram­ming language, a crash can occur when a number is divided by zero. However, R is not affected by this. Division by zero results in the value Inf, which can be easily filtered out of the data during a cleanup later on:

# list of divisors, containing zero
divisors = c(2, 4, 0, 10)
# returns 'c(50, 25, Inf, 10) '
quotients = 100 / divisors
# filter out Inf; returns 'c(50, 25, 10)'
cleaned_quotients = quotients[quotients != Inf]
R

R supports OOP and func­tion­al pro­gram­ming

R makes pro­gram­ming extremely flexible. The language doesn’t fit clearly into the hierarchy of pro­gram­ming paradigms. It is supported by an OOP system, but you won’t find the usual class de­f­i­n­i­tions. Its primarily func­tion­al and im­per­a­tive ap­proach­es are used on a daily basis. The func­tion­al features are strongly pro­nounced, and they are ideal for data pro­cess­ing.

Similar to JavaScript, the object system’s flex­i­bil­i­ty is its main advantage. The generic functions are com­pa­ra­ble to Python, in the sense that they can be applied to different types of objects. For example, the length() function exists in R pro­gram­ming, which is similar to len() in Python.

How does R pro­gram­ming work?

R pro­gram­ming spe­cial­izes in data and sta­tis­tics. In R, you need a data set to develop a solution to a problem. Un­for­tu­nate­ly, this may not always exist at the time of de­vel­op­ment. This means that an R pro­gram­ming project usually begins with simulated data. Users write the code, test the func­tion­al­i­ty, and replace the test data with real data at a later point.

How is R code executed?

R is a dynamic, in­ter­pret­ed scripting language, similar to Ruby and Python. Unlike the pro­gram­ming language C there is no sep­a­ra­tion of source code and ex­e­cutable code in R. De­vel­op­ment usually takes place in­ter­ac­tive­ly, whereby the in­ter­preter is fed line by line with source code, which is executed im­me­di­ate­ly. Variables are created au­to­mat­i­cal­ly when needed and names are bound at runtime.

This kind of in­ter­ac­tive and dynamic pro­gram­ming is like being inside the running program. Objects can be examined and modified, and new ideas can be tested im­me­di­ate­ly. The help command grants access to the syntax and functions doc­u­men­ta­tion:

# view help for 'for' syntax
help('for')
# view help for 'c()' function
help(c)
R

Script files can be loaded dy­nam­i­cal­ly from the in­ter­preter. The source command works in the same way as the shell command. The R source code file is read and fed into the running session:

source('path/to/file.r')
R

What is the syntax of the R pro­gram­ming language?

The scripting language uses curly braces to delimit the bodies of functions and control state­ments, like in C and Java. In contrast to Python, indenting code does not affect the function. Comments start with a hash, like in Ruby and Python, and no semicolon is needed at the end of a statement.

The language has some pe­cu­liar­i­ties, making it easy to recognize R code once you become more familiar with it. The equal sign and two arrow-like operators are used in R pro­gram­ming for as­sign­ments. This allows the as­sign­ment’s direction to be reversed:

# equivalent assignments
age <- 42
'Jack' -> name
person = c(age, name)
R

Another typical feature of R code is the pseudo-object notation following the pattern object.method():

# test if argument is a number
is.numeric(42)
R

The is.numeric function looks like a numeric() method, which belongs to an object named is. However, this is not the case. In R pro­gram­ming, the dot is a regular character. The function could be called is_numeric instead of is .numeric.

The con­cate­na­tion function c() is used to create ubiq­ui­tous vectors in R pro­gram­ming:

people.ages <- c(42, 51, 69)
R

Applying the function to vectors will merge them into a coherent vector:

# yields 'c(1, 2, 3, 4)'
c(c(1, 2), c(3, 4))
R

Unlike most pro­gram­ming languages, indexing a vector’s elements starts at 1 in R. This takes some time to get used to, but it helps to avoid the dreaded off-by-one errors. The highest vector index cor­re­sponds to the vector’s length:

# create a vector of names
people <- c('Jack', 'Jim', 'John')
# access the first name
people[1] == 'Jack'
# access the last name
people[length(people)] == 'John'
R

Similar to Python, R pro­gram­ming also uses slicing. A slice can be used to index a vector’s subrange. This is based on sequences, which are natively supported in R. Let’s create a sequence of numbers and select a slice:

# create vector of numbers between 42 and 69
nums = seq(42, 69)
# equivalent assignment using sequence notation
nums = 42:69
# using a sequence, slice elements 3 through 7
sliced = nums[3:7]
R

How do control struc­tures work in R pro­gram­ming?

Basic op­er­a­tions are defined for vectors in R pro­gram­ming. This means that loops are not required. Instead, an operation is performed on the entire vector, which modifies the in­di­vid­ual elements. We square the first ten positive numbers without a loop:

nums <- seq(10)
squares <- nums ** 2
squares[3] == 9
R

The for loop in R does not work the same way as for loops in C, Java or JavaScript. There is no detour via a loop variable. Iteration is performed directly over the elements, like in Python:

people = c('Jim', 'Jack', 'John')
for (person in people) {
    print(paste('Here comes', person, sep = ' '))
}
R

The if-else branching in R exists as a basic control structure. However, this can be replaced by filter functions or the logical indexing of vectors. Let’s create a vector of ages and filter the data using two variables: over 18 and under 18. This can be done without a loop or branching:

# create 20 ages between 1 and 99
ages = as.integer(runif(20, 1, 99))
# filter adults
adults = ages[ages > 18]
# filter children
children = ages[ages < 18]
# make sure everyone is accounted for
length(adults) + length(children) == length(ages)
R

The same approach can be taken with control struc­tures:

# create 20 ages between 1 and 99
ages = as.integer(runif(20, 1, 99))
# start with empty vectors
adults = c()
children = c()
# populate vectors
for (age in ages) {
    if (age > 18) {
        adults = c(adults, age)
    }
    else {
        children = c(children, age)
    }
}
R

How to get started with R pro­gram­ming

To get started with R pro­gram­ming, you just need a local R in­stal­la­tion. There are in­stallers available for all major operating systems. A standard R in­stal­la­tion includes a GUI in­ter­preter with REPL, in­te­grat­ed help and an editor. For efficient coding, we recommend using an es­tab­lished code editor. RStudio is a great al­ter­na­tive to the R en­vi­ron­ment.

Which projects is R suitable for?

R pro­gram­ming is used mainly in science and research, for example, in bioin­for­mat­ics and machine learning. However, the language is suitable for all projects that use math­e­mat­i­cal models or sta­tis­ti­cal modeling. R does not have an advantage when it comes to pro­cess­ing text. This is Python’s area of expertise.

Common cal­cu­la­tions and vi­su­al­iza­tions in spread­sheets can be replaced with R code. Data and code are not mixed in the same cells, allowing for code to be written once and applied to multiple data sets. Fur­ther­more, there is no danger of over­writ­ing a cell’s formula when making manual changes.

R is con­sid­ered the gold standard for sci­en­tif­ic pub­li­ca­tions. The sep­a­ra­tion of code and data is what makes sci­en­tif­ic re­pro­ducibil­i­ty possible. The mature ecosystem of tools and packages allows efficient pub­li­ca­tion pipelines to be created. Eval­u­a­tions and vi­su­al­iza­tions are au­to­mat­i­cal­ly generated from code and data and then in­te­grat­ed into high-quality LaTeX or RMarkdown documents.

Tip

Buy webspace at an af­ford­able price from IONOS. It’s the perfect foun­da­tion for your website.

Go to Main Menu