Chapter 9 High performance computing
9.1 Rcpp: using C++ within R
While R is a relatively new language essentially tailored for statisticians and data analysts, C++ is a well-established general-purpose programming language. This chapter does not aim at providing an overview of programming in C++ but rather how it can be used to address specific problems related to R. The point of view taken here is that of an R programmer with no notions of C++ or similar languages. Therefore we will not go into details concerning fundamental notions of C++ but rather provide some necessary concepts in order to start programming with this language and use it within R.
Having said this, R is an interpreted language
which makes it particularly inefficient when performing tasks that require a repetition of operations such as for
and while
loops,
or recursive calling of functions. On the other hand, C++ is a
compiled language which means that,
before being able to use your code, it is translated (compiled) to lower-level
machine language which allows to considerably accelerate the code running time.
Note that C++ is just one of many solutions that can make the execution of R code more efficient such as vectorization and memory management. Profiling also helps to identify which parts of the code are the slowest to execute and therefore provides useful information to understand where to improve code efficiency.
Another note that is of interest is that C++ compiles ahead of time (before using the code), but there are other approaches that compile just in time (at runtime) such as the Julia language.
9.1.1 Installation
Working with C++ in R does not come “out of the box”, but it is nevertheless a well handled procedure due to the latest contributions to R. In order to make use of these tools, you basically need two elements: a C++ compiler and a
program that permits to connect R and C++. For the latter, you need to install
the Rcpp
package:
# For CRAN official release
install.packages("Rcpp")
# For development version
devtools::install_github("RcppCore/Rcpp")
For the former, it depends on your OS:
1. On Windows, install RTools
.
2. On Mac, install Xcode
from the App Store.
3. On Linux, you need to install r-bas-dev
(or similar) from
your package manager.
Verify that your installation works by running:
require(Rcpp)
cppFunction("double sumC(double x, double y){
double s;
s = x + y;
return s;
} ")
sumC(3.5, 6.5)
;
ottherwise the compiler will return an error.
The function sumC(x,y)
takes two doubles (numeric) values (x
and y
)
and returns a double (the sum of the two values). Can you predict what happens if you run the following lines? Can you explain why?
9.1.2 A first program
The above cppFunction()
creates an inline C++ function (within your R code).
However, it is a better practice to create a separate file dedicated to C++ functions.
It is possible then to compile your C++ functions and call them within R.
In RStudio, you can create a C++ file by clicking on File > New File > C++ File. RStudio provides you with an example. At the top of C++ file, you should see:
The first line tells your program that you can access functionalities from the Rcpp
library.
The second line tells your program that you can access those functionalities without specifying/loading
the Rcpp
library. For example, suppose there is function mean()
in Rcpp
.
With the first line, you could use this function only by typing Rcpp::mean()
.
Adding the second line allows you to directly write mean()
in your program.
This is somehow equivalent to using library(Rcpp)
in your R code.
Once the C++ code is written, it should be saved with the appropriate extension which is either
*.cc
or *.cpp
. Now you can compile your first C++ program and for this type the following command in your R file:
To test this, try to write the sumC()
function within a separate C++ file and compile it in R.
By default, C++ functions are not accessible within your R program.
One key element is to use the attribute [[Rcpp::export]]
in comment
just before the function definition. For example, the separate C++ file could look something like this:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
double sumC(
double x,
double y
) {
double s;
s = x + y;
return s;
}
//
is used for single line comment. The symbol /*
starts a comment
which can possibly cover multiple lines and */
ends the comment.
9.1.3 Data structures
One difference you can immediately notice while working with C++ is that you need to tell the program what type of data structure you intend to use before using it. This makes the writing (and execution) of the code “safer” since the programmer pay more attention to the implementation details thereby preventing the code from being used in a way for which it was not intended. On the other hand, a drawback of this characteristic is that the programmer will have to spend (at least at the beginning) more time conceiving the code.
Here is a list of primitive data types with their keywords:
- int
(4) : integer,
- float
(4) : single precision floating number (numeric),
- double
(8) : double precision floating number (numeric),
- bool
: logical (true or false),
- char
: character,
- void
: without any value.
The numbers in parenthesis represents typical memory (in bytes) required to store the variable.
Single and double precision floating numbers represent how the computer
stores real numbers in the memory. float
requires less memory than double
but is less precise. As a general rule, we only use type double
since this is the default type in R:
R has no single precision data type. All real numbers are stored in double precision format.
help(double)
in R
These data types can be modified by using one (or more) of the following keywords:
- long
: allocate more space in memory,
- short
: allocate less space in memory,
- unsigned
: numbers take only positive value.
For more complex data structures such as vectors and matrices, the story is slightly more complicated.
Luckily for R programmers, Rcpp
provides many predefined structures that make
the transition to C++ easier. Here is a non-exhaustive list:
Table : Common data structure in R with Rcpp equivalent.
Rcpp | R | Role |
---|---|---|
NumericVector |
c(...) or vector(mode = "double", ...) |
A vector of double |
IntegerVector |
c(...) or vector(mode = "integer", ...) |
A vector of integer |
LogicalVector |
c(...) or vector(mode = "bool", ...) |
A vector of boolean |
NumericMatrix |
matrix() |
A matrix of double |
IntegerMatrix |
matrix() |
A matrix of integer |
LogicalMatrix |
matrix() |
A matrix of boolean |
List |
list |
A list |
DataFrame |
data.frame() |
A data frame |
9.1.3.1 Vectors
Before using a vector in C++, you need to create one. For example,
creates an empty vector of double of length 10.
To access elements of a vector, the syntax is the same as in R with one notable exception:
accessor starts at 0! So if you want to access the first element of a vector v
,
you should write v[0]
, the ith element, v[i-1]
. Note that you can equivalently
write v(0)
and v(i-1)
.
The length of the vector v
can be obtained as follows
Here the length of the vector is saved into an integer n
.
Note that length()
is a method for the class NumericVector
and uses the syntax ‘’dot-function’’. Theses concepts are beyond
the scope of this introduction and the only aspect to underline is that the above
notation gives the same result as if you wrote length(v)
in R.
9.1.3.2 Matrices
Creating a matrix is similar to creating a vector. The command
creates an empty matrix with 3 rows and 4 columns. This is equivalent to writing the following in R
The syntax for accessing matrix elements is slightly different. For example, accessing a column or a row can be done as follows
// Copy first column into a vector
NumericVector v = m( _ , 0);
// Copy first row into a vector
NumericVector v = m(0, _);
// Access the ith row, jth column
double x = m(i-1, j-1); // i and j are positive integers already defined
The number of columns/rows of a matrix can be retrieved using
9.1.3.3 List
List creation can be accomplished as follows:
// Create a list with elements e1 and e2
List L = List::create(e1, e2);
// Create a list with elements e1 and e2 with names
List L = List::create(Named("name1") e1, Named("name2") e2);
The elements e1
and e2
can be of any type (e.g. vector, matrix, etc.)
9.1.4 Control structures
Logical operators in R and C++ are similar.
Here is a list where x
and y
are scalar (not vectors!).
Operator | Description |
---|---|
x > y |
x greater than y |
x >= y |
x greater than or equal to y |
x < y |
x less than y |
x <= y |
x less than or equal to y |
x == y |
x equals to y |
x != y |
x not equal to y |
x && y |
x and y |
x || y |
x or y |
!x |
not x |
9.1.4.1 if/else/elseif statements
R and C++ share the same syntax also for if
/else
/elseif
statements
with the exception that else if
has a space in C++ syntax. For example,
the following expression is valid within both R and C++:
9.1.4.2 loops
Loops are much faster in C++ than in R and the syntax is slightly different.
For example, the following R for
loop
can be written in C++ with the following code
Three elements need to be defined:
1. The initialization statement: an iterator (i
here), generally of type integer, needs to be initialized
(here i
is initialized to 1).
2. The test expression: the expression is evaluated and should return a boolean. If false
the loop stops and
if true
, the body of the for loop is evaluated.
3. The update statement: it defines the increment of the iterator, usually set to i++
.
Note that we can also allow the iterators to decrease via the decrement operator i--
.
C++ also offers variants of assignment operators.
Operator | Equivalence | Description |
---|---|---|
x = y; |
Assign the value of y to x |
|
x += y; |
x = x + y; |
Take the value of x and add y , then assign to x |
x -= y; |
x = x - y; |
Take the value of x and substract y , then assign to x |
x *= y; |
x = x * y; |
Take the value of x and multiply by y , then assign to x |
x /= y; |
x = x / y; |
Take the value of x and divide by y , then assign to x |
Therefore, i++
is equivalent to i+=1
. If one therefore wanted to increment by 2 at each loop, one would use i+=2
.
The C++ while
loop has the same syntax as R. The following expression
is the same in both languages.
Note that C++ is also equipped with a do ... while
loop.
This may be useful, for example, in the situation where
you need a first run of the body prior to checking the condition.
9.1.5 Functions
Creating functions in C++ is fundamentally not much different from writing them in R. Neverthelss, C++ requires an additional layer of precision. A function can follow this pattern:
First, the type of return of the function should be specified.
The obj
that is returned should be of the same type as the one specified
at the start of the function declaration and each argument should have a type.
Otherwise, as in R, the body of the function contains statements that define what the function does.
As mentioned earlier, by default a C++ function cannot be used within R and the attribute [[Rcpp::export]]
in comment
just before the function definition needs to be specified. For example
// [[Rcpp::export]]
type function_name(type arg1, ..., type argn){
body of the function;
return obj;
}
will make function_name
available within R.
9.1.6 Comparing with R
Making an effort of writing C++ code instead of R code should deliver advantages.
One way to measure these advantages is by comparing the computational efficiency
of a C++ function with the equivalent R function.
This can be achieved, for example, using the microbenchmark
package.
For example, suppose you create a function that computes the factorial number of a positive integer input. In R, you can write
my_factorialR <- function(n){
factorial <- 1
for(i in 2:n){
factorial <- factorial * i
}
return(factorial)
}
An equivalent C++ function could be
int my_factorialC(int n){
int factorial = 1;
for(int i(2); i<=n; ++i){
factorial *= i;
}
return factorial;
}
## Loading required package: microbenchmark
## Unit: microseconds
## expr min lq mean median uq max
## my_factorialR(500) 13.400 13.5155 13.70697 13.573 13.700 18.937
## my_factorialC(500) 2.307 2.3930 2.57535 2.461 2.527 9.236
## neval cld
## 100 b
## 100 a
It is clear that the computational performance is considerably different.
9.1.7 Example: Buffon’s needle (con’t)
Let’s reconsider Buffon’s needle problem.
We propose to recode the R functions cast_needle()
and
buffon_experiment()
into C++.
For this purpose, first create a C++ file with header
The cast_needle()
function was defined as
cast_needle <- function(plane_width = 20){
available_range <- plane_width/2 - 1 # where 1 is the length of the needle (unit)
x_start <- runif(2, -available_range, available_range)
angle <- runif(1, 0, 2*pi)
x_end <- c(cos(angle), sin(angle)) + x_start # where the angles are multiplied by the needle length which is 1 in this example
cross <- floor(x_start[2]) != floor(x_end[2])
out <- list(start = x_start, end = x_end, cross = cross)
out
}
A possible C++ implementation is
// [[Rcpp::export]]
List cast_needle(
unsigned int plane_width = 20
){
// Variable declaration
int available_range;
double angle,my_pi;
NumericVector x_start(2), x_end(2), tmp_angle(1);
bool cross;
// Computation
my_pi = atan(1) * 4; // defines pi (for simplicity)
available_range = plane_width / 2 - 1;
x_start = runif(2, -available_range, available_range);
tmp_angle = runif(1, 0, 2 * my_pi);
angle = tmp_angle[0];
x_end[0] = cos(angle) + x_start[0];
x_end[1] = sin(angle) + x_start[1];
cross = floor(x_start[2]) != floor(x_end[2]);
return List::create(
Named("start") = x_start,
Named("end") = x_end,
Named("cross") = cross
);
}
Based on the above function, we can make a few remarks:
- For better readibility, it is good practice to separate variable declaration and computation.
- The value of \(\pi\) is not accessible by default so in this case we simply approximate it (but better solutions exist).
- Basic functions such as
atan()
,runif()
,floor()
,cos()
,sin()
are made available byRcpp
(check for Rcpp sugar).
- To avoid an error we have first to save the result of
runif()
asNumericVector
(its default result type).
The buffon_needle()
function was declared in R as follows:
buffon_experiment <- function(B = 2084, plane_width = 10, seed = NULL){
if (!is.null(seed)){
set.seed(seed)
}
X_start <- X_end <- matrix(NA, B, 2)
cross <- rep(NA, B)
for (i in 1:B){
inter <- cast_needle(plane_width = plane_width)
X_start[i, ] <- inter$start
X_end[i, ] <- inter$end
cross[i] <- inter$cross
}
out <- list(start = X_start, end = X_end, cross = cross, plane = plane_width)
class(out) <- "buffon_experiment"
out
}
The C++ alternative we propose is the following
// [[Rcpp::export]]
List buffon_experiment(
int B = 2084,
int plane_width = 10
){
// Variable declaration
NumericMatrix X_start(B,2), X_end(B,2);
LogicalVector cross(B);
List L;
NumericVector tmp(2);
// Computation
for(int i(0); i<B; ++i){
L = cast_needle(plane_width);
tmp = L["start"];
X_start(i,_) = tmp;
tmp = L["end"];
X_end(i,_) = tmp;
cross(i) = L["cross"];
}
return List::create(
Named("start") = X_start,
Named("end") = X_end,
Named("cross") = cross,
Named("plane") = plane_width
);
}
Also in this case we have few remarks:
- C++ does not know the content of
L["start"]
in advance, so for this reason it is first saved into a vectortmp
of the correct dimension (the same forL["end"]
).
- To avoid complexity, the list returned by the C++ implementationis not of the class buffon_experiment. This should be added within R.
9.1.8 C++ in an R package
Writing an R package with C++ code follows essentially the same steps as a regular R package. Here we underline the main differences:
- Create an “empty” R package: the option R package using Rcpp should be selected as
Project Type.
- Edit description file:
Rcpp
has automatically been added inImports
andLinkingTo
fields.
- Move your R scripts into the R folder. In addition, move the C++ files into the
src
folder;src
stands for source.The C++ function must have the attribute// [[Rcpp::export]]
prior to function definition in order to make it available within R.
- Documentation: follow the exact same steps as for R files. For C++ files, documentation
uses the same syntax except that
#
should be replaced by//
(comments in C++).
- The steps that follow are identical.