# Euler Problem 1 In R

**The problem:** *If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23.*

*Find the sum of all the multiples of 3 or 5 below 1000.*

## A simple solution using a `for`

loop

A simple solution would be to collect all multiples of 3 and 5 that are less than 1000 by looping over all integers from 1 to 999. For each integer we encounter in the loop, weâ€™ll need to check whether the integer is divisible by 3 or 5: the R modulus operator `%%`

is handy here, because `a %% b`

returns \(a\textrm{mod}(b)\), which if zero, indicates that \(a\) is divisible by \(b\).

```
sum_function <- function(nmax){
# this is where we'll collect the sum
output_sum <- 0
# loop over over number between 1 and the maximum specified
for(i in 1:(nmax - 1)){
# if the remainder of dividing i by 3 or 5 is 0, then
# add to the running sum
if(i %% 3 == 0 | i %% 5 == 0) output_sum <- output_sum + i
}
# return the sum
output_sum
}
```

`sum_function`

accepts the argument `nmax`

which is the postive integer below which we want to sum multiples of 3 and 5. Letâ€™s check that this works for `nmax = 10`

and `nmax = 1000`

(we know the result of the former should be 23, since only 3, 5, 6 and 9 are eligible).

`sum_function(10)`

`## [1] 23`

`sum_function(1000)`

`## [1] 233168`

This seems ok. Although the loop is easy to understand, it is inefficient. This is because we already know that the multiples of 3 and 5 below some bound form a regular sequence, and we therefore actually know in advance that we donâ€™t need to check most of the integers between 1 and 1000. In fact, we can avoid the loop altogether if we can construct the sequences directly, which we can do very easily in R using `seq`

.

## A better solution using `seq`

If we can generate the sequence of integers divisible by 3, and a sequence divisible by 5, such that all values are less than 1000, then we have what we need to solve the problem. *However!* We need to be careful to avoid double counting integers that are divisible by both 3 *and* 5 - as these will appear in both sequences, so we canâ€™t simply add up both sequences. One strategy would be to construct the sequence of integers divisible by 3, and another sequence of integers divisible by 5, sum these, and then subtract all integers they have in common. The integers in common must be divisible by 3 *and* 5, which means that they must be the sequence of integers divisible by 15 - this is also easy to compute. The solution now looks like this:

```
sum_function2 <- function(nmax){
# sequence starting at 3 and increasing in steps of 3
seq_3 <- seq(3, nmax - 1, by = 3)
# sequence starting at 5 and increasing in steps of 5
seq_5 <- seq(5, nmax - 1, by = 5)
# sequence starting at 15 and increasing in steps of 15
# we don't need to do this if nmax - 1 < 15 - hence the if statement
if(nmax - 1 > 15) seq_15 <- seq(15, nmax - 1, by = 15) else seq_15 = 0
# add them up
sum(seq_3) + sum(seq_5) - sum(seq_15)
}
```

Which gives the same result:

`sum_function2(10)`

`## [1] 23`

`sum_function2(1000)`

`## [1] 233168`

So the solution is the same, but other than looking a bit more elegant, is it actually quicker? We can quickly check the performance difference using the function `microbenchmark::microbenchmark`

:

```
library(microbenchmark)
microbenchmark(sum_function(1000), sum_function2(1000))
```

```
## Unit: microseconds
## expr min lq mean median uq max
## sum_function(1000) 473.997 512.195 675.8950 538.266 770.6480 2043.511
## sum_function2(1000) 68.359 86.537 119.2168 106.074 131.3385 315.897
## neval
## 100
## 100
```

The benchmarking is clear: the sequence approach is over 6 times faster than the loop approach! But can we do even better than this?

## A quick mathematical trick

Suppose we have the sum of a sequence of \(n\) integers that are divisible by 3, we can write them as follows: \[\begin{eqnarray*} \sum_{k=1}^n 3k &=& 3 + 6 + 9 + \ldots + 3(n-1) + 3n. \end{eqnarray*}\] If we add the sequence to itself, then the result is \(2\times\) the above. The trick is to add the sequence to itself in*reverse*order: \[\begin{eqnarray*} 2\sum_{k=1}^n 3k = \sum_{k=1}^n 3k + \sum_{k=n}^1 3k & = & (3 + 3n) + (6 + 3(n-1)) + \ldots + (3(n-1) + 6) + (3n + 3) \\ & = & 3(n + 1) + 3(n + 1) + \ldots + 3(n + 1) + 3(n + 1) \\ & = & 3n(n+1). \end{eqnarray*}\] This means that the sum of the sequence of 3â€™s is \(\sum_{k=n}^1 3k = 3n(n+1)/2\). Why does that help? It means that instead of working out the sum by generating a sequence of integers that are divisible by 3 and adding them up (like we did in the previous section), we can work out the sum directly using an expression like the one above. By the same argument, similar expressions hold for sequences of 5â€™s and 15â€™s: \[\begin{eqnarray*} \sum_{k=1}^n 3k &=& 3n(n+1)/2\\ \sum_{k=1}^n 5k &=& 5n(n+1)/2\\ \sum_{k=1}^n 15k &=& 15n(n+1)/2 \end{eqnarray*}\]

The only problem remaining is the value \(n\): this is the number of values in each sequence, and it doesnâ€™t seem like we know what this should be? In other words, how do we know how many values less than 1000 are divisible by 3? Although it isnâ€™t obvious, itâ€™s easy to compute - \(n\) is the number of times 3 divides 1000 without remainder - sometimes called integer division. In R we can calculate this using the binary operator `%/%`

: for example, \(10 %/% 3\) would give 3; \(20 %/% 3\) would give 6. Putting it all together we have the function

```
sum_function3 <- function(nmax){
# find the number of values in each sequence
n_5 <- (nmax - 1) %/% 5
n_3 <- (nmax - 1) %/% 3
n_15 <- (nmax - 1) %/% 15
# this is just: sum(seq_3) + sum(seq_5) - sum(seq_15)
(3 * n_3 * (n_3 + 1) / 2) + (5 * n_5 * (n_5 + 1) / 2) - (15 * n_15 * (n_15 + 1) / 2)
}
```

Which also returns the same output

`sum_function3(10)`

`## [1] 23`

`sum_function3(1000)`

`## [1] 233168`

Benchmarking these with the previous solutions we have:

`microbenchmark(sum_function(1000), sum_function2(1000), sum_function3(1000))`

```
## Unit: microseconds
## expr min lq mean median uq max
## sum_function(1000) 467.017 483.5950 554.29936 518.0600 587.2535 1444.666
## sum_function2(1000) 67.482 75.6280 101.15819 83.9175 95.1265 1086.534
## sum_function3(1000) 1.102 1.5765 2.53894 2.2790 3.0660 8.586
## neval
## 100
## 100
## 100
```

Woop! Thatâ€™s quick. Using the sequence trick is \(215\times\) faster than a loop, and \(25\times\) faster than using sequences. The difference getâ€™s more pronounced as `nmax`

increases (note the scale below uses logs)

```
library(tidyverse)
get_speed <- function(nmax){
z <- summary(microbenchmark(sum_function(nmax), sum_function2(nmax), sum_function3(nmax)))$mean
}
spd_list <- lapply(as.list(round(10^(seq(1, 4, length.out = 20)), 0)), get_speed)
df <- cbind(10^(seq(2, 5, length.out = 20)), do.call("rbind", spd_list))
colnames(df) <- c("nmax", "Loop", "Sequence", "Analytical")
gather(as.tibble(df), type, value, -nmax) %>%
ggplot(aes(x = nmax, y = value, group = type, col = type)) +
coord_trans(x = 'sqrt', y = 'sqrt') +
geom_line(size = 1.2) +
ylab("Computer time (s)") +
ggtitle("Average computer time to sum all integers divisble by 3 or 5 that are less than nmax")
```