R: fastest way of finding out of all elements of a vector are identical?

Jan 01, 2016

There is a question on SO about this: http://stackoverflow.com/questions/4752275/test-for-equality-among-all-elements-of-a-single-vector

But I was a bit more curious, so!

#test data, large vectors
v1 = rep(1234, 1e6)
v2 = runif(1e6)
#functions to try
all_the_same1 = function(x) {
  range(x) == 0
}
all_the_same2 = function(x) {
  max(x) == min(x)
}
all_the_same3 = function(x) {
  sd(x) == 0
}
all_the_same4 = function(x) {
  var(x) == 0
}
all_the_same5 = function(x) {
  x = x - mean(x)
  all(x == 0)
}
all_the_same6 = function(x) {
  length(unique(x)) == 1
}

Simple enough, 6 functions to try and some test data. Then we benchmark:

library("microbenchmark")
microbenchmark(all_the_same1(v1),
               all_the_same2(v1),
               all_the_same3(v1),
               all_the_same4(v1),
               all_the_same5(v1),
               all_the_same6(v1))
microbenchmark(all_the_same1(v2),
               all_the_same2(v2),
               all_the_same3(v2),
               all_the_same4(v2),
               all_the_same5(v2),
               all_the_same6(v2))

Results (for me) look like this, for the first vector:

Unit: milliseconds
              expr       min        lq      mean    median        uq       max neval
 all_the_same1(v1)  4.321870  4.393697  5.554033  4.442849  4.690950 73.166929   100
 all_the_same2(v1)  2.467258  2.494175  2.522648  2.509389  2.534696  2.706289   100
 all_the_same3(v1)  3.661536  3.701472  3.783067  3.736434  3.810016  4.496828   100
 all_the_same4(v1)  3.657147  3.708786  3.774908  3.746528  3.804603  4.343520   100
 all_the_same5(v1)  6.850276  7.029768  8.515351  7.227547  9.164957 73.208182   100
 all_the_same6(v1) 15.083830 15.217973 15.977563 15.400977 17.114863 18.679829   100

And the second:

Unit: milliseconds
              expr       min        lq      mean    median        uq       max neval
 all_the_same1(v2)  4.304317  4.393111  4.868236  4.487904  4.730886  6.729151   100
 all_the_same2(v2)  2.468428  2.498563  2.556797  2.521823  2.570829  3.447666   100
 all_the_same3(v2)  3.643104  3.715955  3.822649  3.756476  3.866921  5.887129   100
 all_the_same4(v2)  3.634034  3.704983  3.793290  3.765984  3.827717  4.488051   100
 all_the_same5(v2)  6.031952  6.206910  9.855640  6.447258  8.357751 80.946411   100
 all_the_same6(v2) 67.558621 69.806449 72.405274 71.504097 73.557513 88.433322   100

In both cases, function #2 is the fastest one. It also does not depend much on the actual data given.

Just Emil Kirkegaard Things

Discussion about this post

Ready for more?