R: fastest way of finding out of all elements of a vector are identical?
There is a question on SO about this: http://stackoverflow.com/questions/4752275/test-for-equality-among-all-elements-of-a-single-vector
But I was a bit more curious, so!
#test data, large vectors
v1 = rep(1234, 1e6)
v2 = runif(1e6)
#functions to try
all_the_same1 = function(x) {
range(x) == 0
}
all_the_same2 = function(x) {
max(x) == min(x)
}
all_the_same3 = function(x) {
sd(x) == 0
}
all_the_same4 = function(x) {
var(x) == 0
}
all_the_same5 = function(x) {
x = x - mean(x)
all(x == 0)
}
all_the_same6 = function(x) {
length(unique(x)) == 1
}
Simple enough, 6 functions to try and some test data. Then we benchmark:
library("microbenchmark")
microbenchmark(all_the_same1(v1),
all_the_same2(v1),
all_the_same3(v1),
all_the_same4(v1),
all_the_same5(v1),
all_the_same6(v1))
microbenchmark(all_the_same1(v2),
all_the_same2(v2),
all_the_same3(v2),
all_the_same4(v2),
all_the_same5(v2),
all_the_same6(v2))
Results (for me) look like this, for the first vector:
Unit: milliseconds
expr min lq mean median uq max neval
all_the_same1(v1) 4.321870 4.393697 5.554033 4.442849 4.690950 73.166929 100
all_the_same2(v1) 2.467258 2.494175 2.522648 2.509389 2.534696 2.706289 100
all_the_same3(v1) 3.661536 3.701472 3.783067 3.736434 3.810016 4.496828 100
all_the_same4(v1) 3.657147 3.708786 3.774908 3.746528 3.804603 4.343520 100
all_the_same5(v1) 6.850276 7.029768 8.515351 7.227547 9.164957 73.208182 100
all_the_same6(v1) 15.083830 15.217973 15.977563 15.400977 17.114863 18.679829 100
And the second:
Unit: milliseconds
expr min lq mean median uq max neval
all_the_same1(v2) 4.304317 4.393111 4.868236 4.487904 4.730886 6.729151 100
all_the_same2(v2) 2.468428 2.498563 2.556797 2.521823 2.570829 3.447666 100
all_the_same3(v2) 3.643104 3.715955 3.822649 3.756476 3.866921 5.887129 100
all_the_same4(v2) 3.634034 3.704983 3.793290 3.765984 3.827717 4.488051 100
all_the_same5(v2) 6.031952 6.206910 9.855640 6.447258 8.357751 80.946411 100
all_the_same6(v2) 67.558621 69.806449 72.405274 71.504097 73.557513 88.433322 100
In both cases, function #2 is the fastest one. It also does not depend much on the actual data given.