R: assign() inside nested functions

Dec 30, 2015

Recently, I wrote a function called copy_names(). It does what you think and a little more: it copies names from one object to another. But it can also attempt to do so even when the sizes of the objects' dimensions do not match up perfectly. For instance:

> t = matrix(1:9, nrow=3)
> t2 = t
> rownames(t) = LETTERS[1:3]; colnames(t) = letters[1:3]
> t
  a b c
A 1 4 7
B 2 5 8
C 3 6 9
> t2
     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9
> copy_names(t, t2)
> t2
  a b c
A 1 4 7
B 2 5 8
C 3 6 9

Here we create a matrix and make a copy of it. Then we assign dimension names to the first object. Then we inspect both of them. Unsurprisingly, only the first has names (because R uses copy-on-modify semantics). Then we call the copy function and then afterwards we see that the second gets the named copied. Hooray!

What if there is imperfect matching? The function will first check whether the number of dimensions is the same and if so, it checks each dimension to see if the lengths match in that dimension. If so, the names are copied. If not, nothing is done. For instance:

> t = matrix(1:6, nrow=3)
> t2 = matrix(1:9, nrow=3)
> rownames(t) = LETTERS[1:3]; colnames(t) = letters[1:2]
> t
  a b
A 1 4
B 2 5
C 3 6
> t2
     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9
> copy_names(t, t2)
> t2
  [,1] [,2] [,3]
A    1    4    7
B    2    5    8
C    3    6    9

Here we create two matrices, but not of exactly the same sizes: the first is 3x2 and the second is 3x3. Then we assign dimnames to the first. Then we copy to the second and inspect. We see that the only the dimension that matched in length (i.e. the first) had the names copied.

How does it work?

Before I changed it, the code looked like this (including roxygen2 documentation):

#' Copy names from one object to another.
#'
#' Attempts to copy names that fit the dimensions of vectors, lists, matrices and data.frames.
#' @param x (an object) An object whose dimnames should be copied.
#' @param y (an object) An object whose dimensions that should be renamed.
#' @keywords names, rownames, colnames, copy
#' @export
#' @examples
#' m = matrix(1:9, nrow=3)
#' n = m
#' rownames(m) = letters[1:3]
#' colnames(m) = LETTERS[1:3]
#' copy_names(m, n)
#' n
copy_names = function(x, y, partialmatching = T) {
  library(stringr)
  #find object dimensions
  x_dims = get_dims(x)
  y_dims = get_dims(y)
  same_n_dimensions = length(x_dims) == length(y_dims)
#what is the object in y parameter?
  y_obj_name = deparse(substitute(y))
#perfect matching
  if (!partialmatching) {
    #set names if matching dims
    if (all(x_dims == y_dims)) {
      attr(y, "dimnames") = attr(x, "dimnames")
    } else {
      stop(str_c("Dimensions did not match! ", x_dims, " vs. ", y_dims))
    }
  }
#if using partial matching and dimensions match in number
  if (same_n_dimensions && partialmatching) {
    #loop over each dimension
    for (dim in 1:length(dimnames(x))) {
      #do lengths match?
      if (x_dims[dim] == y_dims[dim]) {
        dimnames(y)[[dim]] = dimnames(x)[[dim]]
      }
    }
  }
#assign in the outer envir
  assign(y_obj_name, value = y, pos = 1)
}

The call that does the trick is the last one, namely the one using assign(). Here we modify an object outside the function's own environment. How do we know which one to modify? Well, we take one step back (pos = 1). Alternatively, one could have used <<-.

Inside nested functions

However, consider this scenario:

> x = 1
> func1 = function() {
+   x = 2
+   print(paste0("x inside func1 before running func2 is ", x))
+   func2()
+   print(paste0("x inside func1 after running func2 is ", x))
+ }
> 
> func2 = function() {
+   print(paste0("x inside func2 is ", x))
+   print(where("x"))
+   assign("x", value = 3, pos = 1)
+   #x <<- 3
+ }
> 
> x
[1] 1
> func1()
[1] "x inside func1 before running func2 is 2"
[1] "x inside func2 is 1"
<environment: R_GlobalEnv>
[1] "x inside func1 after running func2 is 2"
> x
[1] 3
> 
> x = 1
> func2()
[1] "x inside func2 is 1"
<environment: R_GlobalEnv>
> x
[1] 3

Here we define two functions, one of which calls the other. We also define x outside (in the global environment). Inside func1() we also define x to be another value. However, note the strange result inside func2. When asked to fetch x, which doesn't exist in that function's environment, it returns the value from the... global environment (i.e. x=1), not the func1() environment (x=2)! This is odd because func2() was called from func1(), so one would expect it to try getting it from there before trying the global environment. When we then call x in the global environment after the functions finish, we see that x has been changed there, not inside func2() as might be expected. This is a problem because if we call copy_names() inside a function, it is supposed to change the names of the object inside the function, not inside the global environment.

Why is this? It is complicated, but as far as I can make out, it is due to the difference between the calling environment (where we call the function from) and the enclosing environment (where it was created, in the case above the global environment). R by default will look up variables in the enclosing environment, not the calling environment. assign() using pos = 1 apparently does not work with the calling environments, but the enclosing environments, and hence it changes the value in the global environment, not the function that called it's environment as intended.

The fix is to use the following line instead:

assign("x", value = 3, envir = parent.frame())

which then assigns the value to the object in the right environment, namely in func1()'s.

copy_names() part 2

This also means that copy_names() does not work within functions. For instance:

> get_loadings = function(fa) {
+ library(magrittr)
+ df = loadings(fa) %>% as.vector %>% matrix(nrow=nrow(fa$loadings)) %>% as.data.frame
+ loads = loadings(fa)
+ copy_names(loads, df)
+ return(df)
+ }
> library("psych")
> iris_fa = fa(iris[-5])
> get_loadings(iris_fa)
          V1
1  0.8713121
2 -0.4225686
3  0.9975472
4  0.9646774

Above, we define a new function, get_loadings(), that fetches the loadings from a factor analysis object and transforms it into a clean data.frame by a roundabout way.* We see that the object returned did not keep the dimnames despite altho copy_names() being called. The fix is the same as above, calling assign with envir = parent.frame().

* The reason to use the roundabout way is that the loadings extracted have some odd properties that make them unusable in many functions and they also refuse to be converted to a data.frame. But it turns out that one can just change the class to "matrix" and then they are fine! So one doesn't actually need copy_names() in this case after all.

Just Emil Kirkegaard Things

R: assign() inside nested functions

How does it work?

Inside nested functions

copy_names() part 2