#### Discover more from Just Emil Kirkegaard Things

# Excluding missing or bad data in R: not as easy as it should be!

*On-going series of posts about functions in my R package (https://github.com/Deleetdk/kirkegaard ).* Suppose you have a list or a simple vector (lists are vectors) with some data. However, some of it is missing or bad in various ways: NA, NULL, NaN, Inf (or -Inf). Usually, we want to get rid of these datapoints, but it can be difficult with the built-in functions. R's built-in functions for handling missing (or bad) data are:

*is.na**is.nan**is.infinite / is.finite**is.null*

Unfortunately, they are not consistently vectorized and some of them match multiple types. For instance:

```
x = list(1, NA, 2, NULL, 3, NaN, 4, Inf) #example list
is.na(x)
#> [1] FALSE TRUE FALSE FALSE FALSE TRUE FALSE FALSE
```

So, *is.na* actually matches *NaN* as well. What about *is.nan*?

```
is.nan(x)
#> Error in is.nan(x) : default method not implemented for type 'list'
```

But that turns out not to be vectorized. But it gets worse:

```
sapply(x, is.nan)
#> [[1]]
#> [1] FALSE
#>
#> [[2]]
#> [1] FALSE
#>
#> [[3]]
#> [1] FALSE
#>
#> [[4]]
#> logical(0)
#>
#> [[5]]
#> [1] FALSE
#>
#> [[6]]
#> [1] TRUE
#>
#> [[7]]
#> [1] FALSE
#>
#> [[8]]
#> [1] FALSE
```

Note that calling *is.nan* on *NULL* returns an empty logical vector (*logical(0)*) instead of *FALSE*. This also changes the output from *sapply* to a list instead of a vector we can subset with. *is.infinite* behaves the same way: not vectorized and gives *logical(0)* for *NULL*. But suppose you want a robust function for handling missing data and one that has specificity. I could not find such a function, so I wrote one. Testing it:

```
are_equal(exclude_missing(x), list(1, 2, 3, 4))
#> [1] TRUE
are_equal(exclude_missing(x, .NA = F), list(1, NA, 2, 3, 4))
#> [1] TRUE
are_equal(exclude_missing(x, .NULL = F), list(1, 2, NULL, 3, 4))
#> [1] TRUE
are_equal(exclude_missing(x, .NaN = F), list(1, 2, 3, NaN, 4))
#> [1] TRUE
are_equal(exclude_missing(x, .Inf = F), list(1, 2, 3, 4, Inf))
#> [1] TRUE
```

So, in all cases does it exclude only the type that we want to exclude, and it does not fail due to lack of vectorization in the base-r functions. **Edited** Turns out that there are more problems:

is.na(list(NA)) #> [1] TRUE

So, for some reason, *is.na* returns *TRUE* when given a list with NA. This shouldn't happen I think.