The %notin% operator

Why is negating %in% such a pain?

Kaija Gahm true
07-08-2018

This post has been slightly modified from its original form on woodpeckR.

Problem

I keep forgetting how to select all elements of an object except a few, by name. I get the ! operator confused with the - operator, and I find both of them less than intuitive to use. How can I negate the %in% operator?

Context

I have a data frame called electrofishing that contains observations from a fish sampling survey. One column, stratum, gives the aquatic habitat type of the sampling site. I’d like to exclude observations sampled in the “Tailwater Zone” or “Impounded-Offshore” aquatic habitats.

electrofishing <- data.frame(stratum = c("Tailwater Zone", "Tailwater Zone", "Impounded", "Main Channel Border", "Side Channel", "Impounded-Offshore", "Side Channel"), 
                             idx = 1:7)

electrofishing
              stratum idx
1      Tailwater Zone   1
2      Tailwater Zone   2
3           Impounded   3
4 Main Channel Border   4
5        Side Channel   5
6  Impounded-Offshore   6
7        Side Channel   7

My instinct would be to do this:

electrofishing <- electrofishing[electrofishing$stratum !%in% 
                                   c("Tailwater Zone", "Impounded-Offshore"),]
Error: <text>:1:57: unexpected '!'
1: electrofishing <- electrofishing[electrofishing$stratum !
                                                            ^

But that doesn’t work. You can’t negate the %in% operator directly. Instead, you have to wrap the %in% statement in parentheses and negate the entire statement, returning the opposite of the original boolean vector:

electrofishing <- electrofishing[!(electrofishing$stratum %in% 
                                     c("Tailwater Zone", "Impounded-Offshore")),]

I’m not saying this doesn’t make sense, but I can never remember it. My English-speaking brain would much rather say “rows whose stratum is not included in c(”Tailwater Zone“,”Impounded-Offshore“)” than “not rows whose stratum is included in c(”Tailwater Zone“,”Impounded-Offshore“)”.

Solution

Luckily, it’s pretty easy to negate %in% and create a %notin% operator. I credit this answer to user “catastrophic-failure” on this Stack Overflow question.

`%notin%` <- Negate(`%in%`)

I didn’t even know that the Negate function existed. The more you know.

Outcome

I know there are lots of ways to negate selections in R. dplyr has select() and filter() functions that are easier to use with -c(). Or I could just learn to throw a ! in front of my %in% statements. But %notin% seems a little more intuitive.

Now it’s straightforward to select these rows from my data frame.

electrofishing <- electrofishing[electrofishing$stratum %notin% 
                                   c("Tailwater Zone", "Impounded-Offshore"),]

Resources

https://stackoverflow.com/questions/38351820/negation-of-in-in-r

This one does a good job of explaining why !%in% doesn’t work.

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Citation

For attribution, please cite this work as

Gahm (2018, July 8). Kaija Gahm: The %notin% operator. Retrieved from https://kaijagahm.netlify.app/posts/2018-07-08-the-notin-operator/

BibTeX citation

@misc{gahm2018the,
  author = {Gahm, Kaija},
  title = {Kaija Gahm: The %notin% operator},
  url = {https://kaijagahm.netlify.app/posts/2018-07-08-the-notin-operator/},
  year = {2018}
}