Why is negating %in%
such a pain?
This post has been slightly modified from its original form on woodpeckR.
I keep forgetting how to select all elements of an object except a few, by name. I get the !
operator confused with the -
operator, and I find both of them less than intuitive to use. How can I negate the %in%
operator?
I have a data frame called electrofishing
that contains observations from a fish sampling survey. One column, stratum
, gives the aquatic habitat type of the sampling site. I’d like to exclude observations sampled in the “Tailwater Zone” or “Impounded-Offshore” aquatic habitats.
electrofishing <- data.frame(stratum = c("Tailwater Zone", "Tailwater Zone", "Impounded", "Main Channel Border", "Side Channel", "Impounded-Offshore", "Side Channel"),
idx = 1:7)
electrofishing
stratum idx
1 Tailwater Zone 1
2 Tailwater Zone 2
3 Impounded 3
4 Main Channel Border 4
5 Side Channel 5
6 Impounded-Offshore 6
7 Side Channel 7
My instinct would be to do this:
<- electrofishing[electrofishing$stratum !%in%
electrofishing c("Tailwater Zone", "Impounded-Offshore"),]
Error: <text>:1:57: unexpected '!'
1: electrofishing <- electrofishing[electrofishing$stratum !
^
But that doesn’t work. You can’t negate the %in%
operator directly. Instead, you have to wrap the %in%
statement in parentheses and negate the entire statement, returning the opposite of the original boolean vector:
electrofishing <- electrofishing[!(electrofishing$stratum %in%
c("Tailwater Zone", "Impounded-Offshore")),]
I’m not saying this doesn’t make sense, but I can never remember it. My English-speaking brain would much rather say “rows whose stratum is not included in c(”Tailwater Zone“,”Impounded-Offshore“)” than “not rows whose stratum is included in c(”Tailwater Zone“,”Impounded-Offshore“)”.
Luckily, it’s pretty easy to negate %in%
and create a %notin%
operator. I credit this answer to user “catastrophic-failure” on this Stack Overflow question.
`%notin%` <- Negate(`%in%`)
I didn’t even know that the Negate
function existed. The more you know.
I know there are lots of ways to negate selections in R. dplyr
has select()
and filter()
functions that are easier to use with -c()
. Or I could just learn to throw a !
in front of my %in%
statements. But %notin%
seems a little more intuitive.
Now it’s straightforward to select these rows from my data frame.
electrofishing <- electrofishing[electrofishing$stratum %notin%
c("Tailwater Zone", "Impounded-Offshore"),]
https://stackoverflow.com/questions/38351820/negation-of-in-in-r
This one does a good job of explaining why !%in%
doesn’t work.
If you see mistakes or want to suggest changes, please create an issue on the source repository.
For attribution, please cite this work as
Gahm (2018, July 8). Kaija Gahm: The %notin% operator. Retrieved from https://kaijagahm.netlify.app/posts/2018-07-08-the-notin-operator/
BibTeX citation
@misc{gahm2018the, author = {Gahm, Kaija}, title = {Kaija Gahm: The %notin% operator}, url = {https://kaijagahm.netlify.app/posts/2018-07-08-the-notin-operator/}, year = {2018} }