How to code it in a more efficient way : delete multiple row with a very complex condition in R

39 Views Asked by At

Below is a sample of a large data set from which I want to delete quadrats (Qm) numbered greater than than 3 in parcels (PARCELLE) 1, 3, 4, and 8.

FIELD   SECTOR  PARCELLE    Qm  Total
North   A   1   1   2
North   A   1   2   3
North   A   1   3   0.5
North   A   1   4   0.5
North   A   1   5   1
North   A   1   6   0.5
North   B   2   1   10
North   B   2   2   3
North   B   2   3   4
North   B   2   4   2
North   B   2   5   7
North   B   2   6   25
North   C   3   1   0
North   C   3   2   0
North   C   3   3   2
North   C   3   4   5
North   C   3   5   0.5
North   C   3   6   1
North   D   4   1   0
North   D   4   2   0
North   D   4   3   0
North   D   4   4   0
North   D   4   5   0
North   D   4   6   85
North   E   5   1   0
North   E   5   2   5
North   E   5   3   0.5
North   E   5   4   0
North   E   5   5   0
North   E   5   6   0
North   F   6   1   0.5
North   F   6   2   0.5
North   F   6   3   0.5
North   F   6   4   0
North   F   6   5   0
North   F   6   6   0
North   G   7   1   0.5
North   G   7   2   0.5
North   G   7   3   2
North   G   7   4   2
North   G   7   5   0.5
North   G   7   6   0
North   H   8   1   0.5
North   H   8   2   1
North   H   8   3   60
North   H   8   4   0.5
North   H   8   5   0.5
North   H   8   6   1

I have achieved this manipulation with one statement for each parcel.

New_Data <- Data_Frame[!(Data_Frame$PARCELLE == "1" & Data_Frame$Qm > 3), ]
New_Data <- New_Data[!(New_Data$PARCELLE == "3" & New_Data$Qm > 3), ]
New_Data <- New_Data[!(New_Data$PARCELLE == "4" & New_Data$Qm > 3), ]
New_Data <- New_Data[!(New_Data$PARCELLE == "8" & New_Data$Qm > 3), ]

I want to condense my code but I can't figure out how to specify a condition on the parcel number. I would like my code to resemble something like this:

New_Data <- Data_Frame[!(Data_Frame$PARCELLE == "1 & 3 & 4 & 8" & Data_Frame$Qm > 3), ]
2

There are 2 best solutions below

1
Onyambu On BEST ANSWER

Use %in% operator:

Data_Frame[!(Data_Frame$PARCELLE %in% c(1, 2, 3) & Data_Frame$Qm>3),]

You can also use the following:

 subset(Data_Frame, !(PARCELLE %in% c(1, 2, 3) & Qm > 3))

The two are only different in terms of how they treat NA with the first returning NA where the data was NA while the second drops the NA data

1
Juan C On

This should do:

df %>% filter(!(PARCELLE %in% c(1, 3, 4, 8) & Qm > 3))


# FIELD SECTOR PARCELLE Qm Total
# 1  North      A        1  1   2.0
# 2  North      A        1  2   3.0
# 3  North      A        1  3   0.5
# 4  North      B        2  1  10.0
# 5  North      B        2  2   3.0
# 6  North      B        2  3   4.0
# 7  North      B        2  4   2.0
# 8  North      B        2  5   7.0
# 9  North      B        2  6  25.0
# 10 North      C        3  1   0.0
# 11 North      C        3  2   0.0
# 12 North      C        3  3   2.0
# 13 North      D        4  1   0.0
# 14 North      D        4  2   0.0
# 15 North      D        4  3   0.0
# 16 North      E        5  1   0.0
# 17 North      E        5  2   5.0
# 18 North      E        5  3   0.5
# 19 North      E        5  4   0.0
# 20 North      E        5  5   0.0
# 21 North      E        5  6   0.0
# 22 North      F        6  1   0.5
# 23 North      F        6  2   0.5
# 24 North      F        6  3   0.5
# 25 North      F        6  4   0.0
# 26 North      F        6  5   0.0
# 27 North      F        6  6   0.0
# 28 North      G        7  1   0.5
# 29 North      G        7  2   0.5
# 30 North      G        7  3   2.0
# 31 North      G        7  4   2.0
# 32 North      G        7  5   0.5
# 33 North      G        7  6   0.0
# 34 North      H        8  1   0.5
# 35 North      H        8  2   1.0
# 36 North      H        8  3  60.0