I am looking for an Excel formula that looks for a substring and counts unique values.
Assume column B contains:
Kansas_Pink_Fish
Kansas_Pink_Fish
Kansas_Pink_Fish
Kansas_Pink_Bird
Kansas_Pink_Turtle
Missouri_Red_Bird
Missouri_Red_Bird
Missouri_Pink_Bird
Missouri_Pink_Bird
Ohio_Pink_Fish
Ohio_Green_Fish
Ohio_Green_Turtle
Ohio_Green_Turtle
Looking for the number of "Pink" items. Pink would be in the formula, not necessarily a reference to a cell.
The answer would be 4.
Kansas_Pink_Fish Kansas_Pink_Bird Kansas_Pink_Turtle Missouri_Pink_Bird
I am not looking for the list, I am looking for the count.
I have tried a couple of things but I either get an error, #value, or the number 100 (which I think may be the upper limit of one of the functions I was trying to use).
The actual list is Over 38K rows. The answer is just over 1K when doing it by hand:
- Filter > contains Pink
- Copy results to another tab
- Sort
- Remove duplicates
- Count
=COUNTA(UNIQUE(FILTER(B:B,ISNUMBER(SEARCH("_Pink_",B:B)))))
=COUNTA(UNIQUE(FILTER(B:B,ISNUMBER(SEARCH("*_Pink_*",B:B)))))
I was expecting 1,094
This first checks column A for the presence of the string
_Pink_(case sensitive; in case you want it to be case insensitive replace FIND for SEARCH). This is wrapped inIFS. If it is present it returns the value of column A, else it returns an error.This array of values and errors is wrapped in
TOCOLwith argument2to ignore error values.This returns an array of all values containing
_Pink_. Wrapping it inUNIQUEleaves only the distinct uniques from the array. We need to count the number of unique values, but since we also have the ability that no matches are found, for whichUNIQUEreturns one line containing an error. If we then would useROWSorCOUNTAit'd return 1 instead of 0, so we wrap it inISTEXTwhich results in TRUE for each non-error line, or in case of the error it returnsFALSE.Add
--in front of the value(s) converts TRUE to 1 or FALSE to 0. If we nowSUMit we have the expected outcome.Edit: Or shorter using
1-ISERRas suggested by Mayukh:Here
1-ISERRcan be seen asNOT(ISERRand can be used instead ofISNUMBERand--ISTEXT