Excel formula that looks for a substring and counts unique values

63 Views Asked by At

I am looking for an Excel formula that looks for a substring and counts unique values.

Assume column B contains:

Kansas_Pink_Fish

Kansas_Pink_Fish

Kansas_Pink_Fish

Kansas_Pink_Bird

Kansas_Pink_Turtle

Missouri_Red_Bird

Missouri_Red_Bird

Missouri_Pink_Bird

Missouri_Pink_Bird

Ohio_Pink_Fish

Ohio_Green_Fish

Ohio_Green_Turtle

Ohio_Green_Turtle

Looking for the number of "Pink" items. Pink would be in the formula, not necessarily a reference to a cell.

The answer would be 4.

Kansas_Pink_Fish Kansas_Pink_Bird Kansas_Pink_Turtle Missouri_Pink_Bird

I am not looking for the list, I am looking for the count.

I have tried a couple of things but I either get an error, #value, or the number 100 (which I think may be the upper limit of one of the functions I was trying to use).

The actual list is Over 38K rows. The answer is just over 1K when doing it by hand:

  • Filter > contains Pink
  • Copy results to another tab
  • Sort
  • Remove duplicates
  • Count

=COUNTA(UNIQUE(FILTER(B:B,ISNUMBER(SEARCH("_Pink_",B:B)))))

=COUNTA(UNIQUE(FILTER(B:B,ISNUMBER(SEARCH("*_Pink_*",B:B)))))

I was expecting 1,094

1

There are 1 best solutions below

1
P.b On
=SUM(--ISTEXT(UNIQUE(TOCOL(IFS(ISNUMBER(FIND("_Pink_",A:A)),A:A),2))))

This first checks column A for the presence of the string _Pink_ (case sensitive; in case you want it to be case insensitive replace FIND for SEARCH). This is wrapped in IFS. If it is present it returns the value of column A, else it returns an error.

This array of values and errors is wrapped in TOCOL with argument 2 to ignore error values.

This returns an array of all values containing _Pink_. Wrapping it in UNIQUE leaves only the distinct uniques from the array. We need to count the number of unique values, but since we also have the ability that no matches are found, for which UNIQUE returns one line containing an error. If we then would use ROWS or COUNTA it'd return 1 instead of 0, so we wrap it in ISTEXT which results in TRUE for each non-error line, or in case of the error it returns FALSE.

Add -- in front of the value(s) converts TRUE to 1 or FALSE to 0. If we now SUM it we have the expected outcome.

Edit: Or shorter using 1-ISERR as suggested by Mayukh:

=SUM(1-ISERR(UNIQUE(FILTER(A:A,1-ISERR(FIND("_Pink_",A:A))))))

Here 1-ISERR can be seen as NOT(ISERR and can be used instead of ISNUMBER and --ISTEXT