Google sheets query issues with splitting comma separated email addresses

124 Views Asked by At

I have a google sheet which takes form responses. The bulk of these responses are "which project is it" and "which email addresses are involved ?" The responses for email address will be in a comma separated string. The responses for project could be random

[Projects and email addresses]

1

|Who are the users you are sending it to ? | Which Project is this for ? |
| [email protected], [email protected], [email protected] |Project 1
| [email protected], [email protected], [email protected], [email protected], [email protected], [email protected],  [email protected]   | Project 2  |

[query shows what is wrong]

2

I would like to be able to use a query to sort all projects and users so that all project numbers are combined, but all users are unique (with duplicates removed)

I have tried many queries and looked at multiple suggestions from StackOverflow, the best of which only gives me the first email address from the selection, and also doesn't combine the projects. The formula for this is =TRANSPOSE(QUERY({A1:A50, ARRAYFORMULA(SPLIT(B1:B50, ",", true, true)) },"Select Col1, Count(Col2), Col2 WHERE Col1 Is Not Null GROUP BY Col1, Col2 LABEL Count(Col2) ''"))

From my understanding - it must obviously be wrong

The first part in curly braces {A1:A50, ARRAYFORMULA(SPLIT(B1:B50, ",", true, true)) } is the array to look at (which splits the comma separated values in column B and removes white space). The second part is the selection where I look at column1 (project) count column 2 (email addresses)

The expected outcome would be

enter image description here

Where am I going wrong ?? Please help

1

There are 1 best solutions below

5
Osm On BEST ANSWER

Try this fomula; you need one range reference!.

enter image description here

=ArrayFormula(
  LAMBDA(r, 
    { 
      TRANSPOSE(SORT(UNIQUE(INDEX(r,,1)), 1, 1));
      BYCOL(
        TRANSPOSE(SORT(UNIQUE(INDEX(r,,1)), 1, 1)), 
        LAMBDA(f, 
          UNIQUE(FILTER(INDEX(r,,2), INDEX(r,,1) = f))
        )
      )
    }
  )(
    LAMBDA(d, 
      QUERY(
        SPLIT(TOCOL(INDEX(d,,1) & "^" & SPLIT(INDEX(d,,2), ", ")), "^"), 
        "Where Col2 <>''"
      )
    )(
      QUERY({A2:B}, "Where Col1 <> ''")
    )
  )
)

Now, let's break it down:

  1. LAMBDA Functions:

    • The formula starts with two LAMBDA functions. These are user-defined functions that make your formula more modular and easier to understand.
  2. Outermost LAMBDA Function:

    • The outer LAMBDA function takes one argument r. It's essentially a function wrapper that you can call with some data later.
  3. Innermost LAMBDA Function:

    • The inner LAMBDA function takes one argument d.
    • Inside this function, you have a series of operations.
  4. QUERY Function:

    • The first operation is a QUERY function that extracts non-empty rows from columns A and B using the condition "Where Col1 <> ''".
  5. SPLIT and TOCOL Functions:

    • The result of the QUERY is then split and combined into a single column using SPLIT and TOCOL functions, creating a list of strings in the format "ProjectName^Email1, Email2, ...".
  6. QUERY Function (Again):

    • Another QUERY function is applied to this split data, filtering out rows where the second column (Col2) is not empty.
  7. Intermediate Result:

    • At this point, you have a filtered list of strings in the "ProjectName^Email1, Email2, ..." format.
  8. Outer LAMBDA Function (Continued):

    • Now, you invoke the outer LAMBDA function with the filtered data, which is passed as d.
  9. UNIQUE, SORT, and TRANSPOSE:

    • Inside the outer LAMBDA, the formula performs several operations.
      • INDEX(r,,1) extracts the first column from the input r.
      • UNIQUE, SORT, and TRANSPOSE are used to create a sorted list of unique project names from the first column.
  10. BYCOL Function:

  • The BYCOL function is applied to the sorted unique project names. It groups the project names and performs an operation on each group.
  1. Innermost LAMBDA Function (Continued):
  • Inside the BYCOL function, an inner LAMBDA function is used. It takes one argument f, representing a unique project name.
  1. FILTER Function:
  • The FILTER function is applied to the original data (r), extracting email addresses from the second column where the project name in the first column matches the current unique project name f.
  1. Final Result:
  • The result of this complex operation is an array containing two columns.
    • The first column is a sorted list of unique project names.
    • The second column contains corresponding email addresses associated with each project name.

So, this formula essentially takes a dataset containing project names and associated emails, filters it, processes it, and returns a sorted list of unique project names along with their associated email addresses.

I hope this has been helpful :)