Not in Subquery

113 Views Asked by At

The following employee table shows which department(s) an employee is in and which one is their primary department. If an employee only belongs to one department their primary flag will show 'N' (even though that is their primary department as it's their only one) and if they have multiple departments it will flag 'Y' or 'N' to show which one is primary.

Employee:

employee_id department_id primary_flag
1 1 N
2 1 Y
2 2 N
3 3 N
4 2 N
4 3 Y
4 4 N

Expected output--the primary department for employee_id is department_id

employee_id department_id
1 1
2 1
3 3
4 3

My solution returns nothing.

Why does it not work?

There are no null values and when I run the subquery by itself it shows the employee_id, department_ids that should not be in the final output.

select employee_id, department_id 
from Employee
WHERE (employee_id, department_id) not in(
    select employee_id, department_id 
    from (
        SELECT employee_id, department_id, primary_flag,
            count(employee_id) over (partition by employee_id) as cnt
        from employee
    ) a 
    WHERE cnt > 1 and primary_flag = 'n'
);

enter image description here

From Joel Coehoorn's comments below, he was able to produce the same results; however it does not work for me in LeetCode.

4

There are 4 best solutions below

7
Bart McEndree On

I combined Employee and Department together

CREATE TABLE [Employee] 
(
    [employee_id]   INT,
    [department_id] VARCHAR(512),
    [primary_flag]  VARCHAR(512)
);

INSERT INTO [Employee] ([employee_id], [department_id], [primary_flag]) VALUES
    ('1', '1', 'N'),
    ('2', '1', 'Y'),
    ('2', '2', 'N'),
    ('3', '3', 'N'),
    ('4', '2', 'N'),
    ('4', '3', 'Y'),
    ('4', '4', 'N');

Select * from Employee

select employee_id, department_id 
from Employee
WHERE cast(employee_id as varchar(10)) + '|' + cast(department_id as varchar(10)) not in(
    select cast(employee_id as varchar(10)) + '|' + cast(department_id as varchar(10))  
    from (
        SELECT employee_id, department_id, primary_flag,
            count(employee_id) over (partition by employee_id) as cnt
        from employee
    ) a 
    WHERE cnt > 1 and primary_flag = 'n'
);

Fiddle

employee_id department_id
1 1
2 1
3 3
4 3
4
Charlieface On

Your query is much much too complicated. This doesn't need any subqueries or windows functions at all.

All you need is some conditional aggregation.

SELECT
  e.employee_id,
  CASE WHEN COUNT(*) = 1
       THEN MIN(e.department_id)
       ELSE MIN(CASE WHEN e.primary_flag = 'Y' THEN e.department_id END)
  END AS department_id
FROM Employee e
GROUP BY
  e.employee_id;

db<>fiddle


Your existing query has multiple issues:

  • Because the values might have a null, MySQL evaluates the NOT IN clause differently. You can see this in action in this fiddle.
    Generally you should avoid NOT IN, it can always be rewritten as a NOT EXISTS.
  • Row comparators don't work in all DBMSs.
  • In any case you could have simplified out and removed the self-join.
    select employee_id, department_id 
    from (
        SELECT employee_id, department_id, primary_flag,
            count(employee_id) over (partition by employee_id) as cnt
        from Employee
    ) a 
    WHERE cnt = 1 or primary_flag = 'y';
    
    Obviously it's not necessary to use window functions anyway, as I mentioned earlier.
0
ValNik On

I think, the best is using row_number() over (partition by employee_id order by case when primary = 'Y' then 0 else 1 end) as rnk and then only keep the rows where this result is 1 (where rnk=1), as suggested by @JoelCoehoorn.

If you want to use a subquery for condition WHERE IN (NOT IN), see example

select employee_id, department_id 
from Employee
WHERE primary_flag='Y' 
  or employee_id not in(select employee_id from Employee where primary_flag='Y')

Output

employee_id department_id
1 1
2 1
3 3
4 3
0
blabla_bingo On

Your query is complex. I would try to simplify the logic.

Overall, there are only two cases regarding the primary department for a given employee_id:

1, when count(department_id) =1 , then that department_id is the primary department regardless the primary_flag always being 'N'. To conclude, the focus is the count(department_id) =1

2, when count(department_id) >1 , then the department_id whose primary_flag is 'Y' is the primary department . In short, the vital part is primary_flag is 'Y'.

As the two cases are mutually exclusive, using UNION is possible without worrying overlap. Besides, the logic is straight. And the query optimization job is more relaxing as each union statement has its separate use of index.

select * from
    (select employee_id, max(department_id)
    from employee
    group by employee_id
    having count(*) =1
    UNION
    select employee_id, department_id
    from employee
    where primary_flag='Y'
    ) t
order by employee_id
;
-- result:
+-------------+--------------------+
| employee_id | max(department_id) |
+-------------+--------------------+
|           1 | 1                  |
|           2 | 1                  |
|           3 | 3                  |
|           4 | 3                  |
+-------------+--------------------+

Note: The reason of putting UNION statements in a derived table is because UNION statements don't enforce the ORDER BY clause.

Table structure suggestion

The current table Employee (employee_id int, department_id int, primary_flag char(1)) design requires some query depth to get the primary department. And the primary_flag does not check if multiple 'Y' for a given employee_id exist in the data set.

I would suggest dropping the primary_flag column in the current table. Then create a table primary_department as below.

-- Note: Make sure you have a key for (employee_id ,department_id) on table `Employee` or create one now. And in this case , a unique key is recommended which warrants the uniqueness of `employee_id-department_id` pair.

create unique index idx_employeeid_departmentid on Employee (employee_id ,department_id);

create table primary_department
(employee_id int,p_department_id int, primary key (employee_id),
constraint `fk` foreign key (employee_id ,p_department_id) references Employee(employee_id ,department_id) 
);

By doing this, we can directly query for the primary department. In addtion, we can guarantee no conflicting primary departments for a given employee_id exist.