Optimizing Group by performance for EAV databsae model

264 Views Asked by At

I am using the below query and it is currently taking 0.7 sec. But when a performance test was run with 200 concurrent users the query was taking more than 60 sec. I have defined required composite indexes but still group by is not considering the index.

We are using RDS MYSQL service (16 GB RAM 8 vCPU)

Query

SELECT 
    patient.voided AS voided, COUNT(*) AS count FROM patient patient
        INNER JOIN person person ON patient.patient_id = person.person_id
        INNER JOIN person_attribute attr ON patient.patient_id = attr.person_id
        INNER JOIN person_attribute_type attr_type ON attr.person_attribute_type_id = attr_type.person_attribute_type_id
WHERE    
    attr.value = 'd31fe20e-6736-42ff-a3ed-b3e622e80842'
        AND attr_type.name = 'LocationAttribute'
GROUP BY patient.voided;

Execution Plan enter image description here

Is it possible to improve the query performance without changing query and by adding indexes or any db optimizations? Because the query was generated by hibernate framework and any code changes requires testing effort.

Here are the definitions of the tables in question from OpenMRS 1.6.

CREATE TABLE IF NOT EXISTS `openmrs`.`person_attribute` (
  `person_attribute_id` INT(11) NOT NULL AUTO_INCREMENT,
  `person_id` INT(11) NOT NULL DEFAULT '0',
  `value` VARCHAR(50) NOT NULL DEFAULT '',
  `person_attribute_type_id` INT(11) NOT NULL DEFAULT '0',
  `creator` INT(11) NOT NULL DEFAULT '0',
  `date_created` DATETIME NOT NULL DEFAULT '0000-00-00 00:00:00',
  `changed_by` INT(11) NULL DEFAULT NULL,
  `date_changed` DATETIME NULL DEFAULT NULL,
  `voided` SMALLINT(6) NOT NULL DEFAULT '0',
  `voided_by` INT(11) NULL DEFAULT NULL,
  `date_voided` DATETIME NULL DEFAULT NULL,
  `void_reason` VARCHAR(255) NULL DEFAULT NULL,
  `uuid` CHAR(38) NOT NULL,
  PRIMARY KEY (`person_attribute_id`),
  UNIQUE INDEX `person_attribute_uuid_index` (`uuid` ASC) VISIBLE,
  INDEX `identifies_person_idx` (`person_id` ASC) VISIBLE,
  INDEX `defines_attribute_type_idx` (`person_attribute_type_id` ASC) VISIBLE,
  INDEX `attribute_creator` (`creator` ASC) VISIBLE,
  INDEX `attribute_changer` (`changed_by` ASC) VISIBLE,
  INDEX `attribute_voider` (`voided_by` ASC) VISIBLE,
  CONSTRAINT `defines_attribute_type`
    FOREIGN KEY (`person_attribute_type_id`)
    REFERENCES `openmrs`.`person_attribute_type` (`person_attribute_type_id`),
  CONSTRAINT `identifies_person`
    FOREIGN KEY (`person_id`)
    REFERENCES `openmrs`.`person` (`person_id`))
ENGINE = InnoDB
DEFAULT CHARACTER SET = utf8;
CREATE TABLE IF NOT EXISTS `openmrs`.`person_attribute_type` (
  `person_attribute_type_id` INT(11) NOT NULL AUTO_INCREMENT,
  `name` VARCHAR(50) NOT NULL DEFAULT '',
  `description` TEXT NOT NULL,
  `format` VARCHAR(50) NULL DEFAULT NULL,
  `foreign_key` INT(11) NULL DEFAULT NULL,
  `searchable` SMALLINT(6) NOT NULL DEFAULT '0',
  `creator` INT(11) NOT NULL DEFAULT '0',
  `date_created` DATETIME NOT NULL DEFAULT '0000-00-00 00:00:00',
  `changed_by` INT(11) NULL DEFAULT NULL,
  `date_changed` DATETIME NULL DEFAULT NULL,
  `retired` SMALLINT(6) NOT NULL DEFAULT '0',
  `retired_by` INT(11) NULL DEFAULT NULL,
  `date_retired` DATETIME NULL DEFAULT NULL,
  `retire_reason` VARCHAR(255) NULL DEFAULT NULL,
  `edit_privilege` VARCHAR(255) NULL DEFAULT NULL,
  `uuid` CHAR(38) NOT NULL,
  `sort_weight` DOUBLE NULL DEFAULT NULL,
  PRIMARY KEY (`person_attribute_type_id`),
  UNIQUE INDEX `person_attribute_type_uuid_index` (`uuid` ASC) VISIBLE,
  INDEX `name_of_attribute` (`name` ASC) VISIBLE,
  INDEX `type_creator` (`creator` ASC) VISIBLE,
  INDEX `attribute_type_changer` (`changed_by` ASC) VISIBLE,
  INDEX `attribute_is_searchable` (`searchable` ASC) VISIBLE,
  INDEX `user_who_retired_person_attribute_type` (`retired_by` ASC) VISIBLE,
  INDEX `person_attribute_type_retired_status` (`retired` ASC) VISIBLE,
  INDEX `privilege_which_can_edit` (`edit_privilege` ASC) VISIBLE)
ENGINE = InnoDB
DEFAULT CHARACTER SET = utf8;
1

There are 1 best solutions below

1
Rick James On

Over-normalization.

You need to restructure the schema to avoid running off to other tables as much. Note that the WHERE requires tests on two tables. There is virtually no way to optimize that except by moving the name and value into the same table. For that matter, the person_id should be in that table, too.

If the framework is generating the schema as well as the query, then it should be relegated to 'toy' projects, not even medium-sized datasets like yours.

I've added a tag that will link to other discussions of EAV. Even after cleaning up these tables, EAV groans when given millions of rows.

These composite indexes may help:

attr:  INDEX(value, person_id,  person_attribute_type_id)
attr_type:  INDEX(name, person_attribute_type_id)

But, if value is TEXT, that first one cannot be added to the table.

For further discussion, please provide SHOW CREATE TABLE for each table.