INTERNAL_ERROR Input row doesn't have expected number of values required by the schema

29 Views Asked by At

Trying to run the below code count the number of records in a pyspark dataframe.

I am not getting the desired result when I ran it with RDD I got the expected result below is the RDD codes

# We will extract the first element of each split row, assuming it represents the study ID
study_ids = cleaned_data.map(lambda row: row[0].strip('"'))

# Count the number of distinct study IDs
num_studies = study_ids.distinct().count()

print("Number of distinct studies:", num_studies)

I tried running the below code:

cleaned_data_df.groupBy('Id').count().orderBy('count', ascending = False).show()

An error occurred while calling o2584.showString. :

org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 157.0 failed 1 times, most recent failure: Lost task 2.0 in stage 157.0 (TID 476) (ip-10-172-184-229.us-west-2.compute.internal executor driver): org.apache.spark.SparkException: [INTERNAL_ERROR] Input row doesn't have expected number of values required by the schema. 14 fields are required while 8 values are provided. SQLSTATE: XX000...

Below is the sample of the data in the dataframe:

Id  Study Title Acronym Status  Conditions  Interventions   Sponsor Collaborators   Enrollment  Funder Type Type    Study Design    Start   Completion
NCT03630471 Effectiveness of a Problem-solving Intervention for Common Adolescent Mental Health Problems in India   PRIDE   COMPLETED   Mental Health Issue (E.G. Depression Psychosis Personality Disorder Substance Abuse)    BEHAVIORAL: PRIDE 'Step 1' problem-solving intervention|BEHAVIORAL: Enhanced usual care Sangath Harvard Medical School (HMS and HSDM)|London School of Hygiene and Tropical Medicine    250.0   OTHER   INTERVENTIONAL  Allocation: RANDOMIZED|Intervention Model: PARALLEL|Masking: DOUBLE (INVESTIGATOR OUTCOMES_ASSESSOR)|Primary Purpose: TREATMENT 2018-08-20  2019-02-28
NCT05992571 Oral Ketone Monoester Supplementation and Resting-state Brain Connectivity      RECRUITING  Cerebrovascular Function|Cognition  OTHER: Placebo|DIETARY_SUPPLEMENT: β-OHB    McMaster University Alzheimer's Society of Brant Haldimand Norfolk Hamilton Halton  30.0    OTHER   INTERVENTIONAL  Allocation: RANDOMIZED|Intervention Model: CROSSOVER|Masking: TRIPLE (PARTICIPANT INVESTIGATOR OUTCOMES_ASSESSOR)|Primary Purpose: BASIC_SCIENCE    2023-10-25  2024-08
NCT00237471 Impact of Tight Glycaemic Control in Acute Myocardial Infarction        TERMINATED  Myocardial Infarct|Hyperglycemia    DRUG: Insulin (tight blood glucose control) Melbourne Health    National Health and Medical Research Council Australia|Bristol-Myers Squibb 40.0    OTHER   INTERVENTIONAL  Allocation: RANDOMIZED|Intervention Model: PARALLEL|Masking: NONE|Primary Purpose: TREATMENT    2005-10 2006-05
NCT03820271 New Prognostic Predictive Models of Mortality of Decompensated Cirrhotic Patients Waiting for Liver Transplantation SUPERMELD   RECRUITING  Decompensated Cirrhosis|Liver Transplantation   OTHER: SuperMELD    Assistance Publique - Hôpitaux de Paris     500.0   OTHER   INTERVENTIONAL  Allocation: NA|Intervention Model: SINGLE_GROUP|Masking: NONE|Primary Purpose: OTHER    2020-10-01  2023-10-01
NCT06229171 InTake Care: Development and Validation of an Innovative Personalized Digital Health Solution for Medication Adherence Support in Cardiovascular Prevention InTakeCare  NOT_YET_RECRUITING  Hypertension|Treatment Adherence and Compliance|Digital Health  OTHER: adherence support system based on a vocal assistant  Istituto Auxologico Italiano    Istituti Clinici Scientifici Maugeri SpA|Politecnico di Milano  206.0   OTHER   INTERVENTIONAL  Allocation: RANDOMIZED|Intervention Model: PARALLEL|Masking: NONE|Primary Purpose: OTHER    2024-10-01  2026-04-01
NCT02945371 Tailored Inhibitory Control Training to Reverse EA-linked Deficits in Mid-life  REV COMPLETED   Smoking|Alcohol Drinking|Prescription Drug Abuse|Substance-Related Disorders|Oral Intake Reduced    BEHAVIORAL: Person-centered inhibitory control training|BEHAVIORAL: Active behavioral response training University of Oregon        103.0   OTHER   INTERVENTIONAL  Allocation: RANDOMIZED|Intervention Model: PARALLEL|Masking: SINGLE (PARTICIPANT)|Primary Purpose: PREVENTION   2014-09 2016-05
NCT01055171 Neuromodulation of Trauma Memories in PTSD & Alcohol Dependence     COMPLETED   Alcohol Dependence|PTSD DRUG: Propranolol|DRUG: Placebo Medical University of South Carolina    National Institute on Alcohol Abuse and Alcoholism (NIAAA)  44.0    OTHER   INTERVENTIONAL  Allocation: RANDOMIZED|Intervention Model: PARALLEL|Masking: QUADRUPLE (PARTICIPANT CARE_PROVIDER INVESTIGATOR OUTCOMES_ASSESSOR)|Primary Purpose: TREATMENT    2010-01 2012-08
NCT01125371 Computerized Brief Alcohol Intervention (BI) for Binge Drinking HIV At-Risk and Infected Women      COMPLETED   Alcohol; Harmful Use|Binge Drinking|Risk Behavior|HIV Infection BEHAVIORAL: Computerized brief alcohol intervention + IVR booster calls|BEHAVIORAL: Computerized brief alcohol intervention|BEHAVIORAL: Attention Control   Johns Hopkins University    National Institute on Alcohol Abuse and Alcoholism (NIAAA)  439.0   OTHER   INTERVENTIONAL  Allocation: RANDOMIZED|Intervention Model: PARALLEL|Masking: DOUBLE (INVESTIGATOR OUTCOMES_ASSESSOR)|Primary Purpose: TREATMENT 2011-10 2016-06-07
NCT02554071 Manitoba Pharmacist Initiated Smoking Cessation Pilot Project       COMPLETED   Smoking Cessation   OTHER: Pharmacist - Smoking Cessation Support   University of Manitoba  Govenment of Manitoba|Canadian Foundation for Pharmacy|Neighbourhood Pharmacy Association of Canada 119.0   OTHER   INTERVENTIONAL  Allocation: NA|Intervention Model: SINGLE_GROUP|Masking: NONE|Primary Purpose: SUPPORTIVE_CARE  2014-01 2014-11
NCT01772771 Molecular Testing for the MD Anderson Cancer Center Personalized Cancer Therapy Program     RECRUITING  Glioma|Hematopoietic and Lymphoid Cell Neoplasm|Malignant Solid Neoplasm|Melanoma|Sarcoma   PROCEDURE: Biospecimen Collection|OTHER: Genetic Testing|OTHER: Medical Chart Review    M.D. Anderson Cancer Center National Cancer Institute (NCI) 12000.0 OTHER   OBSERVATIONAL   Observational Model: |Time Perspective: p   2012-03-01  2033-03-01
0

There are 0 best solutions below