speeding up zero-shot text classification in python

59 Views Asked by Cristian Castillo At 02 January 2024 at 21:19

I'm currently using Hugging Face's transformers library for Zero Shot Classification to analyze Customer reviews of products (in Spanish), but I'm facing a scalability problem.

At first, I was using the model below, but it takes too long to process each review text (I need to process around 5k to 10k reviews daily).

classifier = pipeline("zero-shot-classification", model="MoritzLaurer/mDeBERTa-v3-base-mnli-xnli")

Then I switched to this small version of the model but, although the processing time is much better, the quality of the results is very poor compared to the previous model.

classifier = pipeline("zero-shot-classification", model="MoritzLaurer/multilingual-MiniLMv2-L6-mnli-xnli")

I'd like to know if there are ways to improve this situation or maybe a completly different approach (first time doing NLP). My main objective is to check each review and see if it's related to certain topics (good product quality, bad product quality, correct size, wrong size, corrct color, wrong color, damaged product, ...) so I can detect problems with the products or publications and see differences between brands, suppliers, categories, etc.

The code:

# classifier = pipeline("zero-shot-classification", model="MoritzLaurer/mDeBERTa-v3-base-mnli-xnli")
classifier = pipeline("zero-shot-classification", model="MoritzLaurer/multilingual-MiniLMv2-L6-mnli-xnli")

candidate_labels = ['buena talla', 'mala talla', 'buen tamaño', 'mal tamaño', 'color equivocado', 'no me gustó el color', 'buen producto', 'mal producto', 
                    'producto se encuentra dañado', 'producto no es el que pedí', 'le faltan partes al pedido', 'entrega rápida',  'demora en llegar', 'buena calidad', 'mala calidad',
                    'se rompe', 'buena calidad', 'mala calidad', 'talla grande', 'talla pequeña', 'comodo', 'incomodo', 'buena experiencia', 'mala experiencia', 'lo recomiendo',
                    'no lo recomiendo', 'no era lo que esperaba', 'descripción incorrecta']

for index, row in df.iterrows():
    output = classifier(row['COMENTARIO'], candidate_labels, multi_label=True)
       
    df.at[index, 'BUENA_TALLA'] = output['scores'][output['labels'].index('buena talla')]
    df.at[index, 'MALA_TALLA'] = output['scores'][output['labels'].index('mala talla')]
    df.at[index, 'BUEN_TAMANO'] = output['scores'][output['labels'].index('buen tamaño')]
    df.at[index, 'MAL_TAMANO'] = output['scores'][output['labels'].index('mal tamaño')]
    df.at[index, 'COLOR_EQUIVOCADO'] = output['scores'][output['labels'].index('color equivocado')]
    df.at[index, 'NO_GUSTA_COLOR'] = output['scores'][output['labels'].index('no me gustó el color')]
    df.at[index, 'BUEN_PRODUCTO'] = output['scores'][output['labels'].index('buen producto')]
    df.at[index, 'MAL_PRODUCTO'] = output['scores'][output['labels'].index('mal producto')]
    df.at[index, 'PRODUCTO_DANADO'] = output['scores'][output['labels'].index('producto se encuentra dañado')]
    df.at[index, 'NO_CORRESPONDE'] = output['scores'][output['labels'].index('producto no es el que pedí')]
    df.at[index, 'PRODUCTO_INCOMPLETO'] = output['scores'][output['labels'].index('le faltan partes al pedido')]
    df.at[index, 'ENTREGA_RAPIDA'] = output['scores'][output['labels'].index('entrega rápida')]
    df.at[index, 'DEMORA_LLEGAR'] = output['scores'][output['labels'].index('demora en llegar')]
    df.at[index, 'BUENA_CALIDAD'] = output['scores'][output['labels'].index('buena calidad')]
    df.at[index, 'MALA_CALIDAD'] = output['scores'][output['labels'].index('mala calidad')]
    df.at[index, 'SE_ROMPE'] = output['scores'][output['labels'].index('se rompe')]
    df.at[index, 'BUENA_CALIDAD'] = output['scores'][output['labels'].index('buena calidad')]
    df.at[index, 'MALA_CALIDAD'] = output['scores'][output['labels'].index('mala calidad')]
    df.at[index, 'TALLA_GRANE'] = output['scores'][output['labels'].index('talla grande')]
    df.at[index, 'TALLA_PEQUENA'] = output['scores'][output['labels'].index('talla pequeña')]
    df.at[index, 'COMODO'] = output['scores'][output['labels'].index('comodo')]
    df.at[index, 'INCOMODO'] = output['scores'][output['labels'].index('incomodo')]
    df.at[index, 'BUENA_EXP'] = output['scores'][output['labels'].index('buena experiencia')]
    df.at[index, 'MALA_EXP'] = output['scores'][output['labels'].index('mala experiencia')]
    df.at[index, 'RECOMIENDO'] = output['scores'][output['labels'].index('lo recomiendo')]
    df.at[index, 'NO_RECOMIENDO'] = output['scores'][output['labels'].index('no lo recomiendo')]
    df.at[index, 'NO_ERA_LO_QUE_ESPERABA'] = output['scores'][output['labels'].index('no era lo que esperaba')]
    df.at[index, 'DESCRIPCION_INCORRECTA'] = output['scores'][output['labels'].index('descripción incorrecta')]
    
    print(f"Review: {row['COMENTARIO']}")
    print("Predicted labels:", output['labels'])
    print("Scores:", output['scores'])
    print("="*50)

Original Q&A

speeding up zero-shot text classification in python

There are 0 best solutions below

Related Questions in PYTHON

Related Questions in NLP

Related Questions in HUGGINGFACE-TRANSFORMERS

Related Questions in TEXT-CLASSIFICATION

Related Questions in ZEROSHOT-CLASSIFICATION

Trending Questions

Popular # Hahtags

Popular Questions