Tabula pd df loss data

28 Views Asked by At

I need to take a table from a PDF file.

the code is:

pdf=tabula.read_pdf(arquivo, pages=(1,2), lattice=True)

I convert both df to lists, as code below:

lista=pdf[1].values.tolist()

lista2=pdf[2].values.tolist()

My problem is that the convertion is losing the first row of dataframe.

The result of convertion of lista2 is:

"[[**8**,
  'vitamínicos e/ou minerais /\rVitaminas: C (45mg), E (10mg),\rNiacina (16mg), A (600mcg), ac.\rpantotênico (5mg), D (5mcg), B6\r(1,3mg), B1 (1,2mg), B2 (1,3 mg),\rB12 (1mcg), ác. fólico (200mcg),\rbiotina (30mcg): Minerais: cálcio\r(90mg), fósforo (38mg),\rmanganês (45mg), ferro (5mg),\rzinco (5mg), selênio (30 mcg),\rmanganês (1,2mg), selênio\r(30mcg), iodo (100mcg):\rProbiótico: Lactobacillus\racidophilus / COMPRIMIDO /\rSEM MARCA',
  4705050,
  'CP',
  360,
  nan],
 [**9**,
  'vitaminas + minerais /\rpolivitaminas + poliminerais /\rCOMPRIMIDO REVESTIDO\r/ ZIRVIT MULTI - POR MARCA',
  3970019,
  'CP',
  540,
  nan],
 [**10**,
  'suplemento alimentar / óleo de\rmicroalgas e lecitina de soja /\rCÁPSULA / SEM MARCA',
  5717310,
  'CP',
  360,
  nan]]"

When I request the valor of original source (before values.tolist) pandas data frame pdf[2], I have:

**8**   vitamínicos e/ou minerais /\rVitaminas: C (45m...   4705050 CP  360 NaN
**9**   vitaminas + minerais /\rpolivitaminas + polimi...   3970019 CP  540 NaN
**10**  suplemento alimentar / óleo de\rmicroalgas e l...   5717310 CP  360 NaN"

I have 4 products in pd df (7,8,9,10) and when I convert this to the list, I lost the first value, product ID 7.

Any idea how to solve this question? Thank you.

0

There are 0 best solutions below