I've extracted some pdf tables using Camelot.
The first column contains merged cells, which is often problematic.
Despite tweaking some of the advanced configurations, the merged cells for the first column, span across rows.
I'd like to iterate through the first column rows to achieve the following:
- Start from the top
- if you find an empty cell, then move / concatenate each previous string sequentially (with a space in between), to the first instance of a non-empty cell.
| Column | What I have now | What I'd like |
|---|---|---|
| 1 | A | A B C D |
| 2 | B | |
| 3 | C | |
| 4 | D | |
| 5 | ||
| 6 | F | F G |
| 7 | G | |
| 8 |
Code
df:
Example Code