I'm trying to run the LLM model in Databricks (Microsoft Azure) Python for tabular data. When I run the code in Jupiter I don't get any error but when I run it in databrikcs I get the error:
AttributeError: 'pyarrow.lib.Table' object has no attribute 'to_reader'
The code is:
from arize.pandas.embeddings.tabular_generators import EmbeddingGeneratorForTabularFeatures
import arize.pandas.embeddings.base_generators
# EmbeddingGeneratorForTabularFeatures.list_pretrained_models()
generator = EmbeddingGeneratorForTabularFeatures(
model_name="distilbert-base-uncased",
tokenizer_max_length=512,
#, dropout=0 # Remove Drop-out
)
tabular_vector_columns = [] # list of tabular vectors
prompt_columns = [] # list of prompt columns
# Iterate over each column_set
for i in range(split_prompt_n):
tab_vec_col_name_i = 'tabular_vector_' + str(i)
prompt_col_name_i = 'prompts_' + str(i)
tabular_vector_columns += [tab_vec_col_name_i]
prompt_columns += [prompt_col_name_i]
# train_X
train_X[tab_vec_col_name_i ], train_X[prompt_col_name_i] = generator.generate_embeddings(
train_X,
selected_columns = cols_per[str(i)],
return_prompt_col = True
)
# test_X
test_X[tab_vec_col_name_i], test_X[prompt_col_name_i] = generator.generate_embeddings(
test_X,
selected_columns = cols_per[str(i)],
return_prompt_col = True
)
At the line of train_X in the loop, I get the error. I didn't find any solution to it.