Intel Server Unavailable after executing the code

124 Views Asked by Adarsh Wase At 01 September 2023 at 19:10

I am on intel dev cloud and using Intel OneAPI. This is my code till now:

# first block of jupyter notebook
import modin.pandas as pd

# second block of jupyter notebook
df = pd.read_csv('dataset/dataset.csv')
df.head()

# output of second block

UserWarning: Ray execution environment not yet initialized. Initializing...
To remove this warning, run the following python code before doing dataframe operations:

    import ray
    ray.init()

2023-09-01 12:00:16,471 INFO worker.py:1636 -- Started a local Ray instance.

The first block is running properly but, when I am reading my dataset, it is giving me this warning and server unavailable error.

If I use import pandas as pd, the code is running fine, but modin.pandas is not working. My dataset is ~ 1 GB csv file. Why is this happening???

How to Reproduce this?

Step 1 - https://devcloud.intel.com/oneapi/
Step 2 - click on Getting Started tab.
Step 3 - Go down and Click on launch JupyterLab. (It is like Google Colab or Kaggle Notebooks)
Step 4 - Create ipynb and use wget to download this data.
Data - !wget https://s3-ap-southeast-1.amazonaws.com/he-public-data/datasetab75fb3.zip

System Information

OS - Linux 90-Ubuntu 5.4.0-80-generic
CPU - Intel(R) Xeon(R) Gold 6128 CPU @ 3.40GHz
RAM - 188 GB

Original Q&A

There are 2 best solutions below

xPetersue On 01 September 2023 at 19:28

Step 1: Check if you have installed modin properly. If you are unsure, try to reinstall the relevant modin dependencies etc.

pip install “modin[all]” # dependencies + modin execution engines

Step 2: import modin.pandas as pd

df = pd.read_csv('dataset.csv') #Please avoid placing your .csv file under a folder.

Let's see what will happen.

Reference: The pandas library offers user-friendly data structures, including DataFrames, for data analysis. However, it may perform slowly with extensive datasets (e.g., 100 GB or 1 TB) since it wasn't optimized for such large volumes. Fortunately, the Modin library addresses this by allowing you to scale pandas workflows with just one code change.

Igor Zamyatin On 06 October 2023 at 18:39

Installing Ray 2.6.1 by running

pip uninstall ray
pip install ray==2.6.1

and then re-exporting the ipykernel to run the notebook which has

import ray
ray.shutdown()
ray.init(_memory=16000 * 1024 * 1024, object_store_memory=500 * 1024 * 1024,_driver_object_store_memory=500 * 1024 * 1024)

as first block of the notebook and then desired code in the next block

import modin.pandas as pd

# third block of jupyter notebook
df = pd.read_csv('dataset/dataset.csv')
df.head()

should help to avoid the issue

You can also check Intel DevCloud support for the discussion

Intel Server Unavailable after executing the code

There are 2 best solutions below

Related Questions in PANDAS

Related Questions in INTEL

Related Questions in INTEL-ONEAPI

Related Questions in MODIN

Trending Questions

Popular # Hahtags

Popular Questions