hdfs library will not load in an HDinsight jupyter notebook

20 Views Asked by At

I have a problem in the HDInsight Jupyter Notebook.

I cannot access outside files. I am trying to access files on the HDInsight cluster head node which I can ssh to using my username and password from a remote terminal.

If I try to install the hdfs library, the notebook reports it is already installed.

% pip install hdfs   # python code

Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: hdfs in /home/spark/.local/lib/python3.8/site-packages (2.7.3)
Requirement already satisfied: docopt in /home/spark/.local/lib/python3.8/site-packages (from hdfs) (0.6.2)
... more statements like this follow ...
Note: you may need to restart the kernel to use updated packages.

But when I try to import from the hdfs library, it reports

from hdfs import InsecureClient         # python code
from pyspark.sql import SparkSession    # python code

An error was encountered:
No module named 'hdfs'
Traceback (most recent call last):
ModuleNotFoundError: No module named 'hdfs'

What is the problem?

Is there a different way to access the files on the HDInsight cluster head node?

0

There are 0 best solutions below