I have an on-demand HDInsight cluster that is launched from a Spark Activity within Azure Data Factory and runs PySpark 3.1. To test out my code, I normally launch Jupyter Notebook from the created HDInsight Cluster page.
Now, I would like to pass some parameters to that Spark activity and retrieve these parameters from within Jupyter notebook code. I've tried doing so in two ways, but none of them worked for me:
Method A. as Arguments and then tried to retrieve them using sys.argv[].
Method B. as Spark configuration and then tried to retrieve them using sc.getConf().getAll().
I suspect that either:
- I am not specifying parameters correctly
- or using a wrong way to retrieve them in Jupyter Notebook code
- or parameters are only valid for the Python
*.pyscripts specified in the "File path" field, but not for the Jupyter notebooks.
Any pointers on how to pass parameters into HDInsight Spark activity within Azure Data Factory would be much appreciated.

The issue is with the
entryFilePath. In theSparkactivity of HDInsight cluster, you must either give theentryFilePathas a .jar file or .py file. When we follow this, we can successfully pass arguments which can be utilized usingsys.argv.nb1.py(sample) is as shown below:Jupyter notebookusing the following query:NOTE:
So, please ensure that you are passing arguments to python script (.py file) but not a python notebook.