I have an error in Azure Databricks. It's not possible to find built-in function.
[UNRESOLVED_ROUTINE] Cannot resolve function md5 on search path [system.builtin, system.session, spark_catalog.default]
It's happening only sometimes (not always) in a workflow, after rerun all is fine. Can be caused due to the fact that I have many tasks which use the same notebook but with different parameters. Do you know how to resolve? Some init script on the job cluster level or add libraries on the task level?
This error indicates that the
md5function cannot be resolved in the default search path. Themd5function is available in thepyspark.sql.functionsmodule, so you need to import it before using it in your code. I have tried the below example:In the above code, I have imported the md5 function and applied the
md5function to a column in a DataFrame.In SQL, the
md5function is not built-in. However, you can still use it by registering a temporary SQL function using the registerTempFunction method.I have tried the below example:
Results:
I have imported the module, in this case, the
hashlibmodule for calculating the MD5 hash and theStringTypeclass frompyspark.sql.types.In the above code, we are registering a temporary SQL function called "md5" that takes a string input and returns the MD5 hash as a string. Then, I created a temporary view of the DataFrame df and used the registered function in a SQL query.