I am using the openAPI of Datahub to create new datasets. It works fine when just creating a dataset with a name, description etc. However, when I try to include a schema I just don't get the API to work. I'm working with parquet files so ideally it should use the ParquetSchema. Currently i'm using this JSON in my POST request:
[
{
"urn": "urn:li:dataset:(urn:li:dataPlatform:spark,test,DEV)",
"datasetProperties": {
"value": {
"__type": "DatasetProperties",
"description": "Dataset generated from namespacefile",
"name": "test",
"qualifiedName": "file",
"externalUrl": "file"
}
},
"schemaMetadata": {
"value": {
"__type": "SchemaMetadata",
"schemaName": "testschema",
"platform": "urn:li:dataPlatform:spark",
"version": 1,
"hash": "cf83e1357eefb8bdf154b14372c1e5e251e7f09e3d1194e7b55a11d68a9179e9",
"platformSchema": {
"com.linkedin.schema.ParquetSchema": {
"fields": [
{
"name": "test",
"type": {
"typeName": "STRING"
},
"repetition": "OPTIONAL"
}
]
}
}
}
}
}
]
When I exclude everything that is schema related it works fine. Does anyone know what the issue might be?