DataHub openAPI /v2/entity/dataset POST request - how to include schema

45 Views Asked by At

I am using the openAPI of Datahub to create new datasets. It works fine when just creating a dataset with a name, description etc. However, when I try to include a schema I just don't get the API to work. I'm working with parquet files so ideally it should use the ParquetSchema. Currently i'm using this JSON in my POST request:

[
    {
        "urn": "urn:li:dataset:(urn:li:dataPlatform:spark,test,DEV)",
        "datasetProperties": {
            "value": {
                "__type": "DatasetProperties",
                "description": "Dataset generated from namespacefile",
                "name": "test",
                "qualifiedName": "file",
                "externalUrl": "file"
            }
        },
        "schemaMetadata": {
            "value": {
                "__type": "SchemaMetadata",
                "schemaName": "testschema",
                "platform": "urn:li:dataPlatform:spark",
                "version": 1,
                "hash": "cf83e1357eefb8bdf154b14372c1e5e251e7f09e3d1194e7b55a11d68a9179e9",
                "platformSchema": {
                    "com.linkedin.schema.ParquetSchema": {
                        "fields": [
                            {
                                "name": "test",
                                "type": {
                                    "typeName": "STRING"
                                },
                                "repetition": "OPTIONAL"
                            }
                        ]
                    }
                }
            }
        }
    }
]

When I exclude everything that is schema related it works fine. Does anyone know what the issue might be?

0

There are 0 best solutions below