Caused by: org.bson.BsonInvalidOperationException: Invalid state INITIAL

Question

Caused by: org.bson.BsonInvalidOperationException: Invalid state INITIAL

2.1k Views Asked by Tom At 27 April 2019 at 07:44

There are several similar questions over there on the internet,but no one has answers.

I am using following code to save the mongo data to Hive, but exceptions occur as shown in the end. I would ask how to work around this problem

I am using

spark-mongo-connector (spark 2.1.0 - scala 2.11)

java-mongo-driver 3.10.2

import com.mongodb.spark.MongoSpark
import org.apache.spark.SparkConf
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.types.StructType

object MongoConnector_Test {
  def main(args: Array[String]): Unit = {
    val conf = new SparkConf().set("spark.mongodb.input.uri", "mongodb://user:pass@mongo1:123456/db1.t1").setMaster("local[4]").setAppName("MongoConnectorTest")
    val session = SparkSession.builder().config(conf).enableHiveSupport().getOrCreate()
    val schema: StructType = new StructType().add("_id", "string").add("x", "string").add("y", "string").add("z", "string")//
    val df = MongoSpark.read(session).schema(schema).load()
    df.write.saveAsTable("MongoConnector_Test" + System.currentTimeMillis())
  }

}

But, following exception occurs.

Caused by: org.bson.BsonInvalidOperationException: Invalid state INITIAL
    at org.bson.json.StrictCharacterStreamJsonWriter.checkState(StrictCharacterStreamJsonWriter.java:395)
    at org.bson.json.StrictCharacterStreamJsonWriter.writeNull(StrictCharacterStreamJsonWriter.java:192)
    at org.bson.json.JsonNullConverter.convert(JsonNullConverter.java:24)
    at org.bson.json.JsonNullConverter.convert(JsonNullConverter.java:21)
    at org.bson.json.JsonWriter.doWriteNull(JsonWriter.java:206)
    at org.bson.AbstractBsonWriter.writeNull(AbstractBsonWriter.java:557)
    at org.bson.codecs.BsonNullCodec.encode(BsonNullCodec.java:38)
    at org.bson.codecs.BsonNullCodec.encode(BsonNullCodec.java:28)
    at org.bson.codecs.EncoderContext.encodeWithChildContext(EncoderContext.java:91)
    at org.bson.codecs.BsonValueCodec.encode(BsonValueCodec.java:62)
    at com.mongodb.spark.sql.BsonValueToJson$.apply(BsonValueToJson.scala:29)
    at com.mongodb.spark.sql.MapFunctions$.bsonValueToString(MapFunctions.scala:103)
    at com.mongodb.spark.sql.MapFunctions$.com$mongodb$spark$sql$MapFunctions$$convertToDataType(MapFunctions.scala:78)
    at com.mongodb.spark.sql.MapFunctions$$anonfun$3.apply(MapFunctions.scala:39)
    at com.mongodb.spark.sql.MapFunctions$$anonfun$3.apply(MapFunctions.scala:37)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
    at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
    at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
    at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
    at com.mongodb.spark.sql.MapFunctions$.documentToRow(MapFunctions.scala:37)
    at com.mongodb.spark.sql.MongoRelation$$anonfun$buildScan$2.apply(MongoRelation.scala:45)
    at com.mongodb.spark.sql.MongoRelation$$anonfun$buildScan$2.apply(MongoRelation.scala:45)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
    at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
    at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.execute(FileFormatWriter.scala:243)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:190)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:188)
    at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1341)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:193)
    ... 8 more

Original Q&A

There are 3 best solutions below

**phuong** · Answer 1 · 2019-07-02T10:13:08.740000

Mongo stores data in document and schema is not fixed for all documents. So, please note this case due to meta data may be null and it's root cause for your issue. Let ignore some fields aren't available in some documents and the issue will be fixed.

**kcforstackoverflow** · Answer 2 · 2020-01-16T14:08:11.163000

Actually, the schema miss match should be the actual issue.

Mongo DB is having the flexible schema type so even if you dont have the field and the value as null then also it will work. But you should not have schema mismatch.

For example in multiple collection, same struct/column should not have the miss and match of subtypes.

We faced this issue and after multiple checks we could finally solve it by correcting the Schema for the collections , from String to boolean

**kcforstackoverflow** · Answer 3 · 2020-01-16T16:25:28.790000

Incase you still have this issue. Consider you have 2 documents in the mongo collecttion. This will give error since the sub elements inside structType otherDetails field (isExternalUser, isPrivate) has boolean and String. So both should be changed to String or boolean to make it work. At the same time it may or may not have some field in all collection documents(here isInternal is not present in second).

{
"_id" : ObjectId("5aa78d90d169ed325063b06d"),
"Name" : Kailash Test,
"EmpId" : 1234567,
"company" : "test.com",
"otherDetails" : {
    "isPrivate" : false,
    "isInternal" : false,
    "isExternalUser" : true
},
}
{
"_id" : ObjectId("5aa78d90d169ed123456789d"),
"Name" : Kailash Test2,
"EmpId" : 1234567,
"company" : "test.com",
"otherDetails" : {
    "isPrivate" : "false",
    "isExternalUser" : "true"
},
}

Caused by: org.bson.BsonInvalidOperationException: Invalid state INITIAL

There are 3 best solutions below

Related Questions in MONGODB

Related Questions in APACHE-SPARK

Related Questions in MONGO-JDBC-DRIVER

Trending Questions

Popular # Hahtags

Popular Questions