Will Hive 3.0 introduce a metastore cluster environment based on CachedStore, will there be a phantom read problem? This is a Q&A about HMS. The questions are as follows: According to the description on the official website, CachedStore synchronizes once every minute by default. In a cluster of multiple HMSs, if table partitions are modified concurrently, will cache inconsistency cause ABA problems? It can be seen that the transaction in the code is at the RR level, but the corresponding read operation does not include for update. For example, modifying the partition org.apache.hadoop.hive.metastore.RawStore#alterPartitions will start a transaction, generate a directSql query partition (the current read is not used), and then modify mysql and mark the cache dirty. It is assumed that there will be concurrent transactions after the read operation. Write, then the metadata will not be messed up?
private List<Long> getPartitionIdsViaSqlFilter(
String catName, String dbName, String tblName, String sqlFilter,
List<? extends Object> paramsForFilter, List<String> joinsForFilter, Integer max)
throws MetaException {
...
String queryText =
"select " + PARTITIONS + ".\"PART_ID\" from " + PARTITIONS + ""
+ " inner join " + TBLS + " on " + PARTITIONS + ".\"TBL_ID\" = " + TBLS + ".\"TBL_ID\" "
+ " and " + TBLS + ".\"TBL_NAME\" = ? "
+ " inner join " + DBS + " on " + TBLS + ".\"DB_ID\" = " + DBS + ".\"DB_ID\" "
+ " and " + DBS + ".\"NAME\" = ? "
+ join(joinsForFilter, ' ')
+ " where " + DBS + ".\"CTLG_NAME\" = ? "
+ (StringUtils.isBlank(sqlFilter) ? "" : (" and " + sqlFilter)) + orderForFilter;
...