How to implement ColumnFamilies in RocksDB in Java?

657 Views Asked by At

I am trying to use column families in RocksDB through java binding.

RocksDB.loadLibrary();
        String threat = "threat_data";
        String ipRange = "ip_range";
        options = new DBOptions();
        options.setCreateIfMissing(true);
        options.setCreateMissingColumnFamilies(true);
        ColumnFamilyOptions cfOpts = new ColumnFamilyOptions().optimizeUniversalStyleCompaction();
        List cfDescriptors = Arrays.asList(
                new ColumnFamilyDescriptor(RocksDB.DEFAULT_COLUMN_FAMILY, cfOpts),
                new ColumnFamilyDescriptor(threat.getBytes(), cfOpts),
                new ColumnFamilyDescriptor(ipRange.getBytes(),cfOpts)
        );
        List<ColumnFamilyHandle> cfHandles = new ArrayList<>();
        rocksDb = RocksDB.open(options, new File("/tmp/benchmark", "rockdb-threat-detection.db").getAbsolutePath(),cfDescriptors,cfHandles);
        
        cfHandleThreat = (ColumnFamilyHandle) ((List) cfHandles.stream().filter(x -> {
            try {
                return (new String(x.getName())).equals(threat);
            } catch (RocksDBException e) {
                e.printStackTrace();
            }
            return false;
        }).collect(Collectors.toList())).get(0);
        
        cfHandleIp = (ColumnFamilyHandle) ((List) cfHandles.stream().filter(x -> {
            try {
                return (new String(x.getName())).equals(ipRange);
            } catch (RocksDBException e) {
                e.printStackTrace();
            }
            return false;
        }).collect(Collectors.toList())).get(0);

I am creating 2 column families threat_data and ip_range. But if trying to read from using get() function, the performance hits low.

mapThreat.get(ipToLong("157.49.194.173"))

The performance between using columnfamilies and not using them changes drastically. Is there anything I am doing wrong or How should I improve performance?

1

There are 1 best solutions below

0
Asad Awadia On

Are all gets slow or only the first one? There isn't much you can do as they are just virtual dataspaces

The only alternative is to not use column families and prefix your keys with the column family name