I'm switching a Java application from using com.netflix.astyanax:astyanax-core:1.56.44 to com.datastax.cassandra:cassandra-driver-core:3.1.0.
Right off the bat, with a simple test that inserts a row with a randomly-generated key and then reads the row 1000 times, I'm seeing terrible performance compared to the code using Astyanax. Just using a single-node Cassandra instance running locally. The table I'm testing is simple -- just a blob primary key uuid column, and an int date column.
Here's the basic code with the DataStax driver:
class DataStaxCassandra
{
final Session session;
final PreparedStatement preparedIDWriteCmd;
final PreparedStatement preparedIDReadCmd;
void DataStaxCassandra()
{
final PoolingOptions poolingOptions = new PoolingOptions()
.setConnectionsPerHost(HostDistance.LOCAL, 1, 2)
.setConnectionsPerHost(HostDistance.REMOTE, 1, 1)
.setMaxRequestsPerConnection(HostDistance.LOCAL, 128)
.setMaxRequestsPerConnection(HostDistance.REMOTE, 128)
.setPoolTimeoutMillis(0); // Don't ever wait for a connection to one host.
final QueryOptions queryOptions = new QueryOptions()
.setConsistencyLevel(ConsistencyLevel.LOCAL_ONE)
.setPrepareOnAllHosts(true)
.setReprepareOnUp(true);
final LoadBalancingPolicy dcAwareRRPolicy = DCAwareRoundRobinPolicy.builder()
.withLocalDc("my_laptop")
.withUsedHostsPerRemoteDc(0)
.build();
final LoadBalancingPolicy loadBalancingPolicy = new TokenAwarePolicy(dcAwareRRPolicy);
final SocketOptions socketOptions = new SocketOptions()
.setConnectTimeoutMillis(1000)
.setReadTimeoutMillis(1000);
final RetryPolicy retryPolicy = new LoggingRetryPolicy(DefaultRetryPolicy.INSTANCE);
Cluster.Builder clusterBuilder = Cluster.builder()
.withClusterName("test cluster")
.withPort(9042)
.addContactPoints("127.0.0.1")
.withPoolingOptions(poolingOptions)
.withQueryOptions(queryOptions)
.withLoadBalancingPolicy(loadBalancingPolicy)
.withSocketOptions(socketOptions)
.withRetryPolicy(retryPolicy);
// I've tried both V3 and V2, with lower connections/host and higher reqs/connection settings
// with V3, and it doesn't noticably affect the test performance. Leaving it at V2 because the
// Astyanax version is using V2.
clusterBuilder.withProtocolVersion(ProtocolVersion.V2);
final Cluster cluster = clusterBuilder.build();
session = cluster.connect();
preparedIDWriteCmd = session.prepare(
"INSERT INTO \"mykeyspace\".\"mytable\" (\"uuid\", \"date\") VALUES (?, ?) USING TTL 38880000");
preparedIDReadCmd = session.prepare(
"SELECT \"date\" from \"mykeyspace\".\"mytable\" WHERE \"uuid\"=?");
}
public List<Row> execute(final Statement statement, final int timeout)
throws InterruptedException, ExecutionException, TimeoutException
{
final ResultSetFuture future = session.executeAsync(statement);
try
{
final ResultSet readRows = future.get(timeout, TimeUnit.MILLISECONDS);
final List<Row> resultRows = new ArrayList<>();
// How far we can go without triggering the blocking fetch:
int remainingInPage = readRows.getAvailableWithoutFetching();
for (final Row row : readRows)
{
resultRows.add(row);
if (--remainingInPage == 0) break;
}
return resultRows;
}
catch (final TimeoutException e)
{
future.cancel(true);
throw e;
}
}
private void insertRow(final byte[] id, final int date)
throws InterruptedException, ExecutionException, TimeoutException
{
final ByteBuffer idKey = ByteBuffer.wrap(id);
final BoundStatement writeCmd = preparedIDWriteCmd.bind(idKey, date);
writeCmd.setRoutingKey(idKey);
execute(writeCmd, 1000);
}
public int readRow(final byte[] id)
throws InterruptedException, ExecutionException, TimeoutException
{
final ByteBuffer idKey = ByteBuffer.wrap(id);
final BoundStatement readCmd = preparedIDReadCmd.bind(idKey);
readCmd.setRoutingKey(idKey);
final List<Row> idRows = execute(readCmd, 1000);
if (idRows.isEmpty()) return 0;
final Row idRow = idRows.get(0);
return idRow.getInt("date");
}
}
void perfTest()
{
final DataStaxCassandra ds = new DataStaxCassandra();
final int perfTestCount = 10000;
final long startTime = System.nanoTime();
for (int i = 0; i < perfTestCount; ++i)
{
final String id = UUIDUtils.generateRandomUUIDString();
final byte[] idBytes = Utils.hexStringToByteArray(id);
final int date = (int)(System.currentTimeMillis() / 1000);
try
{
ds.insertRow(idBytes, date);
final int dateRead = ds.readRow(idBytes);
assert(dateRead == date) : "Inserted ID with date " +date +" but date read is " +dateRead;
}
catch (final InterruptedException | ExecutionException | TimeoutException e)
{
System.err.println("ERROR reading ID (test " +(i+1) +") - " +e.toString());
}
}
System.out.println(
perfTestCount +" insert+reads took " +
TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - startTime) +" ms");
}
Is there something I'm doing wrong that would yield poor performance? I was hoping it'd be a decent speed improvement, given that I'm using a pretty old version of Astyanax.
I've tried not wrapping the load balancing policy with TokenAwarePolicy, and getting rid of the "setRoutingKey" lines, just because I know these things definitely shouldn't help when just using a single node as I'm currently doing.
My local Cassandra version is 2.1.15 (which supports native protocol V3), but the machines in our production environment are running Cassandra 2.0.12.156 (which only supports V2).
Keep in mind that this is targeted for an environment with a bunch of nodes and several data centers, which is why I've got the settings the way I do (which the actual values being set from a config file), even though I know for this test I could skip using things like DCAwareRoundRobinPolicy.
Any help would be greatly appreciated! I can also post the code that's using Astyanax, I just figured first it'd be good to make sure nothing's blatantly wrong with my new code. Thanks!
Tests of 10,000 writes+reads are taking around 30 seconds with the DataStax driver, while with Astyanax, they are taking in the 15-20 second range.
I upped the test count to 100,000 to see if maybe there's some overhead with the DataStax driver that just consumes ~10 seconds at startup, after which they might perform more similarly. But even with 100,000 read/writes:
AstyanaxCassandra 100,000 insert+reads took 156593 ms
DataStaxCassandra 100,000 insert+reads took 294340 ms