I am running an Apache Giraph job which ultimately runs a Hadoop MapReduce job. The job is run by calling a hadoop jar lib/giraph_2.12.jar org.apache.giraph.GiraphRunner command
I'm trying to set a few JVM flags/System properties using the -ca flag which looks like this
"-ca mapreduce.map.java.opts=\"-Xmx30456m -Dzookeeper.client.secure=true -Dzookeeper.clientCnxnSocket=org.apache.zookeeper.ClientCnxnSocketNetty\""
but looks like its not possible to set these configs like this. Here's the code to read the -ca configs in the GiraphConfiguration class
if (cmd.hasOption("ca")) {
String[] var11 = cmd.getOptionValues("ca");
int var5 = var11.length;
for(int var6 = 0; var6 < var5; ++var6) {
String caOptionValue = var11[var6];
String[] parts;
for(Iterator var8 = Splitter.on(',').split(caOptionValue).iterator(); var8.hasNext(); conf.set(parts[0], parts[1])) {
String paramValue = (String)var8.next();
parts = (String[])Iterables.toArray(Splitter.on('=').split(paramValue), String.class);
if (parts.length != 2) {
throw new IllegalArgumentException("Unable to parse custom argument: " + paramValue);
}
if (LOG.isInfoEnabled()) {
LOG.info("Setting custom argument [" + parts[0] + "] to [" + parts[1] + "] in GiraphConfiguration");
}
}
}
}
I've been setting the java.opts memory in my job but does anyone know how to set multiple flags
It looks like the -ca option in Giraph is designed to accept key-value pairs separated by =, rather than a list of JVM arguments.
To set multiple JVM flags, you may need to pass them in directly through the JVM launch arguments rather than via -ca. For example:
hadoop jar lib/giraph_2.12.jar org.apache.giraph.GiraphRunner
-Dmapreduce.map.java.opts="-Xmx30456m -Dzookeeper.client.secure=true -Dzookeeper.clientCnxnSocket=org.apache.zookeeper.ClientCnxnSocketNetty" The -Dmapreduce.map.java.opts allows you to specify multiple JVM arguments together.
Another option is to set the JVM options in your Hadoop config (mapred-site.xml, etc) rather than passing them directly each time.
So in summary:
-ca is for key-value pairs, not multiple JVM args Pass JVM args directly through -Dmapreduce.map.java.opts Or set them in Hadoop config for reuse Let me know if this helps explain how to set multiple JVM options for your Giraph job!