Set JVM flags in an Apache Giraph job

43 Views Asked by At

I am running an Apache Giraph job which ultimately runs a Hadoop MapReduce job. The job is run by calling a hadoop jar lib/giraph_2.12.jar org.apache.giraph.GiraphRunner command

I'm trying to set a few JVM flags/System properties using the -ca flag which looks like this

"-ca mapreduce.map.java.opts=\"-Xmx30456m -Dzookeeper.client.secure=true -Dzookeeper.clientCnxnSocket=org.apache.zookeeper.ClientCnxnSocketNetty\""

but looks like its not possible to set these configs like this. Here's the code to read the -ca configs in the GiraphConfiguration class

      if (cmd.hasOption("ca")) {
        String[] var11 = cmd.getOptionValues("ca");
        int var5 = var11.length;

        for(int var6 = 0; var6 < var5; ++var6) {
          String caOptionValue = var11[var6];

          String[] parts;
          for(Iterator var8 = Splitter.on(',').split(caOptionValue).iterator(); var8.hasNext(); conf.set(parts[0], parts[1])) {
            String paramValue = (String)var8.next();
            parts = (String[])Iterables.toArray(Splitter.on('=').split(paramValue), String.class);
            if (parts.length != 2) {
              throw new IllegalArgumentException("Unable to parse custom  argument: " + paramValue);
            }

            if (LOG.isInfoEnabled()) {
              LOG.info("Setting custom argument [" + parts[0] + "] to [" + parts[1] + "] in GiraphConfiguration");
            }
          }
        }
      }

I've been setting the java.opts memory in my job but does anyone know how to set multiple flags

1

There are 1 best solutions below

1
Augusto Cesar On

It looks like the -ca option in Giraph is designed to accept key-value pairs separated by =, rather than a list of JVM arguments.

To set multiple JVM flags, you may need to pass them in directly through the JVM launch arguments rather than via -ca. For example:

hadoop jar lib/giraph_2.12.jar org.apache.giraph.GiraphRunner
-Dmapreduce.map.java.opts="-Xmx30456m -Dzookeeper.client.secure=true -Dzookeeper.clientCnxnSocket=org.apache.zookeeper.ClientCnxnSocketNetty" The -Dmapreduce.map.java.opts allows you to specify multiple JVM arguments together.

Another option is to set the JVM options in your Hadoop config (mapred-site.xml, etc) rather than passing them directly each time.

So in summary:

-ca is for key-value pairs, not multiple JVM args Pass JVM args directly through -Dmapreduce.map.java.opts Or set them in Hadoop config for reuse Let me know if this helps explain how to set multiple JVM options for your Giraph job!