use Java to run "tar" command, but it costs much time to generate the tar file

2.4k Views Asked by At

In my project, i need generate an agent for users, and the agent include some jars(about 22M),some source class(about 200k) and a XML. The server will generate different xml for defferent user,so I have to generate the agent dynamically. I used Runtime.getRuntime().exec("tar...") to tar the files to generate agents.

When I run the unit test , the generation costs a lot of time, about 2min for every agent. I cann't let the user wait a page for 2min...so do it exists any other way to make this program more efficiency, or there is another way to generate the agent fast and smoothly?? appreciate!

 /**
  * tar the agents
  * @param inputFiles agent files
  * @param outputFile agent tar
  * @param baseDir the directory path to run "tar" command
  */
 public static void tarFile(String[] inputFiles,String outputFile,String baseDir){
     String cmd="tar -zcf "+outputFile+" ";
     for (int i = 0; i < inputFiles.length; i++) {
        cmd+=inputFiles[i]+" ";
    }
     System.out.println(cmd);
     try {
        Process process=Runtime.getRuntime().exec(cmd, null, new File(baseDir));
        BufferedReader bufferedReader=new BufferedReader(new InputStreamReader(process.getInputStream()));
        String s;
        while ((s=bufferedReader.readLine())!=null) {
            System.out.println(s);
        }
        process.waitFor();
    } catch (IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    } catch (InterruptedException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }
 }

this is my tar class, the param--basedir is to set the command's execution path

2

There are 2 best solutions below

0
On

Use Apache Commons Compress to create the tar archive directly in Java.
See TarArchiveOutputStream.

Here a complete servlet that creates that tar output on the fly:

import java.io.*;   
import javax.servlet.http.*;    
import org.apache.commons.compress.archivers.tar.*;
import org.apache.commons.compress.utils.IOUtils;

public class Web extends HttpServlet {
    private static final long serialVersionUID = 1L;

    @Override
    protected void doGet(HttpServletRequest req, HttpServletResponse resp) throws IOException {
        String[] files = { 
                "/path/to/file1",
                "/path/to/file2",
        };

        resp.setContentType("application/x-tar");
        try (TarArchiveOutputStream out = new TarArchiveOutputStream(resp.getOutputStream())) {
            for (String name: files) {
                File file = new File(name);
                TarArchiveEntry entry = new TarArchiveEntry(file);
                out.putArchiveEntry(entry);
                try (InputStream in = new FileInputStream(file)) {
                    IOUtils.copy(in, out);
                }
                out.closeArchiveEntry();
            }
        }
    }
}

The client will start receiving the archive instantly, without having to wait for the tar file to be created in advance. Also this does not require writing the archive to disc.

4
On

I would need to see the actual tar command which you pass to Runtime.getRuntime().exec.

There are 4 factors which really influence the speed of zipping up files into a tarball:

  1. CPU speed on the box
  2. I/O speed on the box
  3. Amount of source data to archive
  4. Compression

rzymek pointed out that you can use Apache Commons Compress to generate the file natively. However, I am not sure that would be any faster unless we understand what the constraint from the command-line approach is.

If you just tar it yourself using a shell, is the speed comparable? If so, try reducing the compression ratio or even turning it off entirely (removing the "z" option in tar czf).

If you are compressing the tar file and it needs to be compressed, then if the XML is small, you could pre-compress the JAR files (I think they already are compressed if I recall, but you may need to check the compression ratio) and then make the tarball uncompressed since compressing a compressed file accomplishes nothing (and sometimes is slower!)