Extracting an archive created via Java RandomAccessFile with PHP

44 Views Asked by At

I'm trying to recreate a long lost PHP website. One of the pages of this website allowed employees to upload archive files that were created by a local script they executed. The webserver would then extract the contents into separate files to be stored in different folders for other purposes.

Thankfully I have the script that created the archives, but it is in Java. I imagine it can be reversed though? The script they used would basically just run the below addFile on multiple file paths.

public class Archive {
    static void create(File f) throws IOException {
        BufferedOutputStream w = new BufferedOutputStream(new FileOutputStream(f));
        w.write(new byte[]{1, 3, 3, 7});
        w.write(new byte[4]);
        w.close();
    }

    static int addFile(File archive, File add, String name) throws IOException {
        if (!add.exists()) {
            throw new IOException("File to be added does not exist!");
        }
        if (add.isDirectory()) {
            throw new IOException("Cannot add directories!");
        }
        if (!archive.exists()) {
            Archive.create(archive);
        }
        if (archive.isDirectory()) {
            throw new IOException("Archive is no valid archive!");
        }
        RandomAccessFile r = new RandomAccessFile(archive, "rw");
        int code = r.readInt();
        if (code != 16974599) {
            throw new IOException("Archive is no valid archive!");
        }
        int fileCount = r.readInt();
        r.seek(4);
        r.writeInt(fileCount + 1);
        r.seek(r.length());
        RandomAccessFile bi = new RandomAccessFile(add, "r");
        r.writeInt((int)bi.length());
        r.writeBytes(name);
        r.write(0);
        byte[] swap = new byte[(int)bi.length()];
        bi.readFully(swap);
        r.write(swap);
        bi.close();
        r.close();
        return fileCount + 1;
    }

    public static void main(String[] args) throws IOException {
    }
}

Update:

I have created a function using fread() but then it runs out of memory after the first file. That is with the memory limit temporarily set at 512mb. Is there an alternative?

1

There are 1 best solutions below

0
Ruslan Osmanov On BEST ANSWER

According to the Java code, the file format is as follows:

  • 0x01030307 (that is 16974599 in decimal representation)
  • 32-byte file count, little endian
  • File 1 32-byte length, little endian
  • File 1 name followed by 0x00
  • File 1 bytes
  • ...
  • File N 32-byte length, little endian
  • File N name followed by 0x00
  • File N bytes

It is not an archive format but a simple concatenation of files with some metadata.

To extract the files from such an 'archive,' we can use a PHP code like this:

<?php
class MyArchiveHeader {
    public function __construct(
        private int $typeCode,
        private int $fileCount
    ) {}

    public function getTypeCode(): int
    {
        return $this->typeCode;
    }

    public function getFileCount(): int
    {
        return $this->fileCount;
    }
}

class MyArchiveFile {
    public function __construct(
        private string $filename,
        private string $contents
    ) {}

    public function getFilename(): string
    {
        return $this->filename;
    }

    public function getContents(): string
    {
        return $this->contents;
    }
}

class MyArchive {

    public function __construct(private string $filename) {}

    public function extractFiles(string $outputDirectory): void
    {
        if (!is_dir($outputDirectory)) {
            throw new \InvalidArgumentException('Output directory does not exist');
        }

        $file = new \SplFileObject($this->filename, 'rb');

        $header = $this->parseHeader($file);

        $fileCount = $header->getFileCount();
        for ($i = 0; $i < $fileCount; $i++) {
            $parsedFile = $this->parseFile($file);

            $outputFilename = $outputDirectory . DIRECTORY_SEPARATOR . $parsedFile->getFilename();
            file_put_contents($outputFilename, $parsedFile->getContents());
        }
    }

    private function parseHeader(\SplFileObject $file): MyArchiveHeader
    {
        $typeCodeBytes = $file->fread(4);
        if ($typeCodeBytes === false) {
            throw new \RuntimeException('Could not read file type code');
        }

        $typeCode = unpack('V', $typeCodeBytes)[1]; // Unpack 4 bytes as unsigned integer
        if ($typeCode !== 0x01030307) {
            throw new \RuntimeException('Invalid file type code');
        }

        $fileCountBytes = $file->fread(4);
        if ($fileCountBytes === false) {
            throw new \RuntimeException('Could not read file count');
        }

        $fileCount = unpack('V', $fileCountBytes)[1]; // Unpack 4 bytes as unsigned integer

        return new MyArchiveHeader($typeCode, $fileCount);
    }

    private function parseFile(\SplFileObject $file): MyArchiveFile
    {
        $fileLengthBytes = $file->fread(4);
        if ($fileLengthBytes === false) {
            throw new \RuntimeException('Could not read file length');
        }

        $fileLength = unpack('V', $fileLengthBytes)[1]; // Unpack 4 bytes as unsigned integer

        $filename = "";
        while (!$file->eof()) {
            $char = $file->fread(1);
            if ($char === "\0") {
                break;
            }
            $filename .= $char;
        }

        // TODO Might need to convert $filename to UTF-8, for instance.

        $contents = $file->fread($fileLength);
        if ($contents === false) {
            throw new \RuntimeException('Could not read file contents');
        }

        return new MyArchiveFile($filename, $contents);
    }
}

I haven't tested the code, but it should give you a good starting point. You can use it like this:

$archiveFilename = 'archive';
$outputDir = sys_get_temp_dir() . DIRECTORY_SEPARATOR . 'extracted';
mkdir($outputDir);

echo "Extracting archive to $outputDir\n";
$archive = new MyArchive($archiveFilename);
$archive->extractFiles($outputDir);