Jena Fuseki on Mac M1 slow on loading multiple small turtle files

201 Views Asked by At

I am trying to load some data into Jena Fuseki 4.7.0 on my Mac M1 and noticed that loading multiple small files is very slow in comparison to linux machine.

Then I did some tests by loading a file with single triple below:

<http://ex.com/1> <http://ex.com/p> "Test".

Please note that for all the tests below I created a new dataset and then loaded the file using Fuseki UI. To rule out the possibility of slow Java start I load the file more than once in same dataset without clearing the dataset or restarting Fuseki.

Fuseki 4.7.0 TDB dataset on a VM(Standard D2s v5 (2 vcpus, 8 GiB memory)) running in Azure I get times as below:

15:58:51 INFO  Server          ::   Memory: 4.0 GiB
15:58:51 INFO  Server          ::   Java:   11.0.19
15:58:51 INFO  Server          ::   OS:     Linux 5.4.0-1106-azure amd64


16:15:25 INFO  Fuseki          :: [3411] POST http://***:3030/test/data
16:15:25 INFO  Fuseki          :: [3411] Filename: load-test.ttl, Content-Type=application/octet-stream, Charset=null => Turtle : Count=1 Triples=1 Quads=0
16:15:25 INFO  Fuseki          :: [3411] 200 OK (71 ms)
16:16:28 INFO  Fuseki          :: [3412] POST http://***:3030/test/data
16:16:28 INFO  Fuseki          :: [3412] Filename: load-test.ttl, Content-Type=application/octet-stream, Charset=null => Turtle : Count=1 Triples=1 Quads=0
16:16:28 INFO  Fuseki          :: [3412] 200 OK (43 ms)
16:16:34 INFO  Fuseki          :: [3413] POST http://***:3030/test/data
16:16:34 INFO  Fuseki          :: [3413] Filename: load-test.ttl, Content-Type=application/octet-stream, Charset=null => Turtle : Count=1 Triples=1 Quads=0
16:16:34 INFO  Fuseki          :: [3413] 200 OK (51 ms)

Fuseki 4.7.0 TDB dataset on Mac M1 Max with 10 cores (8 performance and 2 efficiency) and 64GB RAM

17:26:48 INFO  Server          ::   Memory: 4.0 GiB
17:26:48 INFO  Server          ::   Java:   11.0.18
17:26:48 INFO  Server          ::   OS:     Mac OS X 12.6 aarch64


17:10:27 INFO  Fuseki          :: [217] POST http://localhost:3030/test/data
17:10:27 INFO  Fuseki          :: [217] Filename: load-test.ttl, Content-Type=application/octet-stream, Charset=null => Turtle : Count=1 Triples=1 Quads=0
17:10:27 INFO  Fuseki          :: [217] 200 OK (486 ms)
17:11:04 INFO  Fuseki          :: [218] POST http://localhost:3030/test/data
17:11:04 INFO  Fuseki          :: [218] Filename: load-test.ttl, Content-Type=application/octet-stream, Charset=null => Turtle : Count=1 Triples=1 Quads=0
17:11:04 INFO  Fuseki          :: [218] 200 OK (319 ms)
17:11:20 INFO  Fuseki          :: [219] POST http://localhost:3030/test/data
17:11:20 INFO  Fuseki          :: [219] Filename: load-test.ttl, Content-Type=application/octet-stream, Charset=null => Turtle : Count=1 Triples=1 Quads=0
17:11:20 INFO  Fuseki          :: [219] 200 OK (328 ms)

Fuseki 4.7.0 in mem dataset on a VM(Standard D2s v5 (2 vcpus, 8 GiB memory)) running in Azure I get times as below:

14:33:04 INFO  Server          ::   Memory: 4.0 GiB
14:33:04 INFO  Server          ::   Java:   11.0.19
14:33:04 INFO  Server          ::   OS:     Linux 5.4.0-1106-azure amd64


14:51:20 INFO  Fuseki          :: [121] POST http://****:3030/test-in-mem/data
14:51:20 INFO  Fuseki          :: [121] Filename: load-test.ttl, Content-Type=application/octet-stream, Charset=null => Turtle : Count=1 Triples=1 Quads=0
14:51:20 INFO  Fuseki          :: [121] 200 OK (24 ms)
14:51:28 INFO  Fuseki          :: [122] POST http://****:3030/test-in-mem/data
14:51:28 INFO  Fuseki          :: [122] Filename: load-test.ttl, Content-Type=application/octet-stream, Charset=null => Turtle : Count=1 Triples=1 Quads=0
14:51:28 INFO  Fuseki          :: [122] 200 OK (3 ms)

Fuseki 4.7.0 in mem dataset on Mac M1 Max with 10 cores (8 performance and 2 efficiency) and 64GB RAM

15:42:42 INFO  Server          ::   Memory: 4.0 GiB
15:42:42 INFO  Server          ::   Java:   11.0.18
15:42:42 INFO  Server          ::   OS:     Mac OS X 12.6 aarch64


15:47:58 INFO  Fuseki          :: [107] POST http://localhost:3030/test-in-mem/data
15:47:58 INFO  Fuseki          :: [107] Filename: load-test.ttl, Content-Type=application/octet-stream, Charset=null => Turtle : Count=1 Triples=1 Quads=0
15:47:58 INFO  Fuseki          :: [107] 200 OK (35 ms)
15:48:38 INFO  Fuseki          :: [108] POST http://localhost:3030/test-in-mem/data
15:48:38 INFO  Fuseki          :: [108] Filename: load-test.ttl, Content-Type=application/octet-stream, Charset=null => Turtle : Count=1 Triples=1 Quads=0
15:48:38 INFO  Fuseki          :: [108] 200 OK (13 ms)

Java version on M1 Mac

******** % java -version

openjdk version "11.0.18" 2023-01-17 LTS
OpenJDK Runtime Environment Corretto-11.0.18.10.1 (build 11.0.18+10-LTS)
OpenJDK 64-Bit Server VM Corretto-11.0.18.10.1 (build 11.0.18+10-LTS, mixed mode)

Java version on Azure VM Standard D2s v5

******$ java -version
openjdk version "11.0.19" 2023-04-18 LTS
OpenJDK Runtime Environment Corretto-11.0.19.7.1 (build 11.0.19+7-LTS)
OpenJDK 64-Bit Server VM Corretto-11.0.19.7.1 (build 11.0.19+7-LTS, mixed mode)

I have also tried with JDK17 and no luck. Please see the results below:

Fuseki 4.7.0 in TDB dataset on Mac M1 Max with 10 cores (8 performance and 2 efficiency) and 64GB RAM

openjdk version "17.0.7" 2023-04-18 LTS
OpenJDK Runtime Environment Zulu17.42+19-CA (build 17.0.7+7-LTS)
OpenJDK 64-Bit Server VM Zulu17.42+19-CA (build 17.0.7+7-LTS, mixed mode, sharing)
11:30:17 INFO  Server          ::   Memory: 4.0 GiB
11:30:17 INFO  Server          ::   Java:   17.0.7
11:30:17 INFO  Server          ::   OS:     Mac OS X 12.6 aarch64


11:30:55 INFO  Fuseki          :: [9] POST http://localhost:3030/test/data
11:30:55 INFO  Fuseki          :: [9] Filename: load-test.ttl, Content-Type=application/octet-stream, Charset=null => Turtle : Count=1 Triples=1 Quads=0
11:30:55 INFO  Fuseki          :: [9] 200 OK (661 ms)
11:31:01 INFO  Fuseki          :: [10] POST http://localhost:3030/test/data
11:31:01 INFO  Fuseki          :: [10] Filename: load-test.ttl, Content-Type=application/octet-stream, Charset=null => Turtle : Count=1 Triples=1 Quads=0
11:31:01 INFO  Fuseki          :: [10] 200 OK (315 ms)
11:31:26 INFO  Fuseki          :: [11] POST http://localhost:3030/test/data
11:31:26 INFO  Fuseki          :: [11] Filename: load-test.ttl, Content-Type=application/octet-stream, Charset=null => Turtle : Count=1 Triples=1 Quads=0
11:31:26 INFO  Fuseki          :: [11] 200 OK (342 ms)

Java is not running in emulated mode, please see the details below:

10:43:03 INFO  Server          ::   Memory: 4.0 GiB
10:43:03 INFO  Server          ::   Java:   11.0.18
10:43:03 INFO  Server          ::   OS:     Mac OS X 12.6 aarch64
10:43:03 INFO  Server          ::   PID:    29753

enter image description here

I have tried to search for it but could not find anything. It seems there is overhead of few hundred milliseconds regardless of the file size. I am wondering if anyone has any idea what could be the reason or how to debug it?

1

There are 1 best solutions below

1
RobV On

To eliminate the possibility of the JVM being emulated on your Mac please start up the Fuseki Server, then go Activity Monitor and find the associated java process. There will be a Kind column that will say either Intel or Apple. If it says Intel then you've installed a JVM that's not built for the Apple architecture so its running via Apple's emulation layer.

For example on my M1 MacBook I see the following process:

Java process running with native architecture

i.e. I have a JVM of kind Apple so that means it is running natively. If yours says Intel in that column then you are running via emulation.

I then tried your test and got the following outputs:

09:46:31 INFO  Server          :: Apache Jena Fuseki 4.7.0
09:46:31 INFO  Config          :: FUSEKI_HOME=/Users/rvesse/Documents/Apps/fuseki-4.7.0/.
09:46:31 INFO  Config          :: FUSEKI_BASE=/Users/rvesse/Documents/Apps/fuseki-4.7.0/run
09:46:31 INFO  Config          :: Shiro file: file:///Users/rvesse/Documents/Apps/fuseki-4.7.0/run/shiro.ini
09:46:31 INFO  Config          :: Template file: templates/config-mem
09:46:32 INFO  Server          :: Database: in-memory
09:46:32 INFO  Server          :: Path = /ds
09:46:32 INFO  Server          ::   Memory: 4.0 GiB
09:46:32 INFO  Server          ::   Java:   17.0.4
09:46:32 INFO  Server          ::   OS:     Mac OS X 12.6.5 aarch64
09:46:32 INFO  Server          ::   PID:    3199
09:46:32 INFO  Server          :: Started 2023/05/02 09:46:32 BST on port 3030
09:51:58 INFO  Fuseki          :: [1] POST http://localhost:3030/ds/
09:51:58 INFO  Fuseki          :: [1] Body: Content-Length=44, Content-Type=application/turtle, Charset=null => Turtle : Count=1 Triples=1 Quads=0
09:51:58 INFO  Fuseki          :: [1] 200 OK (10 ms)
09:52:06 INFO  Fuseki          :: [2] POST http://localhost:3030/ds/
09:52:06 INFO  Fuseki          :: [2] Body: Content-Length=44, Content-Type=application/turtle, Charset=null => Turtle : Count=1 Triples=1 Quads=0
09:52:06 INFO  Fuseki          :: [2] 200 OK (2 ms)
09:52:11 INFO  Fuseki          :: [3] POST http://localhost:3030/ds/
09:52:11 INFO  Fuseki          :: [3] Body: Content-Length=44, Content-Type=application/turtle, Charset=null => Turtle : Count=1 Triples=1 Quads=0
09:52:11 INFO  Fuseki          :: [3] 200 OK (2 ms)

So I think we need to rule out emulation before we dive into any further debugging.

For reference I am using the following JDK:

openjdk version "17.0.4" 2022-07-19
OpenJDK Runtime Environment Homebrew (build 17.0.4+0)
OpenJDK 64-Bit Server VM Homebrew (build 17.0.4+0, mixed mode, sharing)

Installed via brew install openjdk