Compile time in Julia when training a model in Flux is too high

83 Views Asked by At

I'm training a simple model on Julia using Flux and I'm trying to compare the training times between training on GPU and on CPU (also between Julia and Python using the same model in Python with TensorFlow but I can't seem to make TensorFlow work on GPU for some reason) and when I get the running time of the cell where the model is being trained, most of the time is used in compiling the train cycle, at least when training on the GPU: 40.026761 seconds (90.14 M allocations: 6.862 GiB, 2.40% gc time, 86.56% compilation time) which seems a bit excessive, so is there a way to pre-compile the cycle so the cell runs faster? This is my model:

model = Chain(            
    Conv((3, 3), 3 => 16, relu, pad=SamePad(), stride=(1, 1)),
    MaxPool((2,2), pad=SamePad()),
    Conv((3, 3), 16 => 32, relu, pad=SamePad(), stride=(1, 1)),
    MaxPool((2,2), pad=SamePad()),
    Conv((3, 3), 32 => 64, relu, pad=SamePad(), stride=(1, 1)),
    MaxPool((2,2), pad=SamePad()),
    Flux.flatten,
    Dense(16384 => 32, relu),
    Dense(32 => 16, relu),

    Dense(16 => 1),
)
model2 = gpu(model)

Then I dump the dataset on to the GPU and train the model

train_set_gpu = Flux.DataLoader((X_train, Y_train), batchsize=32) |> gpu
opt = Flux.setup(Adam(), model2)
loss_history = Float32[]

@time for epoch = 1:20
    Flux.train!(model2, train_set_gpu, opt) do m,x,y
        x = x
        y = y
        err = loss(m(x), y)
        ChainRules.ignore_derivatives() do
            push!(loss_history, err)
        end
        return err
    end
end

Since it's a small dataset (600 128x128x3 images), without the compilation time it would take like 5 seconds of training.

0

There are 0 best solutions below