I am working through a python book.. but using Julialang instead.. in order to learn the language etc... and I have come upon another area here where I am not quite clear ..
but when i start tossing more complex matrices it fell apart..
include("activation_function_exercise/spiral_data.jl")
include("activation_function_exercise/dense_layer.jl")
include("activation_function_exercise/activation_relu.jl")
include("activation_function_exercise/activation_softmax.jl")
coords, color = spiral_data(100, 3)
dense1 = LayerDense(2,3)
dense2 = LayerDense(3,3)
forward(dense1, coords)
println("Forward 1 layer")
activated_output = relu_activation(dense1.output)
forward(dense2, activated_output)
println("Forward 2 layer")
activated_output2 = softmax_activation(dense2.output)
println("\n", activated_output2)
I get a proper matrix back
julia> activated_output2
300×3 Matrix{Float64}:
0.00333346 0.00333337 0.00333335
0.00333345 0.00333337 0.00333335
0.00333345 0.00333336 0.00333335
0.00333344 0.00333336 0.00333335
0.00333343 0.00333336 0.00333334
0.00333311 0.00333321 0.00333322
but the book has
>>>
[[0.33333 0.3333 0.3333]
...
Seems I am an order of magnitude lower than the book? even when using FluxMLs softmax function
EDIT:
I thought maybe my ReLU activation code was causing the discrepancy.. and tried switching to the FluxML NNlib version... but get same activated_output2
with 0.0033333
instead of 0.333333
will keep checking other parts like my forward function
EDIT2:
Adding my DenseLayer
implementation for completeness
DenseLayer
# see https://github.com/FluxML/Flux.jl/blob/b78a27b01c9629099adb059a98657b995760b617/src/layers/basic.jl#L71-L111
using Base: Integer, Float64
mutable struct LayerDense
weights::Matrix{Float64}
biases::Matrix{Float64}
num_inputs::Integer
num_neurons::Integer
output::Matrix{Float64}
LayerDense(num_inputs::Integer, num_neurons::Integer) = new(0.01 * randn(num_inputs, num_neurons), zeros((1, num_neurons)),num_inputs, num_neurons)
end
function forward(layer::LayerDense, inputs::Matrix{Float64})
layer.output = inputs * layer.weights .+ layer.biases
end
EDIT3:
Using the library.. I started inspecting my spiral_data
implementation.. seems within reason
Python
import numpy as np
import nnfs
from nnfs.datasets import spiral_data
nnfs.init()
X, y = spiral_data(samples=100, classes=3)
print(X[:4]). # just check the first couple
>>>
[[0. 0. ]
[0.00299556 0.00964661]
[0.01288097 0.01556285]
[0.02997479 0.0044481 ]]
JuliaLang
include("activation_function_exercise/spiral_data.jl")
coords, color = spiral_data(100, 3)
julia> coords
300×2 Matrix{Float64}:
0.0 0.0
-0.00133462 0.0100125
0.00346739 0.0199022
-0.00126302 0.0302767
0.00184948 0.0403617
0.0113095 0.0492225
0.0397276 0.0457691
0.0144484 0.0692151
0.0181726 0.0787382
0.0320308 0.0850793
turned out I was using the
NNlib
softmax on the entire matrix.. which the python book was NOT doing.. and all in needed to do was to modify mysoftmax()
call likesoThen the output at the end of my big long example comes out as expected