How do i get only the first number in a string inside a matrix in julia?

62 Views Asked by At

I'm new to Julia and i have a dataset with this matrix given. I need to only extract the first number of each of the middle strings. Since they are a saved as one string i don't know how to that.

4×3 Matrix{Any}:
 123.12  "[123.3, 15.4]"   5
 223.12  "[523.3, 85.4]"   6
 323.12  "[623.3, 95.4]"   7
 423.12  "[723.3, 115.4]"  8

I tried to convert the string into a float like this and then get the number but it doesn't really work for me.

number = parse.(Float64, split(str, ","))
3

There are 3 best solutions below

0
Przemyslaw Szufel On

I assume that your data comes from files such as this:

dat="""123.12;[123.3, 15.4];5
223.12;[523.3, 85.4];6
323.12;[623.3, 95.4];7
423.12;[723.3, 115.4];8"""

You read this file using DelimitedFiles

julia> mx = readdlm(IOBuffer(dat),';')
4×3 Matrix{Any}:
 123.12  "[123.3, 15.4]"   5
 223.12  "[523.3, 85.4]"   6
 323.12  "[623.3, 95.4]"   7
 423.12  "[723.3, 115.4]"  8

Since the data in second column is in JSON format you can parse that column using JSON3:

julia> JSON3.read.(mx[:,2], Vector{Float64})
4-element Vector{Vector{Float64}}:
 [123.3, 15.4]
 [523.3, 85.4]
 [623.3, 95.4]
 [723.3, 115.4]

If you want combine this to a matrix you could do:

julia> hcat(Float64.(mx[:,1]),vcat(transpose.(JSON3.read.(mx[:,2], Vector{Float64}))...),Float64.(mx[:, 3]))
4×4 Matrix{Float64}:
 123.12  123.3   15.4  5.0
 223.12  523.3   85.4  6.0
 323.12  623.3   95.4  7.0
 423.12  723.3  115.4  8.0
1
Dan Getz On

Going with Przemyslaw methods is the comprehensive and flexible way. To get the values quickly:

parse.(Float64,replace.(readlines(IOBuffer(dat)), r".*\[(.*),.*"=>s"\1"))

will also work. Using dat as defined in Przemyslaw's answer it gives:

4-element Vector{Float64}:
 123.3
 523.3
 623.3
 723.3
0
Andre Wildberg On

An approach using a regular expression to find the number.

The regex, one or more digits (including negative) -?\d+ zero or more dots \.* and zero or more digits \d*.

rx = r"-?\d+\.*\d*"

Finding the digit in the first i[1] part of the split string using match within an array comprehension and parseing to a number, finally concatenating the result with the other parts using hcat.

hcat(
  dat[:,1], 
  [parse(Float32, match(rx, String(i[1])).match) for i in split.(dat[:,2], ", ")], 
  dat[:,3]
)
4×3 Matrix{Any}:
 123.12  123.3  5
 223.12  523.3  6
 323.12  623.3  7
 423.12  723.3  8

Assuming the data looks like this

dat
4×3 Matrix{Any}:
 123.12  "[123.3, 15.4]"   5
 223.12  "[523.3, 85.4]"   6
 323.12  "[623.3, 95.4]"   7
 423.12  "[723.3, 115.4]"  8