I just started using this year's Advent of Code to learn F# and I immediately stepped on a rake by trying to reuse the IEnumerable from File.ReadLines.
Here are all of the ways I see to solve this:
// Read all lines immediately into array/list
let linesAll = File.ReadAllLines "file.txt"
let linesArray = File.ReadLines "file.txt" |> Array.ofSeq
let linesList = File.ReadLines "file.txt" |> List.ofSeq
// Lazily load and cache for replays
let linesCache = File.ReadLines "file.txt" |> Seq.cache
// Start new filesystem read for every replay
let linesDelay = (fun () -> File.ReadLines "file.txt") |> Seq.delay
let linesSeqExpr = seq { yield! File.ReadLines "file.txt" }
- Are these all semantically identical (for a read-only file)?
- Are
linesDelayandlinesSeqExprthe only ones that don't read the entire file into memory? - Is
linesListslowed down by having to assemble the list backwards? - Are any of these considered more or less idiomatic?
Edit
Here is code that reproduces my issue:
let lines = System.IO.File.ReadLines("alphabet.txt")
for i = 0 to 5 do
let arr = Seq.zip lines (Seq.skip 1 lines) |> Array.ofSeq
printfn "%A %A" i arr
gives output:
0 [|("A", "C"); ("D", "E"); ("F", "G"); ("H", "I"); ("J", "K"); ("L", "M");
("N", "O"); ("P", "Q"); ("R", "S"); ("T", "U"); ("V", "W"); ("X", "Y")|]
1 [|("A", "B"); ("B", "C"); ("C", "D"); ("D", "E"); ("E", "F"); ("F", "G");
("G", "H"); ("H", "I"); ("I", "J"); ("J", "K"); ("K", "L"); ("L", "M");
("M", "N"); ("N", "O"); ("O", "P"); ("P", "Q"); ("Q", "R"); ("R", "S");
("S", "T"); ("T", "U"); ("U", "V"); ("V", "W"); ("W", "X"); ("X", "Y");
("Y", "Z")|]
2 [|("A", "B"); ("B", "C"); ("C", "D"); ("D", "E"); ("E", "F"); ("F", "G");
("G", "H"); ("H", "I"); ("I", "J"); ("J", "K"); ("K", "L"); ("L", "M");
("M", "N"); ("N", "O"); ("O", "P"); ("P", "Q"); ("Q", "R"); ("R", "S");
("S", "T"); ("T", "U"); ("U", "V"); ("V", "W"); ("W", "X"); ("X", "Y");
("Y", "Z")|]
3 [|("A", "B"); ("B", "C"); ("C", "D"); ("D", "E"); ("E", "F"); ("F", "G");
("G", "H"); ("H", "I"); ("I", "J"); ("J", "K"); ("K", "L"); ("L", "M");
("M", "N"); ("N", "O"); ("O", "P"); ("P", "Q"); ("Q", "R"); ("R", "S");
("S", "T"); ("T", "U"); ("U", "V"); ("V", "W"); ("W", "X"); ("X", "Y");
("Y", "Z")|]
4 [|("A", "B"); ("B", "C"); ("C", "D"); ("D", "E"); ("E", "F"); ("F", "G");
("G", "H"); ("H", "I"); ("I", "J"); ("J", "K"); ("K", "L"); ("L", "M");
("M", "N"); ("N", "O"); ("O", "P"); ("P", "Q"); ("Q", "R"); ("R", "S");
("S", "T"); ("T", "U"); ("U", "V"); ("V", "W"); ("W", "X"); ("X", "Y");
("Y", "Z")|]
5 [|("A", "B"); ("B", "C"); ("C", "D"); ("D", "E"); ("E", "F"); ("F", "G");
("G", "H"); ("H", "I"); ("I", "J"); ("J", "K"); ("K", "L"); ("L", "M");
("M", "N"); ("N", "O"); ("O", "P"); ("P", "Q"); ("Q", "R"); ("R", "S");
("S", "T"); ("T", "U"); ("U", "V"); ("V", "W"); ("W", "X"); ("X", "Y");
("Y", "Z")|]
Looks like Seq.zip lines (Seq.skip 1 lines) expression is triggering a bug by doing two enumerations at the same time.
Edit 2
Reproduction in C#. Slightly different order because I'm not skipping one on the right side.
var lines = File.ReadLines("alphabet.txt");
for (int i = 0; i < 5; i++)
{
var zipped = new List<(string, string)>();
var enum1 = lines.GetEnumerator();
var enum2 = lines.GetEnumerator();
while (enum1.MoveNext() && enum2.MoveNext())
{
zipped.Add((enum1.Current, enum2.Current));
}
Console.WriteLine($"{i} [{string.Join(',', zipped)}]");
}
0 [(A, B),(C, D),(E, F),(G, H),(I, J),(K, L),(M, N),(O, P),(Q, R),(S, T),(U, V),(W, X),(Y, Z)]
1 [(A, A),(B, B),(C, C),(D, D),(E, E),(F, F),(G, G),(H, H),(I, I),(J, J),(K, K),(L, L),(M, M),(N, N),(O, O),(P, P),(Q, Q),(R, R),(S, S),(T, T),(U, U),(V, V),(W, W),(X, X),(Y, Y),(Z, Z)]
2 [(A, A),(B, B),(C, C),(D, D),(E, E),(F, F),(G, G),(H, H),(I, I),(J, J),(K, K),(L, L),(M, M),(N, N),(O, O),(P, P),(Q, Q),(R, R),(S, S),(T, T),(U, U),(V, V),(W, W),(X, X),(Y, Y),(Z, Z)]
3 [(A, A),(B, B),(C, C),(D, D),(E, E),(F, F),(G, G),(H, H),(I, I),(J, J),(K, K),(L, L),(M, M),(N, N),(O, O),(P, P),(Q, Q),(R, R),(S, S),(T, T),(U, U),(V, V),(W, W),(X, X),(Y, Y),(Z, Z)]
4 [(A, A),(B, B),(C, C),(D, D),(E, E),(F, F),(G, G),(H, H),(I, I),(J, J),(K, K),(L, L),(M, M),(N, N),(O, O),(P, P),(Q, Q),(R, R),(S, S),(T, T),(U, U),(V, V),(W, W),(X, X),(Y, Y),(Z, Z)]
Edit 3
This is a known issue and will not be fixed to keep compatibility.
// - IEnumerator<T> instances from the same IEnumerable<T> party on the same underlying
// reader.
What problem did you have by reusing the sequence from
File.ReadLines? The following code works fine for me:Anyway, here's my take on the answers to your questions:
They're similar, but not identical, because they have different types. E.g. An array and a list don't have exactly the same semantics. (Also, keep in mind that even a read-only file, can be deleted, which will affect the lazy versions.)
linesDelayandlinesSeqExprthe only ones that don't read the entire file into memory?No,
linesCacheshould also only read as many lines as are needed.I don't think so. See source of
List.ofSeqprimitive here.I think they're all fine, depending on the circumstance. Personally, I often just use
File.ReadAllLinesunless I have reason to believe the file is huge.