I would like to programmatically extract the code comments from a Scala source file.
I have access to both the source file and objects of the classes whose comments I am interested in. I am also open to writing the comments in the Scala source file in a specific form to facilitate extraction (though still following Scaladoc conventions).
Specifically, I am not looking for HTML or similar output.
However, a json object I can then traverse to get the comments for each field would be perfectly fine (though json is not a requirement). Ideally, I would like to get a class or class member comment given its "fully qualified" name or an object of the class.
How do I best do this? I am hoping for a solution that is maintainable (without too much effort) from Scala 2.11 to Scala 3.
Appreciate all help!
By this I assume you have that path to the file, which I'll represent in my code as:
TL;DR
Full explanation
First thing to do is to read the contents of the file:
I have made
linesadefto prevent unintended results if callinglinesmultiple times. This is due to the return type ofSource.fromFileand how it handles iterating over the file. This comment here adds an explanation. Since you are reading source code files I think rereading the file is a safe operation to perform and won't lead to memory or performance issues.Now that we have the
contentof the file we can begin to filter out the lines we don't care about. Another way of viewing the problem is that we only want to keep - filter in - the lines that are comments.Edit:
As @jwvh rightly pointed out, where I was using
.trim.startsWithignored comments such as:To address this I've replaced
.trim.startsWithwith.contains.For single line comments this is simple:
Notice the call toNow using.trimabove which is important as often developers start comments intended to match the indentation of the code.trimremoves any whitespace characters at the start of the string..containswhich catches any line with a comment starting anywhere.Now we'll file multi-line comments, or JavaDoc; for example (the content is not important):
The safest thing to do is to fine the lines that the
/*and*/appear on and include all of the lines in between:.zipWithIndexgives us an incrementing number alongside each line. We can use these to represent the line numbers of the source file. At the moment this will give us a list of lines containing/*and*/. We need togroupthese into groups of 2 as all of these kinds of comments will have a matching pair of/*and*/. Once we have these groups we can select, usingslice, all of thelinesstarting from the first index until the last. We want to include the last line so we do a+1to it.Finally we can combine
slashCommentsandjavaDocComments:Regardless of the order in which we join them they won't appear in an ordered list. An improvement that could be made here would be to preserve
lineNumberand order by this at the end.I will include a "too long; didn't read" (TL;DR) version at the top so anyone can just copy the code in full without the step by step explanation.
I hope I have answered your question and provided a useful solution. You mentioned a JSON file as output. What I've provided is a
List[String]in memory which you can process. If output to JSON is required I can update my answer with this.