I have a text file and I wrote a script for Linux where it counts all the characters (with spaces) the number of lines and words. I also have to write something that counts the number of paragraphs but I don't know how. If anyone could help me I would really appreciate it.
This is my script:
#!/bin/bash
is_text_file() {
if [[ $(grep -c -P '[\x01-\x7F]' "$1") -gt 0 ]]; then
return 0
else
return 1
fi
}
if [ $# -lt 2 ]; then
echo "Usage: $0 -file FILE_PATH [-occurrences NUMBER]"
exit 1
fi
while [ "$#" -gt 0 ]; do
case "$1" in
-file)
file="$2"
shift 2
;;
-occurrences)
occurrences="$2"
shift 2
;;
*)
echo "Invalid flag: $1"
exit 1
;;
esac
done
if ! is_text_file "$file"; then
echo "The specified file is not a text file."
exit 1
fi
word_count=$(cat "$file" | tr -s '[:space:]' '\n' | wc -w)
line_count=$(cat "$file" | grep -c '^')
character_count=$(cat "$file" | wc -c)
paragraph_count=$(awk 'BEGIN { RS = "" } { print NF }' "$file")
echo "Word count: $word_count"
echo "Line count: $line_count"
echo "Character count: $character_count"
echo "Paragraph count: $paragraph_count"
if [ -n "$occurrences" ]; then
echo "Most frequent words:"
cat "$file" | tr -s '[:space:]' '\n' | tr -d '[:punct:]' | tr '[:upper:]' '[:lower:]' | sort | uniq -c | sort -nr | head -n "$occurrences"
fi
This is my text file:
This is a sample text file.
It contains some words and paragraphs.
Feel free to add more, content for testing.
It should say that there are 2 paragraphs, but it says 1.
I appears you may have DOS line endings. That will interfere with using awk in paragraph mode which is the easist way to solve your issue.
Given:
You can use GNU awk or a recent awk that supports more than one character for a record separator:
(Note: Setting
RS="\r?\n(\r?\n)+"is not the same asRS=""See here and here)If you want to use classic Paragraph Mode in awk (with
RS=""), use a method to remove the\rat the line endings. The best is either a nested awk or sed:Or Ruby:
Any of those should print
2regardless of DOS or Unix line endings.If you want total paragraphs, words, and punctuation marks I would do that in Ruby:
Or in awk:
Either prints:
Either of those is easily modified (based on that skeleton) to count whatever you want to count in the text.