I am building an experimental PHP app that processes poems in Cyrillic UTF-8 characters. I want to achieve the following:
- Count the occurrences of every character and total counts for categories like "all consonants". It might include special characters and punctuation, as long as I can hide some of them in the output. I work on UTF-8, so I can only use multibyte functions. Using count_chars() is not a possibility :(
- Preserve line breaks and capitalization. I keep multiple copies of the original text with different formatting. They may look redundant, but I want to preserve as much information as possible.
- Change HTML formatting of certain characters based on a condition, e.g. give vowels and consonants different background color, or highlight every occurrence of a chosen character. As far as I understand, first I need to split my string into lines (to preserve the breaks), then turn each of them into an array of 1-character chunks. For the output I would join() lines back. Unfortunately, I couldn't find any ideas on how to apply HTML to array values to solve such problem as mine.
What I tried
On top of not knowing how to do some things, I encountered some minor problems. Here's step by step what I do now.
I collect a poem through post method. Poem in English for illustration purposes only.
Text sample:
We shall not cease from exploration
And the end of all our exploring
Will be to arrive where we started
And know the place for the first time.
I numbered the steps hoping to make commenting easier.
1. Getting the value with and without tags
This is how it looks in htmlentities() after being submitted through textarea:
$string = "We shall not cease from exploration<br /> And the end of all our exploring<br /> Will be to arrive where we started<br /> And know the place for the first time."
How I output line breaks:
$poem = nl2br($string);
Here's a copy without tags:
$droptags = strip_tags($poem);
2. Counting characters
This is my rudimentary attempt at count_chars() that lacks counting loop(s):
$poem2array = preg_split('//u', $droptags, null, PREG_SPLIT_NO_EMPTY);
$unique_characters = array_unique($poem2array);
The output is following:
(
[0] => W
[1] => e
[2] =>
...
)
3. Splitting lines into arrays
Splitting into lines:
$lines = preg_split('<br />', $showtags);
My problem here is that the array looks like this:
(
[0] => We shall not cease from exploration<
[1] => >
And the end of all our exploring<
[2] => >
Will be to arrive where we started<
[3] => >
And know the place for the first time.
)
My attempt to split the text into nested arrays. I know it's broken because I can only get the last line.
foreach($lines as $line) {
$line = preg_split('//u', $line, null, PREG_SPLIT_NO_EMPTY);
}
4. HTML styling
As for HTML styling of arrays, I have no ideas. My reference arrays would look like this:
$vowels = array("a", "e", "i");
$consonants = array("b", "c", "d");
$fontcolor = array("vowels" => "blue",
"consonants" => "orange");
Counting characters
Splitting lines into arrays
In this case I had to do a tricky. I had to change the marker from < br /> to another marker, in this case ; otherwise the > will always appear
In yoour code I suggest to change the name of variable otherwise it wil mix with the variable $line used in the loop