PHP: Splitting a string into an array around words wrapped in tildes & keeping those words

Question

PHP: Splitting a string into an array around words wrapped in tildes & keeping those words

90 Views Asked by indextwo At 28 February 2023 at 20:33

It's very late and I think I've been staring at this too long to figure out, but: I have been provided a bunch of raw text where anything within in tildes (~) is a title, and everything else is just plain text. However, the text may or may not include newlines; for example:

Title & text on the same line: ~THE BURGER MINI~A tiny little burger patty in a tiny little bun.

Title & text on different lines:

~THE BURGER MAX~
A gigantic hunk of steak in between two toasted baguettes, each stuffed with beef & cheese`

A combination of both:

~THE BURGER ZERO~
No burger, no bun, just air.

~THE BURGER ITALIANO~
A soft mix of ground beef & mozzarella stuffed between two pillowy pieces of pasta.~NOTE~This is basically giant ravioli.

Ultimately the kind of output I'm trying to achieve would be something like:

Array
(
    [0] => Array
        (
            [title] => THE BURGER ZERO
        )

    [1] => Array
        (
            [text] => No burger, no bun, just air.
        )

    [2] => Array
        (
            [title] => THE BURGER ITALIANO
        )

    [3] => Array
        (
            [text] => A soft mix of ground beef & mozzarella stuffed between two pillowy pieces of pasta.
        )

    [4] => Array
        (
            [title] => NOTE
        )

    [5] => Array
        (
            [text] => This is basically giant ravioli.
        )

)

...so I can then differentiate between titles & text, but crucially in the order they appear.

I can split the string in newlines into an array with the following:

$tempArray = preg_split('/\s*\R\s*/', trim($str), NULL, PREG_SPLIT_NO_EMPTY);

But after that, I get stuck. Using preg_split on any group within tildes (preg_split('/~(.*?)~/uim', $line);) will give me all of the paragraph text, but loses the titles (as they're being used for the split). I've been banging my head against various forms of preg_match & preg_match_all but all I'm getting is a headache.

Is there a straightforward way to get what I'm after that would work with all of the above examples?

Original Q&A

There are 3 best solutions below

**Alex Howansky** · Answer 1 · 2023-02-28T20:50:35.847000

preg_match_all('/~([^~]+)~\n*([^~\n]+)/', $str, $match);

So, match a tilde, followed by one or more of anything but a tilde, followed by another tilde. Capture what's between the tildes:

~([^~]+)~

Followed by zero or more newlines:

\n*

Followed by one or more of anything but tildes and newlines. And capture that.

([^~\n]+)

This will give you the titles in $match[1] and the descriptions in $match[2]:

print_r($match[1]);

Array
(
    [0] => THE BURGER ZERO
    [1] => THE BURGER ITALIANO
    [2] => NOTE
)

print_r($match[1]);

Array
(
    [0] => No burger, no bun, just air.
    [1] => A soft mix of ground beef & mozzarella stuffed between two pillowy pieces of pasta.
    [2] => This is basically giant ravioli.
)

Which you might then combine into a single array:

$items = array_combine($match[1], $match[2]);
print_r($items);

Array
(
    [THE BURGER ZERO] => No burger, no bun, just air.
    [THE BURGER ITALIANO] => A soft mix of ground beef & mozzarella stuffed between two pillowy pieces of pasta.
    [NOTE] => This is basically giant ravioli.
)

**pr1nc3** · Answer 2 · 2023-02-28T20:51:57.720000

<?php
$input = '~THE BURGER ZERO~
No burger, no bun, just air.

~THE BURGER ITALIANO~
A soft mix of ground beef & mozzarella stuffed between two pillowy pieces of pasta.~NOTE~This is basically giant ravioli.';

$splittedText = array_values(array_filter(explode ("~", $input)));

foreach($splittedText as $key => $value){
    if (ctype_upper(str_replace(' ', '', $value))){
        $splittedText[$key] = ['title' => $value];
    }
    else{
        $splittedText[$key] = ['text' => $value];
    }
}

print_r($splittedText);

This solution is without the usage of any regex.

How it works is that

First explode the whole string on the wave dash
Then clean the array from empty spots, rearrange keys and iterate the array
Check if the value that we are iterating is all capitals (removing the spaces), if it is then we set the key to be "title" otherwise it's "text" as stated in the expected output.

The output is:

  Array
(
    [0] => Array
        (
            [title] => THE BURGER ZERO
        )

    [1] => Array
        (
            [text] => 
No burger, no bun, just air.


        )

    [2] => Array
        (
            [title] => THE BURGER ITALIANO
        )

    [3] => Array
        (
            [text] => 
A soft mix of ground beef & mozzarella stuffed between two pillowy pieces of pasta.
        )

    [4] => Array
        (
            [title] => NOTE
        )

    [5] => Array
        (
            [text] => This is basically giant ravioli.
        )

)

**Casimir et Hippolyte** · Answer 3 · 2023-03-26T21:50:47.580000

A way with preg_split that has the useful option PREG_SPLIT_DELIM_CAPTURE that returns captured parts of the delimiter:

$str = <<<TEXT
~THE BURGER ZERO~
No burger, no bun, just air.

~TRICKY TEST~
Meet me ~5pm.

~THE BURGER ITALIANO~
A soft mix of ground beef & mozzarella stuffed between two pillowy pieces of pasta.~NOTE~This is basically giant ravioli.

~THE BURGER MINI~A tiny little burger patty in a tiny little bun.
TEXT;

$pattern = '/ \s* ~ ( [\p{Lu} ]+ ) ~ \s* /ux';

$arr = preg_split($pattern, $str, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);

print_r(array_chunk($arr, 2));

demo

PHP: Splitting a string into an array around words wrapped in tildes & keeping those words

There are 3 best solutions below

Related Questions in PHP

Related Questions in ARRAYS

Related Questions in REGEX

Related Questions in PREG-MATCH

Trending Questions

Popular # Hahtags

Popular Questions