Parsing Rich Text Content into Logical Array Elements: How to Divide Text and Images?

67 Views Asked by At

I am working on a blog platform using a Rails backend as a CMS and a React frontend. In my Rails app, I'm utilizing a rich text editor. I pass the rich text content from the backend to the frontend to populate my blog pages.

In my Rails application, I have a Post serializer that defines how the data is sent to the frontend. Here's the relevant part of the code:

following is my Postserializer

enter code     include FastJsonapi::ObjectSerializer
attributes :id, :created_at, :title ,:summary

attributes :body do |object|        
    object.content.to_s.gsub(/\A<div class="trix-content">(.*)<\/div>\z/m, '\1').strip.html_safe
end

attributes :heroimage do |object|  
    unless object.heroimage.nil?           
        object.heroimage.url&.split("?")&.first
    end
end

attributes :tags do |object|
    object.tags
end
attributes :Author do |object|
    {
        name: object.user.first_name + " " + object.user.last_name,
        id: object.user.id,
        avatar:object.user.member_detail.avatar.url&.split("?")&.first
    }
end

endhere

On the frontend, I fetch the content from the API and populate my blog pages. Here's the corresponding React code:

function BlogPage() {
    useEffect(() => {
        async function fetchData() {
            const resp = await api.postApi.getStoryPage(slug);
            dispatch(setStoryDetails({ data: resp?.data?.data?.data, slug }));
        }
        if (!details[slug]) fetchData();
    }, []);

    return (
        <Skeleton borderRadius={6} h={"100%"} isLoaded={details[slug]} minH={"24rem"}>
            <Box fontFamily={"nunito"} fontSize={"18px"} fontWeight={"400"} className="content" dangerouslySetInnerHTML={{ __html: details[slug]?.attributes?.body }} />
        </Skeleton>
    );
}

rich text content

<div class="trix-content"> <div> <strong>Rails World 2023: Highlights</strong><br><em><br>Rails World 2023 in Amsterdam, a vibrant two-day community conference, showcased the best in Rails development. With 700+ attendees, 29 speakers, 3 keynotes, workshops, and more, the event buzzed with excitement.<br></em><br> </div><div> <br><action-text-attachment sgid="BAh7CEkiCGdpZAY6BkVUSSI0Z2lkOi8vYmFja2VuZC9BY3RpdmVTdG9yYWdlOjpCbG9iLzEzP2V4cGlyZXNfaW4GOwBUSSIMcHVycG9zZQY7AFRJIg9hdHRhY2hhYmxlBjsAVEkiD2V4cGlyZXNfYXQGOwBUMA==--a52fb5dd95e4f5a24e2d78a0de5522b6f12ce5f1" content-type="image/jpeg" url="https://boiling-beyond-54226-89c5a928bfaf.herokuapp.com/rails/active_storage/blobs/redirect/eyJfcmFpbHMiOnsibWVzc2FnZSI6IkJBaHBFZz09IiwiZXhwIjpudWxsLCJwdXIiOiJibG9iX2lkIn19--7663062e81ac06df91a0cc2e4d8d8002550fbc5b/RailsWorld2023-audience.jpeg" filename="RailsWorld2023-audience.jpeg" filesize="405313" width="1440" height="960" previewable="true" presentation="gallery"><figure class="attachment attachment--preview attachment--jpeg"> <img src="https://tv-platform-test.s3.ap-northeast-1.amazonaws.com/y2b46w8an3p3nysvkrxk4yshfkn5"> <figcaption class="attachment__caption"> <span class="attachment__name">RailsWorld2023-audience.jpeg</span> <span class="attachment__size">396 KB</span> </figcaption> </figure> </action-text-attachment><br><br> </div><div><br></div><div class="attachment-gallery attachment-gallery--2"> <action-text-attachment sgid="BAh7CEkiCGdpZAY6BkVUSSI0Z2lkOi8vYmFja2VuZC9BY3RpdmVTdG9yYWdlOjpCbG9iLzE0P2V4cGlyZXNfaW4GOwBUSSIMcHVycG9zZQY7AFRJIg9hdHRhY2hhYmxlBjsAVEkiD2V4cGlyZXNfYXQGOwBUMA==--8b9a8375f9643d4fa6706c677ec88283dc9781a7" content-type="image/jpeg" url="https://boiling-beyond-54226-89c5a928bfaf.herokuapp.com/rails/active_storage/blobs/redirect/eyJfcmFpbHMiOnsibWVzc2FnZSI6IkJBaHBFdz09IiwiZXhwIjpudWxsLCJwdXIiOiJibG9iX2lkIn19--0713e15959a2165c160a4bb7e04b9c5044e8c2da/RailsWorld2023-sponsorlounge.jpeg" filename="RailsWorld2023-sponsorlounge.jpeg" filesize="664576" width="1440" height="960" previewable="true" presentation="gallery"><figure class="attachment attachment--preview attachment--jpeg"> <img src="https://tv-platform-test.s3.ap-northeast-1.amazonaws.com/3dmhfrh1nt6eohj6mfdr1z2a9yc1"> <figcaption class="attachment__caption"> <span class="attachment__name">RailsWorld2023-sponsorlounge.jpeg</span> <span class="attachment__size">649 KB</span> </figcaption> </figure> </action-text-attachment><action-text-attachment sgid="BAh7CEkiCGdpZAY6BkVUSSI0Z2lkOi8vYmFja2VuZC9BY3RpdmVTdG9yYWdlOjpCbG9iLzE1P2V4cGlyZXNfaW4GOwBUSSIMcHVycG9zZQY7AFRJIg9hdHRhY2hhYmxlBjsAVEkiD2V4cGlyZXNfYXQGOwBUMA==--bf959983f11f26237de76e914f7af0bc284ed774" content-type="image/jpeg" url="https://boiling-beyond-54226-89c5a928bfaf.herokuapp.com/rails/active_storage/blobs/redirect/eyJfcmFpbHMiOnsibWVzc2FnZSI6IkJBaHBGQT09IiwiZXhwIjpudWxsLCJwdXIiOiJibG9iX2lkIn19--23e36e544189161d5632228fcaf03bffeb77c96b/RailsWorld2023-wafris.jpeg" filename="RailsWorld2023-wafris.jpeg" filesize="654453" width="1440" height="960" previewable="true" presentation="gallery"><figure class="attachment attachment--preview attachment--jpeg"> <img src="https://tv-platform-test.s3.ap-northeast-1.amazonaws.com/sun455kpa4rv7xt5q6g3kcm5t8ry"> <figcaption class="attachment__caption"> <span class="attachment__name">RailsWorld2023-wafris.jpeg</span> <span class="attachment__size">639 KB</span> </figcaption> </figure> </action-text-attachment> </div><div> <br>Rails Foundation Core &amp; Contributing members were thrilled to connect with the community, gaining valuable insights and sharing the enthusiasm for Rails' future.</div> </div

I've attempted to split the content, but it doesn't work correctly. Additionally, I tried using regular expressions to extract attachments, but I also need to maintain their order in the content.

Could you please help me find a solution to properly handle and display the rich text content on my blog pages?

1

There are 1 best solutions below

0
zaphodbln_ On

I did something similar - using Trix editor to create rich-text emails. I would recommend to use an HTML parser - in my case I used Nokogiri. Creating your own HMTL parser which is safe to malicious input might become a bit strenuous.

I did this on the Rails backend. Maybe you want to use this approach to extract the attachments and send them through your API.

Using Nokogiri you could do something like:

attributes :body do |object|        
    Nokogiri::HTML(object.content.to_s, nil, 'UTF-8').search("body")
end

To find the attachments use:

doc = Nokogiri::HTML(object.content.to_s, nil, 'UTF-8').search("body")   # or whereever your HMTL-Code resides

doc.search("action-text-attachment").each do |att|
  
  # in my case I needed the acutal files on the hard drive - adapt as neccessary
  if att.attributes["filename"]
    filename = att.attributes["filename"].value
  else
    match = att.attributes["url"].value.match(/\.[a-z]*$/i)
    if match
      extension = match[0]
    else
      extension = ""
    end
    filename = (Digest::SHA2.hexdigest att.attributes["url"].value) + extension
  end

  # save the file to hdd -- omitted for the sake of clarity

  # use att.attributes["content-type"].value.match(/^image/) to treat images etc...
end

Perhaps you can use this example to find your solution