Attempting to build a publication reference converter - cannot get it to function correctly

33 Views Asked by At

Our general reference style is NLM (numerical) and should follow the below format:

  • List no more than three author names; use et al. after this
  • For articles with fewer than three authors, list their names (in the case of an article with two authors, include ‘and’ in between their names)
  • Publication title in sentence case
  • Use PubMed journal abbreviations
  • Do not italicise journal names
  • Add a full stop between journal name and year
  • Only include the year of publication (do not include day or month)
  • Semi colon after the year, no space before volume and no space before page range after colon
  • Use hyphens in page range with a full stop after (e.g. 26-28.)
  • Do not abridge page numbers
  • All references end in a full stop
  • No use of bold and italics

When I input a reference from the Pubmed website, I want the code to convert this default reference to the above format.

First example:

Pubmed reference:

McInnes IB, Asahina A, Coates LC, Landewé R, Merola JF, Ritchlin CT, Tanaka Y, Gossec L, Gottlieb AB, Warren RB, Ink B, Assudani D, Bajracharya R, Shende V, Coarse J, Mease PJ. Bimekizumab in patients with psoriatic arthritis, naive to biologic treatment: a randomised, double-blind, placebo-controlled, phase 3 trial (BE OPTIMAL). Lancet. 2023 Jan 7;401(10370):25-37. doi: 10.1016/S0140-6736(22)02302-9. Epub 2022 Dec 6. PMID: 36493791.

to convert to

New format:

McInnes IB, Asahina A, Coates LC, et al. Bimekizumab in patients with psoriatic arthritis, naïve to biologic treatment: a randomised, double-blind, placebo-controlled, phase 3 trial (BE OPTIMAL). Lancet. 2023;401(10370):25-37.

Second example:

Reich K, Warren RB, Lebwohl M, Gooderham M, Strober B, Langley RG, Paul C, De Cuyper D, Vanvoorden V, Madden C, Cioffi C, Peterson L, Blauvelt A. Bimekizumab versus Secukinumab in Plaque Psoriasis. N Engl J Med. 2021 Jul 8;385(2):142-152. doi: 10.1056/NEJMoa2102383. Epub 2021 Apr 23. PMID: 33891380.

Pubmed reference:

New format:

Reich K, Warren RB, Lebwohl M, et al. Bimekizumab versus secukinumab in plaque psoriasis. N Engl J Med. 2021;385(2):142-152.

However, I cannot get it to function correctly with the code I have written.

I have tried troubleshooting with chatGPT but no luck with any of those fixes

Here is the code:

import re

def format_pubmed_reference(reference):
\# Improved regex pattern to handle a wider range of reference formats
pattern = re.compile(
r'^(?P\<authors\>.+?).(?P\<title\>.+?).\\s\*(?P\<journal\>\[A-Za-z\\s.&\]+)\\s\*(?P\<year\>\\d{4})\\s\*(;(?P\<volume\>\\d+)(((?P\<issue\>\\d+)))?)?:?(?P\<pages\>\\d+-\\d+)?.?'
)

    match = pattern.search(reference)
    if not match:
        return "Invalid reference format. Please ensure it matches PubMed standards."
    
    parts = match.groupdict()
    
    # Handling authors
    authors_list = [author.strip() for author in parts['authors'].split(',')]
    if len(authors_list) > 3:
        authors_formatted = ', '.join(authors_list[:3]) + ', et al'
    elif len(authors_list) == 2:
        authors_formatted = ' and '.join(authors_list)
    else:
        authors_formatted = parts['authors']
    
    # Formatting the output
    formatted_reference = f"{authors_formatted}. {parts['title']} {parts['journal']}. {parts['year']}"
    if parts.get('volume'):
        formatted_reference += f";{parts['volume']}"
        if parts.get('issue'):
            formatted_reference += f"({parts['issue']})"
    if parts.get('pages'):
        formatted_reference += f":{parts['pages']}."
    else:
        formatted_reference += "."
    
    return formatted_reference
    if name == "main": # Prompting user for input reference_input = input("Enter PubMed reference: ") formatted_reference = format_pubmed_reference(reference_input) print(formatted_reference) 
0

There are 0 best solutions below