Extracting text and comment from Google Doc Python

41 Views Asked by Soham Deshpande At 04 March 2024 at 01:30

I need help with extracting the comments from one of my google docs. Basically I want to get the text that was commented on and also the content from inside the comment box. For example if I commented "This is out of place" on the sentence "Hello World" then I can get both of the texts. If this is not possible to get both, I need the content from the comment box more importantly. The code I have so far is this:

def read_comments(comments):
    comment_text = ''
    for comment in comments:
        comment_text += comment['content']
    return comment_text

def main():
    credentials = get_credentials()
    http = credentials.authorize(Http())
    docs_service = discovery.build(
        'docs', 'v1', http=http, discoveryServiceUrl=DISCOVERY_DOC)
    
    doc = docs_service.documents().get(documentId=DOCUMENT_ID_2).execute()
    doc_content = doc.get('body').get('content')

    comments = docs_service.documents().get(documentId=DOCUMENT_ID_2).execute().get('comments', [])
    comments_text = read_comments(comments)

    print(comments_text)

    sentences = sent_tokenize(comments_text)
    for sentence in sentences:
        sentence = "{This is a PB}" + sentence + "{This is a PB}"
        print(sentence)

if __name__ == '__main__':
    main()

When running this I get no error but there is nothing returned. The list is empty.

Original Q&A

There are 1 best solutions below

msamsami On 04 March 2024 at 03:25

You need to use the Google Docs API to fetch the comments of a Google Doc file. This is because comments are not part of the document's content, they are metadata associated with it. Here is a modified script that uses Google Docs API to fetch the comments' content and quoted file content:

def main():
    credentials = get_credentials()
    http = credentials.authorize(Http())
    gdrive_service = discovery.build(
        "drive", "v3", http=http, discoveryServiceUrl=DISCOVERY_DOC
    )
    
    results = service.comments().list(fileId=file_id, fields='*').execute()
    comments = results.get("comments", [])

    # Now, each item in `comments` is a dictionary, with the following fields:
    # 'content', 'quotedFileContent', 'replies', 'author', 'deleted', 'htmlContent', ...
    # The 'content' field contains the comment text
    # The 'quotedFileContent' field contains the text that was commented on

    comments_text = read_comments(comments)

    # Rest of the code
    ...

Note that Google Drive API must be enabled for your project and the document must be shared with the service account's email address.

Extracting text and comment from Google Doc Python

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in GOOGLE-CLOUD-PLATFORM

Related Questions in GOOGLE-DOCS

Related Questions in TEXT-EXTRACTION

Trending Questions

Popular # Hahtags

Popular Questions