I have hundreds of word documents that needs to be processed but need to organized them first by versions in subfolders.
I basically get a drop of these word documents within a single folder and need to automate the organization moving forward before I get nuts.
So I have a script that basically creates a folder with the same name of the file and moves the file inside that folder, this part is done.
Now I need to go into each subfolder, and get the document version from within the first word page of each document, then create a sub-folder withe version number and move the word file into that subfolder.
The structure should be as follows (taking two folders as examples):
(Folder) Test
(Subfolder) 12.0
Test.docx
(Folder) Test1
(Subfolder) 13.0
Test1.docx
Luckily I was able to figure it out that "doc.paragraphs[6].text" will always return the version information in a single line as follows:
>>> doc.paragraphs[6].text
'Version Number: 12.0'
Would appreciate if someone can point me out to the right direction.
This is the script I have so far:
#!/usr/bin/env python3
import glob, os, shutil, docx, sys
folder = sys.argv[1]
#print(folder)
for file_path in glob.glob(os.path.join(folder, '*.docx')):
new_dir = file_path.rsplit('.', 1)[0]
#print(new_dir)
try:
os.mkdir(os.path.join(folder, new_dir))
except WindowsError:
# Handle the case where the target dir already exist.
pass
shutil.move(file_path, os.path.join(new_dir, os.path.basename(file_path)))
Please see below the complete solution to your requirement.
Note: To know about re.search go through https://www.geeksforgeeks.org/python-regex-re-search-vs-re-findall/
Before execution:
After Execution