Example URL https://bioconductor.org/packages/release/bioc/VIEWS
Currently I'm splitting each individual clump of metadata by every blank line, then converting to a dictionary splitting on the first colon using the string before as the key and the string after as the value. THE ISSUE I'm running is that I am going line by line through each package metadata, some lines do not have colons and I want to append that to the previous value as one complete string.
response = requests.get(
'https://bioconductor.org/packages/release/bioc/VIEWS')
package_list = response.text.split('\n\n')
package_dict = {
package_list.split(':')[0]: package_list.split(':')[1] for package in package_list
}
Try using regex to parse the data:
Prints: