I'm currently using Apache Nutch to crawl a website. Usually, the dump data I acquired by this command
bin/nutch dump -segment crawl/segments -outputDir test_data
However, the returned data folder structure is like this: a1, af, d3,etc...
I want to configure the crawl so that the folder name will be according to the website section such as "About Us", "News" instead of "ca","d3". Thank you
I tried changing the Nutch_site.xml and adding property, but it seems like a first-timer like me lack the know-how to make it work properly