I am using python to develop a json parser. The idea is to write a json that holds a specific token ($$INSERT_VAR$$) Since this is only a single token whose value(s) I obtain through a command, I think this could be a great enviroment to learn multiprocessing in python.
My idea is a parent process that reads from the input file to append variables to json, launching a child process running the command that obtains the values that have to be written, when a child finishes (SIGCHLD) handle the appropiate data recollection from the (correct) child through a pipe and keep on with the main read loop.
So my first issue on design is data insertion into a file. Since this is basically not possible, my thought is to approach text writes to the resulting file while there isnt a child fetching values, and storing in a variable the text after a token. Which would then append it to the result obtained and writes it wholly to file.
My question would be: Which is better?
- Doing more, smaller reads to obtain text character by character
- Doing less frequent, large reads (i.e.: a line or N ammount of characters) which then the parent processes.
I personally am leaning towards the second one, but I would like to know if there are any advantages and/or inconveniences to my approach.
The right size is system dependent, however one character at a time will be the slowest everywhere.
The following is a bit rough but might be a useful starting point. This implements enough of the methods to behave like a file and act as an interface between your file reading code and the actual file. You can expand the
line.replacewith something a bit more general.a_text_file.txtscript.pyWhen you run this the output is ...