I have a high level idea of how arithmetic coding works, but in the sources I've read (https://neptune.ai/blog/lossless-data-compression-using-arithmetic-encoding-in-python-and-its-applications-in-deep-learning and https://go-compression.github.io/algorithms/arithmetic/) it's not clear to me what the relationship between the frequency table and the message we're compressing is.
My understanding is the frequency table is determined from some universe of messages, and thus should be representative of messages you'll encode, in expectation. Is this understanding correct?
My second question pertains to why characters with larger frequencies (and thus larger probabilities) take up a large section of the interval. I don't believe this has anything to do with accuracy. Does it somehow speed up the algorithm? Why not make every character have the same probability?
I start with the second question: no, it has nothing to do with speed. Using the probabilities is the very thing that actually does the compression!
The main idea of the compression: use few bits for common messages, and use many bits for rare messages. In the ideal case, we should use exactly 1 bit for a message with 50% probability. So compression actually crucially depends on knowing the frequencies.
Arithmetic coding implements almost ideal coding - it produces a bit for each time a probability range is reduced by half. Low probability symbols produce a smaller range - so they output more bits. High probability symbols produce less bits. Symbols with probability >50% may produce no bits - it would take several such symbols for a single bit!
There are many ways to calculate a frequency table - but it should match the real distribution of the input as close as possible. The closer they are - the better compression you would get. Note, that both encoder and decoder must use the same frequencies!