I am studying APL and I am facing this problem: in real life cases, it is common to deal with non-rectangular datasets (imagine importing data from a SQL database where the "count(*) ... group by ..." gives different counts for different groups). I have setup an example where I import a time series of currencies exchange rates (both with ⎕CSV and ⎕JSON) and I end up with a table with lot of rows and three columns, the ISO code, the reference date and the rate. I find this table is useless in many cases, for instance if one wants to calculate daily return rates. I have manage to transform this table into a two entries table with Iso Codes in rows and dates in columns. I'd like to know if there are better ways to achieve this (or better) result.
(d h)←⎕csv 'dayliesjan.csv' '' (0 1 0 4 0 1) 1 ⍝ remove header row from data
This creates table d with rows that look like this:
├───┼──────────┼──────────┤
│FJD│2.451 │2024-01-10│
├───┼──────────┼──────────┤
│FJD│2.4514 │2024-01-29│
├───┼──────────┼──────────┤
│FJD│2.4523 │2024-01-12│
├───┼──────────┼──────────┤
Then
dates←∪d[;3] ⍝ create a vector of unique dates
isos←∪d[;1] ⍝ create a vector of unique isos
hdates←(⊂'---'),dates ⍝ prepare a header for the two entries table
PRint the two entries table
hdates⍪isos,(≢isos) (≢dates) ⍴ d[{⊃⍵}⌸d[;1 3];2]
┌───┬──────────┬──────────┬──────────┬──────────┬──────────┬──────────┬──────────┬
│---│2024-01-02│2024-01-03│2024-01-04│2024-01-05│2024-01-08│2024-01-09│2024-01-10│
├───┼──────────┼──────────┼──────────┼──────────┼──────────┼──────────┼──────────┼
│AFN│76.7668 │76.7351 │77.1061 │76.8808 │77.2931 │77.1918 │77.3973 │
├───┼──────────┼──────────┼──────────┼──────────┼──────────┼──────────┼──────────┼
│ALL│103.17 │103.2 │103.88 │104.3 │104.47 │104.29 │104.28 │
├───┼──────────┼──────────┼──────────┼──────────┼──────────┼──────────┼──────────┼
│DZD│147.147 │146.9074 │147.3182 │146.7768 │147.0278 │146.9367 │147.0292 │
├───┼──────────┼──────────┼──────────┼──────────┼──────────┼──────────┼──────────┼
│AOA│917.42 │914.384 │917.186 │914.427 │916.706 │916.444 │917.425 │
├───┼──────────┼──────────┼──────────┼──────────┼──────────┼──────────┼──────────┼
│XCD│2.9581 │2.9481 │2.9573 │2.9487 │2.9554 │2.9538 │2.9554 │
├───┼──────────┼──────────┼──────────┼──────────┼──────────┼──────────┼──────────┼
│ARS│886.1662 │885.2383 │888.5621 │886.6029 │889.3809 │890.4455 │891.496 │
├───┼──────────┼──────────┼──────────┼──────────┼──────────┼──────────┼──────────┼
│AMD│443.13 │441.54 │443.1 │441.75 │442.66 │442.4 │443.14 │
├───┼──────────┼──────────┼──────────┼──────────┼──────────┼──────────┼──────────┼
│AWG│1.9611 │1.9545 │1.9606 │1.9549 │1.9593 │1.9583 │1.9593 │
├───┼──────────┼──────────┼──────────┼──────────┼──────────┼──────────┼──────────┼
On such a table I could easily (so I think) calculate daily return rates.
Does this make sense? Is there a better way?
Really, all you want is the exchange rate for all combinations of isos and dates. This calls for an outer product:
For each combination, we want the row number in
d:Now it is trivial to get the rate:
From here, you can add the headers if you want, but that'll just make usage of the table harder.
You can maybe make the code easier to follow through naming:
Or even creating a custom lookup function: