Flattening Tuples in Haskell (pointfree)

166 Views Asked by At

I've recently picked up Haskell and I'm trying writing a simple program to sort acronyms. Basically, I have a list of acronyms - each acronym is a pair of Strings, for example ("DH", "Diffie-Hellman") - and I want to sort them and print them in the following format (Acronym, Meaning, Index).

For example, if

acronyms = [("ECDLP", "elliptic curve discrete logarithm problem"),
            ("DH", "Diffie-Hellman"),
            ("KDF", "key derivation function")]

then the output should be

("DH","Diffie-Hellman",1)
("ECDLP","Elliptic Curve Discrete Logarithm Problem",2)
("KDF","Key Derivation Function",3)

Currently, the code I have is the following.

main :: IO ()
main =
    mapM_
        print
        (zipWith
             (curry (\((a, b), c) -> (a, b, c)))
             (sort . nub $ map (capitalise <$>) acronyms)
             [1 ..])

Here capitalise is a function that I wrote that capitalises the first character of each word in a string. Overall, this program is working but I would like to get some feedback on how I can improve it. Finally, is there any way I could make this curry (\((a, b), c) -> (a, b, c)) part in the pointfree style?

Thanks

3

There are 3 best solutions below

0
amalloy On

I would start by making it less point-free! That curry is a mess, and accomplishing nothing. Just write a lambda that combines a tuple with an index directly.

main :: IO ()
main = mapM_ print . zipWith output [1 ..] . sort . nub . map (fmap capitalise) $ acronyms
  where output idx (acronym, meaning) = (acronym, meaning, idx)

This is fine, but I think the fmap instance for tuples is surprising enough that, in code I wanted anyone else to read, I'd name that function too:

main :: IO ()
main = mapM_ print . zipWith output [1 ..] . sort . nub . map capitaliseDefinition $ acronyms
  where output idx (acronym, meaning) = (acronym, meaning, idx)
        capitaliseDefinition (acronym, meaning) = (acronym, capitalise meaning)

You could also be more modern, and use traverse_ instead of mapM_. I'm sympathetic to the clingers-on of mapM_, though - the name is more obviously indicative of what it does.

2
Silvio Mayolo On

Code Review

First off, the curry really is just adding complexity. You can write a lambda that takes multiple arguments without it. Rather than

curry (\((a, b), c) -> (a, b, c))

consider

\(a, b) c -> (a, b, c)

Now, you seem to have the idea that everything should be pointfree. That's a neat exercise for recreational programming and can be useful in code golf (where the goal is to shorten the code as much as possible), but when you're writing general-purpose software it's often a poor design choice (where the goal is to make the code as readable as possible). So I'd recommend introducing some local variables and some additional functions to take care of the intermediaries.

We can split the zipping part off into a separate function

enumerateAcronyms :: [(a, b)] -> [(a, b, Int)]
enumerateAcronyms xs = zipWith (\(a, b) c -> (a, b, c)) xs [1..]

Note that, although we use this function as [(String, String)] -> [(String, String, Int)], I write it as [(a, b)] -> [(a, b, Int)]. This more general type provides very strong guarantees about what we do with our input. Namely, we won't modify the first two elements of the tuple; all we will do is move them around and associate integers with them.

The Functor instance for (,) a is weird. It took me a minute to realize that (capitalize <$>) meant "do it to the second element of the tuple". So you should either do so explicitly

\(x, y) -> (x, capitalize y)

or use a name that more clearly matches your intention. In this case, we actually have two choices: Control.Arrow.second or Data.Bifunctor.second. They do the same thing and are simply based on different abstractions (the former abstracts on the function type and the latter on the data structure).

second capitalize

My personal preference is to never use the list-specific functions like map and to always use the general functor fmap. That way, if you want to expand this function to be generic and work over, say, sequences or some other data structure, you have less work to do.

nub is O(n^2), which is fine if you're preserving the order of the list. But if you're sorting the list, we can get it in O(n) by rolling it ourselves. So let's write our own nub.

Likewise, I might replace mapM_, which requires Monad, with the identical traverse_, which only requires Applicative and hence works in more general situations.

Finally, I'm not a huge fan of chaining together long sequences of operations. Some people do it, but I don't like reading my code right-to-left, so rather than

main :: IO ()
main = traverse_ print (enumerateAcronyms (uniqSort $ map (second capitalise) acronyms))

I would probably write

main :: IO ()
main =
  let uniqAcronyms = uniqSort $ map (second capitalise) acronyms
  in traverse_ print (enumerateAcronyms uniqAcronyms)

For a grand total of something like

import Data.Char
import Data.List
import Data.Bifunctor
import Data.Foldable

acronyms :: [(String, String)]
acronyms = [("ECDLP", "elliptic curve discrete logarithm problem"),
            ("DH", "Diffie-Hellman"),
            ("KDF", "key derivation function")]

capitaliseFirst :: String -> String
capitaliseFirst [] = []
capitaliseFirst (x:xs) = toUpper x : xs

capitalise :: String -> String
capitalise = unwords . fmap capitaliseFirst . words

enumerateAcronyms :: [(a, b)] -> [(a, b, Int)]
enumerateAcronyms xs = zipWith (\(a, b) c -> (a, b, c)) xs [1..]

uniqSort :: Ord a => [a] -> [a]
uniqSort = go . sort
    where go [] = []
          go [x] = [x]
          go (x:x':xs)
             | x == x' = go (x':xs)
             | otherwise = x : go (x':xs)

main :: IO ()
main =
  let uniqAcronyms = uniqSort $ map (second capitalise) acronyms
  in traverse_ print (enumerateAcronyms uniqAcronyms)

This is much longer than what you wrote, but it's also going to be much clearer to the average Haskell programmer. There's a lot less to have to keep in your head when reading any particular function, so the code is much easier to break down. If you're coming from Java or some other languages, you've seen the opposite side of this: Code is far too verbose and split across ten files, which makes it hard to see what's going on. But in Haskell, we run into the opposite issue: Code can get too short and snazzy and it gets hard to reason about. There's a happy medium to be reached in the middle.

Pointfree

Now, to answer your initial question: Can we pointfree curry (\((a, b), c) -> (a, b, c))? I don't recommend doing it in production code, as it's definitely not readable, but let's explore. First, as mentioned, the curry isn't necessary. This is just

\(a, b) c -> (a, b, c)

which is

\(a, b) c -> (,,) a b c

then it's just a matter of moving variables around and getting rid of the tuple. c can be eliminated, hence

\(a, b) -> (,,) a b

Now we have a 2-tuple on the left and two arguments on the right. The difference between those is uncurry, hence

\(a, b) -> uncurry (,,) (a, b)

and then we eliminate the last element

uncurry (,,)

but of course, I have no idea what that does at a glance, whereas \(a, b) c -> (a, b, c) is pretty self-explanatory.

In general, we can pointfree any Haskell function provided we have catamorphisms for any data structures used (this is maybe for Maybe, or foldr for lists), as well as some basic quality-of-life functions in the prelude. You can use other tricks like the reader monad or liftA2 to make things easier or shorter, but fundamentally you don't need things like that. It's just a matter of being able to move variables around the various Haskell syntax elements. You convert recursion into fix, pattern matching into catamorphisms, and arguments that are in the "wrong" order into several flip calls.

Lambdabot used to be able to automatically pointfree most Haskell expressions using this systematic approach. I'm not sure if it's still around or not.

0
Daniel Wagner On

Parallel list comprehensions completely eliminate the futzing about with tuples (zipWith, curry, the lambda, and (<$>)), which I think makes it much more readable. Also, since you're sorting anyway, you don't need nub -- simply grouping adjacent equal elements will be enough. So:

{-# Language ParallelListComp #-}

main = mapM_ print
    [ (acronym, capitalise meaning, i)
    | (acronym, meaning):_ <- group (sort acronyms)
    | i <- [1..]
    ]