User-defined tuple-based data constructors

192 Views Asked by At

Me trying to grok The Little MLer again. TLMLer has this SML code

datatype a pizza = Bottom | Topping of (a * (a pizza))
datatype fish = Anchovy | Lox | Tuna

which I've translated as

data PizzaSh a = CrustSh | ToppingSh a (PizzaSh a)
data FishPSh = AnchovyPSh | LoxPSh | TunaPSh

and then an alternative closer to TLMLer perhaps

data PizzaSh2 a = CrustSh2 | ToppingSh2 (a, PizzaSh2 a)

And from each I create a pizza

fpizza1 = ToppingSh AnchovyPSh (ToppingSh TunaPSh (ToppingSh LoxPSh CrustSh))
fpizza2 = ToppingSh2 (AnchovyPSh, ToppingSh2 (LoxPSh, ToppingSh2 (TunaPSh, CrustSh2)))

respectively, which are of type PizzaSh FishPSh and PizzaSh2 FishPSh respectively.

But the second version (which is arguably the closer to the original ML version) seems "offbeat." It's as if I'm creating a 2-tuple when I "cons" toppings together where the second member recursively expands . May I assume the parametric data constructor "function" of PizzaSh2 doesn't literally build a tuple, it's just borrowing the tuple as a cons strategy, correct? Which is preferable in Haskell, PizzaSh or PizzaSh2? As I understand, a tuple (cartesian product) data type will have a single constructor, e.g., data Point a b = Pt a b, not the disjoint union of ored-together (|) constructors. In SML the "*" indicates product, i.e., tuple, but, again, is this just a "tuple-like thing," i.e., it's just a tuple-looking way to cons a pizza together?

2

There are 2 best solutions below

3
chi On BEST ANSWER

In Haskell, we prefer this style:

data PizzaSh a = CrustSh | ToppingSh a (PizzaSh a)

There is no need in Haskell to use a tuple there, since a data constructor like ToppingSh can take multiple arguments.

Using an additional pair as in

data PizzaSh2 a = CrustSh2 | ToppingSh2 (a, PizzaSh2 a)

creates a type which is almost isomorphic to the previous one, but is more cumbersome to handle since it requires to use more parentheses. E.g.

foo (ToppingSh x y)
-- vs
foo (ToppingSh2 (x, y))

bar :: PizzaSh a -> ...
bar (ToppingSh x y) = ....
-- vs
bar (ToppingSh2 (x, y)) = ...

Further, the type is indeed only almost isomorphic. When using an additional pair, because of laziness, we have one more value which can be represented in the type: we have a correpondence

ToppingSh x y     <->    ToppingSh2 (x, y)

which breaks down in the case

???               <->    ToppingSh2 undefined

That is, ToppinggSh2 can be applied to a non-terminating (or otherwise exceptional), pair-valued expression, and that constructs a value which can not be represented using ToppingSh.

Operationally, to achieve that GHC uses a double indirection (roughly pointer-to-pointer, or thunk-returning-pair-of-thunks), which further slows down the code. Hence, it's also a bad choice from a performance point of view, if one cares about such micro-optimizations.

0
Carl On

As far as the Haskell side, it absolutely is nesting a (,) constructor inside the ToppingSh constructor. It would violate Haskell's non-strict semantics to not do the nesting you requested. If the nesting was removed, you'd be unable to distinguish between undefined :: PizzaSh2 () and ToppingSh undefined :: PizzaSh2 (). And yes, most of the time, that isn't what you want. PizzaSh is the much more natural formulation in Haskell unless you have a particular need to be able to introduce another bottom into the evaluation process.

I can't address what's going on behind the scenes in any particular ML implementation. Though I can say that with strict evaluation semantics, there isn't a behavioral difference to observe, meaning compilers are free to use a wider variety of approaches.