Most Pythonic way to handle exception caused by functools.reduce when the iterable yields no elements?

88 Views Asked by At

Python's functools.reduce throws an exception when the iterable passed to it yields no elements.

Here's how I currently use it:

some_list = []  # empty list should be permissible

functools.reduce(
    lambda left, right: pandas.merge(left, right, on=['idx'], how='outer'),
    some_list
)

This throws an exception if the list contains no elements.

What I actually want it to do is return None if the list is empty. But that can't be achieved by setting the initial value to None because None cannot be merged with a DataFrame type in the call to pandas.merge.

I could wrap this statement in a function and perform a return-early check like so:

def f(some_list):

    if len(some_list) < 1:
        return None

But this doesn't seem like a great solution. Is there a more elegant way to do it?

1

There are 1 best solutions below

9
chepner On

You can provide an initial value as the 3rd argument to reduce. The initial value is used in two ways:

  1. If the iterable is empty, reduce simply returns the initial value.
  2. If the iterable is non-empty, reduce adds the initial value to the beginning of the iterable.

That is, with an initial value x, reduce(f, iterable, x) is effectively the same as reduce(f, itertools.chain([x], iterable))*.

>>> from operator import add
>>> from functools import reduce
>>> reduce(add, [], "bar")
"bar"
>>> reduce(add, ["foo", "bar"])
"foobar"
>>> reduce(add, ["bar"], "foo")
"foobar"

In your case, you can provide an empty dataframe as the initial value. For example,

import pandas as pd
from functools import partial, reduce


l = pd.DataFrame(zip([1,2,3], ["foo", "bar", "baz"]), columns=["idx", "name"])
r = pd.DataFrame([4,5,6], columns=["idx"])

identity = pd.DataFrame(columns=["idx"])
merger = partial(pd.merge, on='idx', how='outer')

Then

>>> reduce(merger, [], identity)
Empty DataFrame
Columns: [idx]
Index: []
>>> reduce(merger, [l], identity)
   idx name
0    1  foo
1    2  bar
2    3  baz
>>> reduce(merger, [l])
   idx name
0    1  foo
1    2  bar
2    3  baz
>>> reduce(merger, [r])
   idx
0    4
1    5
2    6
>>> reduce(merger, [r], identity)
   idx
0    4
1    5
2    6
>>> reduce(merger, [l, r], identity)
   idx name
0    1  foo
1    2  bar
2    3  baz
3    4  NaN
4    5  NaN
5    6  NaN

* itertools.chain is basically the iterator version of +. Chaining together two iterators produces an iterator that produces the values from the first iterator, followed by the values from the second.