I would like to control the output dtypes for apply on a row. foo and bar below have multiple outputs.
import pandas as pd
def foo(x):
return x['a'] * x['b'], None, x['a'] > x['b']
def bar(x):
return x['a'] * x['b'], None
df = pd.DataFrame([{'a': 10, 'b': 2}, {'a': 10, 'b': 20}])
df2 = df.copy()
df[['product', 'dummy', 'greater']] = df.apply(foo, axis=1, result_type='expand')
df2[['product', 'dummy']] = df2.apply(bar, axis=1, result_type='expand')
The output dtypes are:
| col | df | df2 |
|---|---|---|
| a | int64 | int64 |
| b | int64 | int64 |
| product | int64 | float64 |
| dummy | object | float64 |
| greater | bool | - |
A comment to this question pandas apply changing dtype, suggests that apply returns a series with a single dtype. That may be the case with bar since the outputs can be cast to float. But it doesn't seem to be the case for foo, because then the outputs would need to be object.
Is it possible to control the output dtypes of apply? I.e. get/specify the output dtypes (int, object) for bar, or do I need to cast the dtype at the end?
Background: I have a dataframe where the dummy column has values True, False and None and dtype 'object'. The apply function runs on some corner cases, and introduces NaN instead of None. I'm replacing the NaN with None after apply, but it seems overly complicated.
pandas version 1.5.2
IIUC, you're asking why
productanddummyhave different dtypes after applyingfooandbareven though the values returned by those functions are the same for those new columns ?If so, that's because when
result_type == "expand", there is a specific transformation done behind the scenes withinfer_to_same_shape, which is roughly equivalent to this :Output (foo) :
Output (bar) :
As you can see,
infer_objectskeepsexpandbarinferred asfloat64for both columns (if this is unintuitive, see GH28318).That depends on the computation made by the applied function and the values returned. So yes, you have somehow this kind of control but you can always add
convert_dtypesorastypeat the end.