Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Segfault on np.maximum(series, ...) #60611

Open
2 of 3 tasks
ssche opened this issue Dec 27, 2024 · 6 comments
Open
2 of 3 tasks

BUG: Segfault on np.maximum(series, ...) #60611

ssche opened this issue Dec 27, 2024 · 6 comments
Labels
Bug Regression Functionality that used to work in a prior pandas version ufuncs __array_ufunc__ and __array_function__

Comments

@ssche
Copy link
Contributor

ssche commented Dec 27, 2024

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import numpy as np
import pandas as pd
a = [-3.22, 4]
x = pd.Series(a)
np.maximum(x, 0, where=x > 2)

Issue Description

Segmentation fault (core dumped) when executing above code.

np.maximum(...) goes into an infinite call cycle which eventually exceeds the max. stack size.

Call stack (bottom up):

...
array_ufunc, arraylike.py:399
__array_ufunc__, generic.py:2171
array_ufunc, arraylike.py:399
__array_ufunc__, generic.py:2171
array_ufunc, arraylike.py:399
__array_ufunc__, generic.py:2171

__array_ufunc__, generic.py:2171 (core/generic.py):

class NDFrame
    ...
    @final
    def __array_ufunc__(
        self, ufunc: np.ufunc, method: str, *inputs: Any, **kwargs: Any
    ):
        return arraylike.array_ufunc(self, ufunc, method, *inputs, **kwargs)  <--

array_ufunc, arraylike.py:399 (core/arraylike.py):


    elif self.ndim == 1:
        # ufunc(series, ...)
        inputs = tuple(extract_array(x, extract_numpy=True) for x in inputs)
        result = getattr(ufunc, method)(*inputs, **kwargs)   <--
    else:
        # ufunc(dataframe)
        if method == "__call__" and not kwargs:

Expected Behavior

No recursion and successful execution of code. This used to work fine in pandas==2.1.1 (or perhaps even higher).

Installed Versions

INSTALLED VERSIONS

commit : 0691c5c
python : 3.13.1
python-bits : 64
OS : Linux
OS-release : 6.12.5-200.fc41.x86_64
Version : #1 SMP PREEMPT_DYNAMIC Sun Dec 15 16:48:23 UTC 2024
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : en_AU.UTF-8
LOCALE : en_AU.UTF-8

pandas : 2.2.3
numpy : 2.2.1
pytz : 2020.4
dateutil : 2.9.0.post0
pip : 24.3.1
Cython : 3.0.11
sphinx : None
IPython : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : None
blosc : None
bottleneck : 1.4.2
dataframe-api-compat : None
fastparquet : None
fsspec : None
html5lib : None
hypothesis : None
gcsfs : None
jinja2 : None
lxml.etree : None
matplotlib : None
numba : None
numexpr : 2.10.2
odfpy : None
openpyxl : 3.1.2
pandas_gbq : None
psycopg2 : 2.9.10
pymysql : None
pyarrow : 18.1.0
pyreadstat : None
pytest : 8.3.4
python-calamine : None
pyxlsb : None
s3fs : None
scipy : 1.14.1
sqlalchemy : None
tables : 3.10.1
tabulate : None
xarray : None
xlrd : 2.0.1
xlsxwriter : None
zstandard : None
tzdata : 2024.2
qtpy : None
pyqt5 : None

@ssche ssche added Bug Needs Triage Issue that has not been reviewed by a pandas team member Regression Functionality that used to work in a prior pandas version ufuncs __array_ufunc__ and __array_function__ labels Dec 27, 2024
@rhshadrach
Copy link
Member

Thanks for the report, I am not able to get the example working on pandas 2.1.1. Can you post the environment details where you get this working?

Versions
INSTALLED VERSIONS
------------------
commit              : e86ed377639948c64c429059127bcf5b359ab6be
python              : 3.11.11.final.0
python-bits         : 64
OS                  : Linux
OS-release          : 6.8.0-49-generic
Version             : #49~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Nov  6 17:42:15 UTC 2
machine             : x86_64
processor           : x86_64
byteorder           : little
LC_ALL              : None
LANG                : en_US.UTF-8
LOCALE              : en_US.UTF-8

pandas              : 2.1.1
numpy               : 1.26.4
pytz                : 2024.2
dateutil            : 2.9.0.post0
setuptools          : 59.6.0
pip                 : 24.2
Cython              : 3.0.11
pytest              : 8.3.3
hypothesis          : 6.112.1
sphinx              : 8.0.2
blosc               : 1.11.2
feather             : None
xlsxwriter          : 3.2.0
lxml.etree          : 5.3.0
html5lib            : 1.1
pymysql             : 1.4.6
psycopg2            : 2.9.9
jinja2              : 3.1.4
IPython             : 8.27.0
pandas_datareader   : None
bs4                 : 4.12.3
bottleneck          : 1.4.0
dataframe-api-compat: None
fastparquet         : 2024.5.0
fsspec              : 2024.9.0
gcsfs               : 2024.9.0post1
matplotlib          : 3.9.2
numba               : 0.60.0
numexpr             : 2.10.1
odfpy               : None
openpyxl            : 3.1.5
pandas_gbq          : None
pyarrow             : 17.0.0
pyreadstat          : 1.2.7
pyxlsb              : 1.0.10
s3fs                : 2024.9.0
scipy               : 1.14.1
sqlalchemy          : 2.0.35
tables              : 3.10.1
tabulate            : 0.9.0
xarray              : 2024.9.0
xlrd                : 2.0.1
zstandard           : 0.23.0
tzdata              : 2024.1
qtpy                : None
pyqt5               : None

@rhshadrach rhshadrach added the Needs Info Clarification about behavior needed to assess issue label Dec 28, 2024
@ssche
Copy link
Contributor Author

ssche commented Dec 29, 2024

Interesting. It works for me, right off the bat. See this:

>>> import numpy as np
>>> import pandas as pd
>>> a = [-3.22, 4]
>>> x = pd.Series(a)
>>> np.maximum(x, 0, where=x > 2)
0    6.900705e-310
1     4.000000e+00
dtype: float64
>>> 
>>> pd.show_versions()
virtualenv/lib/python3.11/site-packages/_distutils_hack/__init__.py:33: UserWarning: Setuptools is replacing distutils.
  warnings.warn("Setuptools is replacing distutils.")

INSTALLED VERSIONS
------------------
commit              : e86ed377639948c64c429059127bcf5b359ab6be
python              : 3.11.11.final.0
python-bits         : 64
OS                  : Linux
OS-release          : 6.12.5-200.fc41.x86_64
Version             : #1 SMP PREEMPT_DYNAMIC Sun Dec 15 16:48:23 UTC 2024
machine             : x86_64
processor           : 
byteorder           : little
LC_ALL              : None
LANG                : en_AU.UTF-8
LOCALE              : en_AU.UTF-8

pandas              : 2.1.1
numpy               : 1.24.3
pytz                : 2020.4
dateutil            : 2.8.2
setuptools          : 67.7.2
pip                 : 24.0
Cython              : 0.29.34
pytest              : 7.3.1
hypothesis          : None
sphinx              : None
blosc               : None
feather             : None
xlsxwriter          : 0.9.6
lxml.etree          : None
html5lib            : None
pymysql             : None
psycopg2            : 2.9.6
jinja2              : 2.11.2
IPython             : None
pandas_datareader   : None
bs4                 : None
bottleneck          : 1.3.5
dataframe-api-compat: None
fastparquet         : None
fsspec              : None
gcsfs               : None
matplotlib          : 3.9.2
numba               : None
numexpr             : 2.8.4
odfpy               : None
openpyxl            : 3.1.2
pandas_gbq          : None
pyarrow             : 11.0.0
pyreadstat          : None
pyxlsb              : None
s3fs                : None
scipy               : 1.10.1
sqlalchemy          : 1.3.23
tables              : 3.8.0
tabulate            : None
xarray              : None
xlrd                : 2.0.1
zstandard           : None
tzdata              : 2023.4
qtpy                : None
pyqt5               : None

I'm using numpy 1.24.3, while you tried with numpy 1.26.4. With numpy 1.26.4, I'm running into the same issue that I described (and which you are probably also experiencing with your venv).

@ssche ssche removed the Needs Info Clarification about behavior needed to assess issue label Dec 29, 2024
@ssche
Copy link
Contributor Author

ssche commented Dec 29, 2024

I ran some tests with pandas 2.1.1 and the issue occurred first with numpy 1.25.0, so numpy 1.24.4 was the last version this has been working with pandas 2.1.1.

There's been some changes around __array_ufunc__ in numpy 1.25.0 which may have contributed to the regression. One I found which may be relevant is https://numpy.org/doc/stable/release/1.25.0-notes.html#array-likes-that-define-array-ufunc-can-now-override-ufuncs-if-used-as-where

If the where keyword argument of a numpy.ufunc is a subclass of numpy.ndarray or is a duck type that defines numpy.class.__array_ufunc__ it can override the behavior of the ufunc using the same mechanism as the input and output arguments. Note that for this to work properly, the where.__array_ufunc__ implementation will have to unwrap the where argument to pass it into the default implementation of the ufunc or, for numpy.ndarray subclasses before using super().__array_ufunc__.

Indeed, when I use straight numpy arrays instead of series for the where mask and the first argument, the problem goes away.

>>> import numpy as np
>>> import pandas as pd
>>> a = [-3.22, 4]
>>> x = pd.Series(a)
>>> np.maximum(x.values, 0, where=(x > 2).values)
array([0., 4.])

@rhshadrach
Copy link
Member

Thanks @ssche - agreed that appears to be it. Further investigations and PRs to fix are welcome!

@rhshadrach rhshadrach removed the Needs Triage Issue that has not been reviewed by a pandas team member label Dec 29, 2024
@ssche
Copy link
Contributor Author

ssche commented Jan 2, 2025

This discussion in the PR for numpy/numpy#23219 about compatibility with Dask (and downstream libs in general) may be relevant. I might try to see if I can observe any changes in the argument list of __array_ufunc__ to detect whether this is a where-call (to change the behaviour in that case to avoid the recursion).

@ssche
Copy link
Contributor Author

ssche commented Jan 6, 2025

Would this be a viable start for a fix in arraylike.py (if "where" in kwargs and...)?

def array_ufunc(self, ufunc: np.ufunc, method: str, *inputs: Any, **kwargs: Any):
    ...
    if method == "reduce":
        # e.g. test.series.test_ufunc.test_reduce
        result = dispatch_reduction_ufunc(self, ufunc, method, *inputs, **kwargs)
        if result is not NotImplemented:
            return result

    # We still get here with kwargs `axis` for e.g. np.maximum.accumulate
    #  and `dtype` and `keepdims` for np.ptp

    if "where" in kwargs and isinstance(kwargs["where"], Series):
        where = kwargs["where"]
        kwargs['where'] = where.values

    if self.ndim > 1 and (len(inputs) > 1 or ufunc.nout > 1):
        # Just give up on preserving types in the complex case.
        # In theory we could preserve them for them.
        # * nout>1 is doable if BlockManager.apply took nout and
        #   returned a Tuple[BlockManager].
        # * len(inputs) > 1 is doable when we know that we have
        #   aligned blocks / dtypes.

        # e.g. my_ufunc, modf, logaddexp, heaviside, subtract, add
        inputs = tuple(np.asarray(x) for x in inputs)
        # Note: we can't use default_array_ufunc here bc reindexing means
        #  that `self` may not be among `inputs`
        result = getattr(ufunc, method)(*inputs, **kwargs)
    ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Regression Functionality that used to work in a prior pandas version ufuncs __array_ufunc__ and __array_function__
Projects
None yet
Development

No branches or pull requests

2 participants