I have a dataframe with 19M rows of different customers (~10K customers) and for their daily consumption over different date ranges. I have resampled this data into weekly consumption and the resulted dataframe is 2M rows. I want to know the ranges of consecutive dates for each customer and select those with the max(range). Any ideas? Thank you!
How to select a range of consecutive dates of a dataframe with many users in pandas
111 Views Asked by dogo At
1
There are 1 best solutions below
Related Questions in PANDAS
- ModuleNotFoundError on .ipynb
- Str object is not callable in pandas
- Need help realigning python fill_between with data points
- AttributeError: module 'numba' has no attribute 'generated_jit'
- Fix error when assigning a list of values to dataframe row
- How to make pandas show large datasets in output?
- merge dataframe but do not sort by merge key
- vim python omnifunc not working some modules
- Preserving DataFrame Modifications Across Options in a Streamlit Application
- How to join 2 datasets by looking up based on a string (full match or part match)
- Python Pandas getting hierarchy path till top management
- How to convert pandas series to integer for use in datetime.fromisocalendar
- reformat numbers stored in array
- How can I resolve this error and work smoothly in deep learning?
- What is the best way to merge two dataframes that one of them has date ranges and the other one has date WITHOUT any shared columns?
Related Questions in DATAFRAME
- Preserving DataFrame Modifications Across Options in a Streamlit Application
- Python Pandas getting hierarchy path till top management
- What is the best way to merge two dataframes that one of them has date ranges and the other one has date WITHOUT any shared columns?
- python pandas plot.bar something wrong
- Subsetting rows with sequence of values and identifying columns where sequence begins
- How to group rows by values to create new columns in Pandas DataFrame?
- How to write an R function to pivot the last n minutes?
- How can I change the groupby scope to find the first value that meets the conditions of a mask?
- Eliminate sub elements in a huge list of strings as long as no duplicates appear
- How to transfer object dataframe in sklearn.ensemble methods
- How can i fix this error ? Attempt to get argmax of an empty sequence
- How can I change the groupby column to find the first row that meets the conditions of a mask if the initial groupby failed to find it?
- How to iteratively create matrices/vectors from columns/unique row values of dataframe, and pass them to subsequent code?
- How to convert scraped HTML document to a dataframe?
- Replacing values on a dataframe row using a specific value as reference
Related Questions in TIME-SERIES
- Measures of similarity for time series data
- Is there an algorithm to identify the increasing Period/Interval of a time series?
- What kind of ARIMA model would be best fit for this data?
- How to load very big timeseries file(s) in Python to do analysis?
- How to write the query statement of the total number of time series by paging in Apache IoTDB?
- error to generate regular raster stack time series in R
- Getting NotImplementedError: While Importing ARMA
- Plotting Non-Uniform Time Series Data from a Text File
- How in SQL can I identify if a value has changed within the current week or vis-a-vis the previous week?
- LSTM : predict_step in PyTorch Lightning
- Slow SELECT statement, possibly due to WHERE?
- R: Error in tseries::garch() Function for Auto GARCH Model Detection
- LSTM multistep forecast
- Sum column depending on values from another column on a single row (Pivot columns)
- gap fill for raster stack in R
Related Questions in DATA-ANALYSIS
- Pneumonia detection, using transfer learning
- duplicates within a 30 day period in samples from location A
- Understanding numeric_only boolean parameter in Pandas
- How can I turn categories into columns with percentage results?
- Unable to filter in power bi dax query
- YTD sum by month, using only latest value for each month
- Stopping a Power BI Table visual slicing the result of a virtual table
- Removing duplicate data conditionally in Excel
- How can I compare the similarity between multiple sets?
- Forecast the revenue for next month using 1 year historical data
- issue using dataset with data analysis project
- How can passive terms be rendered in the calculation of an MFA in R?
- Upsert using DuckDB
- Dynamic Filtering of Calculated Table Not Working with SELECTEDVALUE(slicer) in Power BI
- Mediation Analysis in R with two mediators in a repeated measure experiment (within-subject design)
Related Questions in PANDAS-TIMEINDEX
- Week of year is not correctly shown
- time as x-axis for non-continuous time (as cftime.DatetimeProlepticGregorian)
- pandas how to get mean value of datetime timestamp with some conditions?
- Pandas GroupBy time idxmax w/ empty groups throws exception
- Pandas time series index attribute error when using TsTables & PyTables in creating a table class
- Manipulate the Dataframe to start from the nearest varying Midnight timestamp
- How to count number of values in column based on one timestamp value python and add the count to new column
- How to generate monthly period index with annual frequency?
- To find a chosen date between date range of two columns
- How to select a range of consecutive dates of a dataframe with many users in pandas
- Grouping time-series by some custom datetime range?
- Round all index to 30 min in Pandas datetimeindex
- How to resample intra-day intervals and use .idxmax()?
- Why can't I select whole days from intra-day time series?
- Best way to filter out data from specific month in pandas
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
It would be great if you could post some example code, so the replies will be more specific.
You probably want to do something like
earliest = df.groupby('Customer_ID').min()['Consumption_date']to get the earliest consumption date per customer, andlatest = df.groupby('Customer_ID').max()['Consumption_date']for the latest consumption date, and then take the differencetime_span = latest-earliestto get the time span per customer.Knowing the specific df and variable names would be great