ParetoNBDModel#
- class pymc_marketing.clv.models.pareto_nbd.ParetoNBDModel(data=None, *, model_config=None, sampler_config=None)[source]#
Pareto Negative Binomial Model (Pareto/NBD).
Model for continuous, non-contractual customers, first introduced by Schmittlein et al. [1], with additional derivations and predictive methods by Hardie & Fader [2] [3] [4] [5].
The Pareto/NBD model assumes the time duration a customer is active follows a Gamma distribution, and time between purchases is also Gamma-distributed while the customer is still active.
This model requires data to be summarized by recency, frequency, and T for each customer, using
clv.rfm_summary()or equivalent. Covariates impacting customer dropouts and transaction rates are optional.- Parameters:
- data
DataFrame DataFrame containing the following columns:
customer_id: Unique customer identifierfrequency: Number of repeat purchasesrecency: Time between the first and the last purchaseT: Time between the first purchase and the end of the observation period. Model assumptions require T >= recency
Along with optional covariate columns.
- model_config
dict, optional Dictionary containing model parameters and covariate column names:
r: Shape parameter of time between purchases; defaults toWeibull(alpha=2, beta=1)alpha: Scale parameter of time between purchases; defaults toWeibull(alpha=2, beta=10)s: Shape parameter of time until dropout; defaults toWeibull(alpha=2, beta=1)beta: Scale parameter of time until dropout; defaults toWeibull(alpha=2, beta=10)purchase_covariates: Coefficients for purchase rate covariates; defaults toNormal(0, 3)dropout_covariates: Coefficients for dropout covariates; defaults toNormal.dist(0, 3)purchase_covariate_cols: List containing column names of covariates for customer purchase rates.dropout_covariate_cols: List containing column names of covariates for customer dropouts.
If not provided, the model will use default priors specified in the
default_model_configclass attribute.- sampler_config
dict, optional Dictionary of sampler parameters. Defaults to None.
- data
References
[1]David C. Schmittlein, Donald G. Morrison and Richard Colombo. “Counting Your Customers: Who Are They and What Will They Do Next”. Management Science,Vol. 33, No. 1 (Jan., 1987), pp. 1-24.
[2]Fader, Peter & G. S. Hardie, Bruce (2005). “A Note on Deriving the Pareto/NBD Model and Related Expressions”. http://brucehardie.com/notes/009/pareto_nbd_derivations_2005-11-05.pdf
[3]Fader, Peter & G. S. Hardie, Bruce (2014). “Additional Results for the Pareto/NBD Model”. https://www.brucehardie.com/notes/015/additional_pareto_nbd_results.pdf
[4]Fader, Peter & G. S. Hardie, Bruce (2014). “Deriving the Conditional PMF of the Pareto/NBD Model”. https://www.brucehardie.com/notes/028/pareto_nbd_conditional_pmf.pdf
[5]Fader, Peter & G. S. Hardie, Bruce (2007). “Incorporating Time-Invariant Covariates into the Pareto/NBD and BG/NBD Models”. https://www.brucehardie.com/notes/019/time_invariant_covariates.pdf
Examples
import pymc as pm from pymc_extras.prior import Prior from pymc_marketing.clv import ParetoNBDModel, rfm_summary rfm_df = rfm_summary(raw_data,'id_col_name','date_col_name') # Initialize model with customer data; `model_config` parameter is optional model = ParetoNBDModel( model_config={ "r": Prior("Weibull", alpha=2, beta=1), "alpha: Prior("Weibull", alpha=2, beta=10), "s": Prior("Weibull", alpha=2, beta=1), "beta": Prior("Weibull", alpha=2, beta=10), }, ) # Fit model quickly to large datasets via the default Maximum a Posteriori method model.fit(data=rfm_df,method='map') print(model.fit_summary()) # Use 'demz' for more informative predictions and reliable performance on smaller datasets model.fit(data=rfm_df,method='demz') print(model.fit_summary()) # Predict number of purchases for customers over the next 10 time periods expected_purchases = model.expected_purchases( data=rfm_df, future_t=10, ) # Predict probability of customer making 'n' purchases over 't' time periods # Data parameter is omitted here because predictions are ran on original dataset expected_num_purchases = model.expected_purchase_probability( n=[0, 1, 2, 3], future_t=[10,20,30,40], ) new_data = pd.DataFrame( data = { "customer_id": [0, 1, 2, 3], "frequency": [5, 2, 1, 8], "recency": [7, 4, 2.5, 11], "T": [10, 8, 10, 22] } ) # Predict probability customers will still be active in 'future_t' time periods probability_alive = model.expected_probability_alive( data=new_data, future_t=[0, 3, 6, 9], ) # Predict number of purchases for a new customer over 't' time periods. expected_purchases_new_customer = model.expected_purchases_new_customer( t=[2, 5, 7, 10], )
Methods
ParetoNBDModel.__init__([data, ...])Initialize model configuration and sampler configuration for the model.
Convert the model configuration and sampler configuration from the attributes to keyword arguments.
Build the model from the InferenceData object.
ParetoNBDModel.build_model([data])Build the model.
Create attributes for the inference data.
Compute posterior predictive samples of dropout, purchase rate and frequency/recency of new customers.
Sample from the Gamma distribution representing dropout times for new customers.
ParetoNBDModel.distribution_new_customer_purchase_rate([...])Sample from the Gamma distribution representing purchase rates for new customers.
ParetoNBDModel.distribution_new_customer_recency_frequency([...])Pareto/NBD process representing purchases across the customer population.
Compute expected probability of being alive.
Compute expected probability of n_purchases over future_t time periods.
ParetoNBDModel.expected_purchases([data, ...])Compute expected number of future purchases.
Compute the expected number of purchases for a new customer across t time periods.
ParetoNBDModel.fit([data, method, fit_method])Infer posteriors of model parameters to run predictions.
ParetoNBDModel.fit_summary(**kwargs)Compute the summary of the fit result.
ParetoNBDModel.graphviz(**kwargs)Get the graphviz representation of the model.
Create the initialization kwargs from an InferenceData object.
ParetoNBDModel.load(fname[, check])Create a ModelBuilder instance from a file.
ParetoNBDModel.load_from_idata(idata[, check])Create a ModelBuilder instance from an InferenceData object.
ParetoNBDModel.save(fname, **kwargs)Save the model's inference data to a file.
ParetoNBDModel.set_idata_attrs([idata])Set attributes on an InferenceData object.
ParetoNBDModel.table(**model_table_kwargs)Get the summary table of the model.
ParetoNBDModel.thin_fit_result(keep_every)Return a copy of the model with a thinned fit result.
Attributes
covariate_colsAll covariate column names.
default_model_configDefault model configuration.
default_sampler_configDefault sampler configuration.
dropout_covariate_colsDropout covariate column names from model_config.
fit_resultGet the posterior fit_result.
idGenerate a unique hash value for the model.
posteriorposterior_predictivepredictionspriorprior_predictivepurchase_covariate_colsPurchase covariate column names from model_config.
versionidatasampler_configmodel_config