ParetoNBDModel#

class pymc_marketing.clv.models.pareto_nbd.ParetoNBDModel(data=None, *, model_config=None, sampler_config=None)[source]#

Pareto Negative Binomial Model (Pareto/NBD).

Model for continuous, non-contractual customers, first introduced by Schmittlein et al. [1], with additional derivations and predictive methods by Hardie & Fader [2] [3] [4] [5].

The Pareto/NBD model assumes the time duration a customer is active follows a Gamma distribution, and time between purchases is also Gamma-distributed while the customer is still active.

This model requires data to be summarized by recency, frequency, and T for each customer, using clv.rfm_summary() or equivalent. Covariates impacting customer dropouts and transaction rates are optional.

Parameters:

dataDataFrame

DataFrame containing the following columns:

customer_id: Unique customer identifier
frequency: Number of repeat purchases
recency: Time between the first and the last purchase
T: Time between the first purchase and the end of the observation period. Model assumptions require T >= recency

Along with optional covariate columns.

model_configdict, optional

Dictionary containing model parameters and covariate column names:

r: Shape parameter of time between purchases; defaults to Weibull(alpha=2, beta=1)
alpha: Scale parameter of time between purchases; defaults to Weibull(alpha=2, beta=10)
s: Shape parameter of time until dropout; defaults to Weibull(alpha=2, beta=1)
beta: Scale parameter of time until dropout; defaults to Weibull(alpha=2, beta=10)
purchase_covariates: Coefficients for purchase rate covariates; defaults to Normal(0, 3)
dropout_covariates: Coefficients for dropout covariates; defaults to Normal.dist(0, 3)
purchase_covariate_cols: List containing column names of covariates for customer purchase rates.
dropout_covariate_cols: List containing column names of covariates for customer dropouts.

If not provided, the model will use default priors specified in the default_model_config class attribute.

sampler_configdict, optional

Dictionary of sampler parameters. Defaults to None.

References

[1]

David C. Schmittlein, Donald G. Morrison and Richard Colombo. “Counting Your Customers: Who Are They and What Will They Do Next”. Management Science,Vol. 33, No. 1 (Jan., 1987), pp. 1-24.

[2]

Fader, Peter & G. S. Hardie, Bruce (2005). “A Note on Deriving the Pareto/NBD Model and Related Expressions”. http://brucehardie.com/notes/009/pareto_nbd_derivations_2005-11-05.pdf

[3]

Fader, Peter & G. S. Hardie, Bruce (2014). “Additional Results for the Pareto/NBD Model”. https://www.brucehardie.com/notes/015/additional_pareto_nbd_results.pdf

[4]

Fader, Peter & G. S. Hardie, Bruce (2014). “Deriving the Conditional PMF of the Pareto/NBD Model”. https://www.brucehardie.com/notes/028/pareto_nbd_conditional_pmf.pdf

[5]

Fader, Peter & G. S. Hardie, Bruce (2007). “Incorporating Time-Invariant Covariates into the Pareto/NBD and BG/NBD Models”. https://www.brucehardie.com/notes/019/time_invariant_covariates.pdf

Examples

import pymc as pm

from pymc_extras.prior import Prior
from pymc_marketing.clv import ParetoNBDModel, rfm_summary

rfm_df = rfm_summary(raw_data,'id_col_name','date_col_name')

# Initialize model with customer data; `model_config` parameter is optional
model = ParetoNBDModel(
    model_config={
        "r": Prior("Weibull", alpha=2, beta=1),
        "alpha: Prior("Weibull", alpha=2, beta=10),
        "s": Prior("Weibull", alpha=2, beta=1),
        "beta": Prior("Weibull", alpha=2, beta=10),
    },
)

# Fit model quickly to large datasets via the default Maximum a Posteriori method
model.fit(data=rfm_df,method='map')
print(model.fit_summary())

# Use 'demz' for more informative predictions and reliable performance on smaller datasets
model.fit(data=rfm_df,method='demz')
print(model.fit_summary())

# Predict number of purchases for customers over the next 10 time periods
expected_purchases = model.expected_purchases(
    data=rfm_df,
    future_t=10,
)

# Predict probability of customer making 'n' purchases over 't' time periods
# Data parameter is omitted here because predictions are ran on original dataset
expected_num_purchases = model.expected_purchase_probability(
    n=[0, 1, 2, 3],
    future_t=[10,20,30,40],
)

new_data = pd.DataFrame(
    data = {
    "customer_id": [0, 1, 2, 3],
    "frequency": [5, 2, 1, 8],
    "recency": [7, 4, 2.5, 11],
    "T": [10, 8, 10, 22]
    }
)

# Predict probability customers will still be active in 'future_t' time periods
probability_alive = model.expected_probability_alive(
    data=new_data,
    future_t=[0, 3, 6, 9],
)

# Predict number of purchases for a new customer over 't' time periods.
expected_purchases_new_customer = model.expected_purchases_new_customer(
    t=[2, 5, 7, 10],
)

Methods

`ParetoNBDModel.__init__`([data, ...])	Initialize model configuration and sampler configuration for the model.
`ParetoNBDModel.attrs_to_init_kwargs`(attrs)	Convert the model configuration and sampler configuration from the attributes to keyword arguments.
`ParetoNBDModel.build_from_idata`(idata)	Build the model from the InferenceData object.
`ParetoNBDModel.build_model`([data])	Build the model.
`ParetoNBDModel.create_idata_attrs`()	Create attributes for the inference data.
`ParetoNBDModel.distribution_new_customer`([...])	Compute posterior predictive samples of dropout, purchase rate and frequency/recency of new customers.
`ParetoNBDModel.distribution_new_customer_dropout`([...])	Sample from the Gamma distribution representing dropout times for new customers.
`ParetoNBDModel.distribution_new_customer_purchase_rate`([...])	Sample from the Gamma distribution representing purchase rates for new customers.
`ParetoNBDModel.distribution_new_customer_recency_frequency`([...])	Pareto/NBD process representing purchases across the customer population.
`ParetoNBDModel.expected_probability_alive`([...])	Compute expected probability of being alive.
`ParetoNBDModel.expected_purchase_probability`([...])	Compute expected probability of n_purchases over future_t time periods.
`ParetoNBDModel.expected_purchases`([data, ...])	Compute expected number of future purchases.
`ParetoNBDModel.expected_purchases_new_customer`([...])	Compute the expected number of purchases for a new customer across t time periods.
`ParetoNBDModel.fit`([data, method, fit_method])	Infer posteriors of model parameters to run predictions.
`ParetoNBDModel.fit_summary`(**kwargs)	Compute the summary of the fit result.
`ParetoNBDModel.graphviz`(**kwargs)	Get the graphviz representation of the model.
`ParetoNBDModel.idata_to_init_kwargs`(idata)	Create the initialization kwargs from an InferenceData object.
`ParetoNBDModel.load`(fname[, check])	Create a ModelBuilder instance from a file.
`ParetoNBDModel.load_from_idata`(idata[, check])	Create a ModelBuilder instance from an InferenceData object.
`ParetoNBDModel.save`(fname, **kwargs)	Save the model's inference data to a file.
`ParetoNBDModel.set_idata_attrs`([idata])	Set attributes on an InferenceData object.
`ParetoNBDModel.table`(**model_table_kwargs)	Get the summary table of the model.
`ParetoNBDModel.thin_fit_result`(keep_every)	Return a copy of the model with a thinned fit result.

Attributes

`covariate_cols`	All covariate column names.
`default_model_config`	Default model configuration.
`default_sampler_config`	Default sampler configuration.
`dropout_covariate_cols`	Dropout covariate column names from model_config.
`fit_result`	Get the posterior fit_result.
`id`	Generate a unique hash value for the model.
`posterior`
`posterior_predictive`
`predictions`
`prior`
`prior_predictive`
`purchase_covariate_cols`	Purchase covariate column names from model_config.
`version`
`idata`
`sampler_config`
`model_config`