Causal Counter Module
The causal_counter module provides counterfactual explanation generation and evaluation.
Classes
CausalCounterfactual
- class CausalCounterfactual(model, treatment: DataFrame, control: DataFrame, feature: str, label: str)
Bases:
objectMeasure a single feature’s causal impact via counterfactual prediction.
The class implements the T-Learner variant of the potential-outcomes framework:
Treatment model — trained on all features including the one under study. This represents the factual world.
Control model — trained on every feature except the one under study. This represents the counterfactual world where the feature never existed.
Comparing the two models’ predictions yields individual causal effects (ICE), average causal effects (ACE), and classical hypothesis-test statistics for whether the feature matters.
- Parameters:
model (sklearn estimator) – An unfitted sklearn-compatible estimator. It will be
clone-d internally so the caller’s object is never mutated.treatment (pandas.DataFrame) – Dataset for the treatment arm (all features + label).
control (pandas.DataFrame) – Dataset for the control arm (same column schema as treatment).
feature (str) – The feature whose causal impact is measured.
label (str) – Name of the target / label column (must match in both datasets).
- treatment_model_
The fitted treatment model.
Noneuntilfit()is called.- Type:
sklearn estimator
- control_model_
The fitted control model.
Noneuntilfit()is called.- Type:
sklearn estimator
Examples
>>> from sklearn.linear_model import LinearRegression >>> cc = CausalCounterfactual( ... model=LinearRegression(), ... treatment=df_treat, ... control=df_ctrl, ... feature="age", ... label="income", ... ) >>> cc.fit() >>> cc.average_causal_effect(X_new) 1234.56
Attributes:
treatment_model_ (sklearn estimator): The fitted treatment modelcontrol_model_ (sklearn estimator): The fitted control model
- fit() CausalCounterfactual
Fit both treatment and control models.
- Returns:
self
- Return type:
- individual_causal_effects(X: DataFrame) ndarray
Per-sample causal effects: treatment − control.
- Parameters:
X (pandas.DataFrame) – Feature matrix including
self.feature.- Returns:
Per-sample causal effects.
- Return type:
numpy.ndarray
- individual_causal_effects_resid(X: DataFrame) ndarray
Per-sample residual causal effects: control − treatment.
- Parameters:
X (pandas.DataFrame) – Feature matrix including
self.feature.- Returns:
Per-sample residual causal effects.
- Return type:
numpy.ndarray
- average_causal_effect(X: DataFrame) float
Mean of the individual causal effects (treatment − control).
- Parameters:
X (pandas.DataFrame) – Feature matrix including
self.feature.- Returns:
Average causal effect.
- Return type:
float
- average_causal_effect_resid(X: DataFrame) float
Mean of the residual individual causal effects (control − treatment).
- Parameters:
X (pandas.DataFrame) – Feature matrix including
self.feature.- Returns:
Average residual causal effect.
- Return type:
float
- correlation_confidence_interval(alpha: float = 0.05) Dict[str, float]
Confidence interval for the average causal effect.
Uses a normal approximation with Welch-style variance estimation.
- Parameters:
alpha (float) – Significance level (default
0.05→ 95 %% CI).- Returns:
Dictionary with
lower_boundandupper_bound.- Return type:
Dict[str, float]
CausalCounterfactualClassifier
- class CausalCounterfactualClassifier(model, treatment: DataFrame, control: DataFrame, feature: str, label: str)
Bases:
CausalCounterfactualCausal counterfactual analysis specifically tailored for classification models.
This class inherits from CausalCounterfactual and adds methods to evaluate how a feature’s presence affects model predictions and probabilities across multiple classification metrics (e.g., accuracy, ROC AUC, Log Loss, and confusion matrix elements).
- Parameters:
model (sklearn estimator) – An unfitted sklearn-compatible estimator. It will be
clone-d internally.treatment (pandas.DataFrame) – Dataset for the treatment arm (all features + label).
control (pandas.DataFrame) – Dataset for the control arm (same column schema as treatment).
feature (str) – The feature whose causal impact is measured.
label (str) – Name of the target / label column (must match in both datasets).
- individual_causal_effects_probs(X: DataFrame) ndarray
Calculate the individual causal effects of the predicted probabilities.
- Parameters:
X (pandas.DataFrame) – X values stored in a dataframe that will be used for the predictions.
- Returns:
An array of individual causal effects for the probabilities.
- Return type:
numpy.ndarray
- average_causal_effect_probs(X: DataFrame) float
Calculate the average causal effect for the predicted probabilities.
- Parameters:
X (pandas.DataFrame) – X values stored in a dataframe that will be used for the predictions.
- Returns:
The average causal effect for the probabilities.
- Return type:
float
- causal_effect_metric(X: DataFrame, y, metric: str) float
Measure the impact of the feature on a specific classification metric.
Computes the difference:
metric_with_feature - metric_without_feature.- Parameters:
X (pandas.DataFrame) – The input values for prediction process.
y (numpy.ndarray or pandas.Series or pandas.DataFrame) – The corresponding true labels.
metric (str) – The machine learning metric to account for. Supports: ‘accuracy’, ‘precision’, ‘ppv’, ‘f1-score’, ‘f1_score’, ‘recall’, ‘sensitivity’, ‘tpr’, ‘balanced_accuracy’, ‘balanced-accuracy’, ‘balanced_accuracy_score’, ‘balanced-accuracy-score’, ‘fpr’, ‘fnr’, ‘specificity’, ‘fdr’, ‘youden-index’, ‘youden_index’.
- Returns:
The difference in the metric with vs without the feature.
- Return type:
float
- causal_effect_probs_metric(X: DataFrame, y, metric: str) float
Measure the impact of the feature on a probability-related classification metric.
- Parameters:
X (pandas.DataFrame) – The input values for prediction process.
y (numpy.ndarray or pandas.Series or pandas.DataFrame) – The corresponding true labels.
metric (str) – The machine learning metric to account for. Supports: ‘log-loss’, ‘log_loss’, ‘brier_score’, ‘brier-score’, ‘brier_loss’, ‘brier-loss’, ‘auc_score’, ‘auc-score’, ‘average_precision’, ‘average-precision’, ‘average_precision_score’, ‘average-precision-score’.
- Returns:
The difference in the metric with vs without the feature.
- Return type:
float
- causal_effect_loss_metric(X: DataFrame, y, metric: str) float
Measure the impact of the feature on a loss function.
- Parameters:
X (pandas.DataFrame) – The input values for prediction process.
y (numpy.ndarray or pandas.Series or pandas.DataFrame) – The corresponding true labels.
metric (str) – The loss metric to account for. Supports: ‘hamming_loss’, ‘hamming-loss’, ‘brier_score’, ‘brier-score’, ‘brier_loss’, ‘brier-loss’, ‘log-loss’, ‘log_loss’.
- Returns:
The difference in the loss with vs without the feature.
- Return type:
float
- causal_effect_confusion(X: DataFrame, y, metric: str) float
Measure the impact of the feature on a confusion matrix element.
- Parameters:
X (pandas.DataFrame) – The input values for prediction process.
y (numpy.ndarray or pandas.Series or pandas.DataFrame) – The corresponding true labels.
metric (str) – The confusion matrix element to compare. Supports: ‘tn’, ‘tp’, ‘fn’, ‘fp’.
- Returns:
The difference (treatment - control) for the given metric.
- Return type:
float
- causal_effect_metric_tab(X: DataFrame, y) DataFrame
Summarize and tabulate classification performance metrics.
- Parameters:
X (pandas.DataFrame) – The input values for prediction process.
y (numpy.ndarray or pandas.Series or pandas.DataFrame) – The corresponding true labels.
- Returns:
- A DataFrame comparing classification metrics with and without the
feature.
- Return type:
pandas.DataFrame
- causal_effect_loss_metric_tab(X: DataFrame, y) DataFrame
Summarize and tabulate classification loss metrics.
- Parameters:
X (pandas.DataFrame) – The input values for prediction process.
y (numpy.ndarray or pandas.Series or pandas.DataFrame) – The corresponding true labels.
- Returns:
- A DataFrame comparing classification losses with and without the
feature.
- Return type:
pandas.DataFrame
- causal_effect_confusion_matrix_tab(X: DataFrame, y) DataFrame
Summarize and tabulate confusion matrix elements.
- Parameters:
X (pandas.DataFrame) – The input values for prediction process.
y (numpy.ndarray or pandas.Series or pandas.DataFrame) – The corresponding true labels.
- Returns:
- A DataFrame comparing TN, FP, FN, and TP counts with and without the
feature.
- Return type:
pandas.DataFrame
CausalCounterfactualRegression
- class CausalCounterfactualRegression(model, treatment: DataFrame, control: DataFrame, feature: str, label: str)
Bases:
CausalCounterfactualCausal counterfactual analysis specifically tailored for regression models. This class inherits from CausalCounterfactual and adds methods to evaluate how a feature’s presence affects model performance across multiple regression metrics (e.g., R2 score, MSE, MAE).
- Parameters:
model (sklearn estimator) – An unfitted sklearn-compatible estimator. It will be
clone-d internally.treatment (pandas.DataFrame) – Dataset for the treatment arm (all features + label).
control (pandas.DataFrame) – Dataset for the control arm (same column schema as treatment).
feature (str) – The feature whose causal impact is measured.
label (str) – Name of the target / label column (must match in both datasets).
- causal_effect_metric(X: DataFrame, y, metric: str) float
Measure the impact of the feature on a specific regression metric. Computes the difference:
metric_without_feature - metric_with_feature.- Parameters:
X (pandas.DataFrame) – The input values for prediction process.
y (numpy.ndarray or pandas.Series or pandas.DataFrame) – The corresponding true labels.
metric (str) – The machine learning metric to account for. Supports: ‘r2_score’, ‘mean_squared_error’, ‘mean_absolute_error’, ‘mean_squared_log_error’, ‘mean_absolute_percentage_error’, ‘median_absolute_error’, ‘root_mean_squared_error’, ‘root_mean_squared_log_error’.
- Returns:
The difference in the metric with vs without the feature.
- Return type:
float
- causal_effect_metric_tab(X: DataFrame, y) DataFrame
Summarize and tabulate all regression metrics.
- Parameters:
X (pandas.DataFrame) – The input values for prediction process.
y (numpy.ndarray or pandas.Series or pandas.DataFrame) – The corresponding true labels.
- Returns:
- A DataFrame comparing metrics with and without the
feature.
- Return type:
pandas.DataFrame