Skip to content

API Reference

Performs adversarial validation on the train & test datasets provided.

Parameters:

Name Type Description Default
trainset DataFrame

The training dataset.

required
testset DataFrame

The test dataset.

required
target str

The target column name. Default is None.

None
smart bool

Whether to prune features with strongly identifiable properties. Default is True.

True
n_splits int

The number of splits to perform. Default is 5.

5
verbose bool

Whether to print informative messages to the standard output. Default is True.

True
random_state Union[int, RandomState]

If you wish to ensure reproducible output across multiple function calls. Default is None.

None

Returns:

Name Type Description
dict dict

An informative key-valued response.

Raises:

Type Description
ValueError

If a validation error occurs, based on the provided parameters.

Examples:

>>> from advertion import validate
>>>
>>> train = pd.read_csv("...")
>>> test = pd.read_csv("...")
>>>
>>> validate(
>>>     trainset=train,
>>>     testset=test,
>>> )
>>> // {
>>> //     "datasets_follow_same_distribution": True,
>>> //     'mean_roc_auc': 0.5021320833333334,
>>> //     "adversarial_features': ['id'],
>>> // }
Source code in advertion/public.py
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
def validate(
    trainset: pd.DataFrame,
    testset: pd.DataFrame,
    target: str = None,
    smart: bool = True,
    n_splits: int = 5,
    verbose: bool = True,
    random_state: Union[int, np.random.RandomState] = None,
) -> dict:
    """Performs adversarial validation on the train & test datasets provided.

    Args:
        trainset (pd.DataFrame): The training dataset.
        testset (pd.DataFrame): The test dataset.
        target (str): The target column name. Default is None.
        smart (bool, optional): Whether to prune features with strongly identifiable properties. Default is True.
        n_splits (int, optional): The number of splits to perform. Default is 5.
        verbose (bool, optional): Whether to print informative messages to the standard output. Default is True.
        random_state (Union[int, np.random.RandomState], optional): If you wish to ensure reproducible output across \
        multiple function calls. Default is None.

    Returns:
        dict: An informative key-valued response.

    Raises:
        ValueError: If a validation error occurs, based on the provided parameters.

    Examples:
        >>> from advertion import validate
        >>>
        >>> train = pd.read_csv("...")
        >>> test = pd.read_csv("...")
        >>>
        >>> validate(
        >>>     trainset=train,
        >>>     testset=test,
        >>> )

        >>> // {
        >>> //     "datasets_follow_same_distribution": True,
        >>> //     'mean_roc_auc': 0.5021320833333334,
        >>> //     "adversarial_features': ['id'],
        >>> // }

    """
    return AdversarialValidation(
        smart=smart,
        n_splits=n_splits,
        verbose=verbose,
        random_state=random_state,
    ).perform(trainset=trainset, testset=testset, target=target)