Skip to content

API Reference

advertion.validate(trainset, testset, target, smart=True, n_splits=5, verbose=True, random_state=None)

Performs adversarial validation on the train & test datasets provided.

Parameters:

Name Type Description Default
trainset pd.DataFrame

The training dataset.

required
testset pd.DataFrame

The test dataset.

required
target str

The target column name.

required
smart bool

Whether to prune features with strongly identifiable properties.

True
n_splits int

The number of splits to perform.

5
verbose bool

Whether to print informative messages to the standard output.

True
random_state Union[int, np.random.RandomState]

If you wish to ensure reproducible output across multiple function calls.

None

Returns:

Name Type Description
dict dict

An informative key-valued response.

Raises:

Type Description
ValueError

If a validation error occurs, based on the provided parameters.

Examples:

>>> from advertion import validate
>>>
>>> train = pd.read_csv("...")
>>> test = pd.read_csv("...")
>>>
>>> validate(
>>>     trainset=train,
>>>     testset=test,
>>>     target="label",
>>> )
>>> // {
>>> //     "datasets_follow_same_distribution": True,
>>> //     'mean_roc_auc': 0.5021320833333334,
>>> //     "adversarial_features': ['id'],
>>> // }
Source code in advertion/public.py
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
def validate(
    trainset: pd.DataFrame,
    testset: pd.DataFrame,
    target: str,
    smart: bool = True,
    n_splits: int = 5,
    verbose: bool = True,
    random_state: Union[int, np.random.RandomState] = None,
) -> dict:
    """Performs adversarial validation on the train & test datasets provided.

    Args:
        trainset (pd.DataFrame): The training dataset.
        testset (pd.DataFrame): The test dataset.
        target (str): The target column name.
        smart (bool, optional): Whether to prune features with strongly identifiable properties.
        n_splits (int, optional): The number of splits to perform.
        verbose (bool, optional): Whether to print informative messages to the standard output.
        random_state (Union[int, np.random.RandomState], optional): If you wish to ensure reproducible output across \
        multiple function calls.

    Returns:
        dict: An informative key-valued response.

    Raises:
        ValueError: If a validation error occurs, based on the provided parameters.

    Examples:
        >>> from advertion import validate
        >>>
        >>> train = pd.read_csv("...")
        >>> test = pd.read_csv("...")
        >>>
        >>> validate(
        >>>     trainset=train,
        >>>     testset=test,
        >>>     target="label",
        >>> )

        >>> // {
        >>> //     "datasets_follow_same_distribution": True,
        >>> //     'mean_roc_auc': 0.5021320833333334,
        >>> //     "adversarial_features': ['id'],
        >>> // }

    """
    return AdversarialValidation(
        smart=smart,
        n_splits=n_splits,
        verbose=verbose,
        random_state=random_state,
    ).perform(trainset=trainset, testset=testset, target=target)