Changelog
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog (strictly from v0.1.1 and onwards - before v0.1.1, format was a bit freestyle), and this project adheres to Semantic Versioning.
[1.0.1] - 2024-12-12
Fixed
Minor issues related to docs and docs dependencies, in order for the https://advertion.readthedocs.io/en/latest/ to be built successfully.
[1.0.0] - 2024-12-12
Changed
validate function had a required parameter called target. This parameter is now optional.
This means that if the trainset you supply does not have a target column, you do not need to pass the target
parameter. On the other hand, if the trainset you supply does have a target column, you can (and should!) denote the
target parameter, in order to be excluded from the adversarial validation process.
At the same time, this release brings some minor upgrades to project dependencies.
[0.1.1] - 2023-07-22
Fixed
-
wrap preprocessing INFO statement, printed to the stdout, under
verbosefunctionality - as expected. This particular statement got printed even whenverbose=Falsewas passed to thevalidatefunction.INFO: Working only with available numerical features, categorical features are not yet supported.
[0.1.0] - 2023-07-22
The first non pre-release of the package. 🎉
v0.1.0 is still considered a beta release, as the API has not been tested extensively across many and diverse datasets. I have tested it with 3 different Kaggle datasets up to this point.
No changes to the functionality are introduced, only the article https://ilias-ant.github.io/blog/adversarial-validation/ is referenced in the README, meant to serve as additional contextual documentation.
[0.1.0-beta] - 2023-07-20
This is considered the beta pre-release version, introducing some minor additions after a bit of personal testing on 2-3 kaggle datasets.
Features:
Passing explicitly a random_state is now propagated to the underlying classifier as well.
Documentation:
Added short README/homepage introduction on the concept of adversarial validation and where this package stands.
Also, added a homemade package logo (available in README + homepage https://advertion.readthedocs.io/en/latest/)
[0.1.0-alpha] - 2023-07-19
This is considered the alpha pre-release version, introducing some backwards-incompatible changes w.r.t. the previous release.
Features:
Response of the main public object, advertion.validate, has changed from bool to dict:
from advertion import validate
train = pd.read_csv("...")
test = pd.read_csv("...")
validate(
trainset=train,
testset=test,
target="label",
)
# // {
# // "datasets_follow_same_distribution": True,
# // 'mean_roc_auc': 0.5021320833333334,
# // "adversarial_features': ['id'],
# // }
Also, upon selecting smart=True (is actually the default case), an improved identification logic of adversarial features has been introduced, based on the Kolmogorov–Smirnov test. Having verbose=True prints to the standard output the statistic value and the p-value of the test for every feature that is deemed as adversarial.
Documentation:
New page on adversarial features: https://advertion.readthedocs.io/en/latest/adversarial-features/. It is also referenced on the standard output when smart=True and verbose=True.
Tests:
Tests have been developed for the package's public interface, reaching 100% test coverage on the project.
CI/CD:
Continuous Integration - enabled through Github Actions - enriched with 2 additional linters:
Also, test suite now runs against the following combinations:
python-version: ['3.8', '3.9', '3.10', '3.11']
os: [ubuntu-latest, macos-latest, windows-latest]
Last but not least, codecov has been introduced.
For more details, see:
.github/workflows/ci.yml
[0.1.0-alpha2] - 2023-07-16
A follow-up, pre-alpha release that introduces continuous documentation capabilities to the project, through MkDocs + readthedocs. Material for MkDocs has been utilized as the theme.
URL: https://advertion.readthedocs.io/en/latest/
No change to the functionality since inaugural pre-release v0.1.0-alpha1.
[0.1.0-alpha1] - 2023-07-16
This inaugural pre-alpha release introduces the core functionality of adversarial validation, exposed to the end user through the following method:
from advertion import validate
train = pd.read_csv("...") # let's say target variable is "label"
test = pd.read_csv("...")
are_similar = validate(
train=train,
test=test,
target="label",
)
# are_similar = True: train and test are following the same underlying distribution.
# are_similar = False: test dataset exhibits a different underlying distribution than train dataset.
- passing
smart=Trueemploys a pruning strategy of design matrix features based on feature importance - this helps remove featutes with strongly identifiable properties such as IDs, timestamps etc. - passing an
n_splitsvalue controls the number of cross-validation folds that take place internally. - passing
verbose=Trueprints to the standard output informative messages on the adversarial validation strategy. - passing a
random_statevalue ensures reproducible output across multiple function calls.