Code intrsospection into autosklearn components#1530
Code intrsospection into autosklearn components#1530eddiebergman wants to merge 92 commits intodevelopmentfrom
Conversation
* only active if kernel == 'poly' * adapt the metadata to reflect this
* black checker * Simplified * add examples to black format check Co-authored-by: Matthias Feurer <feurerm@informatik.uni-freiburg.de>
* re-structure manual and use 'collapse' * ADD link to auto-sklearn-talks * unifying titles * Clarify default memory and cpu usage * FIX sphinx_gallery to <=0.10.0 0.10.1 would raise an error for '-D plot_gallery=0' * Re-structure faq * FIX comments by mfeurer * boldface items * merge manual into FAQ * FIX minor * FIX typo * Update doc/faq.rst Co-authored-by: Eddie Bergman <eddiebergmanhs@gmail.com> * Update doc/faq.rst Co-authored-by: Eddie Bergman <eddiebergmanhs@gmail.com> * Update doc/faq.rst Co-authored-by: Eddie Bergman <eddiebergmanhs@gmail.com> * Update doc/faq.rst Co-authored-by: Eddie Bergman <eddiebergmanhs@gmail.com> * Update doc/manual.rst Co-authored-by: Eddie Bergman <eddiebergmanhs@gmail.com> * Update doc/manual.rst Co-authored-by: Eddie Bergman <eddiebergmanhs@gmail.com> * Update doc/faq.rst Co-authored-by: Eddie Bergman <eddiebergmanhs@gmail.com> * FIX link Co-authored-by: Eddie Bergman <eddiebergmanhs@gmail.com>
If you're only exposure to using... -> If your only exposure to using...
* np.bool deprecation * Invalid escape sequence \_ * Series specify dtype * drop na requires keyword args deprecation * unspecified np.int size deprecated, use int instead * deprecated unspeicifed np.int precision * Element wise comparison failed, will raise error in the future * Specify explicit dtype for empty series * metric warnings for mismatch between y_pred and y_true label count * Quantile transformer n_quantiles larger than n_samples warning ignored * Silenced convergence warnings * pass sklearn args as keywords * np.bool deprecation * Invalid escape sequence \_ * Series specify dtype * drop na requires keyword args deprecation * unspecified np.int size deprecated, use int instead * deprecated unspeicifed np.int precision * Element wise comparison failed, will raise error in the future * Specify explicit dtype for empty series * metric warnings for mismatch between y_pred and y_true label count * Quantile transformer n_quantiles larger than n_samples warning ignored * Silenced convergence warnings * pass sklearn args as keywords * flake8'd * flake8'd * Fixed CategoricalImputation not accounting for sparse matrices * Updated to use distro for linux distribution * Ignore convergence warnings for gaussian process regressor * Averaging metrics now use zero_division parameter * Readded scorers to module scope * flake8'd * Fix * Fixed dtype for metalearner no run * Catch gaussian process iterative fit warning * Moved ignored warnings to tests * Correctly type pd.Series * Revert back to usual iterative fit * Readded missing iteration increment * Removed odd backslash * Fixed imputer for sparse matrices * Ignore warnings we are aware about in tests * Flake'd: * Revert "Fixed imputer for sparse matrices" This reverts commit 05675ad. * Revert "Revert "Fixed imputer for sparse matrices"" This reverts commit d031b0d. * Back to default values * Reverted to default behaviour with comment * Added xfail test to document * flaked * Fixed test, moved to np.testing for assertion * Update autosklearn/pipeline/components/data_preprocessing/categorical_encoding/encoding.py Co-authored-by: Matthias Feurer <feurerm@informatik.uni-freiburg.de> Co-authored-by: Matthias Feurer <feurerm@informatik.uni-freiburg.de>
* Added manual dispatch to tests * Removed parameters to manual dispatch
…tors (#1332) * Update docstrings and types * doc typo fix * flake'd
* added python 3.10 to versions * Added quotes around versions * Trigger tests
* Add submodule * Port to abstract_ensemble, backend from automl_common * Updated workflow files * Update imports * Trigger actions * Another import fix * update import * m * Backend fixes * Backend parameter update * fixture fix for backend * Fix tests * readd old abstract ensemble for now * flake8'd * Added install from source to readme * Moved installation w.r.t submodules to the docs * Temporarily remove submodule * Readded submodule * Updated to use automl_common under autosklearn * Updated MANIFEST * Removed uneeded statements from MANIFEST * Fixed import * Fixed comment line in MANIFEST.in * Added automl_common/setup.py to MANIFEST * Added prefix to script * Re-added removed title # * Added note for submodule for CONTRIBUTING * Made the submodule step a bit more clear for contributing.md * CONTRIBUTING fixes
* Added versioning for sphinx, docutils - introduced by sphinxtoolbox * Fixed bug with config value for `plot_gallery` in doc makefile * Update linkcheck command as well
* Added ignored_warnings file * Use ignored_warnings file * Test regressors with 1d, 1d as 2d and 2d targets * Flake'd * Fix broken relative imports to ignore_warnings * Removed print and updated parameter type for tests * Type import fix
* Added random state to classifiers * Added some doc strings * Removed random_state again * flake'd * Fix some test issues * Re-added seed to test * Updated test doc for unknown test * flake'd
* Added ignored_warnings file * Use ignored_warnings file * Test regressors with 1d, 1d as 2d and 2d targets * Flake'd * Fix broken relative imports to ignore_warnings * Removed print and updated parameter type for tests * Added warning catches to fit methods in tests * Added more warning catches * Flake'd * Created top-level module to allow relativei imports * Deleted blank line in __init__ * Remove uneeded ignore warnings from tests * Fix bad indent * Fix github merge conflict editor whitespaces and indents
* update workflow files * typo fix * Update pytest * remove bad semi-colon * Fix test runner command * Remove explicit steps required from older version * Explicitly add Conda python to path for subprocess command in test * Fix the mypy compliance check * Added PEP 561 compliance * Add py.typed to MANIFEST for dist * Remove py.typed from setup.py
* rename OSX -> macOS as it is the new name rename OSX -> macOS as it is the new name for the operating system. e.g. see https://www.apple.com/macos * Update doc/installation.rst Co-authored-by: Matthias Feurer <lists@matthiasfeurer.de> * Update doc/installation.rst Co-authored-by: Matthias Feurer <lists@matthiasfeurer.de> Co-authored-by: Matthias Feurer <feurerm@informatik.uni-freiburg.de> Co-authored-by: Matthias Feurer <lists@matthiasfeurer.de>
…semble (#1321) * Changed show_models() function to return a dictionary of models in the ensemble instead of a string
* Remove flaky dep * Remove unused pytest import
* Fix: MLPRegressor tests * Fix: Ordering of statements in test * Fix: MLP n_calls
* Fix: Raises errors with the config * Add: Skip error for kernal_pca Seems kernel_pca emits the error: * `"zero-size array to reduction operation maximum which has no identity"` This is gotten on the line `max_eig = lambdas.max()` which makes me assume it emits a matrix with no real eigen values, not something we can really control for
…ures (#1250) * Moved to new splitter, moved to util file * flake8'd * Fixed errors, added test specifically for CustomStratifiedShuffleSplit * flake8'd * Updated docstring * Updated types in docstring * reduce_dataset_size_if_too_large supports more types * flake8'd * flake8'd * Updated docstring * Seperated out the data subsampling into individual functions * Improved typing from Automl.fit to reduce_dataset_size_if_too_large * flak8'd * subsample tested * Finished testing and flake8'd * Cleaned up transform function that was touched * ^ * Removed double typing * Cleaned up typing of convert_if_sparse * Cleaned up splitters and added size test * Cleanup doc in data * rogue line added was removed * Test fix * flake8'd * Typo fix * Fixed ordering of things * Fixed typing and tests of target_validator fit, transform, inv_transform * Updated doc * Updated Type return * Removed elif gaurd * removed extraneuous overload * Updated return type of feature validator * Type fixes for target validator fit * flake8'd * Moved to new splitter, moved to util file * flake8'd * Fixed errors, added test specifically for CustomStratifiedShuffleSplit * flake8'd * Updated docstring * Updated types in docstring * reduce_dataset_size_if_too_large supports more types * flake8'd * flake8'd * Updated docstring * Seperated out the data subsampling into individual functions * Improved typing from Automl.fit to reduce_dataset_size_if_too_large * flak8'd * subsample tested * Finished testing and flake8'd * Cleaned up transform function that was touched * ^ * Removed double typing * Cleaned up typing of convert_if_sparse * Cleaned up splitters and added size test * Cleanup doc in data * rogue line added was removed * Test fix * flake8'd * Typo fix * Fixed ordering of things * Fixed typing and tests of target_validator fit, transform, inv_transform * Updated doc * Updated Type return * Removed elif gaurd * removed extraneuous overload * Updated return type of feature validator * Type fixes for target validator fit * flake8'd * Fixed err message str and automl sparse y tests * Flak8'd * Fix sort indices * list type to List * Remove uneeded comment * Updated comment to make it more clear * Comment update * Fixed warning message for reduce_dataset_if_too_large * Fix test * Added check for error message in tests * Test Updates * Fix error msg * reinclude csr y to test * Reintroduced explicit subsample values test * flaked * Missed an uncomment * Update the comment for test of splitters * Updated warning message in CustomSplitter * Update comment in test * Update tests * Removed overloads * Narrowed type of subsample * Removed overload import * Fix `todense` giving np.matrix, using `toarray` * Made subsampling a little less aggresive * Changed multiplier back to 10 * Allow argument to specfiy how auto-sklearn handles compressing dataset size (#1341) * Added dataset_compression parameter and validation * Fix docstring * Updated docstring for `resampling_strategy` * Updated param def and memory_allocation can now be absolute * insert newline * Fix params into one line * fix indentation in docs * fix import breaks * Allow absolute memory_allocation * Tests * Update test on for precision omitted from methods * Update test for akslearn2 with same args * Update to use TypedDict for better Mypy parsing * Added arg to asklearn2 * Updated tests to remove some warnings * flaked * Fix broken link? * Remove TypedDict as it's not supported in Python3.7 * Missing import * Review changes * Fix magic mock for python < 3.9 * Fixed bad merge
* commit meta learning data bases * commit changed files * commit new files * fixed experimental settings * implemented last comments on old PR * adapted metalearning to last commit * add a text preprocessing example * intigrated feedback * new changes on *.csv files * reset changes * add changes for merging * add changes for merging * add changes for merging * try to merge * fixed string representation for metalearning (some sort of hot fix, maybe this needs to be fixed in a bigger scale) * fixed string representation for metalearning (some sort of hot fix, maybe this needs to be fixed in a bigger scale) * fixed string representation for metalearning (some sort of hot fix, maybe this needs to be fixed in a bigger scale) * init * init * commit changes for text preprocessing * text prepreprocessing commit * fix metalearning * fix metalearning * adapted test to new text feature * fix style guide issues * integrate PR comments * integrate PR comments * implemented the comments to the last PR * fitted operation is not in place therefore we have to assgin the fitted self.preprocessor again to it self * add first text processing tests * add first text processing tests * including comments from 01.25. * including comments from 01.28. * including comments from 01.28. * including comments from 01.28. * including comments from 01.31.
* Update FAQ with text stuff * Take suggestions into account
* Push * `fit_ensemble` now has priority for kwargs to take * Change ordering of prefernce for ensemble params * Add TODO note for metrics * Add `metrics` arg to `fit_ensemble` * Add test for pareto front sizes * Remove uneeded file * Re-added tests to `test_pareto_front` * Add descriptions to test files * Add test to ensure argument priority * Add test to make sure X_data only loaded when required * Remove part of test required for performance history * Default to `self._metrics` if `metrics` not available
* Create simple example and doc for naive early stopping * Fix doc, pass through SMAC callbacks directly * Fix `isinstance` check * Add test for early stopping * Fix signature of early stopping example/test * Fix doc build
Bumps [actions/setup-python](https://github.com/actions/setup-python) from 3 to 4. - [Release notes](https://github.com/actions/setup-python/releases) - [Commits](actions/setup-python@v3...v4) --- updated-dependencies: - dependency-name: actions/setup-python dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [actions/download-artifact](https://github.com/actions/download-artifact) from 2 to 3. - [Release notes](https://github.com/actions/download-artifact/releases) - [Commits](actions/download-artifact@v2...v3) --- updated-dependencies: - dependency-name: actions/download-artifact dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [codecov/codecov-action](https://github.com/codecov/codecov-action) from 2 to 3. - [Release notes](https://github.com/codecov/codecov-action/releases) - [Changelog](https://github.com/codecov/codecov-action/blob/master/CHANGELOG.md) - [Commits](codecov/codecov-action@v2...v3) --- updated-dependencies: - dependency-name: codecov/codecov-action dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 2 to 3. - [Release notes](https://github.com/actions/upload-artifact/releases) - [Commits](actions/upload-artifact@v2...v3) --- updated-dependencies: - dependency-name: actions/upload-artifact dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Fix logging server cleanup * Add comment relating to the `try: finally:` * Remove nested try: except: from `fit`
Bumps [peter-evans/find-comment](https://github.com/peter-evans/find-comment) from 1 to 2. - [Release notes](https://github.com/peter-evans/find-comment/releases) - [Commits](peter-evans/find-comment@v1...v2) --- updated-dependencies: - dependency-name: peter-evans/find-comment dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [actions/stale](https://github.com/actions/stale) from 4 to 5. - [Release notes](https://github.com/actions/stale/releases) - [Changelog](https://github.com/actions/stale/blob/main/CHANGELOG.md) - [Commits](actions/stale@v4...v5) --- updated-dependencies: - dependency-name: actions/stale dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Init commit * Fix logging server cleanup (#1503) * Fix logging server cleanup * Add comment relating to the `try: finally:` * Remove nested try: except: from `fit` * Bump peter-evans/find-comment from 1 to 2 (#1520) Bumps [peter-evans/find-comment](https://github.com/peter-evans/find-comment) from 1 to 2. - [Release notes](https://github.com/peter-evans/find-comment/releases) - [Commits](peter-evans/find-comment@v1...v2) --- updated-dependencies: - dependency-name: peter-evans/find-comment dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump actions/stale from 4 to 5 (#1521) Bumps [actions/stale](https://github.com/actions/stale) from 4 to 5. - [Release notes](https://github.com/actions/stale/releases) - [Changelog](https://github.com/actions/stale/blob/main/CHANGELOG.md) - [Commits](actions/stale@v4...v5) --- updated-dependencies: - dependency-name: actions/stale dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Init commit * Update evaluation module * Clean up other occurences of the word validation * Re-add test for test predictions Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Add debug statements and 30s timeouts * Fix formatting * Update internal timeout param * +timeout, use allocated tmpdir * +timeout, use allocated tmpdir * Remove another occurence of explicit `tmp` * Increase timelimits once again * Remove incomplete comment
* Init commit * Fix DummyClassifiers in _load_pareto_set * Add test for dummy only in classifiers * Update no ensemble docstring * Add automl case where automl only has dummy * Remove tmp file * Fix `include` statement to be regressor
* Create PR * Update MLP regressor values
* Make docker file install from `setup.py` * Add pytest cache to gitignore * Up timeouts on test_metadata_generation
This reverts commit 4f691a1.
…sklearn into document_model_capabilities
This reverts commit b03dddd.
Codecov Report
@@ Coverage Diff @@
## development #1530 +/- ##
===============================================
- Coverage 83.79% 83.75% -0.04%
===============================================
Files 152 154 +2
Lines 11667 11730 +63
Branches 2037 2049 +12
===============================================
+ Hits 9776 9825 +49
- Misses 1343 1359 +16
+ Partials 548 546 -2 |
d813838 to
259ed3d
Compare
auvipy
left a comment
There was a problem hiding this comment.
It is a huge undertake in one go! it would be better to split the changes in smaller reviewable chunks
|
Codecov Report❌ Patch coverage is ❌ Your patch status has failed because the patch coverage (80.32%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## development #1530 +/- ##
===============================================
- Coverage 88.13% 83.75% -4.38%
===============================================
Files 140 154 +14
Lines 11163 11730 +567
Branches 0 2049 +2049
===============================================
- Hits 9839 9825 -14
- Misses 1324 1359 +35
- Partials 0 546 +546 🚀 New features to boost your workflow:
|
This is a first draft in an attempt to address #1429. Redone to get clean history after rebasing.
Sample output:
The
classifiers()returnsdict[str, ClassifierInfo]The same exists for all
classifiers,regressors,data_preprocessorsandfeature_preprocessorsIssues
Data preprocessors are their own beast, there's technically only one datapreprocessor component
FeatTypeSplitThere's a lot of duplication existing already, i.e.
handles_spares = Trueandinput = (SPARSE, ...)inget_properties()..notation instead of["this"].autosklearn.infotoo.I would prefer to do what's below long term, it's a bit nicer to look at. The problem now is that if a user adds a custom component, I would like it to show up when they do
components. This is fixable, just noting that it could be changed.