Deploy a new doc website (#338)

A new documentation website. And: * add actions for doc * update docstr * installation instructions for doc dev * unify README and Getting Started * rename notebook * doc about best_model_for_estimator #340 * docstr for keep_search_state #340 * DNN Co-authored-by: Qingyun Wu <qingyun.wu@psu.edu> Co-authored-by: Z.sk <shaokunzhang@psu.edu>
2026-04-20 03:02:16 -04:00 · 2021-12-16 17:11:33 -08:00
parent 671ccbbe3f
commit efd85b4c86
91 changed files with 12277 additions and 752 deletions
--- a/website/docs/Contribute.md
+++ b/website/docs/Contribute.md
@@ -0,0 +1,88 @@
+# Contributing
+
+This project welcomes (and encourages) all forms of contributions, including but not limited to:
+- Pushing patches.
+- Code review of pull requests.
+- Documentation, examples and test cases.
+- Readability improvement, e.g., improvement on docstr and comments.
+- Community participation in [issues](https://github.com/microsoft/FLAML/issues), [discussions](https://github.com/microsoft/FLAML/discussions), and [gitter](https://gitter.im/FLAMLer/community?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge).
+- Tutorials, blog posts, talks that promote the project.
+- Sharing application scenarios and/or related research.
+
+You can take a look at the [Roadmap for Upcoming Features](https://github.com/microsoft/FLAML/wiki/Roadmap-for-Upcoming-Features) to identify potential things to work on.
+
+
+Most contributions require you to agree to a
+Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
+the rights to use your contribution. For details, visit <https://cla.opensource.microsoft.com>.
+
+If you are new to GitHub [here](https://help.github.com/categories/collaborating-with-issues-and-pull-requests/) is a detailed help source on getting involved with development on GitHub.
+
+When you submit a pull request, a CLA bot will automatically determine whether you need to provide
+a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions
+provided by the bot. You will only need to do this once across all repos using our CLA.
+
+This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
+For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
+contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.
+
+## Becoming a Reviewer
+There is currently no formal reviewer solicitation process. Current reviewers identify reviewers from active contributors. If you are willing to become a reviewer, you are welcome to let us know on gitter.
+
+## Developing
+
+### Setup
+
+```bash
+git clone https://github.com/microsoft/FLAML.git
+pip install -e .[test,notebook]
+```
+
+### Docker
+
+We provide a simple [Dockerfile](https://github.com/microsoft/FLAML/blob/main/Dockerfile).
+
+```bash
+docker build git://github.com/microsoft/FLAML -t flaml-dev
+docker run -it flaml-dev
+```
+
+### Develop in Remote Container
+
+If you use vscode, you can open the FLAML folder in a [Container](https://code.visualstudio.com/docs/remote/containers).
+We have provided the configuration in [devcontainer](https://github.com/microsoft/FLAML/blob/main/.devcontainer).
+
+### Pre-commit
+
+Run `pre-commit install` to install pre-commit into your git hooks. Before you commit, run
+`pre-commit run` to check if you meet the pre-commit requirements. If you use Windows (without WSL) and can't commit after installing pre-commit, you can run `pre-commit uninstall` to uninstall the hook. In WSL or Linux this is supposed to work.
+
+### Coverage
+
+Any code you commit should not decrease coverage. To run all unit tests:
+
+```bash
+coverage run -m pytest test
+```
+
+Then you can see the coverage report by
+`coverage report -m` or `coverage html`.
+If all the tests are passed, please also test run [notebook/automl_classification](https://github.com/microsoft/FLAML/blob/main/notebook/automl_classification.ipynb) to make sure your commit does not break the notebook example.
+
+### Documentation
+
+To build and test documentation locally, install [Node.js](https://nodejs.org/en/download/).
+
+Then:
+
+```console
+npm install --global yarn
+pip install pydoc-markdown
+cd website
+yarn install
+pydoc-markdown
+yarn start
+```
+
+The last command starts a local development server and opens up a browser window.
+Most changes are reflected live without having to restart the server.
--- a/website/docs/Examples/AutoML-Classification.md
+++ b/website/docs/Examples/AutoML-Classification.md
@@ -0,0 +1,62 @@
+# AutoML - Classification
+
+### A basic classification example
+
+```python
+from flaml import AutoML
+from sklearn.datasets import load_iris
+
+# Initialize an AutoML instance
+automl = AutoML()
+# Specify automl goal and constraint
+automl_settings = {
+    "time_budget": 1,  # in seconds
+    "metric": 'accuracy',
+    "task": 'classification',
+    "log_file_name": "iris.log",
+}
+X_train, y_train = load_iris(return_X_y=True)
+# Train with labeled input data
+automl.fit(X_train=X_train, y_train=y_train,
+           **automl_settings)
+# Predict
+print(automl.predict_proba(X_train))
+# Print the best model
+print(automl.model.estimator)
+```
+
+#### Sample of output
+```
+[flaml.automl: 11-12 18:21:44] {1485} INFO - Data split method: stratified
+[flaml.automl: 11-12 18:21:44] {1489} INFO - Evaluation method: cv
+[flaml.automl: 11-12 18:21:44] {1540} INFO - Minimizing error metric: 1-accuracy
+[flaml.automl: 11-12 18:21:44] {1577} INFO - List of ML learners in AutoML Run: ['lgbm', 'rf', 'catboost', 'xgboost', 'extra_tree', 'lrl1']
+[flaml.automl: 11-12 18:21:44] {1826} INFO - iteration 0, current learner lgbm
+[flaml.automl: 11-12 18:21:44] {1944} INFO - Estimated sufficient time budget=1285s. Estimated necessary time budget=23s.
+[flaml.automl: 11-12 18:21:44] {2029} INFO -  at 0.2s,	estimator lgbm's best error=0.0733,	best estimator lgbm's best error=0.0733
+[flaml.automl: 11-12 18:21:44] {1826} INFO - iteration 1, current learner lgbm
+[flaml.automl: 11-12 18:21:44] {2029} INFO -  at 0.3s,	estimator lgbm's best error=0.0733,	best estimator lgbm's best error=0.0733
+[flaml.automl: 11-12 18:21:44] {1826} INFO - iteration 2, current learner lgbm
+[flaml.automl: 11-12 18:21:44] {2029} INFO -  at 0.4s,	estimator lgbm's best error=0.0533,	best estimator lgbm's best error=0.0533
+[flaml.automl: 11-12 18:21:44] {1826} INFO - iteration 3, current learner lgbm
+[flaml.automl: 11-12 18:21:44] {2029} INFO -  at 0.6s,	estimator lgbm's best error=0.0533,	best estimator lgbm's best error=0.0533
+[flaml.automl: 11-12 18:21:44] {1826} INFO - iteration 4, current learner lgbm
+[flaml.automl: 11-12 18:21:44] {2029} INFO -  at 0.6s,	estimator lgbm's best error=0.0533,	best estimator lgbm's best error=0.0533
+[flaml.automl: 11-12 18:21:44] {1826} INFO - iteration 5, current learner xgboost
+[flaml.automl: 11-12 18:21:45] {2029} INFO -  at 0.9s,	estimator xgboost's best error=0.0600,	best estimator lgbm's best error=0.0533
+[flaml.automl: 11-12 18:21:45] {1826} INFO - iteration 6, current learner lgbm
+[flaml.automl: 11-12 18:21:45] {2029} INFO -  at 1.0s,	estimator lgbm's best error=0.0533,	best estimator lgbm's best error=0.0533
+[flaml.automl: 11-12 18:21:45] {1826} INFO - iteration 7, current learner extra_tree
+[flaml.automl: 11-12 18:21:45] {2029} INFO -  at 1.1s,	estimator extra_tree's best error=0.0667,	best estimator lgbm's best error=0.0533
+[flaml.automl: 11-12 18:21:45] {2242} INFO - retrain lgbm for 0.0s
+[flaml.automl: 11-12 18:21:45] {2247} INFO - retrained model: LGBMClassifier(learning_rate=0.2677050123105203, max_bin=127,
+               min_child_samples=12, n_estimators=4, num_leaves=4,
+               reg_alpha=0.001348364934537134, reg_lambda=1.4442580148221913,
+               verbose=-1)
+[flaml.automl: 11-12 18:21:45] {1608} INFO - fit succeeded
+[flaml.automl: 11-12 18:21:45] {1610} INFO - Time taken to find the best model: 0.3756711483001709
+```
+
+### A more advanced example including custom learner and metric
+
+[Link to notebook](https://github.com/microsoft/FLAML/blob/main/notebook/flaml_automl.ipynb) | [Open in colab](https://colab.research.google.com/github/microsoft/FLAML/blob/main/notebook/flaml_automl.ipynb)
--- a/website/docs/Examples/AutoML-NLP.md
+++ b/website/docs/Examples/AutoML-NLP.md
@@ -0,0 +1,89 @@
+# AutoML - NLP
+
+### Requirements
+
+This example requires GPU. Install the [nlp] option:
+```python
+pip install "flaml[nlp]"
+```
+
+### A simple sequence classification example
+
+```python
+from flaml import AutoML
+from datasets import load_dataset
+
+train_dataset = load_dataset("glue", "mrpc", split="train").to_pandas()
+dev_dataset = load_dataset("glue", "mrpc", split="validation").to_pandas()
+test_dataset = load_dataset("glue", "mrpc", split="test").to_pandas()
+custom_sent_keys = ["sentence1", "sentence2"]
+label_key = "label"
+X_train, y_train = train_dataset[custom_sent_keys], train_dataset[label_key]
+X_val, y_val = dev_dataset[custom_sent_keys], dev_dataset[label_key]
+X_test = test_dataset[custom_sent_keys]
+
+automl = AutoML()
+automl_settings = {
+    "time_budget": 100,
+    "task": "seq-classification",
+    "custom_hpo_args": {"output_dir": "data/output/"},
+    "gpu_per_trial": 1,  # set to 0 if no GPU is available
+}
+automl.fit(X_train=X_train, y_train=y_train, X_val=X_val, y_val=y_val, **automl_settings)
+automl.predict(X_test)
+```
+
+#### Sample output
+
+```
+[flaml.automl: 12-06 08:21:39] {1943} INFO - task = seq-classification
+[flaml.automl: 12-06 08:21:39] {1945} INFO - Data split method: stratified
+[flaml.automl: 12-06 08:21:39] {1949} INFO - Evaluation method: holdout
+[flaml.automl: 12-06 08:21:39] {2019} INFO - Minimizing error metric: 1-accuracy
+[flaml.automl: 12-06 08:21:39] {2071} INFO - List of ML learners in AutoML Run: ['transformer']
+[flaml.automl: 12-06 08:21:39] {2311} INFO - iteration 0, current learner transformer
+{'data/output/train_2021-12-06_08-21-53/train_8947b1b2_1_n=1e-06,s=9223372036854775807,e=1e-05,s=-1,s=0.45765,e=32,d=42,o=0.0,y=0.0_2021-12-06_08-21-53/checkpoint-53': 53}
+[flaml.automl: 12-06 08:22:56] {2424} INFO - Estimated sufficient time budget=766860s. Estimated necessary time budget=767s.
+[flaml.automl: 12-06 08:22:56] {2499} INFO -  at 76.7s, estimator transformer's best error=0.1740,      best estimator transformer's best error=0.1740
+[flaml.automl: 12-06 08:22:56] {2606} INFO - selected model: <flaml.nlp.huggingface.trainer.TrainerForAuto object at 0x7f49ea8414f0>
+[flaml.automl: 12-06 08:22:56] {2100} INFO - fit succeeded
+[flaml.automl: 12-06 08:22:56] {2101} INFO - Time taken to find the best model: 76.69802761077881
+[flaml.automl: 12-06 08:22:56] {2112} WARNING - Time taken to find the best model is 77% of the provided time budget and not all estimators' hyperparameter search converged. Consider increasing the time budget.
+```
+
+### A simple sequence regression example
+
+```python
+from flaml import AutoML
+from datasets import load_dataset
+
+train_dataset = (
+    load_dataset("glue", "stsb", split="train[:1%]").to_pandas().iloc[0:4]
+)
+dev_dataset = (
+    load_dataset("glue", "stsb", split="train[1%:2%]").to_pandas().iloc[0:4]
+)
+custom_sent_keys = ["sentence1", "sentence2"]
+label_key = "label"
+X_train = train_dataset[custom_sent_keys]
+y_train = train_dataset[label_key]
+X_val = dev_dataset[custom_sent_keys]
+y_val = dev_dataset[label_key]
+
+automl = AutoML()
+automl_settings = {
+    "gpu_per_trial": 0,
+    "time_budget": 20,
+    "task": "seq-regression",
+    "metric": "rmse",
+}
+automl_settings["custom_hpo_args"] = {
+    "model_path": "google/electra-small-discriminator",
+    "output_dir": "data/output/",
+    "ckpt_per_epoch": 5,
+    "fp16": False,
+}
+automl.fit(
+    X_train=X_train, y_train=y_train, X_val=X_val, y_val=y_val, **automl_settings
+)
+```
--- a/website/docs/Examples/AutoML-Rank.md
+++ b/website/docs/Examples/AutoML-Rank.md
@@ -0,0 +1,96 @@
+# AutoML - Rank
+
+### A simple learning-to-rank example
+
+```python
+from sklearn.datasets import fetch_openml
+from flaml import AutoML
+
+X_train, y_train = fetch_openml(name="credit-g", return_X_y=True, as_frame=False)
+y_train = y_train.cat.codes
+# not a real learning to rank dataaset
+groups = [200] * 4 + [100] * 2    # group counts
+automl = AutoML()
+automl.fit(
+    X_train, y_train, groups=groups,
+    task='rank', time_budget=10,    # in seconds
+)
+```
+
+#### Sample output
+
+```
+[flaml.automl: 11-15 07:14:30] {1485} INFO - Data split method: group
+[flaml.automl: 11-15 07:14:30] {1489} INFO - Evaluation method: holdout
+[flaml.automl: 11-15 07:14:30] {1540} INFO - Minimizing error metric: 1-ndcg
+[flaml.automl: 11-15 07:14:30] {1577} INFO - List of ML learners in AutoML Run: ['lgbm', 'xgboost']
+[flaml.automl: 11-15 07:14:30] {1826} INFO - iteration 0, current learner lgbm
+[flaml.automl: 11-15 07:14:30] {1944} INFO - Estimated sufficient time budget=679s. Estimated necessary time budget=1s.
+[flaml.automl: 11-15 07:14:30] {2029} INFO -  at 0.1s,  estimator lgbm's best error=0.0248,     best estimator lgbm's best error=0.0248
+[flaml.automl: 11-15 07:14:30] {1826} INFO - iteration 1, current learner lgbm
+[flaml.automl: 11-15 07:14:30] {2029} INFO -  at 0.1s,  estimator lgbm's best error=0.0248,     best estimator lgbm's best error=0.0248
+[flaml.automl: 11-15 07:14:30] {1826} INFO - iteration 2, current learner lgbm
+[flaml.automl: 11-15 07:14:30] {2029} INFO -  at 0.2s,  estimator lgbm's best error=0.0248,     best estimator lgbm's best error=0.0248
+[flaml.automl: 11-15 07:14:30] {1826} INFO - iteration 3, current learner lgbm
+[flaml.automl: 11-15 07:14:30] {2029} INFO -  at 0.2s,  estimator lgbm's best error=0.0248,     best estimator lgbm's best error=0.0248
+[flaml.automl: 11-15 07:14:30] {1826} INFO - iteration 4, current learner xgboost
+[flaml.automl: 11-15 07:14:30] {2029} INFO -  at 0.2s,  estimator xgboost's best error=0.0315,  best estimator lgbm's best error=0.0248
+[flaml.automl: 11-15 07:14:30] {1826} INFO - iteration 5, current learner xgboost
+[flaml.automl: 11-15 07:14:30] {2029} INFO -  at 0.2s,  estimator xgboost's best error=0.0315,  best estimator lgbm's best error=0.0248
+[flaml.automl: 11-15 07:14:30] {1826} INFO - iteration 6, current learner lgbm
+[flaml.automl: 11-15 07:14:30] {2029} INFO -  at 0.3s,  estimator lgbm's best error=0.0248,     best estimator lgbm's best error=0.0248
+[flaml.automl: 11-15 07:14:30] {1826} INFO - iteration 7, current learner lgbm
+[flaml.automl: 11-15 07:14:30] {2029} INFO -  at 0.3s,  estimator lgbm's best error=0.0248,     best estimator lgbm's best error=0.0248
+[flaml.automl: 11-15 07:14:30] {1826} INFO - iteration 8, current learner xgboost
+[flaml.automl: 11-15 07:14:30] {2029} INFO -  at 0.4s,  estimator xgboost's best error=0.0315,  best estimator lgbm's best error=0.0248
+[flaml.automl: 11-15 07:14:30] {1826} INFO - iteration 9, current learner xgboost
+[flaml.automl: 11-15 07:14:30] {2029} INFO -  at 0.4s,  estimator xgboost's best error=0.0315,  best estimator lgbm's best error=0.0248
+[flaml.automl: 11-15 07:14:30] {1826} INFO - iteration 10, current learner xgboost
+[flaml.automl: 11-15 07:14:30] {2029} INFO -  at 0.4s,  estimator xgboost's best error=0.0233,  best estimator xgboost's best error=0.0233
+[flaml.automl: 11-15 07:14:30] {1826} INFO - iteration 11, current learner xgboost
+[flaml.automl: 11-15 07:14:30] {2029} INFO -  at 0.4s,  estimator xgboost's best error=0.0233,  best estimator xgboost's best error=0.0233
+[flaml.automl: 11-15 07:14:30] {1826} INFO - iteration 12, current learner xgboost
+[flaml.automl: 11-15 07:14:30] {2029} INFO -  at 0.4s,  estimator xgboost's best error=0.0233,  best estimator xgboost's best error=0.0233
+[flaml.automl: 11-15 07:14:30] {1826} INFO - iteration 13, current learner xgboost
+[flaml.automl: 11-15 07:14:30] {2029} INFO -  at 0.4s,  estimator xgboost's best error=0.0233,  best estimator xgboost's best error=0.0233
+[flaml.automl: 11-15 07:14:30] {1826} INFO - iteration 14, current learner lgbm
+[flaml.automl: 11-15 07:14:30] {2029} INFO -  at 0.5s,  estimator lgbm's best error=0.0225,     best estimator lgbm's best error=0.0225
+[flaml.automl: 11-15 07:14:30] {1826} INFO - iteration 15, current learner xgboost
+[flaml.automl: 11-15 07:14:30] {2029} INFO -  at 0.5s,  estimator xgboost's best error=0.0233,  best estimator lgbm's best error=0.0225
+[flaml.automl: 11-15 07:14:30] {1826} INFO - iteration 16, current learner lgbm
+[flaml.automl: 11-15 07:14:30] {2029} INFO -  at 0.5s,  estimator lgbm's best error=0.0225,     best estimator lgbm's best error=0.0225
+[flaml.automl: 11-15 07:14:30] {1826} INFO - iteration 17, current learner lgbm
+[flaml.automl: 11-15 07:14:31] {2029} INFO -  at 0.5s,  estimator lgbm's best error=0.0225,     best estimator lgbm's best error=0.0225
+[flaml.automl: 11-15 07:14:31] {1826} INFO - iteration 18, current learner lgbm
+[flaml.automl: 11-15 07:14:31] {2029} INFO -  at 0.6s,  estimator lgbm's best error=0.0225,     best estimator lgbm's best error=0.0225
+[flaml.automl: 11-15 07:14:31] {1826} INFO - iteration 19, current learner lgbm
+[flaml.automl: 11-15 07:14:31] {2029} INFO -  at 0.6s,  estimator lgbm's best error=0.0201,     best estimator lgbm's best error=0.0201
+[flaml.automl: 11-15 07:14:31] {1826} INFO - iteration 20, current learner lgbm
+[flaml.automl: 11-15 07:14:31] {2029} INFO -  at 0.6s,  estimator lgbm's best error=0.0201,     best estimator lgbm's best error=0.0201
+[flaml.automl: 11-15 07:14:31] {1826} INFO - iteration 21, current learner lgbm
+[flaml.automl: 11-15 07:14:31] {2029} INFO -  at 0.7s,  estimator lgbm's best error=0.0201,     best estimator lgbm's best error=0.0201
+[flaml.automl: 11-15 07:14:31] {1826} INFO - iteration 22, current learner lgbm
+[flaml.automl: 11-15 07:14:31] {2029} INFO -  at 0.7s,  estimator lgbm's best error=0.0201,     best estimator lgbm's best error=0.0201
+[flaml.automl: 11-15 07:14:31] {1826} INFO - iteration 23, current learner lgbm
+[flaml.automl: 11-15 07:14:31] {2029} INFO -  at 0.8s,  estimator lgbm's best error=0.0201,     best estimator lgbm's best error=0.0201
+[flaml.automl: 11-15 07:14:31] {1826} INFO - iteration 24, current learner lgbm
+[flaml.automl: 11-15 07:14:31] {2029} INFO -  at 0.8s,  estimator lgbm's best error=0.0201,     best estimator lgbm's best error=0.0201
+[flaml.automl: 11-15 07:14:31] {1826} INFO - iteration 25, current learner lgbm
+[flaml.automl: 11-15 07:14:31] {2029} INFO -  at 0.8s,  estimator lgbm's best error=0.0201,     best estimator lgbm's best error=0.0201
+[flaml.automl: 11-15 07:14:31] {1826} INFO - iteration 26, current learner lgbm
+[flaml.automl: 11-15 07:14:31] {2029} INFO -  at 0.9s,  estimator lgbm's best error=0.0197,     best estimator lgbm's best error=0.0197
+[flaml.automl: 11-15 07:14:31] {1826} INFO - iteration 27, current learner lgbm
+[flaml.automl: 11-15 07:14:31] {2029} INFO -  at 0.9s,  estimator lgbm's best error=0.0197,     best estimator lgbm's best error=0.0197
+[flaml.automl: 11-15 07:14:31] {1826} INFO - iteration 28, current learner lgbm
+[flaml.automl: 11-15 07:14:31] {2029} INFO -  at 1.0s,  estimator lgbm's best error=0.0197,     best estimator lgbm's best error=0.0197
+[flaml.automl: 11-15 07:14:31] {1826} INFO - iteration 29, current learner lgbm
+[flaml.automl: 11-15 07:14:31] {2029} INFO -  at 1.0s,  estimator lgbm's best error=0.0197,     best estimator lgbm's best error=0.0197
+[flaml.automl: 11-15 07:14:31] {2242} INFO - retrain lgbm for 0.0s
+[flaml.automl: 11-15 07:14:31] {2247} INFO - retrained model: LGBMRanker(colsample_bytree=0.9852774042640857,
+           learning_rate=0.034918421933217675, max_bin=1023,
+           min_child_samples=22, n_estimators=6, num_leaves=23,
+           reg_alpha=0.0009765625, reg_lambda=21.505295697527654, verbose=-1)
+[flaml.automl: 11-15 07:14:31] {1608} INFO - fit succeeded
+[flaml.automl: 11-15 07:14:31] {1610} INFO - Time taken to find the best model: 0.8846545219421387
+[flaml.automl: 11-15 07:14:31] {1624} WARNING - Time taken to find the best model is 88% of the provided time budget and not all estimators' hyperparameter search converged. Consider increasing the time budget.
+```
--- a/website/docs/Examples/AutoML-Regression.md
+++ b/website/docs/Examples/AutoML-Regression.md
@@ -0,0 +1,101 @@
+# AutoML - Regression
+
+### A basic regression example
+
+```python
+from flaml import AutoML
+from sklearn.datasets import fetch_california_housing
+
+# Initialize an AutoML instance
+automl = AutoML()
+# Specify automl goal and constraint
+automl_settings = {
+    "time_budget": 1,  # in seconds
+    "metric": 'r2',
+    "task": 'regression',
+    "log_file_name": "california.log",
+}
+X_train, y_train = fetch_california_housing(return_X_y=True)
+# Train with labeled input data
+automl.fit(X_train=X_train, y_train=y_train,
+           **automl_settings)
+# Predict
+print(automl.predict(X_train))
+# Print the best model
+print(automl.model.estimator)
+```
+
+#### Sample output
+
+```
+[flaml.automl: 11-15 07:08:19] {1485} INFO - Data split method: uniform
+[flaml.automl: 11-15 07:08:19] {1489} INFO - Evaluation method: holdout
+[flaml.automl: 11-15 07:08:19] {1540} INFO - Minimizing error metric: 1-r2
+[flaml.automl: 11-15 07:08:19] {1577} INFO - List of ML learners in AutoML Run: ['lgbm', 'rf', 'catboost', 'xgboost', 'extra_tree']
+[flaml.automl: 11-15 07:08:19] {1826} INFO - iteration 0, current learner lgbm
+[flaml.automl: 11-15 07:08:19] {1944} INFO - Estimated sufficient time budget=846s. Estimated necessary time budget=2s.
+[flaml.automl: 11-15 07:08:19] {2029} INFO -  at 0.2s,  estimator lgbm's best error=0.7393,     best estimator lgbm's best error=0.7393
+[flaml.automl: 11-15 07:08:19] {1826} INFO - iteration 1, current learner lgbm
+[flaml.automl: 11-15 07:08:19] {2029} INFO -  at 0.3s,  estimator lgbm's best error=0.7393,     best estimator lgbm's best error=0.7393
+[flaml.automl: 11-15 07:08:19] {1826} INFO - iteration 2, current learner lgbm
+[flaml.automl: 11-15 07:08:19] {2029} INFO -  at 0.3s,  estimator lgbm's best error=0.5446,     best estimator lgbm's best error=0.5446
+[flaml.automl: 11-15 07:08:19] {1826} INFO - iteration 3, current learner lgbm
+[flaml.automl: 11-15 07:08:19] {2029} INFO -  at 0.4s,  estimator lgbm's best error=0.2807,     best estimator lgbm's best error=0.2807
+[flaml.automl: 11-15 07:08:19] {1826} INFO - iteration 4, current learner lgbm
+[flaml.automl: 11-15 07:08:19] {2029} INFO -  at 0.5s,  estimator lgbm's best error=0.2712,     best estimator lgbm's best error=0.2712
+[flaml.automl: 11-15 07:08:19] {1826} INFO - iteration 5, current learner lgbm
+[flaml.automl: 11-15 07:08:19] {2029} INFO -  at 0.5s,  estimator lgbm's best error=0.2712,     best estimator lgbm's best error=0.2712
+[flaml.automl: 11-15 07:08:19] {1826} INFO - iteration 6, current learner lgbm
+[flaml.automl: 11-15 07:08:20] {2029} INFO -  at 0.6s,  estimator lgbm's best error=0.2712,     best estimator lgbm's best error=0.2712
+[flaml.automl: 11-15 07:08:20] {1826} INFO - iteration 7, current learner lgbm
+[flaml.automl: 11-15 07:08:20] {2029} INFO -  at 0.7s,  estimator lgbm's best error=0.2197,     best estimator lgbm's best error=0.2197
+[flaml.automl: 11-15 07:08:20] {1826} INFO - iteration 8, current learner xgboost
+[flaml.automl: 11-15 07:08:20] {2029} INFO -  at 0.8s,  estimator xgboost's best error=1.4958,  best estimator lgbm's best error=0.2197
+[flaml.automl: 11-15 07:08:20] {1826} INFO - iteration 9, current learner xgboost
+[flaml.automl: 11-15 07:08:20] {2029} INFO -  at 0.8s,  estimator xgboost's best error=1.4958,  best estimator lgbm's best error=0.2197
+[flaml.automl: 11-15 07:08:20] {1826} INFO - iteration 10, current learner xgboost
+[flaml.automl: 11-15 07:08:20] {2029} INFO -  at 0.9s,  estimator xgboost's best error=0.7052,  best estimator lgbm's best error=0.2197
+[flaml.automl: 11-15 07:08:20] {1826} INFO - iteration 11, current learner xgboost
+[flaml.automl: 11-15 07:08:20] {2029} INFO -  at 0.9s,  estimator xgboost's best error=0.3619,  best estimator lgbm's best error=0.2197
+[flaml.automl: 11-15 07:08:20] {1826} INFO - iteration 12, current learner xgboost
+[flaml.automl: 11-15 07:08:20] {2029} INFO -  at 0.9s,  estimator xgboost's best error=0.3619,  best estimator lgbm's best error=0.2197
+[flaml.automl: 11-15 07:08:20] {1826} INFO - iteration 13, current learner xgboost
+[flaml.automl: 11-15 07:08:20] {2029} INFO -  at 1.0s,  estimator xgboost's best error=0.3619,  best estimator lgbm's best error=0.2197
+[flaml.automl: 11-15 07:08:20] {1826} INFO - iteration 14, current learner extra_tree
+[flaml.automl: 11-15 07:08:20] {2029} INFO -  at 1.1s,  estimator extra_tree's best error=0.7197,       best estimator lgbm's best error=0.2197
+[flaml.automl: 11-15 07:08:20] {2242} INFO - retrain lgbm for 0.0s
+[flaml.automl: 11-15 07:08:20] {2247} INFO - retrained model: LGBMRegressor(colsample_bytree=0.7610534336273627,
+              learning_rate=0.41929025492645006, max_bin=255,
+              min_child_samples=4, n_estimators=45, num_leaves=4,
+              reg_alpha=0.0009765625, reg_lambda=0.009280655005879943,
+              verbose=-1)
+[flaml.automl: 11-15 07:08:20] {1608} INFO - fit succeeded
+[flaml.automl: 11-15 07:08:20] {1610} INFO - Time taken to find the best model: 0.7289648056030273
+[flaml.automl: 11-15 07:08:20] {1624} WARNING - Time taken to find the best model is 73% of the provided time budget and not all estimators' hyperparameter search converged. Consider increasing the time budget.
+```
+
+### Multi-output regression
+
+We can combine `sklearn.MultiOutputRegressor` and `flaml.AutoML` to do AutoML for multi-output regression.
+
+```python
+from flaml import AutoML
+from sklearn.datasets import make_regression
+from sklearn.model_selection import train_test_split
+from sklearn.multioutput import MultiOutputRegressor
+
+# create regression data
+X, y = make_regression(n_targets=3)
+
+# split into train and test data
+X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=42)
+
+# train the model
+model = MultiOutputRegressor(AutoML(task="regression", time_budget=60))
+model.fit(X_train, y_train)
+
+# predict
+print(model.predict(X_test))
+```
+
+It will perform AutoML for each target, each taking 60 seconds.
--- a/website/docs/Examples/AutoML-Time
+++ b/website/docs/Examples/AutoML-Time
@@ -0,0 +1,203 @@
+# AutoML - Time Series Forecast
+
+### Prerequisites
+
+Install the [ts_forecast] option.
+```bash
+pip install "flaml[ts_forecast]"
+```
+
+### Univariate time series
+
+```python
+import numpy as np
+from flaml import AutoML
+
+X_train = np.arange('2014-01', '2021-01', dtype='datetime64[M]')
+y_train = np.random.random(size=72)
+automl = AutoML()
+automl.fit(X_train=X_train[:72],  # a single column of timestamp
+           y_train=y_train,  # value for each timestamp
+           period=12,  # time horizon to forecast, e.g., 12 months
+           task='ts_forecast', time_budget=15,  # time budget in seconds
+           log_file_name="ts_forecast.log",
+          )
+print(automl.predict(X_train[72:]))
+```
+
+#### Sample output
+
+```
+[flaml.automl: 11-15 18:44:49] {1485} INFO - Data split method: time
+INFO:flaml.automl:Data split method: time
+[flaml.automl: 11-15 18:44:49] {1489} INFO - Evaluation method: cv
+INFO:flaml.automl:Evaluation method: cv
+[flaml.automl: 11-15 18:44:49] {1540} INFO - Minimizing error metric: mape
+INFO:flaml.automl:Minimizing error metric: mape
+[flaml.automl: 11-15 18:44:49] {1577} INFO - List of ML learners in AutoML Run: ['prophet', 'arima', 'sarimax']
+INFO:flaml.automl:List of ML learners in AutoML Run: ['prophet', 'arima', 'sarimax']
+[flaml.automl: 11-15 18:44:49] {1826} INFO - iteration 0, current learner prophet
+INFO:flaml.automl:iteration 0, current learner prophet
+[flaml.automl: 11-15 18:45:00] {1944} INFO - Estimated sufficient time budget=104159s. Estimated necessary time budget=104s.
+INFO:flaml.automl:Estimated sufficient time budget=104159s. Estimated necessary time budget=104s.
+[flaml.automl: 11-15 18:45:00] {2029} INFO -  at 10.5s,	estimator prophet's best error=1.5681,	best estimator prophet's best error=1.5681
+INFO:flaml.automl: at 10.5s,	estimator prophet's best error=1.5681,	best estimator prophet's best error=1.5681
+[flaml.automl: 11-15 18:45:00] {1826} INFO - iteration 1, current learner arima
+INFO:flaml.automl:iteration 1, current learner arima
+[flaml.automl: 11-15 18:45:00] {2029} INFO -  at 10.7s,	estimator arima's best error=2.3515,	best estimator prophet's best error=1.5681
+INFO:flaml.automl: at 10.7s,	estimator arima's best error=2.3515,	best estimator prophet's best error=1.5681
+[flaml.automl: 11-15 18:45:00] {1826} INFO - iteration 2, current learner arima
+INFO:flaml.automl:iteration 2, current learner arima
+[flaml.automl: 11-15 18:45:01] {2029} INFO -  at 11.5s,	estimator arima's best error=2.1774,	best estimator prophet's best error=1.5681
+INFO:flaml.automl: at 11.5s,	estimator arima's best error=2.1774,	best estimator prophet's best error=1.5681
+[flaml.automl: 11-15 18:45:01] {1826} INFO - iteration 3, current learner arima
+INFO:flaml.automl:iteration 3, current learner arima
+[flaml.automl: 11-15 18:45:01] {2029} INFO -  at 11.9s,	estimator arima's best error=2.1774,	best estimator prophet's best error=1.5681
+INFO:flaml.automl: at 11.9s,	estimator arima's best error=2.1774,	best estimator prophet's best error=1.5681
+[flaml.automl: 11-15 18:45:01] {1826} INFO - iteration 4, current learner arima
+INFO:flaml.automl:iteration 4, current learner arima
+[flaml.automl: 11-15 18:45:02] {2029} INFO -  at 12.9s,	estimator arima's best error=1.8560,	best estimator prophet's best error=1.5681
+INFO:flaml.automl: at 12.9s,	estimator arima's best error=1.8560,	best estimator prophet's best error=1.5681
+[flaml.automl: 11-15 18:45:02] {1826} INFO - iteration 5, current learner arima
+INFO:flaml.automl:iteration 5, current learner arima
+[flaml.automl: 11-15 18:45:04] {2029} INFO -  at 14.4s,	estimator arima's best error=1.8560,	best estimator prophet's best error=1.5681
+INFO:flaml.automl: at 14.4s,	estimator arima's best error=1.8560,	best estimator prophet's best error=1.5681
+[flaml.automl: 11-15 18:45:04] {1826} INFO - iteration 6, current learner sarimax
+INFO:flaml.automl:iteration 6, current learner sarimax
+[flaml.automl: 11-15 18:45:04] {2029} INFO -  at 14.7s,	estimator sarimax's best error=2.3515,	best estimator prophet's best error=1.5681
+INFO:flaml.automl: at 14.7s,	estimator sarimax's best error=2.3515,	best estimator prophet's best error=1.5681
+[flaml.automl: 11-15 18:45:04] {1826} INFO - iteration 7, current learner sarimax
+INFO:flaml.automl:iteration 7, current learner sarimax
+[flaml.automl: 11-15 18:45:04] {2029} INFO -  at 15.0s,	estimator sarimax's best error=1.6371,	best estimator prophet's best error=1.5681
+INFO:flaml.automl: at 15.0s,	estimator sarimax's best error=1.6371,	best estimator prophet's best error=1.5681
+[flaml.automl: 11-15 18:45:05] {2242} INFO - retrain prophet for 0.5s
+INFO:flaml.automl:retrain prophet for 0.5s
+[flaml.automl: 11-15 18:45:05] {2247} INFO - retrained model: <prophet.forecaster.Prophet object at 0x7f042ba1da50>
+INFO:flaml.automl:retrained model: <prophet.forecaster.Prophet object at 0x7f042ba1da50>
+[flaml.automl: 11-15 18:45:05] {1608} INFO - fit succeeded
+INFO:flaml.automl:fit succeeded
+[flaml.automl: 11-15 18:45:05] {1610} INFO - Time taken to find the best model: 10.450132608413696
+INFO:flaml.automl:Time taken to find the best model: 10.450132608413696
+0     0.384715
+1     0.191349
+2     0.372324
+3     0.814549
+4     0.269616
+5     0.470667
+6     0.603665
+7     0.256773
+8     0.408787
+9     0.663065
+10    0.619943
+11    0.090284
+Name: yhat, dtype: float64
+```
+
+### Multivariate time series
+
+```python
+import statsmodels.api as sm
+
+data = sm.datasets.co2.load_pandas().data
+# data is given in weeks, but the task is to predict monthly, so use monthly averages instead
+data = data['co2'].resample('MS').mean()
+data = data.fillna(data.bfill())  # makes sure there are no missing values
+data = data.to_frame().reset_index()
+num_samples = data.shape[0]
+time_horizon = 12
+split_idx = num_samples - time_horizon
+train_df = data[:split_idx]  # train_df is a dataframe with two columns: timestamp and label
+X_test = data[split_idx:]['index'].to_frame()  # X_test is a dataframe with dates for prediction
+y_test = data[split_idx:]['co2']  # y_test is a series of the values corresponding to the dates for prediction
+
+from flaml import AutoML
+
+automl = AutoML()
+settings = {
+    "time_budget": 10,  # total running time in seconds
+    "metric": 'mape',  # primary metric for validation: 'mape' is generally used for forecast tasks
+    "task": 'ts_forecast',  # task type
+    "log_file_name": 'CO2_forecast.log',  # flaml log file
+    "eval_method": "holdout",  # validation method can be chosen from ['auto', 'holdout', 'cv']
+    "seed": 7654321,  # random seed
+}
+
+automl.fit(dataframe=train_df,  # training data
+           label='co2',  # label column
+           period=time_horizon,  # key word argument 'period' must be included for forecast task)
+           **settings)
+```
+
+#### Sample output
+
+```
+[flaml.automl: 11-15 18:54:12] {1485} INFO - Data split method: time
+INFO:flaml.automl:Data split method: time
+[flaml.automl: 11-15 18:54:12] {1489} INFO - Evaluation method: holdout
+INFO:flaml.automl:Evaluation method: holdout
+[flaml.automl: 11-15 18:54:13] {1540} INFO - Minimizing error metric: mape
+INFO:flaml.automl:Minimizing error metric: mape
+[flaml.automl: 11-15 18:54:13] {1577} INFO - List of ML learners in AutoML Run: ['prophet', 'arima', 'sarimax']
+INFO:flaml.automl:List of ML learners in AutoML Run: ['prophet', 'arima', 'sarimax']
+[flaml.automl: 11-15 18:54:13] {1826} INFO - iteration 0, current learner prophet
+INFO:flaml.automl:iteration 0, current learner prophet
+[flaml.automl: 11-15 18:54:15] {1944} INFO - Estimated sufficient time budget=25297s. Estimated necessary time budget=25s.
+INFO:flaml.automl:Estimated sufficient time budget=25297s. Estimated necessary time budget=25s.
+[flaml.automl: 11-15 18:54:15] {2029} INFO -  at 2.6s,	estimator prophet's best error=0.0008,	best estimator prophet's best error=0.0008
+INFO:flaml.automl: at 2.6s,	estimator prophet's best error=0.0008,	best estimator prophet's best error=0.0008
+[flaml.automl: 11-15 18:54:15] {1826} INFO - iteration 1, current learner prophet
+INFO:flaml.automl:iteration 1, current learner prophet
+[flaml.automl: 11-15 18:54:18] {2029} INFO -  at 5.2s,	estimator prophet's best error=0.0008,	best estimator prophet's best error=0.0008
+INFO:flaml.automl: at 5.2s,	estimator prophet's best error=0.0008,	best estimator prophet's best error=0.0008
+[flaml.automl: 11-15 18:54:18] {1826} INFO - iteration 2, current learner arima
+INFO:flaml.automl:iteration 2, current learner arima
+[flaml.automl: 11-15 18:54:18] {2029} INFO -  at 5.5s,	estimator arima's best error=0.0047,	best estimator prophet's best error=0.0008
+INFO:flaml.automl: at 5.5s,	estimator arima's best error=0.0047,	best estimator prophet's best error=0.0008
+[flaml.automl: 11-15 18:54:18] {1826} INFO - iteration 3, current learner arima
+INFO:flaml.automl:iteration 3, current learner arima
+[flaml.automl: 11-15 18:54:18] {2029} INFO -  at 5.6s,	estimator arima's best error=0.0047,	best estimator prophet's best error=0.0008
+INFO:flaml.automl: at 5.6s,	estimator arima's best error=0.0047,	best estimator prophet's best error=0.0008
+[flaml.automl: 11-15 18:54:18] {1826} INFO - iteration 4, current learner prophet
+INFO:flaml.automl:iteration 4, current learner prophet
+[flaml.automl: 11-15 18:54:21] {2029} INFO -  at 8.1s,	estimator prophet's best error=0.0005,	best estimator prophet's best error=0.0005
+INFO:flaml.automl: at 8.1s,	estimator prophet's best error=0.0005,	best estimator prophet's best error=0.0005
+[flaml.automl: 11-15 18:54:21] {1826} INFO - iteration 5, current learner arima
+INFO:flaml.automl:iteration 5, current learner arima
+[flaml.automl: 11-15 18:54:21] {2029} INFO -  at 8.9s,	estimator arima's best error=0.0047,	best estimator prophet's best error=0.0005
+INFO:flaml.automl: at 8.9s,	estimator arima's best error=0.0047,	best estimator prophet's best error=0.0005
+[flaml.automl: 11-15 18:54:21] {1826} INFO - iteration 6, current learner arima
+INFO:flaml.automl:iteration 6, current learner arima
+[flaml.automl: 11-15 18:54:22] {2029} INFO -  at 9.7s,	estimator arima's best error=0.0047,	best estimator prophet's best error=0.0005
+INFO:flaml.automl: at 9.7s,	estimator arima's best error=0.0047,	best estimator prophet's best error=0.0005
+[flaml.automl: 11-15 18:54:22] {1826} INFO - iteration 7, current learner sarimax
+INFO:flaml.automl:iteration 7, current learner sarimax
+[flaml.automl: 11-15 18:54:23] {2029} INFO -  at 10.1s,	estimator sarimax's best error=0.0047,	best estimator prophet's best error=0.0005
+INFO:flaml.automl: at 10.1s,	estimator sarimax's best error=0.0047,	best estimator prophet's best error=0.0005
+[flaml.automl: 11-15 18:54:23] {2242} INFO - retrain prophet for 0.9s
+INFO:flaml.automl:retrain prophet for 0.9s
+[flaml.automl: 11-15 18:54:23] {2247} INFO - retrained model: <prophet.forecaster.Prophet object at 0x7f0418e21f50>
+INFO:flaml.automl:retrained model: <prophet.forecaster.Prophet object at 0x7f0418e21f50>
+[flaml.automl: 11-15 18:54:23] {1608} INFO - fit succeeded
+INFO:flaml.automl:fit succeeded
+[flaml.automl: 11-15 18:54:23] {1610} INFO - Time taken to find the best model: 8.118467330932617
+INFO:flaml.automl:Time taken to find the best model: 8.118467330932617
+[flaml.automl: 11-15 18:54:23] {1624} WARNING - Time taken to find the best model is 81% of the provided time budget and not all estimators' hyperparameter search converged. Consider increasing the time budget.
+WARNING:flaml.automl:Time taken to find the best model is 81% of the provided time budget and not all estimators' hyperparameter search converged. Consider increasing the time budget.
+```
+
+#### Compute and plot predictions
+
+```python
+flaml_y_pred = automl.predict(X_test)
+import matplotlib.pyplot as plt
+
+plt.plot(X_test, y_test, label='Actual level')
+plt.plot(X_test, flaml_y_pred, label='FLAML forecast')
+plt.xlabel('Date')
+plt.ylabel('CO2 Levels')
+plt.legend()
+```
+
+![png](images/CO2.png)
+
+[Link to notebook](https://github.com/microsoft/FLAML/blob/main/notebook/automl_time_series_forecast.ipynb) | [Open in colab](https://colab.research.google.com/github/microsoft/FLAML/blob/main/notebook/automl_time_series_forecast.ipynb)
--- a/website/docs/Examples/AutoML-for-LightGBM.md
+++ b/website/docs/Examples/AutoML-for-LightGBM.md
@@ -0,0 +1,195 @@
+# AutoML for LightGBM
+
+### Use built-in LGBMEstimator
+
+```python
+from flaml import AutoML
+from flaml.data import load_openml_dataset
+
+# Download [houses dataset](https://www.openml.org/d/537) from OpenML. The task is to predict median price of the house in the region based on demographic composition and a state of housing market in the region.
+X_train, X_test, y_train, y_test = load_openml_dataset(dataset_id=537, data_dir='./')
+
+automl = AutoML()
+settings = {
+    "time_budget": 60,  # total running time in seconds
+    "metric": 'r2',  # primary metrics for regression can be chosen from: ['mae','mse','r2']
+    "estimator_list": ['lgbm'],  # list of ML learners; we tune lightgbm in this example
+    "task": 'regression',  # task type  
+    "log_file_name": 'houses_experiment.log',  # flaml log file
+    "seed": 7654321,    # random seed
+}
+automl.fit(X_train=X_train, y_train=y_train, **settings)
+```
+
+#### Sample output
+
+```
+[flaml.automl: 11-15 19:46:44] {1485} INFO - Data split method: uniform
+[flaml.automl: 11-15 19:46:44] {1489} INFO - Evaluation method: cv
+[flaml.automl: 11-15 19:46:44] {1540} INFO - Minimizing error metric: 1-r2
+[flaml.automl: 11-15 19:46:44] {1577} INFO - List of ML learners in AutoML Run: ['lgbm']
+[flaml.automl: 11-15 19:46:44] {1826} INFO - iteration 0, current learner lgbm
+[flaml.automl: 11-15 19:46:44] {1944} INFO - Estimated sufficient time budget=3232s. Estimated necessary time budget=3s.
+[flaml.automl: 11-15 19:46:44] {2029} INFO -  at 0.5s,	estimator lgbm's best error=0.7383,	best estimator lgbm's best error=0.7383
+[flaml.automl: 11-15 19:46:44] {1826} INFO - iteration 1, current learner lgbm
+[flaml.automl: 11-15 19:46:44] {2029} INFO -  at 0.6s,	estimator lgbm's best error=0.4774,	best estimator lgbm's best error=0.4774
+[flaml.automl: 11-15 19:46:44] {1826} INFO - iteration 2, current learner lgbm
+[flaml.automl: 11-15 19:46:44] {2029} INFO -  at 0.7s,	estimator lgbm's best error=0.4774,	best estimator lgbm's best error=0.4774
+[flaml.automl: 11-15 19:46:44] {1826} INFO - iteration 3, current learner lgbm
+[flaml.automl: 11-15 19:46:44] {2029} INFO -  at 0.9s,	estimator lgbm's best error=0.2985,	best estimator lgbm's best error=0.2985
+[flaml.automl: 11-15 19:46:44] {1826} INFO - iteration 4, current learner lgbm
+[flaml.automl: 11-15 19:46:45] {2029} INFO -  at 1.3s,	estimator lgbm's best error=0.2337,	best estimator lgbm's best error=0.2337
+[flaml.automl: 11-15 19:46:45] {1826} INFO - iteration 5, current learner lgbm
+[flaml.automl: 11-15 19:46:45] {2029} INFO -  at 1.4s,	estimator lgbm's best error=0.2337,	best estimator lgbm's best error=0.2337
+[flaml.automl: 11-15 19:46:45] {1826} INFO - iteration 6, current learner lgbm
+[flaml.automl: 11-15 19:46:46] {2029} INFO -  at 2.5s,	estimator lgbm's best error=0.2219,	best estimator lgbm's best error=0.2219
+[flaml.automl: 11-15 19:46:46] {1826} INFO - iteration 7, current learner lgbm
+[flaml.automl: 11-15 19:46:46] {2029} INFO -  at 2.9s,	estimator lgbm's best error=0.2219,	best estimator lgbm's best error=0.2219
+[flaml.automl: 11-15 19:46:46] {1826} INFO - iteration 8, current learner lgbm
+[flaml.automl: 11-15 19:46:48] {2029} INFO -  at 4.5s,	estimator lgbm's best error=0.1764,	best estimator lgbm's best error=0.1764
+[flaml.automl: 11-15 19:46:48] {1826} INFO - iteration 9, current learner lgbm
+[flaml.automl: 11-15 19:46:54] {2029} INFO -  at 10.5s,	estimator lgbm's best error=0.1630,	best estimator lgbm's best error=0.1630
+[flaml.automl: 11-15 19:46:54] {1826} INFO - iteration 10, current learner lgbm
+[flaml.automl: 11-15 19:46:56] {2029} INFO -  at 12.4s,	estimator lgbm's best error=0.1630,	best estimator lgbm's best error=0.1630
+[flaml.automl: 11-15 19:46:56] {1826} INFO - iteration 11, current learner lgbm
+[flaml.automl: 11-15 19:47:13] {2029} INFO -  at 29.0s,	estimator lgbm's best error=0.1630,	best estimator lgbm's best error=0.1630
+[flaml.automl: 11-15 19:47:13] {1826} INFO - iteration 12, current learner lgbm
+[flaml.automl: 11-15 19:47:15] {2029} INFO -  at 31.1s,	estimator lgbm's best error=0.1630,	best estimator lgbm's best error=0.1630
+[flaml.automl: 11-15 19:47:15] {1826} INFO - iteration 13, current learner lgbm
+[flaml.automl: 11-15 19:47:29] {2029} INFO -  at 45.8s,	estimator lgbm's best error=0.1564,	best estimator lgbm's best error=0.1564
+[flaml.automl: 11-15 19:47:33] {2242} INFO - retrain lgbm for 3.2s
+[flaml.automl: 11-15 19:47:33] {2247} INFO - retrained model: LGBMRegressor(colsample_bytree=0.8025848209352517,
+              learning_rate=0.09100963138990374, max_bin=255,
+              min_child_samples=42, n_estimators=363, num_leaves=216,
+              reg_alpha=0.001113000336715291, reg_lambda=76.50614276906414,
+              verbose=-1)
+[flaml.automl: 11-15 19:47:33] {1608} INFO - fit succeeded
+[flaml.automl: 11-15 19:47:33] {1610} INFO - Time taken to find the best model: 45.75616669654846
+[flaml.automl: 11-15 19:47:33] {1624} WARNING - Time taken to find the best model is 76% of the provided time budget and not all estimators' hyperparameter search converged. Consider increasing the time budget.
+```
+
+#### Retrieve best config
+
+```python
+print('Best hyperparmeter config:', automl.best_config)
+print('Best r2 on validation data: {0:.4g}'.format(1-automl.best_loss))
+print('Training duration of best run: {0:.4g} s'.format(automl.best_config_train_time))
+print(automl.model.estimator)
+# Best hyperparmeter config: {'n_estimators': 363, 'num_leaves': 216, 'min_child_samples': 42, 'learning_rate': 0.09100963138990374, 'log_max_bin': 8, 'colsample_bytree': 0.8025848209352517, 'reg_alpha': 0.001113000336715291, 'reg_lambda': 76.50614276906414}
+# Best r2 on validation data: 0.8436
+# Training duration of best run: 3.229 s
+# LGBMRegressor(colsample_bytree=0.8025848209352517,
+#               learning_rate=0.09100963138990374, max_bin=255,
+#               min_child_samples=42, n_estimators=363, num_leaves=216,
+#               reg_alpha=0.001113000336715291, reg_lambda=76.50614276906414,
+#               verbose=-1)
+```
+
+#### Plot feature importance
+
+```python
+import matplotlib.pyplot as plt
+plt.barh(automl.model.estimator.feature_name_, automl.model.estimator.feature_importances_)
+```
+![png](../Use-Cases/images/feature_importance.png)
+
+#### Compute predictions of testing dataset
+
+```python
+y_pred = automl.predict(X_test)
+print('Predicted labels', y_pred)
+# Predicted labels [143391.65036562 245535.13731811 153171.44071629 ... 184354.52735963
+#  235510.49470445 282617.22858956]
+```
+
+#### Compute different metric values on testing dataset
+
+```python
+from flaml.ml import sklearn_metric_loss_score
+print('r2', '=', 1 - sklearn_metric_loss_score('r2', y_pred, y_test))
+print('mse', '=', sklearn_metric_loss_score('mse', y_pred, y_test))
+print('mae', '=', sklearn_metric_loss_score('mae', y_pred, y_test))
+# r2 = 0.8505434326526395
+# mse = 1975592613.138005
+# mae = 29471.536046068788
+```
+
+#### Compare with untuned LightGBM
+
+```python
+from lightgbm import LGBMRegressor
+
+lgbm = LGBMRegressor()
+lgbm.fit(X_train, y_train)
+y_pred = lgbm.predict(X_test)
+from flaml.ml import sklearn_metric_loss_score
+print('default lgbm r2', '=', 1 - sklearn_metric_loss_score('r2', y_pred, y_test))
+# default lgbm r2 = 0.8296179648694404
+```
+
+#### Plot learning curve
+
+How does the model accuracy improve as we search for different hyperparameter configurations?
+
+```python
+from flaml.data import get_output_from_log
+import numpy as np
+
+time_history, best_valid_loss_history, valid_loss_history, config_history, metric_history = \
+    get_output_from_log(filename=settings['log_file_name'], time_budget=60)
+plt.title('Learning Curve')
+plt.xlabel('Wall Clock Time (s)')
+plt.ylabel('Validation r2')
+plt.step(time_history, 1 - np.array(best_valid_loss_history), where='post')
+plt.show()
+```
+![png](images/lgbm_curve.png)
+
+### Use a customized LightGBM learner
+
+The native API of LightGBM allows one to specify a custom objective function in the model constructor. You can easily enable it by adding a customized LightGBM learner in FLAML. In the following example, we show how to add such a customized LightGBM learner with a custom objective function.
+
+#### Create a customized LightGBM learner with a custom objective function
+
+```python
+import numpy as np
+
+# define your customized objective function
+def my_loss_obj(y_true, y_pred):
+    c = 0.5
+    residual = y_pred - y_true
+    grad = c * residual /(np.abs(residual) + c)
+    hess = c ** 2 / (np.abs(residual) + c) ** 2
+    # rmse grad and hess
+    grad_rmse = residual
+    hess_rmse = 1.0
+
+    # mae grad and hess
+    grad_mae = np.array(residual)
+    grad_mae[grad_mae > 0] = 1.
+    grad_mae[grad_mae <= 0] = -1.
+    hess_mae = 1.0
+
+    coef = [0.4, 0.3, 0.3]
+    return coef[0] * grad + coef[1] * grad_rmse + coef[2] * grad_mae, \
+        coef[0] * hess + coef[1] * hess_rmse + coef[2] * hess_mae
+
+from flaml.model import LGBMEstimator
+
+class MyLGBM(LGBMEstimator):
+    """LGBMEstimator with my_loss_obj as the objective function"""
+
+    def __init__(self, **config):
+        super().__init__(objective=my_loss_obj, **config)
+```
+
+#### Add the customized learner and tune it
+
+```python
+automl = AutoML()
+automl.add_learner(learner_name='my_lgbm', learner_class=MyLGBM)
+settings["estimator_list"] = ['my_lgbm']  # change the estimator list
+automl.fit(X_train=X_train, y_train=y_train, **settings)
+```
+
+[Link to notebook](https://github.com/microsoft/FLAML/blob/main/notebook/automl_lightgbm.ipynb) | [Open in colab](https://colab.research.google.com/github/microsoft/FLAML/blob/main/notebook/automl_lightgbm.ipynb)
--- a/website/docs/Examples/AutoML-for-XGBoost.md
+++ b/website/docs/Examples/AutoML-for-XGBoost.md
@@ -0,0 +1,219 @@
+# AutoML for XGBoost
+
+### Use built-in XGBoostSklearnEstimator
+
+```python
+from flaml import AutoML
+from flaml.data import load_openml_dataset
+
+# Download [houses dataset](https://www.openml.org/d/537) from OpenML. The task is to predict median price of the house in the region based on demographic composition and a state of housing market in the region.
+X_train, X_test, y_train, y_test = load_openml_dataset(dataset_id=537, data_dir='./')
+
+automl = AutoML()
+settings = {
+    "time_budget": 60,  # total running time in seconds
+    "metric": 'r2',  # primary metrics for regression can be chosen from: ['mae','mse','r2']
+    "estimator_list": ['xgboost'],  # list of ML learners; we tune lightgbm in this example
+    "task": 'regression',  # task type  
+    "log_file_name": 'houses_experiment.log',  # flaml log file
+    "seed": 7654321,    # random seed
+}
+automl.fit(X_train=X_train, y_train=y_train, **settings)
+```
+
+#### Sample output
+
+```
+[flaml.automl: 09-29 23:06:46] {1446} INFO - Data split method: uniform
+[flaml.automl: 09-29 23:06:46] {1450} INFO - Evaluation method: cv
+[flaml.automl: 09-29 23:06:46] {1496} INFO - Minimizing error metric: 1-r2
+[flaml.automl: 09-29 23:06:46] {1533} INFO - List of ML learners in AutoML Run: ['xgboost']
+[flaml.automl: 09-29 23:06:46] {1763} INFO - iteration 0, current learner xgboost
+[flaml.automl: 09-29 23:06:47] {1880} INFO - Estimated sufficient time budget=2621s. Estimated necessary time budget=3s.
+[flaml.automl: 09-29 23:06:47] {1952} INFO -  at 0.3s,	estimator xgboost's best error=2.1267,	best estimator xgboost's best error=2.1267
+[flaml.automl: 09-29 23:06:47] {1763} INFO - iteration 1, current learner xgboost
+[flaml.automl: 09-29 23:06:47] {1952} INFO -  at 0.5s,	estimator xgboost's best error=2.1267,	best estimator xgboost's best error=2.1267
+[flaml.automl: 09-29 23:06:47] {1763} INFO - iteration 2, current learner xgboost
+[flaml.automl: 09-29 23:06:47] {1952} INFO -  at 0.6s,	estimator xgboost's best error=0.8485,	best estimator xgboost's best error=0.8485
+[flaml.automl: 09-29 23:06:47] {1763} INFO - iteration 3, current learner xgboost
+[flaml.automl: 09-29 23:06:47] {1952} INFO -  at 0.8s,	estimator xgboost's best error=0.3799,	best estimator xgboost's best error=0.3799
+[flaml.automl: 09-29 23:06:47] {1763} INFO - iteration 4, current learner xgboost
+[flaml.automl: 09-29 23:06:47] {1952} INFO -  at 1.0s,	estimator xgboost's best error=0.3799,	best estimator xgboost's best error=0.3799
+[flaml.automl: 09-29 23:06:47] {1763} INFO - iteration 5, current learner xgboost
+[flaml.automl: 09-29 23:06:47] {1952} INFO -  at 1.2s,	estimator xgboost's best error=0.3799,	best estimator xgboost's best error=0.3799
+[flaml.automl: 09-29 23:06:47] {1763} INFO - iteration 6, current learner xgboost
+[flaml.automl: 09-29 23:06:48] {1952} INFO -  at 1.5s,	estimator xgboost's best error=0.2992,	best estimator xgboost's best error=0.2992
+[flaml.automl: 09-29 23:06:48] {1763} INFO - iteration 7, current learner xgboost
+[flaml.automl: 09-29 23:06:48] {1952} INFO -  at 1.9s,	estimator xgboost's best error=0.2992,	best estimator xgboost's best error=0.2992
+[flaml.automl: 09-29 23:06:48] {1763} INFO - iteration 8, current learner xgboost
+[flaml.automl: 09-29 23:06:49] {1952} INFO -  at 2.2s,	estimator xgboost's best error=0.2992,	best estimator xgboost's best error=0.2992
+[flaml.automl: 09-29 23:06:49] {1763} INFO - iteration 9, current learner xgboost
+[flaml.automl: 09-29 23:06:49] {1952} INFO -  at 2.5s,	estimator xgboost's best error=0.2513,	best estimator xgboost's best error=0.2513
+[flaml.automl: 09-29 23:06:49] {1763} INFO - iteration 10, current learner xgboost
+[flaml.automl: 09-29 23:06:49] {1952} INFO -  at 2.8s,	estimator xgboost's best error=0.2513,	best estimator xgboost's best error=0.2513
+[flaml.automl: 09-29 23:06:49] {1763} INFO - iteration 11, current learner xgboost
+[flaml.automl: 09-29 23:06:49] {1952} INFO -  at 3.0s,	estimator xgboost's best error=0.2513,	best estimator xgboost's best error=0.2513
+[flaml.automl: 09-29 23:06:49] {1763} INFO - iteration 12, current learner xgboost
+[flaml.automl: 09-29 23:06:50] {1952} INFO -  at 3.3s,	estimator xgboost's best error=0.2113,	best estimator xgboost's best error=0.2113
+[flaml.automl: 09-29 23:06:50] {1763} INFO - iteration 13, current learner xgboost
+[flaml.automl: 09-29 23:06:50] {1952} INFO -  at 3.5s,	estimator xgboost's best error=0.2113,	best estimator xgboost's best error=0.2113
+[flaml.automl: 09-29 23:06:50] {1763} INFO - iteration 14, current learner xgboost
+[flaml.automl: 09-29 23:06:50] {1952} INFO -  at 4.0s,	estimator xgboost's best error=0.2090,	best estimator xgboost's best error=0.2090
+[flaml.automl: 09-29 23:06:50] {1763} INFO - iteration 15, current learner xgboost
+[flaml.automl: 09-29 23:06:51] {1952} INFO -  at 4.5s,	estimator xgboost's best error=0.2090,	best estimator xgboost's best error=0.2090
+[flaml.automl: 09-29 23:06:51] {1763} INFO - iteration 16, current learner xgboost
+[flaml.automl: 09-29 23:06:51] {1952} INFO -  at 5.2s,	estimator xgboost's best error=0.1919,	best estimator xgboost's best error=0.1919
+[flaml.automl: 09-29 23:06:51] {1763} INFO - iteration 17, current learner xgboost
+[flaml.automl: 09-29 23:06:52] {1952} INFO -  at 5.5s,	estimator xgboost's best error=0.1919,	best estimator xgboost's best error=0.1919
+[flaml.automl: 09-29 23:06:52] {1763} INFO - iteration 18, current learner xgboost
+[flaml.automl: 09-29 23:06:54] {1952} INFO -  at 8.0s,	estimator xgboost's best error=0.1797,	best estimator xgboost's best error=0.1797
+[flaml.automl: 09-29 23:06:54] {1763} INFO - iteration 19, current learner xgboost
+[flaml.automl: 09-29 23:06:55] {1952} INFO -  at 9.0s,	estimator xgboost's best error=0.1797,	best estimator xgboost's best error=0.1797
+[flaml.automl: 09-29 23:06:55] {1763} INFO - iteration 20, current learner xgboost
+[flaml.automl: 09-29 23:07:08] {1952} INFO -  at 21.8s,	estimator xgboost's best error=0.1797,	best estimator xgboost's best error=0.1797
+[flaml.automl: 09-29 23:07:08] {1763} INFO - iteration 21, current learner xgboost
+[flaml.automl: 09-29 23:07:11] {1952} INFO -  at 24.4s,	estimator xgboost's best error=0.1797,	best estimator xgboost's best error=0.1797
+[flaml.automl: 09-29 23:07:11] {1763} INFO - iteration 22, current learner xgboost
+[flaml.automl: 09-29 23:07:16] {1952} INFO -  at 30.0s,	estimator xgboost's best error=0.1782,	best estimator xgboost's best error=0.1782
+[flaml.automl: 09-29 23:07:16] {1763} INFO - iteration 23, current learner xgboost
+[flaml.automl: 09-29 23:07:20] {1952} INFO -  at 33.5s,	estimator xgboost's best error=0.1782,	best estimator xgboost's best error=0.1782
+[flaml.automl: 09-29 23:07:20] {1763} INFO - iteration 24, current learner xgboost
+[flaml.automl: 09-29 23:07:29] {1952} INFO -  at 42.3s,	estimator xgboost's best error=0.1782,	best estimator xgboost's best error=0.1782
+[flaml.automl: 09-29 23:07:29] {1763} INFO - iteration 25, current learner xgboost
+[flaml.automl: 09-29 23:07:30] {1952} INFO -  at 43.2s,	estimator xgboost's best error=0.1782,	best estimator xgboost's best error=0.1782
+[flaml.automl: 09-29 23:07:30] {1763} INFO - iteration 26, current learner xgboost
+[flaml.automl: 09-29 23:07:50] {1952} INFO -  at 63.4s,	estimator xgboost's best error=0.1663,	best estimator xgboost's best error=0.1663
+[flaml.automl: 09-29 23:07:50] {2059} INFO - selected model: <xgboost.core.Booster object at 0x7f6399005910>
+[flaml.automl: 09-29 23:07:55] {2122} INFO - retrain xgboost for 5.4s
+[flaml.automl: 09-29 23:07:55] {2128} INFO - retrained model: <xgboost.core.Booster object at 0x7f6398fc0eb0>
+[flaml.automl: 09-29 23:07:55] {1557} INFO - fit succeeded
+[flaml.automl: 09-29 23:07:55] {1558} INFO - Time taken to find the best model: 63.427649974823
+[flaml.automl: 09-29 23:07:55] {1569} WARNING - Time taken to find the best model is 106% of the provided time budget and not all estimators' hyperparameter search converged. Consider increasing the time budget.
+```
+
+#### Retrieve best config
+
+```python
+print('Best hyperparmeter config:', automl.best_config)
+print('Best r2 on validation data: {0:.4g}'.format(1-automl.best_loss))
+print('Training duration of best run: {0:.4g} s'.format(automl.best_config_train_time))
+print(automl.model.estimator)
+# Best hyperparmeter config: {'n_estimators': 473, 'max_leaves': 35, 'max_depth': 0, 'min_child_weight': 0.001, 'learning_rate': 0.26865031351923346, 'subsample': 0.9718245679598786, 'colsample_bylevel': 0.7421362469066445, 'colsample_bytree': 1.0, 'reg_alpha': 0.06824336834995245, 'reg_lambda': 250.9654222583276}
+# Best r2 on validation data: 0.8384
+# Training duration of best run: 2.194 s
+# XGBRegressor(base_score=0.5, booster='gbtree',
+#              colsample_bylevel=0.7421362469066445, colsample_bynode=1,
+#              colsample_bytree=1.0, gamma=0, gpu_id=-1, grow_policy='lossguide',
+#              importance_type='gain', interaction_constraints='',
+#              learning_rate=0.26865031351923346, max_delta_step=0, max_depth=0,
+#              max_leaves=35, min_child_weight=0.001, missing=nan,
+#              monotone_constraints='()', n_estimators=473, n_jobs=-1,
+#              num_parallel_tree=1, random_state=0, reg_alpha=0.06824336834995245,
+#              reg_lambda=250.9654222583276, scale_pos_weight=1,
+#              subsample=0.9718245679598786, tree_method='hist',
+#              use_label_encoder=False, validate_parameters=1, verbosity=0)
+```
+
+#### Plot feature importance
+
+```python
+import matplotlib.pyplot as plt
+
+plt.barh(X_train.columns, automl.model.estimator.feature_importances_)
+```
+![png](images/xgb_feature_importance.png)
+
+#### Compute predictions of testing dataset
+
+```python
+y_pred = automl.predict(X_test)
+print('Predicted labels', y_pred)
+# Predicted labels [139062.95 237622.   140522.03 ... 182125.5  252156.36 264884.5 ]
+```
+
+#### Compute different metric values on testing dataset
+
+```python
+from flaml.ml import sklearn_metric_loss_score
+print('r2', '=', 1 - sklearn_metric_loss_score('r2', y_pred, y_test))
+print('mse', '=', sklearn_metric_loss_score('mse', y_pred, y_test))
+print('mae', '=', sklearn_metric_loss_score('mae', y_pred, y_test))
+# r2 = 0.8456494234135888
+# mse = 2040284106.2781258
+# mae = 30212.830996680445
+```
+
+#### Compare with untuned XGBoost
+
+```python
+from xgboost import XGBRegressor
+
+xgb = XGBRegressor()
+xgb.fit(X_train, y_train)
+y_pred = xgb.predict(X_test)
+from flaml.ml import sklearn_metric_loss_score
+print('default xgboost r2', '=', 1 - sklearn_metric_loss_score('r2', y_pred, y_test))
+# default xgboost r2 = 0.8265451174596482
+```
+
+#### Plot learning curve
+
+How does the model accuracy improve as we search for different hyperparameter configurations?
+
+```python
+from flaml.data import get_output_from_log
+import numpy as np
+
+time_history, best_valid_loss_history, valid_loss_history, config_history, metric_history = \
+    get_output_from_log(filename=settings['log_file_name'], time_budget=60)
+plt.title('Learning Curve')
+plt.xlabel('Wall Clock Time (s)')
+plt.ylabel('Validation r2')
+plt.step(time_history, 1 - np.array(best_valid_loss_history), where='post')
+plt.show()
+```
+![png](images/xgb_curve.png)
+
+### Use a customized XGBoost learner
+
+You can easily enable a custom objective function by adding a customized XGBoost learner (inherit XGBoostEstimator or XGBoostSklearnEstimator) in FLAML. In the following example, we show how to add such a customized XGBoost learner with a custom objective function.
+
+```python
+import numpy as np
+
+# define your customized objective function
+def logregobj(preds, dtrain):
+    labels = dtrain.get_label()
+    preds = 1.0 / (1.0 + np.exp(-preds)) # transform raw leaf weight
+    grad = preds - labels
+    hess = preds * (1.0 - preds)
+    return grad, hess
+
+from flaml.model import XGBoostEstimator
+
+class MyXGB1(XGBoostEstimator):
+    '''XGBoostEstimator with the logregobj function as the objective function
+    '''
+
+    def __init__(self, **config):
+        super().__init__(objective=logregobj, **config)
+
+class MyXGB2(XGBoostEstimator):
+    '''XGBoostEstimator with 'reg:squarederror' as the objective function
+    '''
+
+    def __init__(self, **config):
+        super().__init__(objective='reg:gamma', **config)
+```
+
+#### Add the customized learners and tune them
+
+```python
+automl = AutoML()
+automl.add_learner(learner_name='my_xgb1', learner_class=MyXGB1)
+automl.add_learner(learner_name='my_xgb2', learner_class=MyXGB2)
+settings["estimator_list"] = ['my_xgb1', 'my_xgb2']  # change the estimator list
+automl.fit(X_train=X_train, y_train=y_train, **settings)
+```
+
+[Link to notebook](https://github.com/microsoft/FLAML/blob/main/notebook/automl_xgboost.ipynb) | [Open in colab](https://colab.research.google.com/github/microsoft/FLAML/blob/main/notebook/automl_xgboost.ipynb)
--- a/website/docs/Examples/Integrate
+++ b/website/docs/Examples/Integrate
@@ -0,0 +1,51 @@
+FLAML can be used together with AzureML and mlflow.
+
+### Prerequisites
+
+Install the [azureml] option.
+```bash
+pip install "flaml[azureml]"
+```
+
+Setup a AzureML workspace:
+```python
+from azureml.core import Workspace
+
+ws = Workspace.create(name='myworkspace', subscription_id='<azure-subscription-id>',resource_group='myresourcegroup')
+```
+
+### Enable mlflow in AzureML workspace
+
+```python
+import mlflow
+from azureml.core import Workspace
+
+ws = Workspace.from_config()
+mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri())
+```
+
+### Start an AutoML run
+
+```python
+from flaml.data import load_openml_dataset
+
+# Download [Airlines dataset](https://www.openml.org/d/1169) from OpenML. The task is to predict whether a given flight will be delayed, given the information of the scheduled departure.
+X_train, X_test, y_train, y_test = load_openml_dataset(dataset_id=1169, data_dir="./")
+
+from flaml import AutoML
+
+automl = AutoML()
+settings = {
+    "time_budget": 60,  # total running time in seconds
+    "metric": "accuracy",  # metric to optimize
+    "task": "classification",  # task type  
+    "log_file_name": "airlines_experiment.log",  # flaml log file
+}
+mlflow.set_experiment("flaml")  # the experiment name in AzureML workspace
+with mlflow.start_run() as run:  # create a mlflow run
+    automl.fit(X_train=X_train, y_train=y_train, **settings)
+```
+
+The metrics in the run will be automatically logged in an experiment named "flaml" in your AzureML workspace.
+
+[Link to notebook](https://github.com/microsoft/FLAML/blob/main/notebook/integrate_azureml.ipynb) | [Open in colab](https://colab.research.google.com/github/microsoft/FLAML/blob/main/notebook/integrate_azureml.ipynb)
--- a/website/docs/Examples/Integrate
+++ b/website/docs/Examples/Integrate
@@ -0,0 +1,63 @@
+As FLAML's AutoML module can be used a transformer in the Sklearn's pipeline we can get all the benefits of pipeline.
+
+### Load data
+
+```python
+from flaml.data import load_openml_dataset
+
+# Download [Airlines dataset](https://www.openml.org/d/1169) from OpenML. The task is to predict whether a given flight will be delayed, given the information of the scheduled departure.
+X_train, X_test, y_train, y_test = load_openml_dataset(
+    dataset_id=1169, data_dir='./', random_state=1234, dataset_format='array')
+```
+
+### Create a pipeline
+
+```python
+from sklearn import set_config
+from sklearn.pipeline import Pipeline
+from sklearn.impute import SimpleImputer
+from sklearn.preprocessing import StandardScaler
+from flaml import AutoML
+
+set_config(display='diagram')
+
+imputer = SimpleImputer()
+standardizer = StandardScaler()
+automl = AutoML()
+
+automl_pipeline = Pipeline([
+    ("imputuer",imputer),
+    ("standardizer", standardizer),
+    ("automl", automl)
+])
+automl_pipeline
+```
+![png](images/pipeline.png)
+
+### Run AutoML in the pipeline
+
+```python
+settings = {
+    "time_budget": 60,  # total running time in seconds
+    "metric": 'accuracy',  # primary metrics can be chosen from: ['accuracy','roc_auc', 'roc_auc_ovr', 'roc_auc_ovo', 'f1','log_loss','mae','mse','r2']
+    "task": 'classification',  # task type  
+    "estimator_list":['xgboost','catboost','lgbm'],
+    "log_file_name": 'airlines_experiment.log',  # flaml log file
+}
+automl_pipeline.fit(X_train, y_train,
+                    automl__time_budget=60,
+                    automl__metric="accuracy")
+```
+
+### Get the automl object from the pipeline
+
+```python
+automl = automl_pipeline.steps[2][1]
+# Get the best config and best learner
+print('Best ML leaner:', automl.best_estimator)
+print('Best hyperparmeter config:', automl.best_config)
+print('Best accuracy on validation data: {0:.4g}'.format(1 - automl.best_loss))
+print('Training duration of best run: {0:.4g} s'.format(automl.best_config_train_time))
+```
+
+[Link to notebook](https://github.com/microsoft/FLAML/blob/main/notebook/integrate_sklearn.ipynb) | [Open in colab](https://colab.research.google.com/github/microsoft/FLAML/blob/main/notebook/integrate_sklearn.ipynb)
--- a/website/docs/Examples/Tune-HuggingFace.md
+++ b/website/docs/Examples/Tune-HuggingFace.md
@@ -0,0 +1,187 @@
+# Tune - HuggingFace
+
+This example uses flaml to finetune a transformer model from Huggingface transformers library.
+
+### Requirements
+
+This example requires GPU. Install dependencies:
+```python
+pip install torch transformers datasets "flaml[blendsearch,ray]"
+```
+
+### Prepare for tuning
+
+#### Tokenizer
+
+```python
+from transformers import AutoTokenizer
+
+MODEL_NAME = "distilbert-base-uncased"
+tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, use_fast=True)
+COLUMN_NAME = "sentence"
+
+def tokenize(examples):
+    return tokenizer(examples[COLUMN_NAME], truncation=True)
+```
+
+#### Define training method
+
+```python
+import flaml
+import datasets
+from transformers import AutoModelForSequenceClassification
+
+TASK = "cola"
+NUM_LABELS = 2
+
+def train_distilbert(config: dict):
+    # Load CoLA dataset and apply tokenizer
+    cola_raw = datasets.load_dataset("glue", TASK)
+    cola_encoded = cola_raw.map(tokenize, batched=True)
+    train_dataset, eval_dataset = cola_encoded["train"], cola_encoded["validation"]
+
+    model = AutoModelForSequenceClassification.from_pretrained(
+        MODEL_NAME, num_labels=NUM_LABELS
+    )
+    metric = datasets.load_metric("glue", TASK)
+
+    def compute_metrics(eval_pred):
+        predictions, labels = eval_pred
+        predictions = np.argmax(predictions, axis=1)
+        return metric.compute(predictions=predictions, references=labels)
+
+    training_args = TrainingArguments(
+        output_dir='.',
+        do_eval=False,
+        disable_tqdm=True,
+        logging_steps=20000,
+        save_total_limit=0,
+        **config,
+    )
+
+    trainer = Trainer(
+        model,
+        training_args,
+        train_dataset=train_dataset,
+        eval_dataset=eval_dataset,
+        tokenizer=tokenizer,
+        compute_metrics=compute_metrics,
+    )
+
+    # train model
+    trainer.train()
+
+    # evaluate model
+    eval_output = trainer.evaluate()
+
+    # report the metric to optimize & the metric to log
+    flaml.tune.report(
+        loss=eval_output["eval_loss"],
+        matthews_correlation=eval_output["eval_matthews_correlation"],
+    )
+```
+
+### Define the search
+
+We are now ready to define our search. This includes:
+
+- The `search_space` for our hyperparameters
+- The `metric` and the `mode` ('max' or 'min') for optimization
+- The constraints (`n_cpus`, `n_gpus`, `num_samples`, and `time_budget_s`)
+
+```python
+max_num_epoch = 64
+search_space = {
+        # You can mix constants with search space objects.
+        "num_train_epochs": flaml.tune.loguniform(1, max_num_epoch),
+        "learning_rate": flaml.tune.loguniform(1e-6, 1e-4),
+        "adam_epsilon": flaml.tune.loguniform(1e-9, 1e-7),
+        "adam_beta1": flaml.tune.uniform(0.8, 0.99),
+        "adam_beta2": flaml.tune.loguniform(98e-2, 9999e-4),
+}
+
+# optimization objective
+HP_METRIC, MODE = "matthews_correlation", "max"
+
+# resources
+num_cpus = 4
+num_gpus = 4  # change according to your GPU resources
+
+# constraints
+num_samples = -1  # number of trials, -1 means unlimited
+time_budget_s = 3600  # time budget in seconds
+```
+
+### Launch the tuning
+
+We are now ready to launch the tuning using `flaml.tune.run`:
+
+```python
+import ray
+
+ray.init(num_cpus=num_cpus, num_gpus=num_gpus)
+print("Tuning started...")
+analysis = flaml.tune.run(
+    train_distilbert,
+    search_alg=flaml.CFO(
+        space=search_space,
+        metric=HP_METRIC,
+        mode=MODE,
+        low_cost_partial_config={"num_train_epochs": 1}),
+    resources_per_trial={"gpu": num_gpus, "cpu": num_cpus},
+    local_dir='logs/',
+    num_samples=num_samples,
+    time_budget_s=time_budget_s,
+    use_ray=True,
+)
+```
+
+This will run tuning for one hour. At the end we will see a summary.
+```
+== Status ==
+Memory usage on this node: 32.0/251.6 GiB
+Using FIFO scheduling algorithm.
+Resources requested: 0/4 CPUs, 0/4 GPUs, 0.0/150.39 GiB heap, 0.0/47.22 GiB objects (0/1.0 accelerator_type:V100)
+Result logdir: /home/chiw/FLAML/notebook/logs/train_distilbert_2021-05-07_02-35-58
+Number of trials: 22/infinite (22 TERMINATED)
+Trial name	status	loc	adam_beta1	adam_beta2	adam_epsilon	learning_rate	num_train_epochs	iter	total time (s)	loss	matthews_correlation
+train_distilbert_a0c303d0	TERMINATED		0.939079	0.991865	7.96945e-08	5.61152e-06	1	1	55.6909	0.587986	0
+train_distilbert_a0c303d1	TERMINATED		0.811036	0.997214	2.05111e-09	2.05134e-06	1.44427	1	71.7663	0.603018	0
+train_distilbert_c39b2ef0	TERMINATED		0.909395	0.993715	1e-07	5.26543e-06	1	1	53.7619	0.586518	0
+train_distilbert_f00776e2	TERMINATED		0.968763	0.990019	4.38943e-08	5.98035e-06	1.02723	1	56.8382	0.581313	0
+train_distilbert_11ab3900	TERMINATED		0.962198	0.991838	7.09296e-08	5.06608e-06	1	1	54.0231	0.585576	0
+train_distilbert_353025b6	TERMINATED		0.91596	0.991892	8.95426e-08	6.21568e-06	2.15443	1	98.3233	0.531632	0.388893
+train_distilbert_5728a1de	TERMINATED		0.926933	0.993146	1e-07	1.00902e-05	1	1	55.3726	0.538505	0.280558
+train_distilbert_9394c2e2	TERMINATED		0.928106	0.990614	4.49975e-08	3.45674e-06	2.72935	1	121.388	0.539177	0.327295
+train_distilbert_b6543fec	TERMINATED		0.876896	0.992098	1e-07	7.01176e-06	1.59538	1	76.0244	0.527516	0.379177
+train_distilbert_0071f998	TERMINATED		0.955024	0.991687	7.39776e-08	5.50998e-06	2.90939	1	126.871	0.516225	0.417157
+train_distilbert_2f830be6	TERMINATED		0.886931	0.989628	7.6127e-08	4.37646e-06	1.53338	1	73.8934	0.551629	0.0655887
+train_distilbert_7ce03f12	TERMINATED		0.984053	0.993956	8.70144e-08	7.82557e-06	4.08775	1	174.027	0.523732	0.453549
+train_distilbert_aaab0508	TERMINATED		0.940707	0.993946	1e-07	8.91979e-06	3.40243	1	146.249	0.511288	0.45085
+train_distilbert_14262454	TERMINATED		0.99	0.991696	4.60093e-08	4.83405e-06	3.4954	1	152.008	0.53506	0.400851
+train_distilbert_6d211fe6	TERMINATED		0.959277	0.994556	5.40791e-08	1.17333e-05	6.64995	1	271.444	0.609851	0.526802
+train_distilbert_c980bae4	TERMINATED		0.99	0.993355	1e-07	5.21929e-06	2.51275	1	111.799	0.542276	0.324968
+train_distilbert_6d0d29d6	TERMINATED		0.965773	0.995182	9.9752e-08	1.15549e-05	13.694	1	527.944	0.923802	0.549474
+train_distilbert_b16ea82a	TERMINATED		0.952781	0.993931	2.93182e-08	1.19145e-05	3.2293	1	139.844	0.533466	0.451307
+train_distilbert_eddf7cc0	TERMINATED		0.99	0.997109	8.13498e-08	1.28515e-05	15.5807	1	614.789	0.983285	0.56993
+train_distilbert_43008974	TERMINATED		0.929089	0.993258	1e-07	1.03892e-05	12.0357	1	474.387	0.857461	0.520022
+train_distilbert_b3408a4e	TERMINATED		0.99	0.993809	4.67441e-08	1.10418e-05	11.9165	1	474.126	0.828205	0.526164
+train_distilbert_cfbfb220	TERMINATED		0.979454	0.9999	1e-07	1.49578e-05	20.3715
+```
+
+### Retrieve the results
+
+```python
+best_trial = analysis.get_best_trial(HP_METRIC, MODE, "all")
+metric = best_trial.metric_analysis[HP_METRIC][MODE]
+print(f"n_trials={len(analysis.trials)}")
+print(f"time={time.time()-start_time}")
+print(f"Best model eval {HP_METRIC}: {metric:.4f}")
+print(f"Best model parameters: {best_trial.config}")
+# n_trials=22
+# time=3999.769361972809
+# Best model eval matthews_correlation: 0.5699
+# Best model parameters: {'num_train_epochs': 15.580684188655825, 'learning_rate': 1.2851507818900338e-05, 'adam_epsilon': 8.134982521948352e-08, 'adam_beta1': 0.99, 'adam_beta2': 0.9971094424784387}
+```
+
+[Link to notebook](https://github.com/microsoft/FLAML/blob/main/notebook/tune_huggingface.ipynb) | [Open in colab](https://colab.research.google.com/github/microsoft/FLAML/blob/main/notebook/tune_huggingface.ipynb)
--- a/website/docs/Examples/Tune-PyTorch.md
+++ b/website/docs/Examples/Tune-PyTorch.md
@@ -0,0 +1,286 @@
+# Tune - PyTorch
+
+This example uses flaml to tune a pytorch model on CIFAR10.
+
+## Prepare for tuning
+
+### Requirements
+```bash
+pip install torchvision "flaml[blendsearch,ray]"
+```
+
+Before we are ready for tuning, we first need to define the neural network that we would like to tune.
+
+### Network Specification
+
+```python
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+import torch.optim as optim
+from torch.utils.data import random_split
+import torchvision
+import torchvision.transforms as transforms
+
+
+class Net(nn.Module):
+
+    def __init__(self, l1=120, l2=84):
+        super(Net, self).__init__()
+        self.conv1 = nn.Conv2d(3, 6, 5)
+        self.pool = nn.MaxPool2d(2, 2)
+        self.conv2 = nn.Conv2d(6, 16, 5)
+        self.fc1 = nn.Linear(16 * 5 * 5, l1)
+        self.fc2 = nn.Linear(l1, l2)
+        self.fc3 = nn.Linear(l2, 10)
+
+    def forward(self, x):
+        x = self.pool(F.relu(self.conv1(x)))
+        x = self.pool(F.relu(self.conv2(x)))
+        x = x.view(-1, 16 * 5 * 5)
+        x = F.relu(self.fc1(x))
+        x = F.relu(self.fc2(x))
+        x = self.fc3(x)
+        return x
+```
+
+### Data
+
+```python
+def load_data(data_dir="data"):
+    transform = transforms.Compose([
+        transforms.ToTensor(),
+        transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
+    ])
+
+    trainset = torchvision.datasets.CIFAR10(
+        root=data_dir, train=True, download=True, transform=transform)
+
+    testset = torchvision.datasets.CIFAR10(
+        root=data_dir, train=False, download=True, transform=transform)
+
+    return trainset, testset
+```
+
+### Training
+
+```python
+from ray import tune
+
+def train_cifar(config, checkpoint_dir=None, data_dir=None):
+    if "l1" not in config:
+        logger.warning(config)
+    net = Net(2**config["l1"], 2**config["l2"])
+
+    device = "cpu"
+    if torch.cuda.is_available():
+        device = "cuda:0"
+        if torch.cuda.device_count() > 1:
+            net = nn.DataParallel(net)
+    net.to(device)
+
+    criterion = nn.CrossEntropyLoss()
+    optimizer = optim.SGD(net.parameters(), lr=config["lr"], momentum=0.9)
+
+    # The `checkpoint_dir` parameter gets passed by Ray Tune when a checkpoint
+    # should be restored.
+    if checkpoint_dir:
+        checkpoint = os.path.join(checkpoint_dir, "checkpoint")
+        model_state, optimizer_state = torch.load(checkpoint)
+        net.load_state_dict(model_state)
+        optimizer.load_state_dict(optimizer_state)
+
+    trainset, testset = load_data(data_dir)
+
+    test_abs = int(len(trainset) * 0.8)
+    train_subset, val_subset = random_split(
+        trainset, [test_abs, len(trainset) - test_abs])
+
+    trainloader = torch.utils.data.DataLoader(
+        train_subset,
+        batch_size=int(2**config["batch_size"]),
+        shuffle=True,
+        num_workers=4)
+    valloader = torch.utils.data.DataLoader(
+        val_subset,
+        batch_size=int(2**config["batch_size"]),
+        shuffle=True,
+        num_workers=4)
+
+    for epoch in range(int(round(config["num_epochs"]))):  # loop over the dataset multiple times
+        running_loss = 0.0
+        epoch_steps = 0
+        for i, data in enumerate(trainloader, 0):
+            # get the inputs; data is a list of [inputs, labels]
+            inputs, labels = data
+            inputs, labels = inputs.to(device), labels.to(device)
+
+            # zero the parameter gradients
+            optimizer.zero_grad()
+
+            # forward + backward + optimize
+            outputs = net(inputs)
+            loss = criterion(outputs, labels)
+            loss.backward()
+            optimizer.step()
+
+            # print statistics
+            running_loss += loss.item()
+            epoch_steps += 1
+            if i % 2000 == 1999:  # print every 2000 mini-batches
+                print("[%d, %5d] loss: %.3f" % (epoch + 1, i + 1,
+                                                running_loss / epoch_steps))
+                running_loss = 0.0
+
+        # Validation loss
+        val_loss = 0.0
+        val_steps = 0
+        total = 0
+        correct = 0
+        for i, data in enumerate(valloader, 0):
+            with torch.no_grad():
+                inputs, labels = data
+                inputs, labels = inputs.to(device), labels.to(device)
+
+                outputs = net(inputs)
+                _, predicted = torch.max(outputs.data, 1)
+                total += labels.size(0)
+                correct += (predicted == labels).sum().item()
+
+                loss = criterion(outputs, labels)
+                val_loss += loss.cpu().numpy()
+                val_steps += 1
+
+        # Here we save a checkpoint. It is automatically registered with
+        # Ray Tune and will potentially be passed as the `checkpoint_dir`
+        # parameter in future iterations.
+        with tune.checkpoint_dir(step=epoch) as checkpoint_dir:
+            path = os.path.join(checkpoint_dir, "checkpoint")
+            torch.save(
+                (net.state_dict(), optimizer.state_dict()), path)
+
+        tune.report(loss=(val_loss / val_steps), accuracy=correct / total)
+    print("Finished Training")
+```
+
+### Test Accuracy
+
+```python
+def _test_accuracy(net, device="cpu"):
+    trainset, testset = load_data()
+
+    testloader = torch.utils.data.DataLoader(
+        testset, batch_size=4, shuffle=False, num_workers=2)
+
+    correct = 0
+    total = 0
+    with torch.no_grad():
+        for data in testloader:
+            images, labels = data
+            images, labels = images.to(device), labels.to(device)
+            outputs = net(images)
+            _, predicted = torch.max(outputs.data, 1)
+            total += labels.size(0)
+            correct += (predicted == labels).sum().item()
+
+    return correct / total
+```
+
+## Hyperparameter Optimization
+
+```python
+import numpy as np
+import flaml
+import os
+
+data_dir = os.path.abspath("data")
+load_data(data_dir)  # Download data for all trials before starting the run
+```
+
+### Search space
+
+```python
+max_num_epoch = 100
+config = {
+    "l1": tune.randint(2, 9),   # log transformed with base 2
+    "l2": tune.randint(2, 9),   # log transformed with base 2
+    "lr": tune.loguniform(1e-4, 1e-1),
+    "num_epochs": tune.loguniform(1, max_num_epoch),
+    "batch_size": tune.randint(1, 5)    # log transformed with base 2
+}
+```
+
+### Budget and resource constraints
+
+```python
+time_budget_s = 600     # time budget in seconds
+gpus_per_trial = 0.5    # number of gpus for each trial; 0.5 means two training jobs can share one gpu
+num_samples = 500       # maximal number of trials
+np.random.seed(7654321)
+```
+
+### Launch the tuning
+
+```python
+import time
+start_time = time.time()
+result = flaml.tune.run(
+    tune.with_parameters(train_cifar, data_dir=data_dir),
+    config=config,
+    metric="loss",
+    mode="min",
+    low_cost_partial_config={"num_epochs": 1},
+    max_resource=max_num_epoch,
+    min_resource=1,
+    scheduler="asha",  # Use asha scheduler to perform early stopping based on intermediate results reported
+    resources_per_trial={"cpu": 1, "gpu": gpus_per_trial},
+    local_dir='logs/',
+    num_samples=num_samples,
+    time_budget_s=time_budget_s,
+    use_ray=True)
+```
+
+### Check the result
+
+```python
+print(f"#trials={len(result.trials)}")
+print(f"time={time.time()-start_time}")
+best_trial = result.get_best_trial("loss", "min", "all")
+print("Best trial config: {}".format(best_trial.config))
+print("Best trial final validation loss: {}".format(
+    best_trial.metric_analysis["loss"]["min"]))
+print("Best trial final validation accuracy: {}".format(
+    best_trial.metric_analysis["accuracy"]["max"]))
+
+best_trained_model = Net(2**best_trial.config["l1"],
+                         2**best_trial.config["l2"])
+device = "cpu"
+if torch.cuda.is_available():
+    device = "cuda:0"
+    if gpus_per_trial > 1:
+        best_trained_model = nn.DataParallel(best_trained_model)
+best_trained_model.to(device)
+
+checkpoint_path = os.path.join(best_trial.checkpoint.value, "checkpoint")
+
+model_state, optimizer_state = torch.load(checkpoint_path)
+best_trained_model.load_state_dict(model_state)
+
+test_acc = _test_accuracy(best_trained_model, device)
+print("Best trial test set accuracy: {}".format(test_acc))
+```
+
+### Sample of output
+
+```
+#trials=44
+time=1193.913584947586
+Best trial config: {'l1': 8, 'l2': 8, 'lr': 0.0008818671030627281, 'num_epochs': 55.9513429004283, 'batch_size': 3}
+Best trial final validation loss: 1.0694482081472874
+Best trial final validation accuracy: 0.6389
+Files already downloaded and verified
+Files already downloaded and verified
+Best trial test set accuracy: 0.6294
+```
+
+[Link to notebook](https://github.com/microsoft/FLAML/blob/main/notebook/tune_pytorch.ipynb) | [Open in colab](https://colab.research.google.com/github/microsoft/FLAML/blob/main/notebook/tune_pytorch.ipynb)
--- a/website/docs/Examples/images/CO2.png
+++ b/website/docs/Examples/images/CO2.png
--- a/website/docs/Examples/images/lgbm_curve.png
+++ b/website/docs/Examples/images/lgbm_curve.png
--- a/website/docs/Examples/images/pipeline.png
+++ b/website/docs/Examples/images/pipeline.png
--- a/website/docs/Examples/images/xgb_curve.png
+++ b/website/docs/Examples/images/xgb_curve.png
--- a/website/docs/Examples/images/xgb_feature_importance.png
+++ b/website/docs/Examples/images/xgb_feature_importance.png
--- a/website/docs/FAQ.md
+++ b/website/docs/FAQ.md
@@ -0,0 +1,30 @@
+# Frequently Asked Questions
+
+### About `low_cost_partial_config` in `tune`.
+
+- Definition and purpose: The `low_cost_partial_config` is a dictionary of subset of the hyperparameter coordinates whose value corresponds to a configuration with known low-cost (i.e., low computation cost for training the corresponding model).  The concept of low/high-cost is meaningful in the case where a subset of the hyperparameters to tune directly affects the computation cost for training the model. For example, `n_estimators` and `max_leaves` are known to affect the training cost of tree-based learners. We call this subset of hyperparameters, *cost-related hyperparameters*. In such scenarios, if you are aware of low-cost configurations for the cost-related hyperparameters, you are recommended to set them as the `low_cost_partial_config`. Using the tree-based method example again, since we know that small `n_estimators` and  `max_leaves` generally correspond to simpler models and thus lower cost, we set `{'n_estimators': 4, 'max_leaves': 4}` as the `low_cost_partial_config` by default (note that `4` is the lower bound of search space for these two hyperparameters), e.g., in [LGBM](https://github.com/microsoft/FLAML/blob/main/flaml/model.py#L215).  Configuring `low_cost_partial_config` helps the search algorithms make more cost-efficient choices.  
+In AutoML, the `low_cost_init_value` in `search_space()` function for each estimator serves the same role.
+
+- Usage in practice: It is recommended to configure it if there are cost-related hyperparameters in your tuning task and you happen to know the low-cost values for them, but it is not required( It is fine to leave it the default value, i.e., `None`).
+
+- How does it work: `low_cost_partial_config` if configured, will be used as an initial point of the search. It also affects the search trajectory. For more details about how does it play a role in the search algorithms, please refer to the papers about the search algorithms used: Section 2 of [Frugal Optimization for Cost-related Hyperparameters (CFO)](https://arxiv.org/pdf/2005.01571.pdf) and Section 3 of [Economical Hyperparameter Optimization with Blended Search Strategy (BlendSearch)](https://openreview.net/pdf?id=VbLH04pRA3).
+
+
+### How does FLAML handle imbalanced data (unequal distribution of target classes in classification task)?
+
+Currently FLAML does several things for imbalanced data.
+
+1. When a class contains fewer than 20 examples, we repeatedly add these examples to the training data until the count is at least 20.
+2. We use stratified sampling when doing holdout and kf.
+3. We make sure no class is empty in both training and holdout data.
+4. We allow users to pass `sample_weight` to `AutoML.fit()`.
+
+
+### How to interpret model performance? Is it possible for me to visualize feature importance, SHAP values, optimization history?
+
+You can use ```automl.model.estimator.feature_importances_``` to get the `feature_importances_` for the best model found by automl. See an [example](Examples/AutoML-for-XGBoost#plot-feature-importance).
+
+Packages such as `azureml-interpret` and `sklearn.inspection.permutation_importance` can be used on `automl.model.estimator` to explain the selected model.
+Model explanation is frequently asked and adding a native support may be a good feature. Suggestions/contributions are welcome.
+
+Optimization history can be checked from the [log](Use-Cases/Task-Oriented-AutoML#log-the-trials). You can also [retrieve the log and plot the learning curve](Use-Cases/Task-Oriented-AutoML#plot-learning-curve).
--- a/website/docs/Getting-Started.md
+++ b/website/docs/Getting-Started.md
@@ -0,0 +1,87 @@
+# Getting Started
+
+<!-- ### Welcome to FLAML, a Fast Library for Automated Machine Learning & Tuning! -->
+
+FLAML is a lightweight Python library that finds accurate machine
+learning models automatically, efficiently and economically. It frees users from selecting learners and hyperparameters for each learner.
+
+### Main Features
+
+1. For common machine learning tasks like classification and regression, it quickly finds quality models for user-provided data with low computational resources. It supports both classifcal machine learning models and deep neural networks.
+
+2. It is easy to customize or extend. Users can choose their desired customizability: minimal customization (computational resource budget), medium customization (e.g., scikit-style learner, search space and metric), or full customization (arbitrary training and evaluation code).
+
+3. It supports fast and economical automatic tuning, capable of handling large search space with heterogeneous evaluation cost and complex constraints/guidance/early stopping. FLAML is powered by a new, [cost-effective
+hyperparameter optimization](Use-Cases/Tune-User-Defined-Function#hyperparameter-optimization-algorithm)
+and learner selection method invented by Microsoft Research.
+
+### Quickstart
+
+Install FLAML from pip: `pip install flaml`. Find more options in [Installation](Installation).
+
+There are two ways of using flaml:
+
+#### [Task-oriented AutoML](Use-Cases/task-oriented-automl)
+
+For example, with three lines of code, you can start using this economical and fast AutoML engine as a scikit-learn style estimator.
+
+```python
+from flaml import AutoML
+automl = AutoML()
+automl.fit(X_train, y_train, task="classification")
+```
+
+It automatically tunes the hyparparameters and selects the best model from default learners such as LightGBM, XGBoost, random forest etc. [Customizing](Use-Cases/task-oriented-automl#customize-automlfit) the optimization metrics, learners and search spaces etc. is very easy. For example,
+
+```python
+automl.add_learner("mylgbm", MyLGBMEstimator)
+automl.fit(X_train, y_train, task="classification", metric=custom_metric, estimator_list=["mylgbm"])
+```
+
+#### [Tune user-defined function](Use-Cases/Tune-User-Defined-Function)
+
+You can run generic hyperparameter tuning for a custom function (machine learning or beyond). For example,
+
+```python
+from flaml import tune
+from flaml.model import LGBMEstimator
+
+def train_lgbm(config: dict) -> dict:
+    # convert config dict to lgbm params
+    params = LGBMEstimator(**config).params
+    num_boost_round = params.pop("n_estimators")
+    # train the model
+    train_set = lightgbm.Dataset(X_train, y_train)
+    model = lightgbm.train(params, train_set, num_boost_round)
+    # evaluate the model
+    pred = model.predict(X_test)
+    mse = mean_squared_error(y_test, pred)
+    # return eval results as a dictionary
+    return {"mse": mse}
+
+# load a built-in search space from flaml
+flaml_lgbm_search_space = LGBMEstimator.search_space(X_train.shape)
+# specify the search space as a dict from hp name to domain; you can define your own search space same way
+config_search_space = {hp: space["domain"] for hp, space in flaml_lgbm_search_space.items()}
+# give guidance about hp values corresponding to low training cost, i.e., {"n_estimators": 4, "num_leaves": 4}
+low_cost_partial_config = {
+    hp: space["low_cost_init_value"]
+    for hp, space in flaml_lgbm_search_space.items()
+    if "low_cost_init_value" in space
+}
+# run the tuning, minimizing mse, with total time budget 3 seconds
+analysis = tune.run(
+    train_lgbm, metric="mse", mode="min", config=config_search_space,
+    low_cost_partial_config=low_cost_partial_config, time_budget_s=3, num_samples=-1,
+)
+```
+
+### Where to Go Next?
+
+* Understand the use cases for [Task-oriented AutoML](Use-Cases/task-oriented-automl) and [Tune user-defined function](Use-Cases/Tune-User-Defined-Function).
+* Find code examples under "Examples": from [AutoML - Classification](Examples/AutoML-Classification) to [Tune - PyTorch](Examples/Tune-PyTorch).
+* Watch [video tutorials](https://www.youtube.com/channel/UCfU0zfFXHXdAd5x-WvFBk5A).
+* Learn about [research](Research) around FLAML.
+* Refer to [SDK](reference/automl) and [FAQ](FAQ).
+
+If you like our project, please give it a [star](https://github.com/microsoft/FLAML/stargazers) on GitHub. If you are interested in contributing, please read [Contributor's Guide](Contribute).
--- a/website/docs/Installation.md
+++ b/website/docs/Installation.md
@@ -0,0 +1,63 @@
+# Installation
+
+FLAML requires **Python version >= 3.6**. It can be installed from pip:
+
+```bash
+pip install flaml
+```
+
+or conda:
+```
+conda install flaml -c conda-forge
+```
+
+FLAML has a .NET implementation as well from [ML.NET Model Builder](https://dotnet.microsoft.com/apps/machinelearning-ai/ml-dotnet/model-builder) in [Visual Studio](https://visualstudio.microsoft.com/) 2022.
+
+## Optional Dependencies
+
+### Notebook
+To run the [notebook examples](https://github.com/microsoft/FLAML/tree/main/notebook),
+install flaml with the [notebook] option:
+
+```bash
+pip install flaml[notebook]
+```
+
+### Extra learners
+* catboost
+```bash
+pip install flaml[catboost]
+```
+* vowpal wabbit
+```bash
+pip install flaml[vw]
+```
+* time series forecaster: prophet, statsmodels
+```bash
+pip install flaml[forecast]
+```
+
+### Distributed tuning
+* ray
+```bash
+pip install flaml[ray]
+```
+* nni
+```bash
+pip install flaml[nni]
+```
+* blendsearch
+```bash
+pip install flaml[blendsearch]
+```
+
+### Test and Benchmark
+* test
+```bash
+pip install flaml[test]
+```
+* benchmark
+```bash
+pip install flaml[benchmark]
+```
+
--- a/website/docs/Research.md
+++ b/website/docs/Research.md
@@ -0,0 +1,20 @@
+# Research in FLAML
+
+For technical details, please check our research publications.
+
+* [FLAML: A Fast and Lightweight AutoML Library](https://www.microsoft.com/en-us/research/publication/flaml-a-fast-and-lightweight-automl-library/). Chi Wang, Qingyun Wu, Markus Weimer, Erkang Zhu. MLSys 2021.
+
+```bibtex
+@inproceedings{wang2021flaml,
+    title={FLAML: A Fast and Lightweight AutoML Library},
+    author={Chi Wang and Qingyun Wu and Markus Weimer and Erkang Zhu},
+    year={2021},
+    booktitle={MLSys},
+}
+```
+
+* [Frugal Optimization for Cost-related Hyperparameters](https://arxiv.org/abs/2005.01571). Qingyun Wu, Chi Wang, Silu Huang. AAAI 2021.
+* [Economical Hyperparameter Optimization With Blended Search Strategy](https://www.microsoft.com/en-us/research/publication/economical-hyperparameter-optimization-with-blended-search-strategy/). Chi Wang, Qingyun Wu, Silu Huang, Amin Saied. ICLR 2021.
+* [ChaCha for Online AutoML](https://www.microsoft.com/en-us/research/publication/chacha-for-online-automl/). Qingyun Wu, Chi Wang, John Langford, Paul Mineiro and Marco Rossi. ICML 2021.
+
+Many researchers and engineers have contributed to the technology development. In alphabetical order: Vijay Aski, Sebastien Bubeck, Surajit Chaudhuri, Kevin Chen, Yi Wei Chen, Nadiia Chepurko, Ofer Dekel, Alex Deng, Anshuman Dutt, Nicolo Fusi, Jianfeng Gao, Johannes Gehrke, Niklas Gustafsson, Silu Huang, Moe Kayali, Dongwoo Kim, Christian Konig, John Langford, Menghao Li, Mingqin Li, Xueqing Liu, Zhe Liu, Naveen Gaur, Paul Mineiro, Vivek Narasayya, Jake Radzikowski, Marco Rossi, Amin Saied, Neil Tenenholtz, Olga Vrousgou, Chi Wang, Yue Wang, Markus Weimer, Qingyun Wu, Qiufeng Yin, Haozhe Zhang, Minjia Zhang, XiaoYun Zhang, Eric Zhu.
--- a/website/docs/Use-Cases/Task-Oriented-AutoML.md
+++ b/website/docs/Use-Cases/Task-Oriented-AutoML.md
@@ -0,0 +1,478 @@
+# Task Oriented AutoML
+
+## Overview
+
+`flaml.AutoML` is a class for task-oriented AutoML. It can be used as a scikit-learn style estimator with the standard `fit` and `predict` functions. The minimal inputs from users are the training data and the task type.
+
+* Training data:
+    - numpy array. When the input data are stored in numpy array, they are passed to `fit()` as `X_train` and `y_train`.
+    - pandas dataframe. When the input data are stored in pandas dataframe, they are passed to `fit()` either as `X_train` and `y_train`, or as `dataframe` and `label`.
+* Tasks (specified via `task`):
+    - 'classification': classification.
+    - 'regression': regression.
+    - 'ts_forecast': time series forecasting.
+    - 'rank': learning to rank.
+    - 'seq-classification': sequence classification.
+    - 'seq-regression': sequence regression.
+
+An optional input is `time_budget` for searching models and hyperparameters. When not specified, a default budget of 60 seconds will be used.
+
+A typical way to use `flaml.AutoML`:
+
+```python
+# Prepare training data
+# ...
+from flaml import AutoML
+automl = AutoML()
+automl.fit(X_train, y_train, task="regression", time_budget=60, **other_settings)
+# Save the model
+with open("automl.pkl", "wb") as f:
+    pickle.dump(automl, f, pickle.HIGHEST_PROTOCOL)
+
+# At prediction time
+with open("automl.pkl", "rb") as f:
+    automl = pickle.load(f)
+pred = automl.predict(X_test)
+```
+
+If users provide the minimal inputs only, `AutoML` uses the default settings for time budget, optimization metric, estimator list etc.
+
+## Customize AutoML.fit()
+
+### Optimization metric
+
+The optimization metric is specified via the `metric` argument. It can be either a string which refers to a built-in metric, or a user-defined function.
+
+* Built-in metric.
+    - 'accuracy': 1 - accuracy as the corresponding metric to minimize.
+    - 'log_loss': default metric for multiclass classification.
+    - 'r2': 1 - r2_score as the corresponding metric to minimize. Default metric for regression.
+    - 'rmse': root mean squared error.
+    - 'mse': mean squared error.
+    - 'mae': mean absolute error.
+    - 'mape': mean absolute percentage error.
+    - 'roc_auc': minimize 1 - roc_auc_score. Default metric for binary classification.
+    - 'roc_auc_ovr': minimize 1 - roc_auc_score with `multi_class="ovr"`.
+    - 'roc_auc_ovo': minimize 1 - roc_auc_score with `multi_class="ovo"`.
+    - 'f1': minimize 1 - f1_score.
+    - 'micro_f1': minimize 1 - f1_score with `average="micro"`.
+    - 'micro_f1': minimize 1 - f1_score with `average="micro"`.
+    - 'ap': minimize 1 - average_precision_score.
+    - 'ndcg': minimize 1 - ndcg_score.
+    - 'ndcg@k': minimize 1 - ndcg_score@k. k is an integer.
+* User-defined function.
+A customized metric function that requires the following (input) signature, and returns the input config’s value in terms of the metric you want to minimize, and a dictionary of auxiliary information at your choice:
+
+```python
+def custom_metric(
+    X_val, y_val, estimator, labels,
+    X_train, y_train, weight_val=None, weight_train=None,
+    config=None, groups_val=None, groups_train=None,
+):
+    return metric_to_minimize, metrics_to_log
+```
+
+For example,
+```python
+def custom_metric(
+    X_val, y_val, estimator, labels,
+    X_train, y_train, weight_val=None, weight_train=None,
+    **args,
+):
+    from sklearn.metrics import log_loss
+    import time
+
+    start = time.time()
+    y_pred = estimator.predict_proba(X_val)
+    pred_time = (time.time() - start) / len(X_val)
+    val_loss = log_loss(y_val, y_pred, labels=labels, sample_weight=weight_val)
+    y_pred = estimator.predict_proba(X_train)
+    train_loss = log_loss(y_train, y_pred, labels=labels, sample_weight=weight_train)
+    alpha = 0.5
+    return val_loss * (1 + alpha) - alpha * train_loss, {
+        "val_loss": val_loss,
+        "train_loss": train_loss,
+        "pred_time": pred_time,
+    }
+```
+It returns the validation loss penalized by the gap between validation and training loss as the metric to minimize, and three metrics to log: val_loss, train_loss and pred_time. The arguments `config`, `groups_val` and `groups_train` are not used in the function.
+
+### Estimator and search space
+
+The estimator list can contain one or more estimator names, each corresponding to a built-in estimator or a custom estimator. Each estimator has a search space for hyperparameter configurations. FLAML supports both classical machine learning models and deep neural networks.
+
+#### Estimator
+* Built-in estimator.
+    - 'lgbm': LGBMEstimator. Hyperparameters: n_estimators, num_leaves, min_child_samples, learning_rate, log_max_bin (logarithm of (max_bin + 1) with base 2), colsample_bytree, reg_alpha, reg_lambda.
+    - 'xgboost': XGBoostSkLearnEstimator. Hyperparameters: n_estimators, max_leaves, max_depth, min_child_weight, learning_rate, subsample, colsample_bylevel, colsample_bytree, reg_alpha, reg_lambda.
+    - 'rf': RandomForestEstimator. Hyperparameters: n_estimators, max_features, max_leaves, criterion (for classification only).
+    - 'extra_tree': ExtraTreesEstimator. Hyperparameters: n_estimators, max_features, max_leaves, criterion (for classification only).
+    - 'lrl1': LRL1Classifier (sklearn.LogisticRegression with L1 regularization). Hyperparameters: C.
+    - 'lrl2': LRL2Classifier (sklearn.LogisticRegression with L2 regularization). Hyperparameters: C.
+    - 'catboost': CatBoostEstimator. Hyperparameters: early_stopping_rounds, learning_rate, n_estimators.
+    - 'kneighbor': KNeighborsEstimator. Hyperparameters: n_neighbors.
+    - 'prophet': Prophet. Hyperparameters: changepoint_prior_scale, seasonality_prior_scale, holidays_prior_scale, seasonality_mode.
+    - 'arima': ARIMA. Hyperparameters: p, d, q.
+    - 'sarimax': SARIMAX. Hyperparameters: p, d, q, P, D, Q, s.
+    - 'transformer': Huggingface transformer models. Hyperparameters: learning_rate, num_train_epochs, per_device_train_batch_size, warmup_ratio, weight_decay, adam_epsilon, seed.
+* Custom estimator. Use custom estimator for:
+    - tuning an estimator that is not built-in;
+    - customizing search space for a built-in estimator.
+
+To tune a custom estimator that is not built-in, you need to:
+
+1. Build a custom estimator by inheritting `flaml.model.BaseEstimator` or a derived class.
+For example, if you have a estimator class with scikit-learn style `fit()` and `predict()` functions, you only need to set `self.estimator_class` to be that class in your constructor.
+
+```python
+from flaml.model import SKLearnEstimator
+# SKLearnEstimator is derived from BaseEstimator
+import rgf
+
+class MyRegularizedGreedyForest(SKLearnEstimator):
+    def __init__(self, task="binary", **config):
+        super().__init__(task, **config)
+
+        if task in CLASSIFICATION:
+            from rgf.sklearn import RGFClassifier
+
+            self.estimator_class = RGFClassifier
+        else:
+            from rgf.sklearn import RGFRegressor
+
+            self.estimator_class = RGFRegressor
+
+    @classmethod
+    def search_space(cls, data_size, task):
+        space = {
+            "max_leaf": {
+                "domain": tune.lograndint(lower=4, upper=data_size),
+                "low_cost_init_value": 4,
+            },
+            "n_iter": {
+                "domain": tune.lograndint(lower=1, upper=data_size),
+                "low_cost_init_value": 1,
+            },
+            "learning_rate": {"domain": tune.loguniform(lower=0.01, upper=20.0)},
+            "min_samples_leaf": {
+                "domain": tune.lograndint(lower=1, upper=20),
+                "init_value": 20,
+            },
+        }
+        return space
+```
+
+In the constructor, we set `self.estimator_class` as `RGFClassifier` or `RGFRegressor` according to the task type. If the estimator you want to tune does not have a scikit-learn style `fit()` and `predict()` API, you can override the `fit()` and `predict()` function of `flaml.model.BaseEstimator`, like [XGBoostEstimator](https://github.com/microsoft/FLAML/blob/59083fbdcb95c15819a0063a355969203022271c/flaml/model.py#L511).
+
+2. Give the custom estimator a name and add it in AutoML. E.g.,
+
+```python
+from flaml import AutoML
+automl = AutoML()
+automl.add_learner("rgf", MyRegularizedGreedyForest)
+```
+
+This registers the `MyRegularizedGreedyForest` class in AutoML, with the name "rgf".
+
+3. Tune the newly added custom estimator in either of the following two ways depending on your needs:
+- tune rgf alone: `automl.fit(..., estimator_list=["rgf"])`; or
+- mix it with other built-in learners: `automl.fit(..., estimator_list=["rgf", "lgbm", "xgboost", "rf"])`.
+
+#### Search space
+
+Each estimator class, built-in or not, must have a `search_space` function. In the `search_space` function, we return a dictionary about the hyperparameters, the keys of which are the names of the hyperparameters to tune, and each value is a set of detailed search configurations about the corresponding hyperparameters represented in a dictionary. A search configuration dictionary includes the following fields:
+* `domain`, which specifies the possible values of the hyperparameter and their distribution. Please refer to [more details about the search space domain](Tune-User-Defined-Function#more-details-about-the-search-space-domain).
+* `init_value` (optional), which specifies the initial value of the hyperparameter.
+* `low_cost_init_value`(optional), which specifies the value of the hyperparameter that is associated with low computation cost. See [cost related hyperparameters](Tune-User-Defined-Function#cost-related-hyperparameters) or [FAQ](../FAQ#about-low_cost_partial_config-in-tune) for more details.
+
+In the example above, we tune four hyperparameters, three integers and one float. They all follow a log-uniform distribution. "max_leaf" and "n_iter" have "low_cost_init_value" specified as their values heavily influence the training cost.
+
+
+
+
+To customize the search space for a built-in estimator, use a similar approach to define a class that inherits the existing estimator. For example,
+
+```python
+from flaml.model import XGBoostEstimator
+
+def logregobj(preds, dtrain):
+    labels = dtrain.get_label()
+    preds = 1.0 / (1.0 + np.exp(-preds))  # transform raw leaf weight
+    grad = preds - labels
+    hess = preds * (1.0 - preds)
+    return grad, hess
+
+class MyXGB1(XGBoostEstimator):
+    """XGBoostEstimator with logregobj as the objective function"""
+
+    def __init__(self, **config):
+        super().__init__(objective=logregobj, **config)
+```
+
+We override the constructor and set the training objective as a custom function `logregobj`. The hyperparameters and their search range do not change. For another example,
+
+```python
+class XGBoost2D(XGBoostSklearnEstimator):
+    @classmethod
+    def search_space(cls, data_size, task):
+        upper = min(32768, int(data_size))
+        return {
+            "n_estimators": {
+                "domain": tune.lograndint(lower=4, upper=upper),
+                "low_cost_init_value": 4,
+            },
+            "max_leaves": {
+                "domain": tune.lograndint(lower=4, upper=upper),
+                "low_cost_init_value": 4,
+            },
+        }
+```
+
+We override the `search_space` function to tune two hyperparameters only, "n_estimators" and "max_leaves". They are both random integers in the log space, ranging from 4 to data-dependent upper bound. The lower bound for each corresponds to low training cost, hence the "low_cost_init_value" for each is set to 4.
+
+### Constraint
+
+There are several types of constraints you can impose.
+
+1. End-to-end constraints on the AutoML process.
+
+- `time_budget`: constrains the wall-clock time (seconds) used by the AutoML process. We provide some tips on [how to set time budget](#how-to-set-time-budget).
+
+- `max_iter`: constrains the maximal number of models to try in the AutoML process.
+
+2. Constraints on the (hyperparameters of) the estimators.
+
+Some constraints on the estimator can be implemented via the custom learner. For example,
+
+```python
+class MonotonicXGBoostEstimator(XGBoostSklearnEstimator):
+    @classmethod
+    def search_space(**args):
+        return super().search_space(**args).update({"monotone_constraints": "(1, -1)"})
+```
+
+It adds a monotonicity constraint to XGBoost. This approach can be used to set any constraint that is a parameter in the underlying estimator's constructor.
+
+3. Constraints on the models tried in AutoML.
+
+Users can set constraints such as the maximal number of models to try, limit on training time and prediction time per model.
+* `train_time_limit`: training time in seconds.
+* `pred_time_limit`: prediction time per instance in seconds.
+
+For example,
+```python
+automl.fit(X_train, y_train, max_iter=100, train_time_limit=1, pred_time_limit=1e-3)
+```
+
+### Ensemble
+
+To use stacked ensemble after the model search, set `ensemble=True` or a dict. When `ensemble=True`, the final estimator and `passthrough` in the stacker will be automatically chosen. You can specify customized final estimator or passthrough option:
+* "final_estimator": an instance of the final estimator in the stacker.
+* "passthrough": True (default) or False, whether to pass the original features to the stacker.
+
+For example,
+```python
+automl.fit(
+    X_train, y_train, task="classification",
+    "ensemble": {
+        "final_estimator": LogisticRegression(),
+        "passthrough": False,
+    },
+)
+```
+
+### Resampling strategy
+
+By default, flaml decides the resampling automatically according to the data size and the time budget. If you would like to enforce a certain resampling strategy, you can set `eval_method` to be "holdout" or "cv" for holdout or cross-validation.
+
+For holdout, you can also set:
+* `split_ratio`: the fraction for validation data, 0.1 by default.
+* `X_val`, `y_val`: a separate validation dataset. When they are passed, the validation metrics will be computed against this given validation dataset. If they are not passed, then a validation dataset will be split from the training data and held out from training during the model search. After the model search, flaml will retrain the model with best configuration on the full training data.
+You can set`retrain_full` to be `False` to skip the final retraining or "budget" to ask flaml to do its best to retrain within the time budget.
+
+For cross validation, you can also set `n_splits` of the number of folds. By default it is 5.
+
+#### Data split method
+
+By default, flaml uses the following method to split the data:
+* stratified split for classification;
+* uniform split for regression;
+* time-based split for time series forecasting;
+* group-based split for learning to rank.
+
+The data split method for classification can be changed into uniform split by setting `split_type="uniform"`. For both classification and regression, time-based split can be enforced if the data are sorted by timestamps, by setting `split_type="time"`.
+
+### Parallel tuning
+
+When you have parallel resources, you can either spend them in training and keep the model search sequential, or perform parallel search. Following scikit-learn, the parameter `n_jobs` specifies how many CPU cores to use for each training job. The number of parallel trials is specified via the parameter `n_concurrent_trials`. By default, `n_jobs=-1, n_concurrent_trials=1`. That is, all the CPU cores (in a single compute node) are used for training a single model and the search is sequential. When you have more resources than what each single training job needs, you can consider increasing `n_concurrent_trials`.
+
+To do parallel tuning, install the `ray` and `blendsearch` options:
+```bash
+pip install flaml[ray,blendsearch]
+```
+
+`ray` is used to manage the resources. For example,
+```python
+ray.init(n_cpus=16)
+```
+allocates 16 CPU cores. Then, when you run:
+```python
+automl.fit(X_train, y_train, n_jobs=4, n_concurrent_trials=4)
+```
+flaml will perform 4 trials in parallel, each consuming 4 CPU cores. The parallel tuning uses the [BlendSearch](Tune-User-Defined-Function##blendsearch-economical-hyperparameter-optimization-with-blended-search-strategy) algorithm.
+
+
+### Warm start
+
+We can warm start the AutoML by providing starting points of hyperparameter configurstions for each estimator. For example, if you have run AutoML for one hour, after checking the results, you would like to run it for another two hours, then you can use the best configurations found for each estimator as the starting points for the new run.
+
+```python
+automl1 = AutoML()
+automl1.fit(X_train, y_train, time_budget=3600)
+automl2 = AutoML()
+automl2.fit(X_train, y_train, time_budget=7200, starting_points=automl1.best_config_per_estimator)
+```
+
+`starting_points` is a dictionary. The keys are the estimator names. If you do not need to specify starting points for an estimator, exclude its name from the dictionary. The value for each key can be either a dictionary of a list of dictionaries, corresponding to one hyperparameter configuration, or multiple hyperparameter configurations, respectively.
+
+### Log the trials
+
+The trials are logged in a file if a `log_file_name` is passed.
+Each trial is logged as a json record in one line. The best trial's id is logged in the last line. For example,
+```
+{"record_id": 0, "iter_per_learner": 1, "logged_metric": null, "trial_time": 0.12717914581298828, "wall_clock_time": 0.1728971004486084, "validation_loss": 0.07333333333333332, "config": {"n_estimators": 4, "num_leaves": 4, "min_child_samples": 20, "learning_rate": 0.09999999999999995, "log_max_bin": 8, "colsample_bytree": 1.0, "reg_alpha": 0.0009765625, "reg_lambda": 1.0}, "learner": "lgbm", "sample_size": 150}
+{"record_id": 1, "iter_per_learner": 3, "logged_metric": null, "trial_time": 0.07027268409729004, "wall_clock_time": 0.3756711483001709, "validation_loss": 0.05333333333333332, "config": {"n_estimators": 4, "num_leaves": 4, "min_child_samples": 12, "learning_rate": 0.2677050123105203, "log_max_bin": 7, "colsample_bytree": 1.0, "reg_alpha": 0.001348364934537134, "reg_lambda": 1.4442580148221913}, "learner": "lgbm", "sample_size": 150}
+{"curr_best_record_id": 1}
+```
+
+1. `iter_per_learner` means how many models have been tried for each learner. The reason you see records like `iter_per_learner=3` for `record_id=1` is that flaml only logs better configs than the previous iters by default, i.e., `log_type='better'`. If you use `log_type='all'` instead, all the trials will be logged.
+1. `trial_time` means the time taken to train and evaluate one config in that trial. `total_search_time` is the total time spent from the beginning of `fit()`.
+1. flaml will adjust the `n_estimators` for lightgbm etc. according to the remaining budget and check the time budget constraint and stop in several places. Most of the time that makes `fit()` stops before the given budget. Occasionally it may run over the time budget slightly. But the log file always contains the best config info and you can recover the best model until any time point using `retrain_from_log()`.
+
+We can also use mlflow for logging:
+```python
+mlflow.set_experiment("flaml")
+with mlflow.start_run():
+    automl.fit(X_train=X_train, y_train=y_train, **settings)
+```
+
+### Extra fit arguments
+
+Extra fit arguments that are needed by the estimators can be passed to `AutoML.fit()`. For example, if there is a weight associated with each training example, they can be passed via `sample_weight`. For another example, `period` can be passed for time series forecaster. For any extra keywork argument passed to `AutoML.fit()` which has not been explicitly listed in the function signature, it will be passed to the underlying estimators' `fit()` as is.
+
+## Retrieve and analyze the outcomes of AutoML.fit()
+
+### Get best model
+
+The best model can be obtained by the `model` property of an `AutoML` instance. For example,
+
+```python
+automl.fit(X_train, y_train, task="regression")
+print(automl.mdoel)
+# <flaml.model.LGBMEstimator object at 0x7f9b502c4550>
+```
+
+`flaml.model.LGBMEstimator` is a wrapper class for LightGBM models. To access the underlying model, use the `estimator` property of the `flaml.model.LGBMEstimator` instance.
+
+```python
+print(automl.model.estimator)
+'''
+LGBMRegressor(colsample_bytree=0.7610534336273627,
+              learning_rate=0.41929025492645006, max_bin=255,
+              min_child_samples=4, n_estimators=45, num_leaves=4,
+              reg_alpha=0.0009765625, reg_lambda=0.009280655005879943,
+              verbose=-1)
+'''
+```
+
+Just like a normal LightGBM model, we can inspect it. For example, we can plot the feature importance:
+```python
+import matplotlib.pyplot as plt
+plt.barh(automl.model.estimator.feature_name_, automl.model.estimator.feature_importances_)
+```
+![png](images/feature_importance.png)
+
+### Get best configuration
+
+We can find the best estimator's name and best configuration by:
+
+```python
+print(automl.best_estimator)
+# lgbm
+print(automl.best_config)
+# {'n_estimators': 148, 'num_leaves': 18, 'min_child_samples': 3, 'learning_rate': 0.17402065726724145, 'log_max_bin': 8, 'colsample_bytree': 0.6649148062238498, 'reg_alpha': 0.0009765625, 'reg_lambda': 0.0067613624509965}
+```
+
+We can also find the best configuration per estimator.
+
+```python
+print(automl.best_config_per_estimator)
+# {'lgbm': {'n_estimators': 148, 'num_leaves': 18, 'min_child_samples': 3, 'learning_rate': 0.17402065726724145, 'log_max_bin': 8, 'colsample_bytree': 0.6649148062238498, 'reg_alpha': 0.0009765625, 'reg_lambda': 0.0067613624509965}, 'rf': None, 'catboost': None, 'xgboost': {'n_estimators': 4, 'max_leaves': 4, 'min_child_weight': 1.8630223791106992, 'learning_rate': 1.0, 'subsample': 0.8513627344387318, 'colsample_bylevel': 1.0, 'colsample_bytree': 0.946138073111236, 'reg_alpha': 0.0018311776973217073, 'reg_lambda': 0.27901659190538414}, 'extra_tree': {'n_estimators': 4, 'max_features': 1.0, 'max_leaves': 4}}
+```
+
+The `None` value corresponds to the estimators which have not been tried.
+
+Other useful information:
+```python
+print(automl.best_config_train_time)
+# 0.24841618537902832
+print(automl.best_iteration)
+# 10
+print(automl.best_loss)
+# 0.15448622217577546
+print(automl.time_to_find_best_model)
+# 0.4167296886444092
+print(automl.config_history)
+# {0: ('lgbm', {'n_estimators': 4, 'num_leaves': 4, 'min_child_samples': 20, 'learning_rate': 0.09999999999999995, 'log_max_bin': 8, 'colsample_bytree': 1.0, 'reg_alpha': 0.0009765625, 'reg_lambda': 1.0}, 1.2300517559051514)}
+# Meaning: at iteration 0, the config tried is {'n_estimators': 4, 'num_leaves': 4, 'min_child_samples': 20, 'learning_rate': 0.09999999999999995, 'log_max_bin': 8, 'colsample_bytree': 1.0, 'reg_alpha': 0.0009765625, 'reg_lambda': 1.0} for lgbm, and the wallclock time is 1.23s when this trial is finished.
+```
+
+### Plot learning curve
+
+To plot how the loss is improved over time during the model search, first load the search history from the log file:
+
+```python
+from flaml.data import get_output_from_log
+
+time_history, best_valid_loss_history, valid_loss_history, config_history, metric_history = \
+    get_output_from_log(filename=settings["log_file_name"], time_budget=120)
+```
+
+Then, assuming the optimization metric is "accuracy", we can plot the accuracy versus wallclock time:
+
+```python
+import matplotlib.pyplot as plt
+import numpy as np
+
+plt.title("Learning Curve")
+plt.xlabel("Wall Clock Time (s)")
+plt.ylabel("Validation Accuracy")
+plt.step(time_history, 1 - np.array(best_valid_loss_history), where="post")
+plt.show()
+```
+
+![png](images/curve.png)
+
+The curve suggests that increasing the time budget may further improve the accuracy.
+
+### How to set time budget
+
+* If you have an exact constraint for the total search time, set it as the time budget.
+* If you have flexible time constraints, for example, your desirable time budget is t1=60s, and the longest time budget you can tolerate is t2=3600s, you can try the following two ways:
+1. set t1 as the time budget, and check the message in the console log in the end. If the budget is too small, you will see a warning like
+> WARNING - Time taken to find the best model is 91% of the provided time budget and not all estimators' hyperparameter search converged. Consider increasing the time budget.
+2. set t2 as the time budget, and also set `early_stop=True`. If the early stopping is triggered, you will see a warning like
+    > WARNING - All estimator hyperparameters local search has converged at least once, and the total search time exceeds 10 times the time taken to find the best model.
+
+    > WARNING - Stopping search as early_stop is set to True.
+
+### How much time is needed to find the best model
+
+If you want to get a sense of how much time is needed to find the best model, you can use `max_iter=2` to perform two trials first. The message will be like:
+> INFO - iteration 0, current learner lgbm
+
+> INFO - Estimated sufficient time budget=145194s. Estimated necessary time budget=2118s.
+
+> INFO -  at 2.6s,  estimator lgbm's best error=0.4459,     best estimator lgbm's best error=0.4459
+
+You will see that the time to finish the first and cheapest trial is 2.6 seconds. The estimated necessary time budget is 2118 seconds, and the estimated sufficient time budget is 145194 seconds. Note that this is only an estimated range to help you decide your budget.
--- a/website/docs/Use-Cases/Tune-User-Defined-Function.md
+++ b/website/docs/Use-Cases/Tune-User-Defined-Function.md
@@ -0,0 +1,550 @@
+# Tune User Defined Function
+
+`flaml.tune` is a module for economical hyperparameter tuning. It is used internally by `flaml.AutoML`. It can also be used to directly tune a user-defined function (UDF), which is not limited to machine learning model training. You can use `flaml.tune` instead of `flaml.AutoML` if one of the following is true:
+
+1. Your machine learning task is not one of the built-in tasks from `flaml.AutoML`.
+1. Your input cannot be represented as X_train + y_train or dataframe + label.
+1. You want to tune a function that may not even be a machine learning procedure.
+
+## Basic Tuning Procedure
+
+There are three essential steps (assuming the knowledge of the set of hyperparameters to tune) to use `flaml.tune` to finish a basic tuning task:
+1. Specify the [tuning objective](#tuning-objective) with respect to the hyperparameters.
+1. Specify a [search space](#search-space) of the hyperparameters.
+1. Specify [tuning constraints](#tuning-constraints), including constraints on the resource budget to do the tuning, constraints on the configurations, or/and constraints on a (or multiple) particular metric(s).
+
+With these steps, you can [perform a basic tuning task](#put-together) accordingly.
+
+### Tuning objective
+
+Related arguments:
+- `evaluation_function`: A user-defined evaluation function.
+- `metric`: A string of the metric name to optimize for.
+- `mode`:  A string in ['min', 'max'] to specify the objective as minimization or maximization.
+
+The first step is to specify your tuning objective.
+To do it, you should first specify your evaluation procedure (e.g., perform a machine learning model training and validation) with respect to the hyperparameters in a user-defined function `evaluation_function`.
+The function requires a hyperparameter configuration as input, and can simply return a metric value in a scalar or return a dictionary of metric name and metric value pairs.
+
+In the following code, we define an evaluation function with respect to two hyperparameters named `x` and `y` according to $obj := (x-85000)^2 - x/y$. Note that we use this toy example here for more accessible demonstration purposes. In real use cases, the evaluation function usually cannot be written in this closed form, but instead involves a black-box and expensive evaluation procedure.  Please check out [Tune HuggingFace](../Examples/Tune-HuggingFace), [Tune PyTorch](../Examples/Tune-PyTorch) and [Tune LightGBM](../Getting-Started#tune-user-defined-function) for real examples of tuning tasks.
+
+```python
+import time
+
+def evaluate_config(config: dict):
+    """evaluate a hyperparameter configuration"""
+    score = (config["x"] - 85000) ** 2 - config["x"] / config["y"]
+    # usually the evaluation takes an non-neglible cost
+    # and the cost could be related to certain hyperparameters
+    # here we simulate this cost by calling the time.sleep() function
+    # here we assume the cost is proportional to x
+    faked_evaluation_cost = config["x"] / 100000
+    time.sleep(faked_evaluation_cost)
+    # we can return a single float as a score on the input config:
+    # return score
+    # or, we can return a dictionary that maps metric name to metric value:
+    return {"score": score, "evaluation_cost": faked_evaluation_cost, "constraint_metric": x * y}
+```
+
+When the evaluation function returns a dictionary of metrics, you need to specify the name of the metric to optimize via the argument `metric` (this can be skipped when the function is just returning a scalar). In addition, you need to specify a mode of your optimization/tuning task (maximization or minimization) via the argument `mode` by choosing from "min" or "max".
+
+For example,
+
+```python
+flaml.tune.run(evaluation_function=evaluate_config, metric="score", mode="min", ...)
+```
+
+### Search space
+
+Related arguments:
+- `config`: A dictionary to specify the search space.
+- `low_cost_partial_config` (optional): A dictionary from a subset of controlled dimensions to the initial low-cost values.
+- `cat_hp_cost` (optional): A dictionary from a subset of categorical dimensions to the relative cost of each choice.
+
+The second step is to specify a search space of the hyperparameters through the argument `config`. In the search space, you need to specify valid values for your hyperparameters and can specify how these values are sampled (e.g., from a uniform distribution or a log-uniform distribution).
+
+In the following code example, we include a search space for the two hyperparameters `x` and `y` as introduced above. The valid values for both are integers in the range of [1, 100000]. The values for `x` are sampled uniformly in the specified range (using `tune.randint(lower=1, upper=100000)`), and the values for `y` are sampled uniformly in logarithmic space of the specified range (using `tune.lograndit(lower=1, upper=100000)`).
+
+
+```python
+from flaml import tune
+
+# construct a search space for the hyperparameters x and y.
+config_search_space = {
+    "x": tune.lograndint(lower=1, upper=100000),
+    "y": tune.randint(lower=1, upper=100000)
+}  
+
+# provide the search space to flaml.tune
+flaml.tune.run(..., config=config_search_space, ...)
+```
+
+#### More details about the search space domain
+
+The corresponding value of a particular hyperparameter in the search space dictionary is called a domain, for example, `tune.randint(lower=1, upper=100000)` is the domain for the hyperparameter `y`. The domain specifies a type and valid range to sample parameters from. Supported types include float, integer, and categorical. You can also specify how to sample values from certain distributions in linear scale or log scale.
+It is a common practice to sample in log scale if the valid value range is large and the evaluation function changes more regularly with respect to the log domain.
+See the example below for the commonly used types of domains.
+
+```python
+config = {
+    # Sample a float uniformly between -5.0 and -1.0
+    "uniform": tune.uniform(-5, -1),
+
+    # Sample a float uniformly between 3.2 and 5.4,
+    # rounding to increments of 0.2
+    "quniform": tune.quniform(3.2, 5.4, 0.2),
+
+    # Sample a float uniformly between 0.0001 and 0.01, while
+    # sampling in log space
+    "loguniform": tune.loguniform(1e-4, 1e-2),
+
+    # Sample a float uniformly between 0.0001 and 0.1, while
+    # sampling in log space and rounding to increments of 0.00005
+    "qloguniform": tune.qloguniform(1e-4, 1e-1, 5e-5),
+
+    # Sample a random float from a normal distribution with
+    # mean=10 and sd=2
+    "randn": tune.randn(10, 2),
+
+    # Sample a random float from a normal distribution with
+    # mean=10 and sd=2, rounding to increments of 0.2
+    "qrandn": tune.qrandn(10, 2, 0.2),
+
+    # Sample a integer uniformly between -9 (inclusive) and 15 (exclusive)
+    "randint": tune.randint(-9, 15),
+
+    # Sample a random uniformly between -21 (inclusive) and 12 (inclusive (!))
+    # rounding to increments of 3 (includes 12)
+    "qrandint": tune.qrandint(-21, 12, 3),
+
+    # Sample a integer uniformly between 1 (inclusive) and 10 (exclusive),
+    # while sampling in log space
+    "lograndint": tune.lograndint(1, 10),
+
+    # Sample a integer uniformly between 1 (inclusive) and 10 (inclusive (!)),
+    # while sampling in log space and rounding to increments of 2
+    "qlograndint": tune.qlograndint(1, 10, 2),
+
+    # Sample an option uniformly from the specified choices
+    "choice": tune.choice(["a", "b", "c"]),
+}
+```
+<!-- Please refer to [ray.tune](https://docs.ray.io/en/latest/tune/api_docs/search_space.html#overview) for a more comprehensive introduction about possible choices of the domain. -->
+
+#### Cost-related hyperparameters
+
+Cost-related hyperparameters are a subset of the hyperparameters which directly affect the computation cost incurred in the evaluation of any hyperparameter configuration. For example, the number of estimators (`n_estimators`) and the maximum number of leaves (`max_leaves`) are known to affect the training cost of tree-based learners. So they are cost-related hyperparameters for tree-based learners.
+
+When cost-related hyperparameters exist, the evaluation cost in the search space is heterogeneous.
+In this case, designing a search space with proper ranges of the hyperparameter values is highly non-trivial. Classical tuning algorithms such as Bayesian optimization and random search are typically sensitive to such ranges.  It may take them a very high cost to find a good choice if the ranges are too large. And if the ranges are too small, the optimal choice(s) may not be included and thus not possible to be found. With our method, you can use a search space with larger ranges in the case of heterogeneous cost.
+
+Our search algorithms are designed to finish the tuning process at a low total cost when the evaluation cost in the search space is heterogeneous.
+So in such scenarios, if you are aware of low-cost configurations for the cost-related hyperparameters, you are encouraged to set them as the `low_cost_partial_config`, which is a dictionary of a subset of the hyperparameter coordinates whose value corresponds to a configuration with known low cost.  Using the example of the tree-based methods again, since we know that small `n_estimators` and `max_leaves` generally correspond to simpler models and thus lower cost, we set `{'n_estimators': 4, 'max_leaves': 4}` as the `low_cost_partial_config` by default (note that 4 is the lower bound of search space for these two hyperparameters), e.g., in LGBM. Please find more details on how the algorithm works [here](#cfo-frugal-optimization-for-cost-related-hyperparameters).
+
+
+In addition, if you are aware of the cost relationship between different categorical hyperparameter choices, you are encouraged to provide this information through `cat_hp_cost`. It also helps the search algorithm to reduce the total cost.
+
+### Tuning constraints
+
+Related arguments:
+- `time_budget_s`: The time budget in seconds.
+- `num_samples`: An integer of the number of configs to try.
+- `config_constraints` (optional): A list of config constraints to be satisfied.
+- `metric_constraints` (optional): A list of metric constraints to be satisfied. e.g., `['precision', '>=', 0.9]`.
+
+The third step is to specify constraints of the tuning task. One notable property of `flaml.tune` is that it is able to finish the tuning process (obtaining good results) within a required resource constraint. A user can either provide the resource constraint in terms of wall-clock time (in seconds) through the argument `time_budget_s`, or in terms of the number of trials through the argument `num_samples`.  The following example shows three use cases:
+
+```python
+# Set a resource constraint of 60 seconds wall-clock time for the tuning.
+flaml.tune.run(..., time_budget_s=60, ...)
+
+# Set a resource constraint of 100 trials for the tuning.
+flaml.tune.run(..., num_samples=100, ...)
+
+# Use at most 60 seconds and at most 100 trials for the tuning.
+flaml.tune.run(..., time_budget_s=60, num_samples=100, ...)
+```
+
+
+Optionally, you can provide a list of config constraints to be satisfied through the argument `config_constraints` and provide a list of metric constraints to be satisfied through the argument `metric_constraints`. We provide more details about related use cases in the [Advanced Tuning Options](#more-constraints-on-the-tuning) section.
+
+
+### Put together
+After the aforementioned key steps, one is ready to perform a tuning task by calling `flaml.tune.run()`. Below is a quick sequential tuning example using the pre-defined search space `config_search_space` and a minimization (`mode='min'`) objective for the `score` metric evaluated in `evaluate_config`, using the default serach algorithm in flaml. The time budget is 10 seconds (`time_budget_s=10`).
+```python
+# require: pip install flaml[blendsearch]
+analysis = tune.run(
+    evaluate_config,  # the function to evaluate a config
+    config=config_search_space,  # the search space defined
+    metric="score",
+    mode="min",  # the optimization mode, "min" or "max"
+    num_samples=-1,  # the maximal number of configs to try, -1 means infinite
+    time_budget_s=10,  # the time budget in seconds
+)
+```
+
+
+### Result analysis
+
+Once the tuning process finishes, it returns an [ExperimentAnalysis](../reference/tune/analysis) object, which provides methods to analyze the tuning.
+
+In the following code example, we retrieve the best configuration found during the tuning, and retrieve the best trial's result from the returned `analysis`.
+
+```python
+analysis = tune.run(
+    evaluate_config,  # the function to evaluate a config
+    config=config_search_space,  # the search space defined
+    metric="score",
+    mode="min",  # the optimization mode, "min" or "max"
+    num_samples=-1,  # the maximal number of configs to try, -1 means infinite
+    time_budget_s=10,  # the time budget in seconds
+)
+print(analysis.best_config)  # the best config
+print(analysis.best_trial.last_result)  # the best trial's result
+```
+
+## Advanced Tuning Options
+
+There are several advanced tuning options worth mentioning.
+
+### More constraints on the tuning
+
+A user can specify constraints on the configurations to be satisfied via the argument `config_constraints`. The `config_constraints` receives a list of such constraints to be satisfied. Specifically, each constraint is a tuple that consists of (1) a function that takes a configuration as input and returns a numerical value; (2) an operation chosen from "<="  or ">"; (3) a numerical threshold.
+
+In the following code example, we constrain the output of `area`, which takes a configuration as input and outputs a numerical value, to be no larger than 1000.
+
+```python
+def area(config):
+    return config["width"] * config["height"]
+
+flaml.tune.run(evaluation_function=evaluate_config, mode="min",
+               config=config_search_space,
+               config_constraints=[(area, "<=", 1000)], ...)
+```
+
+ You can also specify a list of metric constraints to be satisfied via the argument `metric_constraints`. Each element in the `metric_constraints` list is a tuple that consists of (1) a string specifying the name of the metric (the metric name must be defined and returned in the user-defined `evaluation_function`); (2) an operation chosen from "<="  or ">"; (3) a numerical threshold.  
+
+ In the following code example, we constrain the metric `score` to be no larger than 0.4.
+
+```python
+flaml.tune.run(evaluation_function=evaluate_config, mode="min",
+               config=config_search_space,
+               metric_constraints=[("score", "<=", 0.4)],...)
+```
+
+### Paralle tuning
+
+Related arguments:
+
+- `use_ray`: A boolean of whether to use ray as the backend.
+- `resources_per_trial`: A dictionary of the hardware resources to allocate per trial, e.g., `{'cpu': 1}`. Only valid when using ray backend.
+
+
+You can perform parallel tuning by specifying `use_ray=True` (requiring flaml[ray] option installed). You can also limit the amount of resources allocated per trial by specifying `resources_per_trial`, e.g., `resources_per_trial={'cpu': 2}`.
+
+```python
+# require: pip install flaml[ray]
+analysis = tune.run(
+    evaluate_config,  # the function to evaluate a config
+    config=config_search_space,  # the search space defined
+    metric="score",
+    mode="min",  # the optimization mode, "min" or "max"
+    num_samples=-1,  # the maximal number of configs to try, -1 means infinite
+    time_budget_s=10,  # the time budget in seconds
+    use_ray=True,
+    resources_per_trial={"cpu": 2}  # limit resources allocated per trial
+)
+print(analysis.best_trial.last_result)  # the best trial's result
+print(analysis.best_config)  # the best config
+```
+
+**A headsup about computation overhead.** When parallel tuning is used, there will be a certain amount of computation overhead in each trial. In case each trial's original cost is much smaller than the overhead, parallel tuning can underperform sequential tuning. Sequential tuning is recommended when compute resource is limited, and each trial can consume all the resources.
+
+
+### Trial scheduling
+
+Related arguments:
+- `scheduler`: A scheduler for executing the trials.
+- `resource_attr`: A string to specify the resource dimension used by the scheduler.
+- `min_resource`: A float of the minimal resource to use for the resource_attr.
+- `max_resource`: A float of the maximal resource to use for the resource_attr.
+- `reduction_factor`: A float of the reduction factor used for incremental pruning.
+
+A scheduler can help manage the trials' execution. It can be used to perform multi-fiedlity evalution, or/and early stopping. You can use two different types of schedulers in `flaml.tune` via `scheduler`.
+
+#### 1. An authentic scheduler implemented in FLAML (`scheduler='flaml'`).
+
+This scheduler is authentic to the new search algorithms provided by FLAML. In a nutshell, it starts the search with the minimum resource. It switches between HPO with the current resource and increasing the resource for evaluation depending on which leads to faster improvement.
+
+If this scheduler is used, you need to
+- Specify a resource dimension. Conceptually a 'resource dimension' is a factor that affects the cost of the evaluation (e.g., sample size, the number of epochs). You need to specify the name of the resource dimension via `resource_attr`. For example, if `resource_attr="sample_size"`, then the config dict passed to the `evaluation_function` would contain a key "sample_size" and its value suggested by the search algorithm. That value should be used in the evaluation function to control the compute cost. The larger is the value, the more expensive the evaluation is.
+
+- Provide the lower and upper limit of the resource dimension via `min_resource` and `max_resource`, and optionally provide `reduction_factor`, which determines the magnitude of resource (multiplicative) increase when we decide to increase the resource.
+
+In the following code example, we consider the sample size as the resource dimension. It determines how much data is used to perform training as reflected in the `evaluation_function`. We set the `min_resource` and `max_resource` to 1000 and the size of the full training dataset, respectively.
+
+```python
+from flaml import tune
+from functools import partial
+from flaml.data import load_openml_task
+
+def obj_from_resource_attr(resource_attr, X_train, X_test, y_train, y_test, config):
+    from lightgbm import LGBMClassifier
+    from sklearn.metrics import accuracy_score
+
+    # in this example sample size is our resource dimension
+    resource = int(config[resource_attr])
+    sampled_X_train = X_train.iloc[:resource]
+    sampled_y_train = y_train[:resource]
+
+    # construct a LGBM model from the config
+    # note that you need to first remove the resource_attr field
+    # from the config as it is not part of the original search space
+    model_config = config.copy()
+    del model_config[resource_attr]
+    model = LGBMClassifier(**model_config)
+
+    model.fit(sampled_X_train, sampled_y_train)
+    y_test_predict = model.predict(X_test)
+    test_loss = 1.0 - accuracy_score(y_test, y_test_predict)
+    return {resource_attr: resource, "loss": test_loss}
+
+X_train, X_test, y_train, y_test = load_openml_task(task_id=7592, data_dir="test/")
+max_resource = len(y_train)
+resource_attr = "sample_size"
+min_resource = 1000
+analysis = tune.run(
+    partial(obj_from_resource_attr, resource_attr, X_train, X_test, y_train, y_test),
+    config = {
+        "n_estimators": tune.lograndint(lower=4, upper=32768),
+        "max_leaves": tune.lograndint(lower=4, upper=32768),
+        "learning_rate": tune.loguniform(lower=1 / 1024, upper=1.0),
+    },
+    metric="loss",
+    mode="min",
+    resource_attr=resource_attr,
+    scheduler="flaml",
+    max_resource=max_resource,
+    min_resource=min_resource,
+    reduction_factor=2,
+    time_budget_s=10,
+    num_samples=-1,
+)
+```
+
+You can find more details about this scheduler in [this paper](https://arxiv.org/pdf/1911.04706.pdf).
+
+
+
+#### 2. A scheduler of the  [`TrialScheduler`](https://docs.ray.io/en/latest/tune/api_docs/schedulers.html#tune-schedulers) class from `ray.tune`.
+
+There is a handful of schedulers of this type implemented in `ray.tune`, for example, [ASHA](https://docs.ray.io/en/latest/tune/api_docs/schedulers.html#asha-tune-schedulers-ashascheduler), [HyperBand](https://docs.ray.io/en/latest/tune/api_docs/schedulers.html#tune-original-hyperband), [BOHB](https://docs.ray.io/en/latest/tune/api_docs/schedulers.html#tune-scheduler-bohb), etc.
+
+To use this type of scheduler you can either (1) set `scheduler='asha'`, which will automatically create an  [ASHAScheduler](https://docs.ray.io/en/latest/tune/api_docs/schedulers.html#asha-tune-schedulers-ashascheduler) instance using the provided inputs (`resource_attr`, `min_resource`, `max_resource`, and `reduction_factor`); or (2) create an instance by yourself and provided it via `scheduler`, as shown in the following code example,
+
+```python
+#  require: pip install flaml[ray]
+from ray.tune.schedulers import HyperBandScheduler
+my_scheduler = HyperBandScheduler(time_attr="sample_size", max_t=max_resource, reduction_factor=2)
+tune.run(.., scheduler=my_scheduler, ...)
+```
+- Similar to the case where the `flaml` scheduler is used, you need to specify the resource dimension, use the resource dimension accordingly in your `evaluation_function`, and provide the necessary information needed for scheduling, such as `min_resource`, `max_resource` and `reduction_factor` (depending on the requirements of the specific scheduler).
+
+- Different from the case when the `flaml` scheduler is used, the amount of resources to use at each iteration is not suggested by the search algorithm through the `resource_attr` in a configuration. You need to specify the evaluation schedule explicitly by yourself in the `evaluation_function` and report intermediate results (using `tune.report()`) accordingly. In the following code example, we use the ASHA scheduler by setting `scheduler="asha"`, we specify `resource_attr`, `min_resource`, `min_resource` and `reduction_factor` the same way as in the previous example (when "flaml" is used as the scheduler). We perform the evaluation in a customized schedule.
+
+```python
+def obj_w_intermediate_report(resource_attr, X_train, X_test, y_train, y_test, min_resource, max_resource, config):
+    from lightgbm import LGBMClassifier
+    from sklearn.metrics import accuracy_score
+
+    # a customized schedule to perform the evaluation
+    eval_schedule = [res for res in range(min_resource, max_resource, 5000)] + [max_resource]
+    for resource in eval_schedule:
+        sampled_X_train = X_train.iloc[:resource]
+        sampled_y_train = y_train[:resource]
+
+        # construct a LGBM model from the config
+        model = LGBMClassifier(**config)
+
+        model.fit(sampled_X_train, sampled_y_train)
+        y_test_predict = model.predict(X_test)
+        test_loss = 1.0 - accuracy_score(y_test, y_test_predict)
+        # need to report the resource attribute used and the corresponding intermediate results
+        tune.report(sample_size=resource, loss=test_loss)
+
+resource_attr = "sample_size"
+min_resource = 1000
+max_resource = len(y_train)
+analysis = tune.run(
+    partial(obj_w_intermediate_report, resource_attr, X_train, X_test, y_train, y_test, min_resource, max_resource),
+    config={
+        "n_estimators": tune.lograndint(lower=4, upper=32768),
+        "learning_rate": tune.loguniform(lower=1 / 1024, upper=1.0),
+    },
+    metric="loss",
+    mode="min",
+    resource_attr=resource_attr,
+    scheduler="asha",
+    max_resource=max_resource,
+    min_resource=min_resource,
+    reduction_factor=2,
+    time_budget_s=10,
+    num_samples = -1,
+)
+```
+
+### Warm start
+
+Related arguments:
+
+- `points_to_evaluate`: A list of initial hyperparameter configurations to run first.
+- `evaluated_rewards`: If you have previously evaluated the parameters passed in as `points_to_evaluate` , you can avoid re-running those trials by passing in the reward attributes as a list so the optimizer can be told the results without needing to re-compute the trial. Must be the same length as `points_to_evaluate`.
+
+If you are aware of some good hyperparameter configurations, you are encouraged to provide them via `points_to_evaluate`. The search algorithm will try them first and use them to bootstrap the search.
+
+You can use previously evaluated configurations to warm-start your tuning.
+For example, the following code means that you know the reward for the two configs in
+points_to_evaluate are 3.99 and 1.99, respectively, and want to
+inform `tune.run()`.
+
+```python
+def simple_obj(config):
+    return config["a"] + config["b"]
+
+from flaml import tune
+config_search_space = {
+    "a": tune.uniform(lower=0, upper=0.99),
+    "b": tune.uniform(lower=0, upper=3)
+}
+
+points_to_evaluate = [
+    {"b": .99, "a": 3},
+    {"b": .99, "a": 2},
+]
+evaluated_rewards = [3.99, 2.99]
+
+analysis = tune.run(
+    simple_obj,
+    config=config_search_space,
+    mode="max",
+    points_to_evaluate=points_to_evaluate,
+    evaluated_rewards=evaluated_rewards,
+    time_budget_s=10,
+    num_samples=-1,
+)
+```
+
+### Reproducibility
+
+By default, there is randomness in our tuning process. If reproducibility is desired, you could
+manually set a random seed before calling `tune.run()`. For example, in the following code, we call `np.random.seed(100)` to set the random seed.
+With this random seed, running the following code multiple times will generate exactly the same search trajectory.
+
+```python
+import numpy as np
+np.random.seed(100)
+analysis = tune.run(
+    simple_obj,
+    config=config_search_space,
+    mode="max",
+    num_samples=10,
+)
+```
+
+
+## Hyperparameter Optimization Algorithm
+
+To tune the hyperparameters toward your objective, you will want to use a hyperparameter optimization algorithm which can help suggest hyperparameters with better performance (regarding your objective). `flaml` offers two HPO methods: CFO and BlendSearch. `flaml.tune` uses BlendSearch by default when the option [blendsearch] is installed.
+
+<!-- ![png](images/CFO.png) | ![png](images/BlendSearch.png)
+:---:|:---: -->
+
+### CFO: Frugal Optimization for Cost-related Hyperparameters
+
+CFO uses the randomized direct search method FLOW<sup>2</sup> with adaptive stepsize and random restart.
+It requires a low-cost initial point as input if such point exists.
+The search begins with the low-cost initial point and gradually move to
+high cost region if needed. The local search method has a provable convergence
+rate and bounded cost.
+
+About FLOW<sup>2</sup>: FLOW<sup>2</sup> is a simple yet effective randomized direct search method.
+It is an iterative optimization method that can optimize for black-box functions.
+FLOW<sup>2</sup> only requires pairwise comparisons between function values to perform iterative update. Comparing to existing HPO methods, FLOW<sup>2</sup> has the following appealing properties:
+
+1. It is applicable to general black-box functions with a good convergence rate in terms of loss.
+1. It provides theoretical guarantees on the total evaluation cost incurred.
+
+The GIFs attached below demonstrate an example search trajectory of FLOW<sup>2</sup> shown in the loss and evaluation cost (i.e., the training time ) space respectively. FLOW<sup>2</sup> is used in tuning the # of leaves and the # of trees for XGBoost. The two background heatmaps show the loss and cost distribution of all configurations. The black dots are the points evaluated in FLOW<sup>2</sup>. Black dots connected by lines are points that yield better loss performance when evaluated.
+
+![gif](images/heatmap_loss_cfo_12s.gif) | ![gif](images/heatmap_cost_cfo_12s.gif)
+:---:|:---:
+
+From the demonstration, we can see that (1) FLOW<sup>2</sup> can quickly move toward the low-loss region, showing good convergence property and (2) FLOW<sup>2</sup> tends to avoid exploring the high-cost region until necessary.
+
+Example:
+
+```python
+from flaml import CFO
+tune.run(...
+    search_alg=CFO(low_cost_partial_config=low_cost_partial_config),
+)
+```
+
+**Recommended scenario**: There exist cost-related hyperparameters and a low-cost
+initial point is known before optimization.
+If the search space is complex and CFO gets trapped into local optima, consider
+using BlendSearch.
+
+### BlendSearch: Economical Hyperparameter Optimization With Blended Search Strategy
+
+BlendSearch combines local search with global search. It leverages the frugality
+of CFO and the space exploration ability of global search methods such as
+Bayesian optimization. Like CFO, BlendSearch requires a low-cost initial point
+as input if such point exists, and starts the search from there. Different from
+CFO, BlendSearch will not wait for the local search to fully converge before
+trying new start points. The new start points are suggested by the global search
+method and filtered based on their distance to the existing points in the
+cost-related dimensions. BlendSearch still gradually increases the trial cost.
+It prioritizes among the global search thread and multiple local search threads
+based on optimism in face of uncertainty.
+
+Example:
+
+```python
+# require: pip install flaml[blendsearch]
+from flaml import BlendSearch
+tune.run(...
+    search_alg=BlendSearch(low_cost_partial_config=low_cost_partial_config),
+)
+```
+
+**Recommended scenario**: Cost-related hyperparameters exist, a low-cost
+initial point is known, and the search space is complex such that local search
+is prone to be stuck at local optima.
+
+**Suggestion about using larger search space in BlendSearch**.
+In hyperparameter optimization, a larger search space is desirable because it is more likely to include the optimal configuration (or one of the optimal configurations) in hindsight. However the performance (especially anytime performance) of most existing HPO methods is undesirable if the cost of the configurations in the search space has a large variation. Thus hand-crafted small search spaces (with relatively homogeneous cost) are often used in practice for these methods, which is subject to idiosyncrasy. BlendSearch combines the benefits of local search and global search, which enables a smart (economical) way of deciding where to explore in the search space even though it is larger than necessary. This allows users to specify a larger search space in BlendSearch, which is often easier and a better practice than narrowing down the search space by hand.
+
+For more technical details, please check our papers.
+
+* [Frugal Optimization for Cost-related Hyperparameters](https://arxiv.org/abs/2005.01571). Qingyun Wu, Chi Wang, Silu Huang. AAAI 2021.
+
+```bibtex
+@inproceedings{wu2021cfo,
+    title={Frugal Optimization for Cost-related Hyperparameters},
+    author={Qingyun Wu and Chi Wang and Silu Huang},
+    year={2021},
+    booktitle={AAAI'21},
+}
+```
+
+* [Economical Hyperparameter Optimization With Blended Search Strategy](https://www.microsoft.com/en-us/research/publication/economical-hyperparameter-optimization-with-blended-search-strategy/). Chi Wang, Qingyun Wu, Silu Huang, Amin Saied. ICLR 2021.
+
+```bibtex
+@inproceedings{wang2021blendsearch,
+    title={Economical Hyperparameter Optimization With Blended Search Strategy},
+    author={Chi Wang and Qingyun Wu and Silu Huang and Amin Saied},
+    year={2021},
+    booktitle={ICLR'21},
+}
+```
--- a/website/docs/Use-Cases/images/BlendSearch.png
+++ b/website/docs/Use-Cases/images/BlendSearch.png
--- a/website/docs/Use-Cases/images/CFO.png
+++ b/website/docs/Use-Cases/images/CFO.png
--- a/website/docs/Use-Cases/images/curve.png
+++ b/website/docs/Use-Cases/images/curve.png
--- a/website/docs/Use-Cases/images/feature_importance.png
+++ b/website/docs/Use-Cases/images/feature_importance.png
--- a/website/docs/Use-Cases/images/heatmap_cost_cfo_12s.gif
+++ b/website/docs/Use-Cases/images/heatmap_cost_cfo_12s.gif
--- a/website/docs/Use-Cases/images/heatmap_loss_cfo_12s.gif
+++ b/website/docs/Use-Cases/images/heatmap_loss_cfo_12s.gif