cleanup

2026-05-13 03:00:55 -04:00 · 2023-09-16 10:57:57 +00:00
parent 4f8e30786c
commit bc4473fe8a
318 changed files with 56 additions and 70662 deletions
--- a/website/docs/Examples/AutoML-Classification.md
+++ b/website/docs/Examples/AutoML-Classification.md
@@ -1,69 +0,0 @@
-# AutoML - Classification
-
-### Prerequisites
-
-Install the [automl] option.
-```bash
-pip install "flaml[automl]"
-```
-
-### A basic classification example
-
-```python
-from flaml import AutoML
-from sklearn.datasets import load_iris
-
-# Initialize an AutoML instance
-automl = AutoML()
-# Specify automl goal and constraint
-automl_settings = {
-    "time_budget": 1,  # in seconds
-    "metric": 'accuracy',
-    "task": 'classification',
-    "log_file_name": "iris.log",
-}
-X_train, y_train = load_iris(return_X_y=True)
-# Train with labeled input data
-automl.fit(X_train=X_train, y_train=y_train,
-           **automl_settings)
-# Predict
-print(automl.predict_proba(X_train))
-# Print the best model
-print(automl.model.estimator)
-```
-
-#### Sample of output
-```
-[flaml.automl: 11-12 18:21:44] {1485} INFO - Data split method: stratified
-[flaml.automl: 11-12 18:21:44] {1489} INFO - Evaluation method: cv
-[flaml.automl: 11-12 18:21:44] {1540} INFO - Minimizing error metric: 1-accuracy
-[flaml.automl: 11-12 18:21:44] {1577} INFO - List of ML learners in AutoML Run: ['lgbm', 'rf', 'catboost', 'xgboost', 'extra_tree', 'lrl1']
-[flaml.automl: 11-12 18:21:44] {1826} INFO - iteration 0, current learner lgbm
-[flaml.automl: 11-12 18:21:44] {1944} INFO - Estimated sufficient time budget=1285s. Estimated necessary time budget=23s.
-[flaml.automl: 11-12 18:21:44] {2029} INFO -  at 0.2s,	estimator lgbm's best error=0.0733,	best estimator lgbm's best error=0.0733
-[flaml.automl: 11-12 18:21:44] {1826} INFO - iteration 1, current learner lgbm
-[flaml.automl: 11-12 18:21:44] {2029} INFO -  at 0.3s,	estimator lgbm's best error=0.0733,	best estimator lgbm's best error=0.0733
-[flaml.automl: 11-12 18:21:44] {1826} INFO - iteration 2, current learner lgbm
-[flaml.automl: 11-12 18:21:44] {2029} INFO -  at 0.4s,	estimator lgbm's best error=0.0533,	best estimator lgbm's best error=0.0533
-[flaml.automl: 11-12 18:21:44] {1826} INFO - iteration 3, current learner lgbm
-[flaml.automl: 11-12 18:21:44] {2029} INFO -  at 0.6s,	estimator lgbm's best error=0.0533,	best estimator lgbm's best error=0.0533
-[flaml.automl: 11-12 18:21:44] {1826} INFO - iteration 4, current learner lgbm
-[flaml.automl: 11-12 18:21:44] {2029} INFO -  at 0.6s,	estimator lgbm's best error=0.0533,	best estimator lgbm's best error=0.0533
-[flaml.automl: 11-12 18:21:44] {1826} INFO - iteration 5, current learner xgboost
-[flaml.automl: 11-12 18:21:45] {2029} INFO -  at 0.9s,	estimator xgboost's best error=0.0600,	best estimator lgbm's best error=0.0533
-[flaml.automl: 11-12 18:21:45] {1826} INFO - iteration 6, current learner lgbm
-[flaml.automl: 11-12 18:21:45] {2029} INFO -  at 1.0s,	estimator lgbm's best error=0.0533,	best estimator lgbm's best error=0.0533
-[flaml.automl: 11-12 18:21:45] {1826} INFO - iteration 7, current learner extra_tree
-[flaml.automl: 11-12 18:21:45] {2029} INFO -  at 1.1s,	estimator extra_tree's best error=0.0667,	best estimator lgbm's best error=0.0533
-[flaml.automl: 11-12 18:21:45] {2242} INFO - retrain lgbm for 0.0s
-[flaml.automl: 11-12 18:21:45] {2247} INFO - retrained model: LGBMClassifier(learning_rate=0.2677050123105203, max_bin=127,
-               min_child_samples=12, n_estimators=4, num_leaves=4,
-               reg_alpha=0.001348364934537134, reg_lambda=1.4442580148221913,
-               verbose=-1)
-[flaml.automl: 11-12 18:21:45] {1608} INFO - fit succeeded
-[flaml.automl: 11-12 18:21:45] {1610} INFO - Time taken to find the best model: 0.3756711483001709
-```
-
-### A more advanced example including custom learner and metric
-
-[Link to notebook](https://github.com/microsoft/FLAML/blob/main/notebook/automl_classification.ipynb) | [Open in colab](https://colab.research.google.com/github/microsoft/FLAML/blob/main/notebook/automl_classification.ipynb)
--- a/website/docs/Examples/AutoML-NLP.md
+++ b/website/docs/Examples/AutoML-NLP.md
@@ -1,376 +0,0 @@
-# AutoML - NLP
-
-### Requirements
-
-This example requires GPU. Install the [automl,hf] option:
-```python
-pip install "flaml[automl,hf]"
-```
-
-### A simple sequence classification example
-
-```python
-from flaml import AutoML
-from datasets import load_dataset
-
-train_dataset = load_dataset("glue", "mrpc", split="train").to_pandas()
-dev_dataset = load_dataset("glue", "mrpc", split="validation").to_pandas()
-test_dataset = load_dataset("glue", "mrpc", split="test").to_pandas()
-custom_sent_keys = ["sentence1", "sentence2"]
-label_key = "label"
-X_train, y_train = train_dataset[custom_sent_keys], train_dataset[label_key]
-X_val, y_val = dev_dataset[custom_sent_keys], dev_dataset[label_key]
-X_test = test_dataset[custom_sent_keys]
-
-automl = AutoML()
-automl_settings = {
-    "time_budget": 100,
-    "task": "seq-classification",
-    "fit_kwargs_by_estimator": {
-        "transformer":
-       {
-           "output_dir": "data/output/"  # if model_path is not set, the default model is facebook/muppet-roberta-base: https://huggingface.co/facebook/muppet-roberta-base
-       }
-    },  # setting the huggingface arguments: output directory
-    "gpu_per_trial": 1,                         # set to 0 if no GPU is available
-}
-automl.fit(X_train=X_train, y_train=y_train, X_val=X_val, y_val=y_val, **automl_settings)
-automl.predict(X_test)
-```
-
-Notice that after you run `automl.fit`, the intermediate checkpoints are saved under the specified output_dir `data/output`. You can use the following code to clean these outputs if they consume a large storage space:
-
-```python
-if os.path.exists("data/output/"):
-    shutil.rmtree("data/output/")
-```
-
-#### Sample output
-
-```
-[flaml.automl: 12-06 08:21:39] {1943} INFO - task = seq-classification
-[flaml.automl: 12-06 08:21:39] {1945} INFO - Data split method: stratified
-[flaml.automl: 12-06 08:21:39] {1949} INFO - Evaluation method: holdout
-[flaml.automl: 12-06 08:21:39] {2019} INFO - Minimizing error metric: 1-accuracy
-[flaml.automl: 12-06 08:21:39] {2071} INFO - List of ML learners in AutoML Run: ['transformer']
-[flaml.automl: 12-06 08:21:39] {2311} INFO - iteration 0, current learner transformer
-{'data/output/train_2021-12-06_08-21-53/train_8947b1b2_1_n=1e-06,s=9223372036854775807,e=1e-05,s=-1,s=0.45765,e=32,d=42,o=0.0,y=0.0_2021-12-06_08-21-53/checkpoint-53': 53}
-[flaml.automl: 12-06 08:22:56] {2424} INFO - Estimated sufficient time budget=766860s. Estimated necessary time budget=767s.
-[flaml.automl: 12-06 08:22:56] {2499} INFO -  at 76.7s, estimator transformer's best error=0.1740,      best estimator transformer's best error=0.1740
-[flaml.automl: 12-06 08:22:56] {2606} INFO - selected model: <flaml.nlp.huggingface.trainer.TrainerForAuto object at 0x7f49ea8414f0>
-[flaml.automl: 12-06 08:22:56] {2100} INFO - fit succeeded
-[flaml.automl: 12-06 08:22:56] {2101} INFO - Time taken to find the best model: 76.69802761077881
-[flaml.automl: 12-06 08:22:56] {2112} WARNING - Time taken to find the best model is 77% of the provided time budget and not all estimators' hyperparameter search converged. Consider increasing the time budget.
-```
-
-### A simple sequence regression example
-
-```python
-from flaml import AutoML
-from datasets import load_dataset
-
-train_dataset = (
-    load_dataset("glue", "stsb", split="train").to_pandas()
-)
-dev_dataset = (
-    load_dataset("glue", "stsb", split="train").to_pandas()
-)
-custom_sent_keys = ["sentence1", "sentence2"]
-label_key = "label"
-X_train = train_dataset[custom_sent_keys]
-y_train = train_dataset[label_key]
-X_val = dev_dataset[custom_sent_keys]
-y_val = dev_dataset[label_key]
-
-automl = AutoML()
-automl_settings = {
-    "gpu_per_trial": 0,
-    "time_budget": 20,
-    "task": "seq-regression",
-    "metric": "rmse",
-}
-automl_settings["fit_kwargs_by_estimator"] = {  # setting the huggingface arguments
-    "transformer": {
-        "model_path": "google/electra-small-discriminator", # if model_path is not set, the default model is facebook/muppet-roberta-base: https://huggingface.co/facebook/muppet-roberta-base
-        "output_dir": "data/output/",                       # setting the output directory
-        "fp16": False,
-    }   # setting whether to use FP16
-}
-automl.fit(
-    X_train=X_train, y_train=y_train, X_val=X_val, y_val=y_val, **automl_settings
-)
-```
-
-#### Sample output
-
-```
-[flaml.automl: 12-20 11:47:28] {1965} INFO - task = seq-regression
-[flaml.automl: 12-20 11:47:28] {1967} INFO - Data split method: uniform
-[flaml.automl: 12-20 11:47:28] {1971} INFO - Evaluation method: holdout
-[flaml.automl: 12-20 11:47:28] {2063} INFO - Minimizing error metric: rmse
-[flaml.automl: 12-20 11:47:28] {2115} INFO - List of ML learners in AutoML Run: ['transformer']
-[flaml.automl: 12-20 11:47:28] {2355} INFO - iteration 0, current learner transformer
-```
-
-### A simple summarization example
-
-```python
-from flaml import AutoML
-from datasets import load_dataset
-
-train_dataset = (
-    load_dataset("xsum", split="train").to_pandas()
-)
-dev_dataset = (
-    load_dataset("xsum", split="validation").to_pandas()
-)
-custom_sent_keys = ["document"]
-label_key = "summary"
-
-X_train = train_dataset[custom_sent_keys]
-y_train = train_dataset[label_key]
-
-X_val = dev_dataset[custom_sent_keys]
-y_val = dev_dataset[label_key]
-
-automl = AutoML()
-automl_settings = {
-    "gpu_per_trial": 1,
-    "time_budget": 20,
-    "task": "summarization",
-    "metric": "rouge1",
-}
-automl_settings["fit_kwargs_by_estimator"] = {      # setting the huggingface arguments
-    "transformer": {
-        "model_path": "t5-small",             # if model_path is not set, the default model is t5-small: https://huggingface.co/t5-small
-        "output_dir": "data/output/",         # setting the output directory
-        "fp16": False,
-    } # setting whether to use FP16
-}
-automl.fit(
-    X_train=X_train, y_train=y_train, X_val=X_val, y_val=y_val, **automl_settings
-)
-```
-#### Sample Output
-
-```
-[flaml.automl: 12-20 11:44:03] {1965} INFO - task = summarization
-[flaml.automl: 12-20 11:44:03] {1967} INFO - Data split method: uniform
-[flaml.automl: 12-20 11:44:03] {1971} INFO - Evaluation method: holdout
-[flaml.automl: 12-20 11:44:03] {2063} INFO - Minimizing error metric: -rouge
-[flaml.automl: 12-20 11:44:03] {2115} INFO - List of ML learners in AutoML Run: ['transformer']
-[flaml.automl: 12-20 11:44:03] {2355} INFO - iteration 0, current learner transformer
-loading configuration file https://huggingface.co/t5-small/resolve/main/config.json from cache at /home/xliu127/.cache/huggingface/transformers/fe501e8fd6425b8ec93df37767fcce78ce626e34cc5edc859c662350cf712e41.406701565c0afd9899544c1cb8b93185a76f00b31e5ce7f6e18bbaef02241985
-Model config T5Config {
-  "_name_or_path": "t5-small",
-  "architectures": [
-    "T5WithLMHeadModel"
-  ],
-  "d_ff": 2048,
-  "d_kv": 64,
-  "d_model": 512,
-  "decoder_start_token_id": 0,
-  "dropout_rate": 0.1,
-  "eos_token_id": 1,
-  "feed_forward_proj": "relu",
-  "initializer_factor": 1.0,
-  "is_encoder_decoder": true,
-  "layer_norm_epsilon": 1e-06,
-  "model_type": "t5",
-  "n_positions": 512,
-  "num_decoder_layers": 6,
-  "num_heads": 8,
-  "num_layers": 6,
-  "output_past": true,
-  "pad_token_id": 0,
-  "relative_attention_num_buckets": 32,
-  "task_specific_params": {
-    "summarization": {
-      "early_stopping": true,
-      "length_penalty": 2.0,
-      "max_length": 200,
-      "min_length": 30,
-      "no_repeat_ngram_size": 3,
-      "num_beams": 4,
-      "prefix": "summarize: "
-    },
-    "translation_en_to_de": {
-      "early_stopping": true,
-      "max_length": 300,
-      "num_beams": 4,
-      "prefix": "translate English to German: "
-    },
-    "translation_en_to_fr": {
-      "early_stopping": true,
-      "max_length": 300,
-      "num_beams": 4,
-      "prefix": "translate English to French: "
-    },
-    "translation_en_to_ro": {
-      "early_stopping": true,
-      "max_length": 300,
-      "num_beams": 4,
-      "prefix": "translate English to Romanian: "
-    }
-  },
-  "transformers_version": "4.14.1",
-  "use_cache": true,
-  "vocab_size": 32128
-}
-```
-
-### A simple token classification example
-
-There are two ways to define the label for a token classification task. The first is to define the token labels:
-
-```python
-from flaml import AutoML
-import pandas as pd
-
-train_dataset = {
-    "id": ["0", "1"],
-    "ner_tags": [
-        ["B-ORG", "O", "B-MISC", "O", "O", "O", "B-MISC", "O", "O"],
-        ["B-PER", "I-PER"],
-    ],
-    "tokens": [
-        [
-            "EU", "rejects", "German", "call", "to", "boycott", "British", "lamb", ".",
-        ],
-        ["Peter", "Blackburn"],
-    ],
-}
-dev_dataset = {
-    "id": ["0"],
-    "ner_tags": [
-        ["O"],
-    ],
-    "tokens": [
-        ["1996-08-22"]
-    ],
-}
-test_dataset = {
-    "id": ["0"],
-    "ner_tags": [
-        ["O"],
-    ],
-    "tokens": [
-        ['.']
-    ],
-}
-custom_sent_keys = ["tokens"]
-label_key = "ner_tags"
-
-train_dataset = pd.DataFrame(train_dataset)
-dev_dataset = pd.DataFrame(dev_dataset)
-test_dataset = pd.DataFrame(test_dataset)
-
-X_train, y_train = train_dataset[custom_sent_keys], train_dataset[label_key]
-X_val, y_val = dev_dataset[custom_sent_keys], dev_dataset[label_key]
-X_test = test_dataset[custom_sent_keys]
-
-automl = AutoML()
-automl_settings = {
-    "time_budget": 10,
-    "task": "token-classification",
-    "fit_kwargs_by_estimator": {
-        "transformer":
-            {
-                "output_dir": "data/output/"
-                # if model_path is not set, the default model is facebook/muppet-roberta-base: https://huggingface.co/facebook/muppet-roberta-base
-            }
-    },  # setting the huggingface arguments: output directory
-    "gpu_per_trial": 1,  # set to 0 if no GPU is available
-    "metric": "seqeval:overall_f1"
-}
-
-automl.fit(X_train=X_train, y_train=y_train, X_val=X_val, y_val=y_val, **automl_settings)
-automl.predict(X_test)
-```
-
-The second is to define the id labels + a token [label list](https://microsoft.github.io/FLAML/docs/reference/nlp/huggingface/training_args):
-
-```python
-from flaml import AutoML
-import pandas as pd
-
-train_dataset = {
-        "id": ["0", "1"],
-        "ner_tags": [
-            [3, 0, 7, 0, 0, 0, 7, 0, 0],
-            [1, 2],
-        ],
-        "tokens": [
-            [
-                "EU", "rejects", "German", "call", "to", "boycott", "British", "lamb", ".",
-            ],
-            ["Peter", "Blackburn"],
-        ],
-    }
-dev_dataset = {
-    "id": ["0"],
-    "ner_tags": [
-        [0],
-    ],
-    "tokens": [
-        ["1996-08-22"]
-    ],
-}
-test_dataset = {
-    "id": ["0"],
-    "ner_tags": [
-        [0],
-    ],
-    "tokens": [
-        ['.']
-    ],
-}
-custom_sent_keys = ["tokens"]
-label_key = "ner_tags"
-
-train_dataset = pd.DataFrame(train_dataset)
-dev_dataset = pd.DataFrame(dev_dataset)
-test_dataset = pd.DataFrame(test_dataset)
-
-X_train, y_train = train_dataset[custom_sent_keys], train_dataset[label_key]
-X_val, y_val = dev_dataset[custom_sent_keys], dev_dataset[label_key]
-X_test = test_dataset[custom_sent_keys]
-
-automl = AutoML()
-automl_settings = {
-    "time_budget": 10,
-    "task": "token-classification",
-    "fit_kwargs_by_estimator": {
-        "transformer":
-            {
-                "output_dir": "data/output/",
-                # if model_path is not set, the default model is facebook/muppet-roberta-base: https://huggingface.co/facebook/muppet-roberta-base
-                "label_list": [ "O","B-PER", "I-PER", "B-ORG", "I-ORG", "B-LOC", "I-LOC", "B-MISC", "I-MISC" ]
-            }
-    },  # setting the huggingface arguments: output directory
-    "gpu_per_trial": 1,  # set to 0 if no GPU is available
-    "metric": "seqeval:overall_f1"
-}
-
-automl.fit(X_train=X_train, y_train=y_train, X_val=X_val, y_val=y_val, **automl_settings)
-automl.predict(X_test)
-```
-
-#### Sample Output
-
-```
-[flaml.automl: 06-30 03:10:02] {2423} INFO - task = token-classification
-[flaml.automl: 06-30 03:10:02] {2425} INFO - Data split method: stratified
-[flaml.automl: 06-30 03:10:02] {2428} INFO - Evaluation method: holdout
-[flaml.automl: 06-30 03:10:02] {2497} INFO - Minimizing error metric: seqeval:overall_f1
-[flaml.automl: 06-30 03:10:02] {2637} INFO - List of ML learners in AutoML Run: ['transformer']
-[flaml.automl: 06-30 03:10:02] {2929} INFO - iteration 0, current learner transformer
-```
-
-For tasks that are not currently supported, use `flaml.tune` for [customized tuning](Tune-HuggingFace).
-
-### Link to Jupyter notebook
-
-To run more examples, especially examples using Ray Tune, please go to:
-
-[Link to notebook](https://github.com/microsoft/FLAML/blob/main/notebook/automl_nlp.ipynb) | [Open in colab](https://colab.research.google.com/github/microsoft/FLAML/blob/main/notebook/automl_nlp.ipynb)
--- a/website/docs/Examples/AutoML-Rank.md
+++ b/website/docs/Examples/AutoML-Rank.md
@@ -1,103 +0,0 @@
-# AutoML - Rank
-
-### Prerequisites
-
-Install the [automl] option.
-```bash
-pip install "flaml[automl]"
-```
-
-### A simple learning-to-rank example
-
-```python
-from sklearn.datasets import fetch_openml
-from flaml import AutoML
-
-X_train, y_train = fetch_openml(name="credit-g", return_X_y=True, as_frame=False)
-y_train = y_train.cat.codes
-# not a real learning to rank dataaset
-groups = [200] * 4 + [100] * 2    # group counts
-automl = AutoML()
-automl.fit(
-    X_train, y_train, groups=groups,
-    task='rank', time_budget=10,    # in seconds
-)
-```
-
-#### Sample output
-
-```
-[flaml.automl: 11-15 07:14:30] {1485} INFO - Data split method: group
-[flaml.automl: 11-15 07:14:30] {1489} INFO - Evaluation method: holdout
-[flaml.automl: 11-15 07:14:30] {1540} INFO - Minimizing error metric: 1-ndcg
-[flaml.automl: 11-15 07:14:30] {1577} INFO - List of ML learners in AutoML Run: ['lgbm', 'xgboost']
-[flaml.automl: 11-15 07:14:30] {1826} INFO - iteration 0, current learner lgbm
-[flaml.automl: 11-15 07:14:30] {1944} INFO - Estimated sufficient time budget=679s. Estimated necessary time budget=1s.
-[flaml.automl: 11-15 07:14:30] {2029} INFO -  at 0.1s,  estimator lgbm's best error=0.0248,     best estimator lgbm's best error=0.0248
-[flaml.automl: 11-15 07:14:30] {1826} INFO - iteration 1, current learner lgbm
-[flaml.automl: 11-15 07:14:30] {2029} INFO -  at 0.1s,  estimator lgbm's best error=0.0248,     best estimator lgbm's best error=0.0248
-[flaml.automl: 11-15 07:14:30] {1826} INFO - iteration 2, current learner lgbm
-[flaml.automl: 11-15 07:14:30] {2029} INFO -  at 0.2s,  estimator lgbm's best error=0.0248,     best estimator lgbm's best error=0.0248
-[flaml.automl: 11-15 07:14:30] {1826} INFO - iteration 3, current learner lgbm
-[flaml.automl: 11-15 07:14:30] {2029} INFO -  at 0.2s,  estimator lgbm's best error=0.0248,     best estimator lgbm's best error=0.0248
-[flaml.automl: 11-15 07:14:30] {1826} INFO - iteration 4, current learner xgboost
-[flaml.automl: 11-15 07:14:30] {2029} INFO -  at 0.2s,  estimator xgboost's best error=0.0315,  best estimator lgbm's best error=0.0248
-[flaml.automl: 11-15 07:14:30] {1826} INFO - iteration 5, current learner xgboost
-[flaml.automl: 11-15 07:14:30] {2029} INFO -  at 0.2s,  estimator xgboost's best error=0.0315,  best estimator lgbm's best error=0.0248
-[flaml.automl: 11-15 07:14:30] {1826} INFO - iteration 6, current learner lgbm
-[flaml.automl: 11-15 07:14:30] {2029} INFO -  at 0.3s,  estimator lgbm's best error=0.0248,     best estimator lgbm's best error=0.0248
-[flaml.automl: 11-15 07:14:30] {1826} INFO - iteration 7, current learner lgbm
-[flaml.automl: 11-15 07:14:30] {2029} INFO -  at 0.3s,  estimator lgbm's best error=0.0248,     best estimator lgbm's best error=0.0248
-[flaml.automl: 11-15 07:14:30] {1826} INFO - iteration 8, current learner xgboost
-[flaml.automl: 11-15 07:14:30] {2029} INFO -  at 0.4s,  estimator xgboost's best error=0.0315,  best estimator lgbm's best error=0.0248
-[flaml.automl: 11-15 07:14:30] {1826} INFO - iteration 9, current learner xgboost
-[flaml.automl: 11-15 07:14:30] {2029} INFO -  at 0.4s,  estimator xgboost's best error=0.0315,  best estimator lgbm's best error=0.0248
-[flaml.automl: 11-15 07:14:30] {1826} INFO - iteration 10, current learner xgboost
-[flaml.automl: 11-15 07:14:30] {2029} INFO -  at 0.4s,  estimator xgboost's best error=0.0233,  best estimator xgboost's best error=0.0233
-[flaml.automl: 11-15 07:14:30] {1826} INFO - iteration 11, current learner xgboost
-[flaml.automl: 11-15 07:14:30] {2029} INFO -  at 0.4s,  estimator xgboost's best error=0.0233,  best estimator xgboost's best error=0.0233
-[flaml.automl: 11-15 07:14:30] {1826} INFO - iteration 12, current learner xgboost
-[flaml.automl: 11-15 07:14:30] {2029} INFO -  at 0.4s,  estimator xgboost's best error=0.0233,  best estimator xgboost's best error=0.0233
-[flaml.automl: 11-15 07:14:30] {1826} INFO - iteration 13, current learner xgboost
-[flaml.automl: 11-15 07:14:30] {2029} INFO -  at 0.4s,  estimator xgboost's best error=0.0233,  best estimator xgboost's best error=0.0233
-[flaml.automl: 11-15 07:14:30] {1826} INFO - iteration 14, current learner lgbm
-[flaml.automl: 11-15 07:14:30] {2029} INFO -  at 0.5s,  estimator lgbm's best error=0.0225,     best estimator lgbm's best error=0.0225
-[flaml.automl: 11-15 07:14:30] {1826} INFO - iteration 15, current learner xgboost
-[flaml.automl: 11-15 07:14:30] {2029} INFO -  at 0.5s,  estimator xgboost's best error=0.0233,  best estimator lgbm's best error=0.0225
-[flaml.automl: 11-15 07:14:30] {1826} INFO - iteration 16, current learner lgbm
-[flaml.automl: 11-15 07:14:30] {2029} INFO -  at 0.5s,  estimator lgbm's best error=0.0225,     best estimator lgbm's best error=0.0225
-[flaml.automl: 11-15 07:14:30] {1826} INFO - iteration 17, current learner lgbm
-[flaml.automl: 11-15 07:14:31] {2029} INFO -  at 0.5s,  estimator lgbm's best error=0.0225,     best estimator lgbm's best error=0.0225
-[flaml.automl: 11-15 07:14:31] {1826} INFO - iteration 18, current learner lgbm
-[flaml.automl: 11-15 07:14:31] {2029} INFO -  at 0.6s,  estimator lgbm's best error=0.0225,     best estimator lgbm's best error=0.0225
-[flaml.automl: 11-15 07:14:31] {1826} INFO - iteration 19, current learner lgbm
-[flaml.automl: 11-15 07:14:31] {2029} INFO -  at 0.6s,  estimator lgbm's best error=0.0201,     best estimator lgbm's best error=0.0201
-[flaml.automl: 11-15 07:14:31] {1826} INFO - iteration 20, current learner lgbm
-[flaml.automl: 11-15 07:14:31] {2029} INFO -  at 0.6s,  estimator lgbm's best error=0.0201,     best estimator lgbm's best error=0.0201
-[flaml.automl: 11-15 07:14:31] {1826} INFO - iteration 21, current learner lgbm
-[flaml.automl: 11-15 07:14:31] {2029} INFO -  at 0.7s,  estimator lgbm's best error=0.0201,     best estimator lgbm's best error=0.0201
-[flaml.automl: 11-15 07:14:31] {1826} INFO - iteration 22, current learner lgbm
-[flaml.automl: 11-15 07:14:31] {2029} INFO -  at 0.7s,  estimator lgbm's best error=0.0201,     best estimator lgbm's best error=0.0201
-[flaml.automl: 11-15 07:14:31] {1826} INFO - iteration 23, current learner lgbm
-[flaml.automl: 11-15 07:14:31] {2029} INFO -  at 0.8s,  estimator lgbm's best error=0.0201,     best estimator lgbm's best error=0.0201
-[flaml.automl: 11-15 07:14:31] {1826} INFO - iteration 24, current learner lgbm
-[flaml.automl: 11-15 07:14:31] {2029} INFO -  at 0.8s,  estimator lgbm's best error=0.0201,     best estimator lgbm's best error=0.0201
-[flaml.automl: 11-15 07:14:31] {1826} INFO - iteration 25, current learner lgbm
-[flaml.automl: 11-15 07:14:31] {2029} INFO -  at 0.8s,  estimator lgbm's best error=0.0201,     best estimator lgbm's best error=0.0201
-[flaml.automl: 11-15 07:14:31] {1826} INFO - iteration 26, current learner lgbm
-[flaml.automl: 11-15 07:14:31] {2029} INFO -  at 0.9s,  estimator lgbm's best error=0.0197,     best estimator lgbm's best error=0.0197
-[flaml.automl: 11-15 07:14:31] {1826} INFO - iteration 27, current learner lgbm
-[flaml.automl: 11-15 07:14:31] {2029} INFO -  at 0.9s,  estimator lgbm's best error=0.0197,     best estimator lgbm's best error=0.0197
-[flaml.automl: 11-15 07:14:31] {1826} INFO - iteration 28, current learner lgbm
-[flaml.automl: 11-15 07:14:31] {2029} INFO -  at 1.0s,  estimator lgbm's best error=0.0197,     best estimator lgbm's best error=0.0197
-[flaml.automl: 11-15 07:14:31] {1826} INFO - iteration 29, current learner lgbm
-[flaml.automl: 11-15 07:14:31] {2029} INFO -  at 1.0s,  estimator lgbm's best error=0.0197,     best estimator lgbm's best error=0.0197
-[flaml.automl: 11-15 07:14:31] {2242} INFO - retrain lgbm for 0.0s
-[flaml.automl: 11-15 07:14:31] {2247} INFO - retrained model: LGBMRanker(colsample_bytree=0.9852774042640857,
-           learning_rate=0.034918421933217675, max_bin=1023,
-           min_child_samples=22, n_estimators=6, num_leaves=23,
-           reg_alpha=0.0009765625, reg_lambda=21.505295697527654, verbose=-1)
-[flaml.automl: 11-15 07:14:31] {1608} INFO - fit succeeded
-[flaml.automl: 11-15 07:14:31] {1610} INFO - Time taken to find the best model: 0.8846545219421387
-[flaml.automl: 11-15 07:14:31] {1624} WARNING - Time taken to find the best model is 88% of the provided time budget and not all estimators' hyperparameter search converged. Consider increasing the time budget.
-```
--- a/website/docs/Examples/AutoML-Regression.md
+++ b/website/docs/Examples/AutoML-Regression.md
@@ -1,108 +0,0 @@
-# AutoML - Regression
-
-### Prerequisites
-
-Install the [automl] option.
-```bash
-pip install "flaml[automl]"
-```
-
-### A basic regression example
-
-```python
-from flaml import AutoML
-from sklearn.datasets import fetch_california_housing
-
-# Initialize an AutoML instance
-automl = AutoML()
-# Specify automl goal and constraint
-automl_settings = {
-    "time_budget": 1,  # in seconds
-    "metric": 'r2',
-    "task": 'regression',
-    "log_file_name": "california.log",
-}
-X_train, y_train = fetch_california_housing(return_X_y=True)
-# Train with labeled input data
-automl.fit(X_train=X_train, y_train=y_train,
-           **automl_settings)
-# Predict
-print(automl.predict(X_train))
-# Print the best model
-print(automl.model.estimator)
-```
-
-#### Sample output
-
-```
-[flaml.automl: 11-15 07:08:19] {1485} INFO - Data split method: uniform
-[flaml.automl: 11-15 07:08:19] {1489} INFO - Evaluation method: holdout
-[flaml.automl: 11-15 07:08:19] {1540} INFO - Minimizing error metric: 1-r2
-[flaml.automl: 11-15 07:08:19] {1577} INFO - List of ML learners in AutoML Run: ['lgbm', 'rf', 'catboost', 'xgboost', 'extra_tree']
-[flaml.automl: 11-15 07:08:19] {1826} INFO - iteration 0, current learner lgbm
-[flaml.automl: 11-15 07:08:19] {1944} INFO - Estimated sufficient time budget=846s. Estimated necessary time budget=2s.
-[flaml.automl: 11-15 07:08:19] {2029} INFO -  at 0.2s,  estimator lgbm's best error=0.7393,     best estimator lgbm's best error=0.7393
-[flaml.automl: 11-15 07:08:19] {1826} INFO - iteration 1, current learner lgbm
-[flaml.automl: 11-15 07:08:19] {2029} INFO -  at 0.3s,  estimator lgbm's best error=0.7393,     best estimator lgbm's best error=0.7393
-[flaml.automl: 11-15 07:08:19] {1826} INFO - iteration 2, current learner lgbm
-[flaml.automl: 11-15 07:08:19] {2029} INFO -  at 0.3s,  estimator lgbm's best error=0.5446,     best estimator lgbm's best error=0.5446
-[flaml.automl: 11-15 07:08:19] {1826} INFO - iteration 3, current learner lgbm
-[flaml.automl: 11-15 07:08:19] {2029} INFO -  at 0.4s,  estimator lgbm's best error=0.2807,     best estimator lgbm's best error=0.2807
-[flaml.automl: 11-15 07:08:19] {1826} INFO - iteration 4, current learner lgbm
-[flaml.automl: 11-15 07:08:19] {2029} INFO -  at 0.5s,  estimator lgbm's best error=0.2712,     best estimator lgbm's best error=0.2712
-[flaml.automl: 11-15 07:08:19] {1826} INFO - iteration 5, current learner lgbm
-[flaml.automl: 11-15 07:08:19] {2029} INFO -  at 0.5s,  estimator lgbm's best error=0.2712,     best estimator lgbm's best error=0.2712
-[flaml.automl: 11-15 07:08:19] {1826} INFO - iteration 6, current learner lgbm
-[flaml.automl: 11-15 07:08:20] {2029} INFO -  at 0.6s,  estimator lgbm's best error=0.2712,     best estimator lgbm's best error=0.2712
-[flaml.automl: 11-15 07:08:20] {1826} INFO - iteration 7, current learner lgbm
-[flaml.automl: 11-15 07:08:20] {2029} INFO -  at 0.7s,  estimator lgbm's best error=0.2197,     best estimator lgbm's best error=0.2197
-[flaml.automl: 11-15 07:08:20] {1826} INFO - iteration 8, current learner xgboost
-[flaml.automl: 11-15 07:08:20] {2029} INFO -  at 0.8s,  estimator xgboost's best error=1.4958,  best estimator lgbm's best error=0.2197
-[flaml.automl: 11-15 07:08:20] {1826} INFO - iteration 9, current learner xgboost
-[flaml.automl: 11-15 07:08:20] {2029} INFO -  at 0.8s,  estimator xgboost's best error=1.4958,  best estimator lgbm's best error=0.2197
-[flaml.automl: 11-15 07:08:20] {1826} INFO - iteration 10, current learner xgboost
-[flaml.automl: 11-15 07:08:20] {2029} INFO -  at 0.9s,  estimator xgboost's best error=0.7052,  best estimator lgbm's best error=0.2197
-[flaml.automl: 11-15 07:08:20] {1826} INFO - iteration 11, current learner xgboost
-[flaml.automl: 11-15 07:08:20] {2029} INFO -  at 0.9s,  estimator xgboost's best error=0.3619,  best estimator lgbm's best error=0.2197
-[flaml.automl: 11-15 07:08:20] {1826} INFO - iteration 12, current learner xgboost
-[flaml.automl: 11-15 07:08:20] {2029} INFO -  at 0.9s,  estimator xgboost's best error=0.3619,  best estimator lgbm's best error=0.2197
-[flaml.automl: 11-15 07:08:20] {1826} INFO - iteration 13, current learner xgboost
-[flaml.automl: 11-15 07:08:20] {2029} INFO -  at 1.0s,  estimator xgboost's best error=0.3619,  best estimator lgbm's best error=0.2197
-[flaml.automl: 11-15 07:08:20] {1826} INFO - iteration 14, current learner extra_tree
-[flaml.automl: 11-15 07:08:20] {2029} INFO -  at 1.1s,  estimator extra_tree's best error=0.7197,       best estimator lgbm's best error=0.2197
-[flaml.automl: 11-15 07:08:20] {2242} INFO - retrain lgbm for 0.0s
-[flaml.automl: 11-15 07:08:20] {2247} INFO - retrained model: LGBMRegressor(colsample_bytree=0.7610534336273627,
-              learning_rate=0.41929025492645006, max_bin=255,
-              min_child_samples=4, n_estimators=45, num_leaves=4,
-              reg_alpha=0.0009765625, reg_lambda=0.009280655005879943,
-              verbose=-1)
-[flaml.automl: 11-15 07:08:20] {1608} INFO - fit succeeded
-[flaml.automl: 11-15 07:08:20] {1610} INFO - Time taken to find the best model: 0.7289648056030273
-[flaml.automl: 11-15 07:08:20] {1624} WARNING - Time taken to find the best model is 73% of the provided time budget and not all estimators' hyperparameter search converged. Consider increasing the time budget.
-```
-
-### Multi-output regression
-
-We can combine `sklearn.MultiOutputRegressor` and `flaml.AutoML` to do AutoML for multi-output regression.
-
-```python
-from flaml import AutoML
-from sklearn.datasets import make_regression
-from sklearn.model_selection import train_test_split
-from sklearn.multioutput import MultiOutputRegressor
-
-# create regression data
-X, y = make_regression(n_targets=3)
-
-# split into train and test data
-X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=42)
-
-# train the model
-model = MultiOutputRegressor(AutoML(task="regression", time_budget=60))
-model.fit(X_train, y_train)
-
-# predict
-print(model.predict(X_test))
-```
-
-It will perform AutoML for each target, each taking 60 seconds.
--- a/website/docs/Examples/AutoML-Time
+++ b/website/docs/Examples/AutoML-Time
--- a/website/docs/Examples/AutoML-for-LightGBM.md
+++ b/website/docs/Examples/AutoML-for-LightGBM.md
@@ -1,207 +0,0 @@
-# AutoML for LightGBM
-
-### Prerequisites for this example
-
-Install the [automl] option.
-```bash
-pip install "flaml[automl] matplotlib openml"
-```
-
-### Use built-in LGBMEstimator
-
-```python
-from flaml import AutoML
-from flaml.automl.data import load_openml_dataset
-
-# Download [houses dataset](https://www.openml.org/d/537) from OpenML. The task is to predict median price of the house in the region based on demographic composition and a state of housing market in the region.
-X_train, X_test, y_train, y_test = load_openml_dataset(dataset_id=537, data_dir='./')
-
-automl = AutoML()
-settings = {
-    "time_budget": 60,  # total running time in seconds
-    "metric": 'r2',  # primary metrics for regression can be chosen from: ['mae','mse','r2']
-    "estimator_list": ['lgbm'],  # list of ML learners; we tune lightgbm in this example
-    "task": 'regression',  # task type
-    "log_file_name": 'houses_experiment.log',  # flaml log file
-    "seed": 7654321,  # random seed
-}
-automl.fit(X_train=X_train, y_train=y_train, **settings)
-```
-
-#### Sample output
-
-```
-[flaml.automl: 11-15 19:46:44] {1485} INFO - Data split method: uniform
-[flaml.automl: 11-15 19:46:44] {1489} INFO - Evaluation method: cv
-[flaml.automl: 11-15 19:46:44] {1540} INFO - Minimizing error metric: 1-r2
-[flaml.automl: 11-15 19:46:44] {1577} INFO - List of ML learners in AutoML Run: ['lgbm']
-[flaml.automl: 11-15 19:46:44] {1826} INFO - iteration 0, current learner lgbm
-[flaml.automl: 11-15 19:46:44] {1944} INFO - Estimated sufficient time budget=3232s. Estimated necessary time budget=3s.
-[flaml.automl: 11-15 19:46:44] {2029} INFO -  at 0.5s,	estimator lgbm's best error=0.7383,	best estimator lgbm's best error=0.7383
-[flaml.automl: 11-15 19:46:44] {1826} INFO - iteration 1, current learner lgbm
-[flaml.automl: 11-15 19:46:44] {2029} INFO -  at 0.6s,	estimator lgbm's best error=0.4774,	best estimator lgbm's best error=0.4774
-[flaml.automl: 11-15 19:46:44] {1826} INFO - iteration 2, current learner lgbm
-[flaml.automl: 11-15 19:46:44] {2029} INFO -  at 0.7s,	estimator lgbm's best error=0.4774,	best estimator lgbm's best error=0.4774
-[flaml.automl: 11-15 19:46:44] {1826} INFO - iteration 3, current learner lgbm
-[flaml.automl: 11-15 19:46:44] {2029} INFO -  at 0.9s,	estimator lgbm's best error=0.2985,	best estimator lgbm's best error=0.2985
-[flaml.automl: 11-15 19:46:44] {1826} INFO - iteration 4, current learner lgbm
-[flaml.automl: 11-15 19:46:45] {2029} INFO -  at 1.3s,	estimator lgbm's best error=0.2337,	best estimator lgbm's best error=0.2337
-[flaml.automl: 11-15 19:46:45] {1826} INFO - iteration 5, current learner lgbm
-[flaml.automl: 11-15 19:46:45] {2029} INFO -  at 1.4s,	estimator lgbm's best error=0.2337,	best estimator lgbm's best error=0.2337
-[flaml.automl: 11-15 19:46:45] {1826} INFO - iteration 6, current learner lgbm
-[flaml.automl: 11-15 19:46:46] {2029} INFO -  at 2.5s,	estimator lgbm's best error=0.2219,	best estimator lgbm's best error=0.2219
-[flaml.automl: 11-15 19:46:46] {1826} INFO - iteration 7, current learner lgbm
-[flaml.automl: 11-15 19:46:46] {2029} INFO -  at 2.9s,	estimator lgbm's best error=0.2219,	best estimator lgbm's best error=0.2219
-[flaml.automl: 11-15 19:46:46] {1826} INFO - iteration 8, current learner lgbm
-[flaml.automl: 11-15 19:46:48] {2029} INFO -  at 4.5s,	estimator lgbm's best error=0.1764,	best estimator lgbm's best error=0.1764
-[flaml.automl: 11-15 19:46:48] {1826} INFO - iteration 9, current learner lgbm
-[flaml.automl: 11-15 19:46:54] {2029} INFO -  at 10.5s,	estimator lgbm's best error=0.1630,	best estimator lgbm's best error=0.1630
-[flaml.automl: 11-15 19:46:54] {1826} INFO - iteration 10, current learner lgbm
-[flaml.automl: 11-15 19:46:56] {2029} INFO -  at 12.4s,	estimator lgbm's best error=0.1630,	best estimator lgbm's best error=0.1630
-[flaml.automl: 11-15 19:46:56] {1826} INFO - iteration 11, current learner lgbm
-[flaml.automl: 11-15 19:47:13] {2029} INFO -  at 29.0s,	estimator lgbm's best error=0.1630,	best estimator lgbm's best error=0.1630
-[flaml.automl: 11-15 19:47:13] {1826} INFO - iteration 12, current learner lgbm
-[flaml.automl: 11-15 19:47:15] {2029} INFO -  at 31.1s,	estimator lgbm's best error=0.1630,	best estimator lgbm's best error=0.1630
-[flaml.automl: 11-15 19:47:15] {1826} INFO - iteration 13, current learner lgbm
-[flaml.automl: 11-15 19:47:29] {2029} INFO -  at 45.8s,	estimator lgbm's best error=0.1564,	best estimator lgbm's best error=0.1564
-[flaml.automl: 11-15 19:47:33] {2242} INFO - retrain lgbm for 3.2s
-[flaml.automl: 11-15 19:47:33] {2247} INFO - retrained model: LGBMRegressor(colsample_bytree=0.8025848209352517,
-              learning_rate=0.09100963138990374, max_bin=255,
-              min_child_samples=42, n_estimators=363, num_leaves=216,
-              reg_alpha=0.001113000336715291, reg_lambda=76.50614276906414,
-              verbose=-1)
-[flaml.automl: 11-15 19:47:33] {1608} INFO - fit succeeded
-[flaml.automl: 11-15 19:47:33] {1610} INFO - Time taken to find the best model: 45.75616669654846
-[flaml.automl: 11-15 19:47:33] {1624} WARNING - Time taken to find the best model is 76% of the provided time budget and not all estimators' hyperparameter search converged. Consider increasing the time budget.
-```
-
-#### Retrieve best config
-
-```python
-print('Best hyperparmeter config:', automl.best_config)
-print('Best r2 on validation data: {0:.4g}'.format(1-automl.best_loss))
-print('Training duration of best run: {0:.4g} s'.format(automl.best_config_train_time))
-print(automl.model.estimator)
-# Best hyperparmeter config: {'n_estimators': 363, 'num_leaves': 216, 'min_child_samples': 42, 'learning_rate': 0.09100963138990374, 'log_max_bin': 8, 'colsample_bytree': 0.8025848209352517, 'reg_alpha': 0.001113000336715291, 'reg_lambda': 76.50614276906414}
-# Best r2 on validation data: 0.8436
-# Training duration of best run: 3.229 s
-# LGBMRegressor(colsample_bytree=0.8025848209352517,
-#               learning_rate=0.09100963138990374, max_bin=255,
-#               min_child_samples=42, n_estimators=363, num_leaves=216,
-#               reg_alpha=0.001113000336715291, reg_lambda=76.50614276906414,
-#               verbose=-1)
-```
-
-#### Plot feature importance
-
-```python
-import matplotlib.pyplot as plt
-plt.barh(automl.feature_names_in_, automl.feature_importances_)
-```
-![png](../Use-Cases/images/feature_importance.png)
-
-#### Compute predictions of testing dataset
-
-```python
-y_pred = automl.predict(X_test)
-print('Predicted labels', y_pred)
-# Predicted labels [143391.65036562 245535.13731811 153171.44071629 ... 184354.52735963
-#  235510.49470445 282617.22858956]
-```
-
-#### Compute different metric values on testing dataset
-
-```python
-from flaml.automl.ml import sklearn_metric_loss_score
-
-print('r2', '=', 1 - sklearn_metric_loss_score('r2', y_pred, y_test))
-print('mse', '=', sklearn_metric_loss_score('mse', y_pred, y_test))
-print('mae', '=', sklearn_metric_loss_score('mae', y_pred, y_test))
-# r2 = 0.8505434326526395
-# mse = 1975592613.138005
-# mae = 29471.536046068788
-```
-
-#### Compare with untuned LightGBM
-
-```python
-from lightgbm import LGBMRegressor
-
-lgbm = LGBMRegressor()
-lgbm.fit(X_train, y_train)
-y_pred = lgbm.predict(X_test)
-from flaml.automl.ml import sklearn_metric_loss_score
-
-print('default lgbm r2', '=', 1 - sklearn_metric_loss_score('r2', y_pred, y_test))
-# default lgbm r2 = 0.8296179648694404
-```
-
-#### Plot learning curve
-
-How does the model accuracy improve as we search for different hyperparameter configurations?
-
-```python
-from flaml.automl.data import get_output_from_log
-import numpy as np
-
-time_history, best_valid_loss_history, valid_loss_history, config_history, metric_history =
-    get_output_from_log(filename=settings['log_file_name'], time_budget=60)
-plt.title('Learning Curve')
-plt.xlabel('Wall Clock Time (s)')
-plt.ylabel('Validation r2')
-plt.step(time_history, 1 - np.array(best_valid_loss_history), where='post')
-plt.show()
-```
-![png](images/lgbm_curve.png)
-
-### Use a customized LightGBM learner
-
-The native API of LightGBM allows one to specify a custom objective function in the model constructor. You can easily enable it by adding a customized LightGBM learner in FLAML. In the following example, we show how to add such a customized LightGBM learner with a custom objective function.
-
-#### Create a customized LightGBM learner with a custom objective function
-
-```python
-import numpy as np
-
-
-# define your customized objective function
-def my_loss_obj(y_true, y_pred):
-    c = 0.5
-    residual = y_pred - y_true
-    grad = c * residual / (np.abs(residual) + c)
-    hess = c ** 2 / (np.abs(residual) + c) ** 2
-    # rmse grad and hess
-    grad_rmse = residual
-    hess_rmse = 1.0
-
-    # mae grad and hess
-    grad_mae = np.array(residual)
-    grad_mae[grad_mae > 0] = 1.
-    grad_mae[grad_mae <= 0] = -1.
-    hess_mae = 1.0
-
-    coef = [0.4, 0.3, 0.3]
-    return coef[0] * grad + coef[1] * grad_rmse + coef[2] * grad_mae,
-           coef[0] * hess + coef[1] * hess_rmse + coef[2] * hess_mae
-
-
-from flaml.automl.model import LGBMEstimator
-
-
-class MyLGBM(LGBMEstimator):
-    """LGBMEstimator with my_loss_obj as the objective function"""
-
-    def __init__(self, **config):
-        super().__init__(objective=my_loss_obj, **config)
-```
-
-#### Add the customized learner and tune it
-
-```python
-automl = AutoML()
-automl.add_learner(learner_name='my_lgbm', learner_class=MyLGBM)
-settings["estimator_list"] = ['my_lgbm']  # change the estimator list
-automl.fit(X_train=X_train, y_train=y_train, **settings)
-```
-
-[Link to notebook](https://github.com/microsoft/FLAML/blob/main/notebook/automl_lightgbm.ipynb) | [Open in colab](https://colab.research.google.com/github/microsoft/FLAML/blob/main/notebook/automl_lightgbm.ipynb)
--- a/website/docs/Examples/AutoML-for-XGBoost.md
+++ b/website/docs/Examples/AutoML-for-XGBoost.md
@@ -1,232 +0,0 @@
-# AutoML for XGBoost
-
-### Prerequisites for this example
-
-Install the [automl] option.
-```bash
-pip install "flaml[automl] matplotlib openml"
-```
-
-### Use built-in XGBoostSklearnEstimator
-
-```python
-from flaml import AutoML
-from flaml.automl.data import load_openml_dataset
-
-# Download [houses dataset](https://www.openml.org/d/537) from OpenML. The task is to predict median price of the house in the region based on demographic composition and a state of housing market in the region.
-X_train, X_test, y_train, y_test = load_openml_dataset(dataset_id=537, data_dir='./')
-
-automl = AutoML()
-settings = {
-    "time_budget": 60,  # total running time in seconds
-    "metric": 'r2',  # primary metrics for regression can be chosen from: ['mae','mse','r2']
-    "estimator_list": ['xgboost'],  # list of ML learners; we tune XGBoost in this example
-    "task": 'regression',  # task type
-    "log_file_name": 'houses_experiment.log',  # flaml log file
-    "seed": 7654321,  # random seed
-}
-automl.fit(X_train=X_train, y_train=y_train, **settings)
-```
-
-#### Sample output
-
-```
-[flaml.automl: 09-29 23:06:46] {1446} INFO - Data split method: uniform
-[flaml.automl: 09-29 23:06:46] {1450} INFO - Evaluation method: cv
-[flaml.automl: 09-29 23:06:46] {1496} INFO - Minimizing error metric: 1-r2
-[flaml.automl: 09-29 23:06:46] {1533} INFO - List of ML learners in AutoML Run: ['xgboost']
-[flaml.automl: 09-29 23:06:46] {1763} INFO - iteration 0, current learner xgboost
-[flaml.automl: 09-29 23:06:47] {1880} INFO - Estimated sufficient time budget=2621s. Estimated necessary time budget=3s.
-[flaml.automl: 09-29 23:06:47] {1952} INFO -  at 0.3s,	estimator xgboost's best error=2.1267,	best estimator xgboost's best error=2.1267
-[flaml.automl: 09-29 23:06:47] {1763} INFO - iteration 1, current learner xgboost
-[flaml.automl: 09-29 23:06:47] {1952} INFO -  at 0.5s,	estimator xgboost's best error=2.1267,	best estimator xgboost's best error=2.1267
-[flaml.automl: 09-29 23:06:47] {1763} INFO - iteration 2, current learner xgboost
-[flaml.automl: 09-29 23:06:47] {1952} INFO -  at 0.6s,	estimator xgboost's best error=0.8485,	best estimator xgboost's best error=0.8485
-[flaml.automl: 09-29 23:06:47] {1763} INFO - iteration 3, current learner xgboost
-[flaml.automl: 09-29 23:06:47] {1952} INFO -  at 0.8s,	estimator xgboost's best error=0.3799,	best estimator xgboost's best error=0.3799
-[flaml.automl: 09-29 23:06:47] {1763} INFO - iteration 4, current learner xgboost
-[flaml.automl: 09-29 23:06:47] {1952} INFO -  at 1.0s,	estimator xgboost's best error=0.3799,	best estimator xgboost's best error=0.3799
-[flaml.automl: 09-29 23:06:47] {1763} INFO - iteration 5, current learner xgboost
-[flaml.automl: 09-29 23:06:47] {1952} INFO -  at 1.2s,	estimator xgboost's best error=0.3799,	best estimator xgboost's best error=0.3799
-[flaml.automl: 09-29 23:06:47] {1763} INFO - iteration 6, current learner xgboost
-[flaml.automl: 09-29 23:06:48] {1952} INFO -  at 1.5s,	estimator xgboost's best error=0.2992,	best estimator xgboost's best error=0.2992
-[flaml.automl: 09-29 23:06:48] {1763} INFO - iteration 7, current learner xgboost
-[flaml.automl: 09-29 23:06:48] {1952} INFO -  at 1.9s,	estimator xgboost's best error=0.2992,	best estimator xgboost's best error=0.2992
-[flaml.automl: 09-29 23:06:48] {1763} INFO - iteration 8, current learner xgboost
-[flaml.automl: 09-29 23:06:49] {1952} INFO -  at 2.2s,	estimator xgboost's best error=0.2992,	best estimator xgboost's best error=0.2992
-[flaml.automl: 09-29 23:06:49] {1763} INFO - iteration 9, current learner xgboost
-[flaml.automl: 09-29 23:06:49] {1952} INFO -  at 2.5s,	estimator xgboost's best error=0.2513,	best estimator xgboost's best error=0.2513
-[flaml.automl: 09-29 23:06:49] {1763} INFO - iteration 10, current learner xgboost
-[flaml.automl: 09-29 23:06:49] {1952} INFO -  at 2.8s,	estimator xgboost's best error=0.2513,	best estimator xgboost's best error=0.2513
-[flaml.automl: 09-29 23:06:49] {1763} INFO - iteration 11, current learner xgboost
-[flaml.automl: 09-29 23:06:49] {1952} INFO -  at 3.0s,	estimator xgboost's best error=0.2513,	best estimator xgboost's best error=0.2513
-[flaml.automl: 09-29 23:06:49] {1763} INFO - iteration 12, current learner xgboost
-[flaml.automl: 09-29 23:06:50] {1952} INFO -  at 3.3s,	estimator xgboost's best error=0.2113,	best estimator xgboost's best error=0.2113
-[flaml.automl: 09-29 23:06:50] {1763} INFO - iteration 13, current learner xgboost
-[flaml.automl: 09-29 23:06:50] {1952} INFO -  at 3.5s,	estimator xgboost's best error=0.2113,	best estimator xgboost's best error=0.2113
-[flaml.automl: 09-29 23:06:50] {1763} INFO - iteration 14, current learner xgboost
-[flaml.automl: 09-29 23:06:50] {1952} INFO -  at 4.0s,	estimator xgboost's best error=0.2090,	best estimator xgboost's best error=0.2090
-[flaml.automl: 09-29 23:06:50] {1763} INFO - iteration 15, current learner xgboost
-[flaml.automl: 09-29 23:06:51] {1952} INFO -  at 4.5s,	estimator xgboost's best error=0.2090,	best estimator xgboost's best error=0.2090
-[flaml.automl: 09-29 23:06:51] {1763} INFO - iteration 16, current learner xgboost
-[flaml.automl: 09-29 23:06:51] {1952} INFO -  at 5.2s,	estimator xgboost's best error=0.1919,	best estimator xgboost's best error=0.1919
-[flaml.automl: 09-29 23:06:51] {1763} INFO - iteration 17, current learner xgboost
-[flaml.automl: 09-29 23:06:52] {1952} INFO -  at 5.5s,	estimator xgboost's best error=0.1919,	best estimator xgboost's best error=0.1919
-[flaml.automl: 09-29 23:06:52] {1763} INFO - iteration 18, current learner xgboost
-[flaml.automl: 09-29 23:06:54] {1952} INFO -  at 8.0s,	estimator xgboost's best error=0.1797,	best estimator xgboost's best error=0.1797
-[flaml.automl: 09-29 23:06:54] {1763} INFO - iteration 19, current learner xgboost
-[flaml.automl: 09-29 23:06:55] {1952} INFO -  at 9.0s,	estimator xgboost's best error=0.1797,	best estimator xgboost's best error=0.1797
-[flaml.automl: 09-29 23:06:55] {1763} INFO - iteration 20, current learner xgboost
-[flaml.automl: 09-29 23:07:08] {1952} INFO -  at 21.8s,	estimator xgboost's best error=0.1797,	best estimator xgboost's best error=0.1797
-[flaml.automl: 09-29 23:07:08] {1763} INFO - iteration 21, current learner xgboost
-[flaml.automl: 09-29 23:07:11] {1952} INFO -  at 24.4s,	estimator xgboost's best error=0.1797,	best estimator xgboost's best error=0.1797
-[flaml.automl: 09-29 23:07:11] {1763} INFO - iteration 22, current learner xgboost
-[flaml.automl: 09-29 23:07:16] {1952} INFO -  at 30.0s,	estimator xgboost's best error=0.1782,	best estimator xgboost's best error=0.1782
-[flaml.automl: 09-29 23:07:16] {1763} INFO - iteration 23, current learner xgboost
-[flaml.automl: 09-29 23:07:20] {1952} INFO -  at 33.5s,	estimator xgboost's best error=0.1782,	best estimator xgboost's best error=0.1782
-[flaml.automl: 09-29 23:07:20] {1763} INFO - iteration 24, current learner xgboost
-[flaml.automl: 09-29 23:07:29] {1952} INFO -  at 42.3s,	estimator xgboost's best error=0.1782,	best estimator xgboost's best error=0.1782
-[flaml.automl: 09-29 23:07:29] {1763} INFO - iteration 25, current learner xgboost
-[flaml.automl: 09-29 23:07:30] {1952} INFO -  at 43.2s,	estimator xgboost's best error=0.1782,	best estimator xgboost's best error=0.1782
-[flaml.automl: 09-29 23:07:30] {1763} INFO - iteration 26, current learner xgboost
-[flaml.automl: 09-29 23:07:50] {1952} INFO -  at 63.4s,	estimator xgboost's best error=0.1663,	best estimator xgboost's best error=0.1663
-[flaml.automl: 09-29 23:07:50] {2059} INFO - selected model: <xgboost.core.Booster object at 0x7f6399005910>
-[flaml.automl: 09-29 23:07:55] {2122} INFO - retrain xgboost for 5.4s
-[flaml.automl: 09-29 23:07:55] {2128} INFO - retrained model: <xgboost.core.Booster object at 0x7f6398fc0eb0>
-[flaml.automl: 09-29 23:07:55] {1557} INFO - fit succeeded
-[flaml.automl: 09-29 23:07:55] {1558} INFO - Time taken to find the best model: 63.427649974823
-[flaml.automl: 09-29 23:07:55] {1569} WARNING - Time taken to find the best model is 106% of the provided time budget and not all estimators' hyperparameter search converged. Consider increasing the time budget.
-```
-
-#### Retrieve best config
-
-```python
-print('Best hyperparmeter config:', automl.best_config)
-print('Best r2 on validation data: {0:.4g}'.format(1-automl.best_loss))
-print('Training duration of best run: {0:.4g} s'.format(automl.best_config_train_time))
-print(automl.model.estimator)
-# Best hyperparmeter config: {'n_estimators': 473, 'max_leaves': 35, 'max_depth': 0, 'min_child_weight': 0.001, 'learning_rate': 0.26865031351923346, 'subsample': 0.9718245679598786, 'colsample_bylevel': 0.7421362469066445, 'colsample_bytree': 1.0, 'reg_alpha': 0.06824336834995245, 'reg_lambda': 250.9654222583276}
-# Best r2 on validation data: 0.8384
-# Training duration of best run: 2.194 s
-# XGBRegressor(base_score=0.5, booster='gbtree',
-#              colsample_bylevel=0.7421362469066445, colsample_bynode=1,
-#              colsample_bytree=1.0, gamma=0, gpu_id=-1, grow_policy='lossguide',
-#              importance_type='gain', interaction_constraints='',
-#              learning_rate=0.26865031351923346, max_delta_step=0, max_depth=0,
-#              max_leaves=35, min_child_weight=0.001, missing=nan,
-#              monotone_constraints='()', n_estimators=473, n_jobs=-1,
-#              num_parallel_tree=1, random_state=0, reg_alpha=0.06824336834995245,
-#              reg_lambda=250.9654222583276, scale_pos_weight=1,
-#              subsample=0.9718245679598786, tree_method='hist',
-#              use_label_encoder=False, validate_parameters=1, verbosity=0)
-```
-
-#### Plot feature importance
-
-```python
-import matplotlib.pyplot as plt
-
-plt.barh(automl.feature_names_in_, automl.feature_importances_)
-```
-![png](images/xgb_feature_importance.png)
-
-#### Compute predictions of testing dataset
-
-```python
-y_pred = automl.predict(X_test)
-print('Predicted labels', y_pred)
-# Predicted labels [139062.95 237622.   140522.03 ... 182125.5  252156.36 264884.5 ]
-```
-
-#### Compute different metric values on testing dataset
-
-```python
-from flaml.automl.ml import sklearn_metric_loss_score
-
-print('r2', '=', 1 - sklearn_metric_loss_score('r2', y_pred, y_test))
-print('mse', '=', sklearn_metric_loss_score('mse', y_pred, y_test))
-print('mae', '=', sklearn_metric_loss_score('mae', y_pred, y_test))
-# r2 = 0.8456494234135888
-# mse = 2040284106.2781258
-# mae = 30212.830996680445
-```
-
-#### Compare with untuned XGBoost
-
-```python
-from xgboost import XGBRegressor
-
-xgb = XGBRegressor()
-xgb.fit(X_train, y_train)
-y_pred = xgb.predict(X_test)
-from flaml.automl.ml import sklearn_metric_loss_score
-
-print('default xgboost r2', '=', 1 - sklearn_metric_loss_score('r2', y_pred, y_test))
-# default xgboost r2 = 0.8265451174596482
-```
-
-#### Plot learning curve
-
-How does the model accuracy improve as we search for different hyperparameter configurations?
-
-```python
-from flaml.automl.data import get_output_from_log
-import numpy as np
-
-time_history, best_valid_loss_history, valid_loss_history, config_history, metric_history =
-    get_output_from_log(filename=settings['log_file_name'], time_budget=60)
-plt.title('Learning Curve')
-plt.xlabel('Wall Clock Time (s)')
-plt.ylabel('Validation r2')
-plt.step(time_history, 1 - np.array(best_valid_loss_history), where='post')
-plt.show()
-```
-![png](images/xgb_curve.png)
-
-### Use a customized XGBoost learner
-
-You can easily enable a custom objective function by adding a customized XGBoost learner (inherit XGBoostEstimator or XGBoostSklearnEstimator) in FLAML. In the following example, we show how to add such a customized XGBoost learner with a custom objective function.
-
-```python
-import numpy as np
-
-
-# define your customized objective function
-def logregobj(preds, dtrain):
-    labels = dtrain.get_label()
-    preds = 1.0 / (1.0 + np.exp(-preds))  # transform raw leaf weight
-    grad = preds - labels
-    hess = preds * (1.0 - preds)
-    return grad, hess
-
-
-from flaml.automl.model import XGBoostEstimator
-
-
-class MyXGB1(XGBoostEstimator):
-    '''XGBoostEstimator with the logregobj function as the objective function
-    '''
-
-    def __init__(self, **config):
-        super().__init__(objective=logregobj, **config)
-
-
-class MyXGB2(XGBoostEstimator):
-    '''XGBoostEstimator with 'reg:squarederror' as the objective function
-    '''
-
-    def __init__(self, **config):
-        super().__init__(objective='reg:gamma', **config)
-```
-
-#### Add the customized learners and tune them
-
-```python
-automl = AutoML()
-automl.add_learner(learner_name='my_xgb1', learner_class=MyXGB1)
-automl.add_learner(learner_name='my_xgb2', learner_class=MyXGB2)
-settings["estimator_list"] = ['my_xgb1', 'my_xgb2']  # change the estimator list
-automl.fit(X_train=X_train, y_train=y_train, **settings)
-```
-
-[Link to notebook](https://github.com/microsoft/FLAML/blob/main/notebook/automl_xgboost.ipynb) | [Open in colab](https://colab.research.google.com/github/microsoft/FLAML/blob/main/notebook/automl_xgboost.ipynb)
--- a/website/docs/Examples/Default-Flamlized.md
+++ b/website/docs/Examples/Default-Flamlized.md
@@ -1,109 +0,0 @@
-# Default - Flamlized Estimator
-
-Flamlized estimators automatically use data-dependent default hyperparameter configurations for each estimator, offering a unique zero-shot AutoML capability, or "no tuning" AutoML.
-
-## Flamlized LGBMRegressor
-
-### Prerequisites
-
-This example requires the [autozero] option.
-
-```bash
-pip install flaml[autozero] lightgbm openml
-```
-
-### Zero-shot AutoML
-
-```python
-from flaml.automl.data import load_openml_dataset
-from flaml.default import LGBMRegressor
-from flaml.automl.ml import sklearn_metric_loss_score
-
-X_train, X_test, y_train, y_test = load_openml_dataset(dataset_id=537, data_dir="./")
-lgbm = LGBMRegressor()
-lgbm.fit(X_train, y_train)
-y_pred = lgbm.predict(X_test)
-print("flamlized lgbm r2", "=", 1 - sklearn_metric_loss_score("r2", y_pred, y_test))
-print(lgbm)
-```
-
-#### Sample output
-
-```
-load dataset from ./openml_ds537.pkl
-Dataset name: houses
-X_train.shape: (15480, 8), y_train.shape: (15480,);
-X_test.shape: (5160, 8), y_test.shape: (5160,)
-flamlized lgbm r2 = 0.8537444671194614
-LGBMRegressor(colsample_bytree=0.7019911744574896,
-              learning_rate=0.022635758411078528, max_bin=511,
-              min_child_samples=2, n_estimators=4797, num_leaves=122,
-              reg_alpha=0.004252223402511765, reg_lambda=0.11288241427227624,
-              verbose=-1)
-```
-
-### Suggest hyperparameters without training
-
-```
-from flaml.data import load_openml_dataset
-from flaml.default import LGBMRegressor
-from flaml.ml import sklearn_metric_loss_score
-
-X_train, X_test, y_train, y_test = load_openml_dataset(dataset_id=537, data_dir="./")
-lgbm = LGBMRegressor()
-hyperparams, estimator_name, X_transformed, y_transformed = lgbm.suggest_hyperparams(X_train, y_train)
-print(hyperparams)
-```
-
-#### Sample output
-```
-load dataset from ./openml_ds537.pkl
-Dataset name: houses
-X_train.shape: (15480, 8), y_train.shape: (15480,);
-X_test.shape: (5160, 8), y_test.shape: (5160,)
-{'n_estimators': 4797, 'num_leaves': 122, 'min_child_samples': 2, 'learning_rate': 0.022635758411078528, 'colsample_bytree': 0.7019911744574896, 'reg_alpha': 0.004252223402511765, 'reg_lambda': 0.11288241427227624, 'max_bin': 511, 'verbose': -1}
-```
-
-[Link to notebook](https://github.com/microsoft/FLAML/blob/main/notebook/zeroshot_lightgbm.ipynb) | [Open in colab](https://colab.research.google.com/github/microsoft/FLAML/blob/main/notebook/zeroshot_lightgbm.ipynb)
-
-## Flamlized XGBClassifier
-
-### Prerequisites
-
-This example requires xgboost, sklearn, openml==0.10.2.
-
-### Zero-shot AutoML
-
-```python
-from flaml.automl.data import load_openml_dataset
-from flaml.default import XGBClassifier
-from flaml.automl.ml import sklearn_metric_loss_score
-
-X_train, X_test, y_train, y_test = load_openml_dataset(dataset_id=1169, data_dir="./")
-xgb = XGBClassifier()
-xgb.fit(X_train, y_train)
-y_pred = xgb.predict(X_test)
-print("flamlized xgb accuracy", "=", 1 - sklearn_metric_loss_score("accuracy", y_pred, y_test))
-print(xgb)
-```
-
-#### Sample output
-
-```
-load dataset from ./openml_ds1169.pkl
-Dataset name: airlines
-X_train.shape: (404537, 7), y_train.shape: (404537,);
-X_test.shape: (134846, 7), y_test.shape: (134846,)
-flamlized xgb accuracy = 0.6729009388487608
-XGBClassifier(base_score=0.5, booster='gbtree',
-              colsample_bylevel=0.4601573737792679, colsample_bynode=1,
-              colsample_bytree=1.0, gamma=0, gpu_id=-1, grow_policy='lossguide',
-              importance_type='gain', interaction_constraints='',
-              learning_rate=0.04039771837785377, max_delta_step=0, max_depth=0,
-              max_leaves=159, min_child_weight=0.3396294979905001, missing=nan,
-              monotone_constraints='()', n_estimators=540, n_jobs=4,
-              num_parallel_tree=1, random_state=0,
-              reg_alpha=0.0012362430984376035, reg_lambda=3.093428791531145,
-              scale_pos_weight=1, subsample=1.0, tree_method='hist',
-              use_label_encoder=False, validate_parameters=1, verbosity=0)
-```
--- a/website/docs/Examples/Integrate
+++ b/website/docs/Examples/Integrate
@@ -1,168 +0,0 @@
-FLAML can be used together with AzureML. On top of that, using mlflow and ray is easy too.
-
-### Prerequisites
-
-Install the [automl,azureml] option.
-```bash
-pip install "flaml[automl,azureml]"
-```
-
-Setup a AzureML workspace:
-```python
-from azureml.core import Workspace
-
-ws = Workspace.create(name='myworkspace', subscription_id='<azure-subscription-id>', resource_group='myresourcegroup')
-```
-
-### Enable mlflow in AzureML workspace
-
-```python
-import mlflow
-from azureml.core import Workspace
-
-ws = Workspace.from_config()
-mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri())
-```
-
-### Start an AutoML run
-
-```python
-from flaml.automl.data import load_openml_dataset
-from flaml import AutoML
-
-# Download [Airlines dataset](https://www.openml.org/d/1169) from OpenML. The task is to predict whether a given flight will be delayed, given the information of the scheduled departure.
-X_train, X_test, y_train, y_test = load_openml_dataset(dataset_id=1169, data_dir="./")
-
-automl = AutoML()
-settings = {
-    "time_budget": 60,  # total running time in seconds
-    "metric": "accuracy",  # metric to optimize
-    "task": "classification",  # task type
-    "log_file_name": "airlines_experiment.log",  # flaml log file
-}
-experiment = mlflow.set_experiment("flaml")  # the experiment name in AzureML workspace
-with mlflow.start_run() as run:  # create a mlflow run
-    automl.fit(X_train=X_train, y_train=y_train, **settings)
-    mlflow.sklearn.log_model(automl, "automl")
-```
-
-The metrics in the run will be automatically logged in an experiment named "flaml" in your AzureML workspace. They can be retrieved by `mlflow.search_runs`:
-
-```python
-mlflow.search_runs(experiment_ids=[experiment.experiment_id], filter_string="params.learner = 'xgboost'")
-```
-
-The logged model can be loaded and used to make predictions:
-```python
-automl = mlflow.sklearn.load_model(f"{run.info.artifact_uri}/automl")
-print(automl.predict(X_test))
-```
-
-[Link to notebook](https://github.com/microsoft/FLAML/blob/main/notebook/integrate_azureml.ipynb) | [Open in colab](https://colab.research.google.com/github/microsoft/FLAML/blob/main/notebook/integrate_azureml.ipynb)
-
-### Use ray to distribute across a cluster
-
-When you have a compute cluster in AzureML, you can distribute `flaml.AutoML` or `flaml.tune` with ray.
-
-#### Build a ray environment in AzureML
-
-Create a docker file such as [.Docker/Dockerfile-cpu](https://github.com/microsoft/FLAML/blob/main/test/.Docker/Dockerfile-cpu). Make sure `RUN pip install flaml[blendsearch,ray]` is included in the docker file.
-
-Then build a AzureML environment in the workspace `ws`.
-
-```python
-ray_environment_name = "aml-ray-cpu"
-ray_environment_dockerfile_path = "./Docker/Dockerfile-cpu"
-
-# Build CPU image for Ray
-ray_cpu_env = Environment.from_dockerfile(name=ray_environment_name, dockerfile=ray_environment_dockerfile_path)
-ray_cpu_env.register(workspace=ws)
-ray_cpu_build_details = ray_cpu_env.build(workspace=ws)
-
-import time
-while ray_cpu_build_details.status not in ["Succeeded", "Failed"]:
-    print(f"Awaiting completion of ray CPU environment build. Current status is: {ray_cpu_build_details.status}")
-    time.sleep(10)
-```
-
-You only need to do this step once for one workspace.
-
-#### Create a compute cluster with multiple nodes
-
-```python
-from azureml.core.compute import AmlCompute, ComputeTarget
-
-compute_target_name = "cpucluster"
-node_count = 2
-
-# This example uses CPU VM. For using GPU VM, set SKU to STANDARD_NC6
-compute_target_size = "STANDARD_D2_V2"
-
-if compute_target_name in ws.compute_targets:
-    compute_target = ws.compute_targets[compute_target_name]
-    if compute_target and type(compute_target) is AmlCompute:
-        if compute_target.provisioning_state == "Succeeded":
-            print("Found compute target; using it:", compute_target_name)
-        else:
-            raise Exception(
-                "Found compute target but it is in state", compute_target.provisioning_state)
-else:
-    print("creating a new compute target...")
-    provisioning_config = AmlCompute.provisioning_configuration(
-        vm_size=compute_target_size,
-        min_nodes=0,
-        max_nodes=node_count)
-
-    # Create the cluster
-    compute_target = ComputeTarget.create(ws, compute_target_name, provisioning_config)
-
-    # Can poll for a minimum number of nodes and for a specific timeout.
-    # If no min node count is provided it will use the scale settings for the cluster
-    compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)
-
-    # For a more detailed view of current AmlCompute status, use get_status()
-    print(compute_target.get_status().serialize())
-```
-
-If the computer target "cpucluster" already exists, it will not be recreated.
-
-#### Run distributed AutoML job
-
-Assuming you have an automl script like [ray/distribute_automl.py](https://github.com/microsoft/FLAML/blob/main/test/ray/distribute_automl.py). It uses `n_concurrent_trials=k` to inform `AutoML.fit()` to perform k concurrent trials in parallel.
-
-Submit an AzureML job as the following:
-
-```python
-from azureml.core import Workspace, Experiment, ScriptRunConfig, Environment
-from azureml.core.runconfig import RunConfiguration, DockerConfiguration
-
-command = ["python distribute_automl.py"]
-ray_environment_name = "aml-ray-cpu"
-env = Environment.get(workspace=ws, name=ray_environment_name)
-aml_run_config = RunConfiguration(communicator="OpenMpi")
-aml_run_config.target = compute_target
-aml_run_config.docker = DockerConfiguration(use_docker=True)
-aml_run_config.environment = env
-aml_run_config.node_count = 2
-config = ScriptRunConfig(
-    source_directory="ray/",
-    command=command,
-    run_config=aml_run_config,
-)
-
-exp = Experiment(ws, "distribute-automl")
-run = exp.submit(config)
-
-print(run.get_portal_url())  # link to ml.azure.com
-run.wait_for_completion(show_output=True)
-```
-
-#### Run distributed tune job
-
-Prepare a script like [ray/distribute_tune.py](https://github.com/microsoft/FLAML/blob/main/test/ray/distribute_tune.py). Replace the command in the above eample with:
-
-```python
-command = ["python distribute_tune.py"]
-```
-
-Everything else is the same.
--- a/website/docs/Examples/Integrate
+++ b/website/docs/Examples/Integrate
@@ -1,72 +0,0 @@
-As FLAML's AutoML module can be used a transformer in the Sklearn's pipeline we can get all the benefits of pipeline.
-
-### Prerequisites
-
-Install the [automl] option.
-```bash
-pip install "flaml[automl] openml"
-```
-
-### Load data
-
-```python
-from flaml.automl.data import load_openml_dataset
-
-# Download [Airlines dataset](https://www.openml.org/d/1169) from OpenML. The task is to predict whether a given flight will be delayed, given the information of the scheduled departure.
-X_train, X_test, y_train, y_test = load_openml_dataset(
-    dataset_id=1169, data_dir='./', random_state=1234, dataset_format='array')
-```
-
-### Create a pipeline
-
-```python
-from sklearn import set_config
-from sklearn.pipeline import Pipeline
-from sklearn.impute import SimpleImputer
-from sklearn.preprocessing import StandardScaler
-from flaml import AutoML
-
-set_config(display='diagram')
-
-imputer = SimpleImputer()
-standardizer = StandardScaler()
-automl = AutoML()
-
-automl_pipeline = Pipeline([
-    ("imputuer",imputer),
-    ("standardizer", standardizer),
-    ("automl", automl)
-])
-automl_pipeline
-```
-
-![png](images/pipeline.png)
-
-### Run AutoML in the pipeline
-
-```python
-automl_settings = {
-    "time_budget": 60,  # total running time in seconds
-    "metric": "accuracy",  # primary metrics can be chosen from: ['accuracy', 'roc_auc', 'roc_auc_weighted', 'roc_auc_ovr', 'roc_auc_ovo', 'f1', 'log_loss', 'mae', 'mse', 'r2'] Check the documentation for more details (https://microsoft.github.io/FLAML/docs/Use-Cases/Task-Oriented-AutoML#optimization-metric)
-    "task": "classification",  # task type
-    "estimator_list": ["xgboost", "catboost", "lgbm"],
-    "log_file_name": "airlines_experiment.log",  # flaml log file
-}
-pipeline_settings = {
-    f"automl__{key}": value for key, value in automl_settings.items()
-}
-automl_pipeline.fit(X_train, y_train, **pipeline_settings)
-```
-
-### Get the automl object from the pipeline
-
-```python
-automl = automl_pipeline.steps[2][1]
-# Get the best config and best learner
-print('Best ML leaner:', automl.best_estimator)
-print('Best hyperparmeter config:', automl.best_config)
-print('Best accuracy on validation data: {0:.4g}'.format(1 - automl.best_loss))
-print('Training duration of best run: {0:.4g} s'.format(automl.best_config_train_time))
-```
-
-[Link to notebook](https://github.com/microsoft/FLAML/blob/main/notebook/integrate_sklearn.ipynb) | [Open in colab](https://colab.research.google.com/github/microsoft/FLAML/blob/main/notebook/integrate_sklearn.ipynb)
--- a/website/docs/Examples/Integrate
+++ b/website/docs/Examples/Integrate
@@ -1,118 +0,0 @@
-# Integrate - Spark
-
-FLAML has integrated Spark for distributed training. There are two main aspects of integration with Spark:
- Use Spark ML estimators for AutoML.
- Use Spark to run training in parallel spark jobs.
-
-## Spark ML Estimators
-
-FLAML integrates estimators based on Spark ML models. These models are trained in parallel using Spark, so we called them Spark estimators. To use these models, you first need to organize your data in the required format.
-
-### Data
-
-For Spark estimators, AutoML only consumes Spark data. FLAML provides a convenient function `to_pandas_on_spark` in the `flaml.automl.spark.utils` module to convert your data into a pandas-on-spark (`pyspark.pandas`) dataframe/series, which Spark estimators require.
-
-This utility function takes data in the form of a `pandas.Dataframe` or `pyspark.sql.Dataframe` and converts it into a pandas-on-spark dataframe. It also takes `pandas.Series` or `pyspark.sql.Dataframe` and converts it into a [pandas-on-spark](https://spark.apache.org/docs/latest/api/python/user_guide/pandas_on_spark/index.html) series. If you pass in a `pyspark.pandas.Dataframe`, it will not make any changes.
-
-This function also accepts optional arguments `index_col` and `default_index_type`.
- `index_col` is the column name to use as the index, default is None.
- `default_index_type` is the default index type, default is "distributed-sequence". More info about default index type could be found on Spark official [documentation](https://spark.apache.org/docs/latest/api/python/user_guide/pandas_on_spark/options.html#default-index-type)
-
-Here is an example code snippet for Spark Data:
-
-```python
-import pandas as pd
-from flaml.automl.spark.utils import to_pandas_on_spark
-# Creating a dictionary
-data = {"Square_Feet": [800, 1200, 1800, 1500, 850],
-      "Age_Years": [20, 15, 10, 7, 25],
-      "Price": [100000, 200000, 300000, 240000, 120000]}
-
-# Creating a pandas DataFrame
-dataframe = pd.DataFrame(data)
-label = "Price"
-
-# Convert to pandas-on-spark dataframe
-psdf = to_pandas_on_spark(dataframe)
-```
-
-To use Spark ML models you need to format your data appropriately. Specifically, use [`VectorAssembler`](https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.ml.feature.VectorAssembler.html) to merge all feature columns into a single vector column.
-
-Here is an example of how to use it:
-```python
-from pyspark.ml.feature import VectorAssembler
-columns = psdf.columns
-feature_cols = [col for col in columns if col != label]
-featurizer = VectorAssembler(inputCols=feature_cols, outputCol="features")
-psdf = featurizer.transform(psdf.to_spark(index_col="index"))["index", "features"]
-```
-
-Later in conducting the experiment, use your pandas-on-spark data like non-spark data and pass them using `X_train, y_train` or `dataframe, label`.
-
-### Estimators
-#### Model List
- `lgbm_spark`: The class for fine-tuning Spark version LightGBM models, using [SynapseML](https://microsoft.github.io/SynapseML/docs/features/lightgbm/about/) API.
-
-#### Usage
-First, prepare your data in the required format as described in the previous section.
-
-By including the models you intend to try in the `estimators_list` argument to `flaml.automl`, FLAML will start trying configurations for these models. If your input is Spark data, FLAML will also use estimators with the `_spark` postfix by default, even if you haven't specified them.
-
-Here is an example code snippet using SparkML models in AutoML:
-
-```python
-import flaml
-# prepare your data in pandas-on-spark format as we previously mentioned
-
-automl = flaml.AutoML()
-settings = {
-    "time_budget": 30,
-    "metric": "r2",
-    "estimator_list": ["lgbm_spark"],  # this setting is optional
-    "task": "regression",
-}
-
-automl.fit(
-    dataframe=psdf,
-    label=label,
-    **settings,
-)
-```
-
-
-[Link to notebook](https://github.com/microsoft/FLAML/blob/main/notebook/automl_bankrupt_synapseml.ipynb) | [Open in colab](https://colab.research.google.com/github/microsoft/FLAML/blob/main/notebook/automl_bankrupt_synapseml.ipynb)
-
-## Parallel Spark Jobs
-You can activate Spark as the parallel backend during parallel tuning in both [AutoML](/docs/Use-Cases/Task-Oriented-AutoML#parallel-tuning) and [Hyperparameter Tuning](/docs/Use-Cases/Tune-User-Defined-Function#parallel-tuning), by setting the `use_spark` to `true`. FLAML will dispatch your job to the distributed Spark backend using [`joblib-spark`](https://github.com/joblib/joblib-spark).
-
-Please note that you should not set `use_spark` to `true` when applying AutoML and Tuning for Spark Data. This is because only SparkML models will be used for Spark Data in AutoML and Tuning. As SparkML models run in parallel, there is no need to distribute them with `use_spark` again.
-
-All the Spark-related arguments are stated below. These arguments are available in both Hyperparameter Tuning and AutoML:
-
-
- `use_spark`: boolean, default=False | Whether to use spark to run the training in parallel spark jobs. This can be used to accelerate training on large models and large datasets, but will incur more overhead in time and thus slow down training in some cases. GPU training is not supported yet when use_spark is True. For Spark clusters, by default, we will launch one trial per executor. However, sometimes we want to launch more trials than the number of executors (e.g., local mode). In this case, we can set the environment variable `FLAML_MAX_CONCURRENT` to override the detected `num_executors`. The final number of concurrent trials will be the minimum of `n_concurrent_trials` and `num_executors`.
- `n_concurrent_trials`: int, default=1 | The number of concurrent trials. When n_concurrent_trials > 1, FLAML performes parallel tuning.
- `force_cancel`: boolean, default=False | Whether to forcely cancel Spark jobs if the search time exceeded the time budget. Spark jobs include parallel tuning jobs and Spark-based model training jobs.
-
-An example code snippet for using parallel Spark jobs:
-```python
-import flaml
-automl_experiment = flaml.AutoML()
-automl_settings = {
-    "time_budget": 30,
-    "metric": "r2",
-    "task": "regression",
-    "n_concurrent_trials": 2,
-    "use_spark": True,
-    "force_cancel": True, # Activating the force_cancel option can immediately halt Spark jobs once they exceed the allocated time_budget.
-}
-
-automl.fit(
-    dataframe=dataframe,
-    label=label,
-    **automl_settings,
-)
-```
-
-
-[Link to notebook](https://github.com/microsoft/FLAML/blob/main/notebook/integrate_spark.ipynb) | [Open in colab](https://colab.research.google.com/github/microsoft/FLAML/blob/main/notebook/integrate_spark.ipynb)
--- a/website/docs/Examples/Tune-AzureML-pipeline.md
+++ b/website/docs/Examples/Tune-AzureML-pipeline.md
@@ -1,216 +0,0 @@
-# Tune - AzureML pipeline
-
-This example uses flaml to tune an Azure ML pipeline that fits a lightgbm classifier on the [sklearn breast cancer dataset](https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)).
-If you already have an Azure ML pipeline, you can use the approach to tune your pipeline with flaml.
-
-## Prepare for tuning
-
-### Requirements
-
-We recommend using conda or venv to create a virtual env to install the dependencies.
-
-```bash
-# set up new conda environment
-conda create -n pipeline_tune python=3.8 pip=20.2 -y
-conda activate pipeline_tune
-
-# install azureml packages for runnig AzureML pipelines
-pip install azureml-core==1.39.0
-pip install azure-ml-component[notebooks]==0.9.10.post1
-pip install azureml-dataset-runtime==1.39.0
-
-# install hydra-core for passing AzureML pipeline parameters
-pip install hydra-core==1.1.1
-
-# install flaml
-pip install flaml[blendsearch,ray]==1.0.9
-```
-
-### Azure ML training pipeline
-
-Before we are ready for tuning, we must first have an Azure ML pipeline.
-In this example, we use the following toy pipeline for illustration.
-The pipeline consists of two steps: (1) data preparation and (2) model training.
-
-![png](images/AzureML_train_pipeline.png).
-
-The [code example](https://github.com/microsoft/FLAML/tree/main/test/pipeline_tuning_example) discussed in the page is included in
-`test/pipeline_tuning_example/`.
-We will use the relative path in the rest of the page.
-
-### Data
-
-The example data exsits in `data/data.csv`.
-It will be uploaded to AzureML workspace to be consumed by the training pipeline
-using the following code.
-
-```python
-Dataset.File.upload_directory(
-    src_dir=to_absolute_path(LOCAL_DIR / "data"),
-    target=(datastore, "classification_data"),
-    overwrite=True,
-)
-
-dataset = Dataset.File.from_files(path=(datastore, 'classification_data'))
-```
-
-### Configurations for the pipeline
-
-The pipeline configuration is defined in
-`configs/train_config.yaml`.
-
-```yaml
-hydra:
-  searchpath:
-    - file://.
-
-aml_config:
-  workspace_name: your_workspace_name
-  resource_group: your_resource_group
-  subscription_id: your_subscription_id
-  cpu_target: cpucluster
-
-train_config:
-  exp_name: sklearn_breast_cancer_classification
-  test_train_ratio: 0.4
-  learning_rate: 0.05
-  n_estimators: 50
-```
-
-### Define and submit the pipeline
-
-The pipeline was defined in
-`submit_train_pipeline.py`.
-
-To submit the pipeline, please specify your AzureML resources
-in the `configs/train_config.yaml` and run
-
-```bash
-cd test/pipeline_tuning_example
-python submit_train_pipeline.py
-```
-
-To get the pipeline ready for HPO, in the training step,
-we need to log the metrics of interest to AzureML using
-
-```python
-run.log(f"{data_name}_{eval_name}", result)
-```
-
-## Hyperparameter Optimization
-
-We are now ready to set up the HPO job for the AzureML pipeline, including:
-
- config the HPO job,
- set up the interaction between the HPO job and the training job.
-
-These two steps are done in `tuner/tuner_func.py`.
-
-### Set up the tune job
-
-`tuner_func.tune_pipeline` sets up the search space, metric to optimize, mode, etc.
-
-```python
-def tune_pipeline(concurrent_run=1):
-    start_time = time.time()
-
-    # config the HPO job
-    search_space = {
-        "train_config.n_estimators": flaml.tune.randint(50, 200),
-        "train_config.learning_rate": flaml.tune.uniform(0.01, 0.5),
-    }
-
-    hp_metric = "eval_binary_error"
-    mode = "max"
-    num_samples = 2
-
-
-    if concurrent_run > 1:
-        import ray  # For parallel tuning
-
-        ray.init(num_cpus=concurrent_run)
-        use_ray = True
-    else:
-        use_ray = False
-
-    # launch the HPO job
-    analysis = flaml.tune.run(
-        run_with_config,
-        config=search_space,
-        metric=hp_metric,
-        mode=mode,
-        num_samples=num_samples,  # number of trials
-        use_ray=use_ray,
-    )
-
-    # get the best config
-    best_trial = analysis.get_best_trial(hp_metric, mode, "all")
-    metric = best_trial.metric_analysis[hp_metric][mode]
-    print(f"n_trials={len(analysis.trials)}")
-    print(f"time={time.time()-start_time}")
-    print(f"Best {hp_metric}: {metric:.4f}")
-    print(f"Best coonfiguration: {best_trial.config}")
-```
-
-### Interact with AzureML pipeline jobs
-
-The interaction between FLAML and AzureML pipeline jobs is in `tuner_func.run_with_config`.
-
-```python
-def run_with_config(config: dict):
-    """Run the pipeline with a given config dict
-    """
-
-    # pass the hyperparameters to AzureML jobs by overwriting the config file.
-    overrides = [f"{key}={value}" for key, value in config.items()]
-
-    print(overrides)
-    run = submit_train_pipeline.build_and_submit_aml_pipeline(overrides)
-
-    print(run.get_portal_url())
-
-    # retrieving the metrics to optimize before the job completes.
-    stop = False
-    while not stop:
-        # get status
-        status = run._core_run.get_status()
-        print(f'status: {status}')
-
-        # get metrics
-        metrics = run._core_run.get_metrics(recursive=True)
-        if metrics:
-            run_metrics = list(metrics.values())
-
-            new_metric = run_metrics[0]['eval_binary_error']
-
-            if type(new_metric) == list:
-                new_metric = new_metric[-1]
-
-            print(f'eval_binary_error: {new_metric}')
-
-            tune.report(eval_binary_error=new_metric)
-
-        time.sleep(5)
-
-        if status == 'FAILED' or status == 'Completed':
-            stop = True
-
-    print("The run is terminated.")
-    print(status)
-
-    return
-```
-
-Overall, to tune the hyperparameters of the AzureML pipeline, run:
-
-```bash
-# the training job will run remotely as an AzureML job in both choices
-# run the tuning job locally
-python submit_tune.py --local
-# run the tuning job remotely
-python submit_tune.py --remote --subscription_id <your subscription_id> --resource_group <your resource_group> --workspace <your workspace>
-```
-
-The local option runs the `tuner/tuner_func.py` in your local machine.
-The remote option wraps up the `tuner/tuner_func.py` as an AzureML component and
-starts another AzureML job to tune the AzureML pipeline.
--- a/website/docs/Examples/Tune-HuggingFace.md
+++ b/website/docs/Examples/Tune-HuggingFace.md
@@ -1,191 +0,0 @@
-# Tune - HuggingFace
-
-This example uses flaml to finetune a transformer model from Huggingface transformers library.
-
-*Note*: `flaml.AutoML` has built-in support for certain finetuning tasks with a
-[higher-level API](AutoML-NLP).
-It may be easier to use that API unless you have special requirements not handled by that API.
-
-### Requirements
-
-This example requires GPU. Install dependencies:
-```python
-pip install torch transformers datasets "flaml[blendsearch,ray]"
-```
-
-### Prepare for tuning
-
-#### Tokenizer
-
-```python
-from transformers import AutoTokenizer
-
-MODEL_NAME = "distilbert-base-uncased"
-tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, use_fast=True)
-COLUMN_NAME = "sentence"
-
-def tokenize(examples):
-    return tokenizer(examples[COLUMN_NAME], truncation=True)
-```
-
-#### Define training method
-
-```python
-import flaml
-import datasets
-from transformers import AutoModelForSequenceClassification
-
-TASK = "cola"
-NUM_LABELS = 2
-
-def train_distilbert(config: dict):
-    # Load CoLA dataset and apply tokenizer
-    cola_raw = datasets.load_dataset("glue", TASK)
-    cola_encoded = cola_raw.map(tokenize, batched=True)
-    train_dataset, eval_dataset = cola_encoded["train"], cola_encoded["validation"]
-
-    model = AutoModelForSequenceClassification.from_pretrained(
-        MODEL_NAME, num_labels=NUM_LABELS
-    )
-    metric = datasets.load_metric("glue", TASK)
-
-    def compute_metrics(eval_pred):
-        predictions, labels = eval_pred
-        predictions = np.argmax(predictions, axis=1)
-        return metric.compute(predictions=predictions, references=labels)
-
-    training_args = TrainingArguments(
-        output_dir='.',
-        do_eval=False,
-        disable_tqdm=True,
-        logging_steps=20000,
-        save_total_limit=0,
-        **config,
-    )
-
-    trainer = Trainer(
-        model,
-        training_args,
-        train_dataset=train_dataset,
-        eval_dataset=eval_dataset,
-        tokenizer=tokenizer,
-        compute_metrics=compute_metrics,
-    )
-
-    # train model
-    trainer.train()
-
-    # evaluate model
-    eval_output = trainer.evaluate()
-
-    # report the metric to optimize & the metric to log
-    flaml.tune.report(
-        loss=eval_output["eval_loss"],
-        matthews_correlation=eval_output["eval_matthews_correlation"],
-    )
-```
-
-### Define the search
-
-We are now ready to define our search. This includes:
-
- The `search_space` for our hyperparameters
- The `metric` and the `mode` ('max' or 'min') for optimization
- The constraints (`n_cpus`, `n_gpus`, `num_samples`, and `time_budget_s`)
-
-```python
-max_num_epoch = 64
-search_space = {
-        # You can mix constants with search space objects.
-        "num_train_epochs": flaml.tune.loguniform(1, max_num_epoch),
-        "learning_rate": flaml.tune.loguniform(1e-6, 1e-4),
-        "adam_epsilon": flaml.tune.loguniform(1e-9, 1e-7),
-        "adam_beta1": flaml.tune.uniform(0.8, 0.99),
-        "adam_beta2": flaml.tune.loguniform(98e-2, 9999e-4),
-}
-
-# optimization objective
-HP_METRIC, MODE = "matthews_correlation", "max"
-
-# resources
-num_cpus = 4
-num_gpus = 4  # change according to your GPU resources
-
-# constraints
-num_samples = -1  # number of trials, -1 means unlimited
-time_budget_s = 3600  # time budget in seconds
-```
-
-### Launch the tuning
-
-We are now ready to launch the tuning using `flaml.tune.run`:
-
-```python
-import ray
-
-ray.init(num_cpus=num_cpus, num_gpus=num_gpus)
-print("Tuning started...")
-analysis = flaml.tune.run(
-    train_distilbert,
-    search_alg=flaml.CFO(
-        space=search_space,
-        metric=HP_METRIC,
-        mode=MODE,
-        low_cost_partial_config={"num_train_epochs": 1}),
-    resources_per_trial={"gpu": num_gpus, "cpu": num_cpus},
-    local_dir='logs/',
-    num_samples=num_samples,
-    time_budget_s=time_budget_s,
-    use_ray=True,
-)
-```
-
-This will run tuning for one hour. At the end we will see a summary.
-```
-== Status ==
-Memory usage on this node: 32.0/251.6 GiB
-Using FIFO scheduling algorithm.
-Resources requested: 0/4 CPUs, 0/4 GPUs, 0.0/150.39 GiB heap, 0.0/47.22 GiB objects (0/1.0 accelerator_type:V100)
-Result logdir: /home/chiw/FLAML/notebook/logs/train_distilbert_2021-05-07_02-35-58
-Number of trials: 22/infinite (22 TERMINATED)
-Trial name	status	loc	adam_beta1	adam_beta2	adam_epsilon	learning_rate	num_train_epochs	iter	total time (s)	loss	matthews_correlation
-train_distilbert_a0c303d0	TERMINATED		0.939079	0.991865	7.96945e-08	5.61152e-06	1	1	55.6909	0.587986	0
-train_distilbert_a0c303d1	TERMINATED		0.811036	0.997214	2.05111e-09	2.05134e-06	1.44427	1	71.7663	0.603018	0
-train_distilbert_c39b2ef0	TERMINATED		0.909395	0.993715	1e-07	5.26543e-06	1	1	53.7619	0.586518	0
-train_distilbert_f00776e2	TERMINATED		0.968763	0.990019	4.38943e-08	5.98035e-06	1.02723	1	56.8382	0.581313	0
-train_distilbert_11ab3900	TERMINATED		0.962198	0.991838	7.09296e-08	5.06608e-06	1	1	54.0231	0.585576	0
-train_distilbert_353025b6	TERMINATED		0.91596	0.991892	8.95426e-08	6.21568e-06	2.15443	1	98.3233	0.531632	0.388893
-train_distilbert_5728a1de	TERMINATED		0.926933	0.993146	1e-07	1.00902e-05	1	1	55.3726	0.538505	0.280558
-train_distilbert_9394c2e2	TERMINATED		0.928106	0.990614	4.49975e-08	3.45674e-06	2.72935	1	121.388	0.539177	0.327295
-train_distilbert_b6543fec	TERMINATED		0.876896	0.992098	1e-07	7.01176e-06	1.59538	1	76.0244	0.527516	0.379177
-train_distilbert_0071f998	TERMINATED		0.955024	0.991687	7.39776e-08	5.50998e-06	2.90939	1	126.871	0.516225	0.417157
-train_distilbert_2f830be6	TERMINATED		0.886931	0.989628	7.6127e-08	4.37646e-06	1.53338	1	73.8934	0.551629	0.0655887
-train_distilbert_7ce03f12	TERMINATED		0.984053	0.993956	8.70144e-08	7.82557e-06	4.08775	1	174.027	0.523732	0.453549
-train_distilbert_aaab0508	TERMINATED		0.940707	0.993946	1e-07	8.91979e-06	3.40243	1	146.249	0.511288	0.45085
-train_distilbert_14262454	TERMINATED		0.99	0.991696	4.60093e-08	4.83405e-06	3.4954	1	152.008	0.53506	0.400851
-train_distilbert_6d211fe6	TERMINATED		0.959277	0.994556	5.40791e-08	1.17333e-05	6.64995	1	271.444	0.609851	0.526802
-train_distilbert_c980bae4	TERMINATED		0.99	0.993355	1e-07	5.21929e-06	2.51275	1	111.799	0.542276	0.324968
-train_distilbert_6d0d29d6	TERMINATED		0.965773	0.995182	9.9752e-08	1.15549e-05	13.694	1	527.944	0.923802	0.549474
-train_distilbert_b16ea82a	TERMINATED		0.952781	0.993931	2.93182e-08	1.19145e-05	3.2293	1	139.844	0.533466	0.451307
-train_distilbert_eddf7cc0	TERMINATED		0.99	0.997109	8.13498e-08	1.28515e-05	15.5807	1	614.789	0.983285	0.56993
-train_distilbert_43008974	TERMINATED		0.929089	0.993258	1e-07	1.03892e-05	12.0357	1	474.387	0.857461	0.520022
-train_distilbert_b3408a4e	TERMINATED		0.99	0.993809	4.67441e-08	1.10418e-05	11.9165	1	474.126	0.828205	0.526164
-train_distilbert_cfbfb220	TERMINATED		0.979454	0.9999	1e-07	1.49578e-05	20.3715
-```
-
-### Retrieve the results
-
-```python
-best_trial = analysis.get_best_trial(HP_METRIC, MODE, "all")
-metric = best_trial.metric_analysis[HP_METRIC][MODE]
-print(f"n_trials={len(analysis.trials)}")
-print(f"time={time.time()-start_time}")
-print(f"Best model eval {HP_METRIC}: {metric:.4f}")
-print(f"Best model parameters: {best_trial.config}")
-# n_trials=22
-# time=3999.769361972809
-# Best model eval matthews_correlation: 0.5699
-# Best model parameters: {'num_train_epochs': 15.580684188655825, 'learning_rate': 1.2851507818900338e-05, 'adam_epsilon': 8.134982521948352e-08, 'adam_beta1': 0.99, 'adam_beta2': 0.9971094424784387}
-```
-
-[Link to notebook](https://github.com/microsoft/FLAML/blob/main/notebook/tune_huggingface.ipynb) | [Open in colab](https://colab.research.google.com/github/microsoft/FLAML/blob/main/notebook/tune_huggingface.ipynb)
--- a/website/docs/Examples/Tune-Lexicographic-objectives.md
+++ b/website/docs/Examples/Tune-Lexicographic-objectives.md
@@ -1,171 +0,0 @@
-# Tune - Lexicographic Objectives
-
-## Requirements
-
-```python
-pip install "flaml>=1.1.0" thop torchvision torch
-```
-Tuning multiple objectives with Lexicographic preference is a new feature added in version 1.1.0 and is subject to change in future versions.
-
-## Tuning accurate and efficient neural networks with lexicographic preference
-
-### Data
-
-```python
-import torch
-import thop
-import torch.nn as nn
-from flaml import tune
-import torch.nn.functional as F
-import torchvision
-import numpy as np
-import os
-
-DEVICE = torch.device("cpu")
-BATCHSIZE = 128
-N_TRAIN_EXAMPLES = BATCHSIZE * 30
-N_VALID_EXAMPLES = BATCHSIZE * 10
-data_dir = os.path.abspath("data")
-
-train_dataset = torchvision.datasets.FashionMNIST(
-    data_dir,
-    train=True,
-    download=True,
-    transform=torchvision.transforms.ToTensor(),
-)
-
-train_loader = torch.utils.data.DataLoader(
-    torch.utils.data.Subset(train_dataset, list(range(N_TRAIN_EXAMPLES))),
-    batch_size=BATCHSIZE,
-    shuffle=True,
-)
-
-val_dataset = torchvision.datasets.FashionMNIST(
-    data_dir, train=False, transform=torchvision.transforms.ToTensor()
-)
-
-val_loader = torch.utils.data.DataLoader(
-    torch.utils.data.Subset(val_dataset, list(range(N_VALID_EXAMPLES))),
-    batch_size=BATCHSIZE,
-    shuffle=True,
-```
-
-### Specific the model
-
-```python
-def define_model(configuration):
-    n_layers = configuration["n_layers"]
-    layers = []
-    in_features = 28 * 28
-    for i in range(n_layers):
-        out_features = configuration["n_units_l{}".format(i)]
-        layers.append(nn.Linear(in_features, out_features))
-        layers.append(nn.ReLU())
-        p = configuration["dropout_{}".format(i)]
-        layers.append(nn.Dropout(p))
-        in_features = out_features
-    layers.append(nn.Linear(in_features, 10))
-    layers.append(nn.LogSoftmax(dim=1))
-    return nn.Sequential(*layers)
-```
-
-### Train
-
-```python
-def train_model(model, optimizer, train_loader):
-    model.train()
-    for batch_idx, (data, target) in enumerate(train_loader):
-        data, target = data.view(-1, 28 * 28).to(DEVICE), target.to(DEVICE)
-        optimizer.zero_grad()
-        F.nll_loss(model(data), target).backward()
-        optimizer.step()
-```
-
-### Metrics
-
-```python
-def eval_model(model, valid_loader):
-    model.eval()
-    correct = 0
-    with torch.no_grad():
-        for batch_idx, (data, target) in enumerate(valid_loader):
-            data, target = data.view(-1, 28 * 28).to(DEVICE), target.to(DEVICE)
-            pred = model(data).argmax(dim=1, keepdim=True)
-            correct += pred.eq(target.view_as(pred)).sum().item()
-
-    accuracy = correct / N_VALID_EXAMPLES
-    flops, params = thop.profile(
-        model, inputs=(torch.randn(1, 28 * 28).to(DEVICE),), verbose=False
-    )
-    return np.log2(flops), 1 - accuracy, params
-```
-
-
-
-### Evaluation function
-
-```python
-def evaluate_function(configuration):
-    model = define_model(configuration).to(DEVICE)
-    optimizer = torch.optim.Adam(model.parameters(), configuration["lr"])
-    n_epoch = configuration["n_epoch"]
-    for epoch in range(n_epoch):
-        train_model(model, optimizer, train_loader)
-    flops, error_rate, params = eval_model(model, val_loader)
-    return {"error_rate": error_rate, "flops": flops, "params": params}
-```
-
-### Search space
-```python
-search_space = {
-    "n_layers": tune.randint(lower=1, upper=3),
-    "n_units_l0": tune.randint(lower=4, upper=128),
-    "n_units_l1": tune.randint(lower=4, upper=128),
-    "n_units_l2": tune.randint(lower=4, upper=128),
-    "dropout_0": tune.uniform(lower=0.2, upper=0.5),
-    "dropout_1": tune.uniform(lower=0.2, upper=0.5),
-    "dropout_2": tune.uniform(lower=0.2, upper=0.5),
-    "lr": tune.loguniform(lower=1e-5, upper=1e-1),
-    "n_epoch": tune.randint(lower=1, upper=20),
-}
-```
-
-### Launch the tuning process
-
-```python
-
-# Low cost initial point
-low_cost_partial_config = {
-    "n_layers": 1,
-    "n_units_l0": 4,
-    "n_units_l1": 4,
-    "n_units_l2": 4,
-    "n_epoch": 1,
-}
-
-# Specific lexicographic preference
-lexico_objectives = {}
-lexico_objectives["metrics"] = ["error_rate", "flops"]
-lexico_objectives["tolerances"] = {"error_rate": 0.02, "flops": 0.0}
-lexico_objectives["targets"] = {"error_rate": 0.0, "flops": 0.0}
-lexico_objectives["modes"] = ["min", "min"]
-
-# launch the tuning process
-analysis = tune.run(
-    evaluate_function,
-    num_samples=-1,
-    time_budget_s=100,
-    config=search_space, # search space of NN
-    use_ray=False,
-    lexico_objectives=lexico_objectives,
-    low_cost_partial_config=low_cost_partial_config, # low cost initial point
-)
-```
-
-We also support providing percentage tolerance as shown below.
-
-```python
-lexico_objectives["tolerances"] = {"error_rate": "5%", "flops": "0%"}
-```
-
-[Link to notebook](https://github.com/microsoft/FLAML/blob/main/notebook/tune_lexicographic.ipynb) | [Open in colab](https://colab.research.google.com/github/microsoft/FLAML/blob/main/notebook/tune_lexicographic.ipynb)
--- a/website/docs/Examples/Tune-PyTorch.md
+++ b/website/docs/Examples/Tune-PyTorch.md
@@ -1,287 +0,0 @@
-# Tune - PyTorch
-
-This example uses flaml to tune a pytorch model on CIFAR10.
-
-## Prepare for tuning
-
-### Requirements
-```bash
-pip install torchvision "flaml[blendsearch,ray]"
-```
-
-Before we are ready for tuning, we first need to define the neural network that we would like to tune.
-
-### Network Specification
-
-```python
-import torch
-import torch.nn as nn
-import torch.nn.functional as F
-import torch.optim as optim
-from torch.utils.data import random_split
-import torchvision
-import torchvision.transforms as transforms
-
-
-class Net(nn.Module):
-
-    def __init__(self, l1=120, l2=84):
-        super(Net, self).__init__()
-        self.conv1 = nn.Conv2d(3, 6, 5)
-        self.pool = nn.MaxPool2d(2, 2)
-        self.conv2 = nn.Conv2d(6, 16, 5)
-        self.fc1 = nn.Linear(16 * 5 * 5, l1)
-        self.fc2 = nn.Linear(l1, l2)
-        self.fc3 = nn.Linear(l2, 10)
-
-    def forward(self, x):
-        x = self.pool(F.relu(self.conv1(x)))
-        x = self.pool(F.relu(self.conv2(x)))
-        x = x.view(-1, 16 * 5 * 5)
-        x = F.relu(self.fc1(x))
-        x = F.relu(self.fc2(x))
-        x = self.fc3(x)
-        return x
-```
-
-### Data
-
-```python
-def load_data(data_dir="data"):
-    transform = transforms.Compose([
-        transforms.ToTensor(),
-        transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
-    ])
-
-    trainset = torchvision.datasets.CIFAR10(
-        root=data_dir, train=True, download=True, transform=transform)
-
-    testset = torchvision.datasets.CIFAR10(
-        root=data_dir, train=False, download=True, transform=transform)
-
-    return trainset, testset
-```
-
-### Training
-
-```python
-from ray import tune
-
-def train_cifar(config, checkpoint_dir=None, data_dir=None):
-    if "l1" not in config:
-        logger.warning(config)
-    net = Net(2**config["l1"], 2**config["l2"])
-
-    device = "cpu"
-    if torch.cuda.is_available():
-        device = "cuda:0"
-        if torch.cuda.device_count() > 1:
-            net = nn.DataParallel(net)
-    net.to(device)
-
-    criterion = nn.CrossEntropyLoss()
-    optimizer = optim.SGD(net.parameters(), lr=config["lr"], momentum=0.9)
-
-    # The `checkpoint_dir` parameter gets passed by Ray Tune when a checkpoint
-    # should be restored.
-    if checkpoint_dir:
-        checkpoint = os.path.join(checkpoint_dir, "checkpoint")
-        model_state, optimizer_state = torch.load(checkpoint)
-        net.load_state_dict(model_state)
-        optimizer.load_state_dict(optimizer_state)
-
-    trainset, testset = load_data(data_dir)
-
-    test_abs = int(len(trainset) * 0.8)
-    train_subset, val_subset = random_split(
-        trainset, [test_abs, len(trainset) - test_abs])
-
-    trainloader = torch.utils.data.DataLoader(
-        train_subset,
-        batch_size=int(2**config["batch_size"]),
-        shuffle=True,
-        num_workers=4)
-    valloader = torch.utils.data.DataLoader(
-        val_subset,
-        batch_size=int(2**config["batch_size"]),
-        shuffle=True,
-        num_workers=4)
-
-    for epoch in range(int(round(config["num_epochs"]))):  # loop over the dataset multiple times
-        running_loss = 0.0
-        epoch_steps = 0
-        for i, data in enumerate(trainloader, 0):
-            # get the inputs; data is a list of [inputs, labels]
-            inputs, labels = data
-            inputs, labels = inputs.to(device), labels.to(device)
-
-            # zero the parameter gradients
-            optimizer.zero_grad()
-
-            # forward + backward + optimize
-            outputs = net(inputs)
-            loss = criterion(outputs, labels)
-            loss.backward()
-            optimizer.step()
-
-            # print statistics
-            running_loss += loss.item()
-            epoch_steps += 1
-            if i % 2000 == 1999:  # print every 2000 mini-batches
-                print("[%d, %5d] loss: %.3f" % (epoch + 1, i + 1,
-                                                running_loss / epoch_steps))
-                running_loss = 0.0
-
-        # Validation loss
-        val_loss = 0.0
-        val_steps = 0
-        total = 0
-        correct = 0
-        for i, data in enumerate(valloader, 0):
-            with torch.no_grad():
-                inputs, labels = data
-                inputs, labels = inputs.to(device), labels.to(device)
-
-                outputs = net(inputs)
-                _, predicted = torch.max(outputs.data, 1)
-                total += labels.size(0)
-                correct += (predicted == labels).sum().item()
-
-                loss = criterion(outputs, labels)
-                val_loss += loss.cpu().numpy()
-                val_steps += 1
-
-        # Here we save a checkpoint. It is automatically registered with
-        # Ray Tune and will potentially be passed as the `checkpoint_dir`
-        # parameter in future iterations.
-        with tune.checkpoint_dir(step=epoch) as checkpoint_dir:
-            path = os.path.join(checkpoint_dir, "checkpoint")
-            torch.save(
-                (net.state_dict(), optimizer.state_dict()), path)
-
-        tune.report(loss=(val_loss / val_steps), accuracy=correct / total)
-    print("Finished Training")
-```
-
-### Test Accuracy
-
-```python
-def _test_accuracy(net, device="cpu"):
-    trainset, testset = load_data()
-
-    testloader = torch.utils.data.DataLoader(
-        testset, batch_size=4, shuffle=False, num_workers=2)
-
-    correct = 0
-    total = 0
-    with torch.no_grad():
-        for data in testloader:
-            images, labels = data
-            images, labels = images.to(device), labels.to(device)
-            outputs = net(images)
-            _, predicted = torch.max(outputs.data, 1)
-            total += labels.size(0)
-            correct += (predicted == labels).sum().item()
-
-    return correct / total
-```
-
-## Hyperparameter Optimization
-
-```python
-import numpy as np
-import flaml
-import os
-
-data_dir = os.path.abspath("data")
-load_data(data_dir)  # Download data for all trials before starting the run
-```
-
-### Search space
-
-```python
-max_num_epoch = 100
-config = {
-    "l1": tune.randint(2, 9),   # log transformed with base 2
-    "l2": tune.randint(2, 9),   # log transformed with base 2
-    "lr": tune.loguniform(1e-4, 1e-1),
-    "num_epochs": tune.loguniform(1, max_num_epoch),
-    "batch_size": tune.randint(1, 5)    # log transformed with base 2
-}
-```
-
-### Budget and resource constraints
-
-```python
-time_budget_s = 600     # time budget in seconds
-gpus_per_trial = 0.5    # number of gpus for each trial; 0.5 means two training jobs can share one gpu
-num_samples = 500       # maximal number of trials
-np.random.seed(7654321)
-```
-
-### Launch the tuning
-
-```python
-import time
-start_time = time.time()
-result = flaml.tune.run(
-    tune.with_parameters(train_cifar, data_dir=data_dir),
-    config=config,
-    metric="loss",
-    mode="min",
-    low_cost_partial_config={"num_epochs": 1},
-    max_resource=max_num_epoch,
-    min_resource=1,
-    scheduler="asha",  # Use asha scheduler to perform early stopping based on intermediate results reported
-    resources_per_trial={"cpu": 1, "gpu": gpus_per_trial},
-    local_dir='logs/',
-    num_samples=num_samples,
-    time_budget_s=time_budget_s,
-    use_ray=True)
-```
-
-### Check the result
-
-```python
-print(f"#trials={len(result.trials)}")
-print(f"time={time.time()-start_time}")
-best_trial = result.get_best_trial("loss", "min", "all")
-print("Best trial config: {}".format(best_trial.config))
-print("Best trial final validation loss: {}".format(
-    best_trial.metric_analysis["loss"]["min"]))
-print("Best trial final validation accuracy: {}".format(
-    best_trial.metric_analysis["accuracy"]["max"]))
-
-best_trained_model = Net(2**best_trial.config["l1"],
-                         2**best_trial.config["l2"])
-device = "cpu"
-if torch.cuda.is_available():
-    device = "cuda:0"
-    if gpus_per_trial > 1:
-        best_trained_model = nn.DataParallel(best_trained_model)
-best_trained_model.to(device)
-
-checkpoint_value = getattr(best_trial.checkpoint, "dir_or_data", None) or best_trial.checkpoint.value
-checkpoint_path = os.path.join(checkpoint_value, "checkpoint")
-
-model_state, optimizer_state = torch.load(checkpoint_path)
-best_trained_model.load_state_dict(model_state)
-
-test_acc = _test_accuracy(best_trained_model, device)
-print("Best trial test set accuracy: {}".format(test_acc))
-```
-
-### Sample of output
-
-```
-#trials=44
-time=1193.913584947586
-Best trial config: {'l1': 8, 'l2': 8, 'lr': 0.0008818671030627281, 'num_epochs': 55.9513429004283, 'batch_size': 3}
-Best trial final validation loss: 1.0694482081472874
-Best trial final validation accuracy: 0.6389
-Files already downloaded and verified
-Files already downloaded and verified
-Best trial test set accuracy: 0.6294
-```
-
-[Link to notebook](https://github.com/microsoft/FLAML/blob/main/notebook/tune_pytorch.ipynb) | [Open in colab](https://colab.research.google.com/github/microsoft/FLAML/blob/main/notebook/tune_pytorch.ipynb)
--- a/website/docs/Examples/images/AzureML_train_pipeline.png
+++ b/website/docs/Examples/images/AzureML_train_pipeline.png
--- a/website/docs/Examples/images/CO2.png
+++ b/website/docs/Examples/images/CO2.png
--- a/website/docs/Examples/images/lgbm_curve.png
+++ b/website/docs/Examples/images/lgbm_curve.png
--- a/website/docs/Examples/images/pipeline.png
+++ b/website/docs/Examples/images/pipeline.png
--- a/website/docs/Examples/images/xgb_curve.png
+++ b/website/docs/Examples/images/xgb_curve.png
--- a/website/docs/Examples/images/xgb_feature_importance.png
+++ b/website/docs/Examples/images/xgb_feature_importance.png