time series forecasting with panel datasets (#541)

* time series forecasting with panel datasets - integrate Temporal Fusion Transformer as a learner based on pytorchforecasting Signed-off-by: Kevin Chen <chenkevin.8787@gmail.com> * update setup.py Signed-off-by: Kevin Chen <chenkevin.8787@gmail.com> * update test_forecast.py Signed-off-by: Kevin Chen <chenkevin.8787@gmail.com> * update setup.py Signed-off-by: Kevin Chen <chenkevin.8787@gmail.com> * update setup.py Signed-off-by: Kevin Chen <chenkevin.8787@gmail.com> * update model.py and test_forecast.py - remove blank lines Signed-off-by: Kevin Chen <chenkevin.8787@gmail.com> * update model.py to prevent errors Signed-off-by: Kevin Chen <chenkevin.8787@gmail.com> * update automl.py and data.py - change forecast task name - update documentation for fit() method Signed-off-by: Kevin Chen <chenkevin.8787@gmail.com> * update test_forecast.py Signed-off-by: Kevin Chen <chenkevin.8787@gmail.com> * update test_forecast.py - add performance test - use 'fit_kwargs_by_estimator' Signed-off-by: Kevin Chen <chenkevin.8787@gmail.com> * add time index function Signed-off-by: Kevin Chen <chenkevin.8787@gmail.com> * update test_forecast.py performance test Signed-off-by: Kevin Chen <chenkevin.8787@gmail.com> * update data.py Signed-off-by: Kevin Chen <chenkevin.8787@gmail.com> * update automl.py Signed-off-by: Kevin Chen <chenkevin.8787@gmail.com> * update data.py to prevent type error Signed-off-by: Kevin Chen <chenkevin.8787@gmail.com> * update setup.py Signed-off-by: Kevin Chen <chenkevin.8787@gmail.com> * update for pytorch forecasting tft on panel datasets Signed-off-by: Kevin Chen <chenkevin.8787@gmail.com> * update automl.py documentations Signed-off-by: Kevin Chen <chenkevin.8787@gmail.com> * - rename estimator - add 'gpu_per_trial' for tft estimator Signed-off-by: Kevin Chen <chenkevin.8787@gmail.com> * update test_forecast.py Signed-off-by: Kevin Chen <chenkevin.8787@gmail.com> * include ts panel forecasting as an example Signed-off-by: Kevin Chen <chenkevin.8787@gmail.com> * update model.py Signed-off-by: Kevin Chen <chenkevin.8787@gmail.com> * update documentations Signed-off-by: Kevin Chen <chenkevin.8787@gmail.com> * update automl_time_series_forecast.ipynb Signed-off-by: Kevin Chen <chenkevin.8787@gmail.com> * update documentations Signed-off-by: Kevin Chen <chenkevin.8787@gmail.com> * "weights_summary" argument deprecated and removed for pl.Trainer() Signed-off-by: Kevin Chen <chenkevin.8787@gmail.com> * update model.py tft estimator prediction method Signed-off-by: Kevin Chen <chenkevin.8787@gmail.com> * update model.py Signed-off-by: Kevin Chen <chenkevin.8787@gmail.com> * update `fit_kwargs` documentation Signed-off-by: Kevin Chen <chenkevin.8787@gmail.com> * update automl.py Signed-off-by: Kevin Chen <chenkevin.8787@gmail.com> Signed-off-by: Kevin Chen <chenkevin.8787@gmail.com> Co-authored-by: Chi Wang <wang.chi@microsoft.com>
2026-02-13 12:35:19 -05:00 · 2022-08-12 11:39:22 -04:00
parent b436459e47
commit f718d18b5e
9 changed files with 4841 additions and 2485 deletions
--- a/website/docs/Examples/AutoML-Time
+++ b/website/docs/Examples/AutoML-Time
@@ -28,7 +28,7 @@ print(automl.predict(X_train[84:]))

 #### Sample output

-```python
+```
 [flaml.automl: 01-21 08:01:20] {2018} INFO - task = ts_forecast
 [flaml.automl: 01-21 08:01:20] {2020} INFO - Data split method: time
 [flaml.automl: 01-21 08:01:20] {2024} INFO - Evaluation method: holdout
@@ -502,7 +502,7 @@ print(automl.predict(multi_X_test))

 #### Sample Output

-```python
+```
 [flaml.automl: 02-28 21:32:26] {2458} INFO - iteration 15, current learner xgboost
 [flaml.automl: 02-28 21:32:26] {2620} INFO -  at 6.2s,	estimator xgboost's best error=0.0959,	best estimator prophet's best error=0.0592
 [flaml.automl: 02-28 21:32:26] {2458} INFO - iteration 16, current learner extra_tree
@@ -594,7 +594,8 @@ print("True label", discrete_y_test)
 ```

 #### Sample Output
-```python
+
+```
 [flaml.automl: 02-28 21:53:03] {2060} INFO - task = ts_forecast_classification
 [flaml.automl: 02-28 21:53:03] {2062} INFO - Data split method: time
 [flaml.automl: 02-28 21:53:03] {2066} INFO - Evaluation method: holdout
@@ -679,4 +680,886 @@ print("True label", discrete_y_test)
 [flaml.automl: 02-28 21:53:04] {2235} INFO - Time taken to find the best model: 0.8547139167785645
 ```

+### Forecasting with Panel Datasets
+
+Panel time series datasets involves multiple individual time series. For example, see Stallion demand dataset from PyTorch Forecasting, orginally from Kaggle.
+
+```python
+def get_stalliion_data():
+    from pytorch_forecasting.data.examples import get_stallion_data
+
+    data = get_stallion_data()
+    # add time index - For datasets with no missing values, FLAML will automate this process
+    data["time_idx"] = data["date"].dt.year * 12 + data["date"].dt.month
+    data["time_idx"] -= data["time_idx"].min()
+    # add additional features
+    data["month"] = data.date.dt.month.astype(str).astype(
+        "category"
+    )  # categories have be strings
+    data["log_volume"] = np.log(data.volume + 1e-8)
+    data["avg_volume_by_sku"] = data.groupby(
+        ["time_idx", "sku"], observed=True
+    ).volume.transform("mean")
+    data["avg_volume_by_agency"] = data.groupby(
+        ["time_idx", "agency"], observed=True
+    ).volume.transform("mean")
+    # we want to encode special days as one variable and thus need to first reverse one-hot encoding
+    special_days = [
+        "easter_day",
+        "good_friday",
+        "new_year",
+        "christmas",
+        "labor_day",
+        "independence_day",
+        "revolution_day_memorial",
+        "regional_games",
+        "beer_capital",
+        "music_fest",
+    ]
+    data[special_days] = (
+        data[special_days]
+        .apply(lambda x: x.map({0: "-", 1: x.name}))
+        .astype("category")
+    )
+    return data, special_days
+
+data, special_days = get_stalliion_data()
+time_horizon = 6  # predict six months
+training_cutoff = data["time_idx"].max() - time_horizon
+data["time_idx"] = data["time_idx"].astype("int")
+ts_col = data.pop("date")
+data.insert(0, "date", ts_col)
+# FLAML assumes input is not sorted, but we sort here for comparison purposes with y_test
+data = data.sort_values(["agency", "sku", "date"])
+X_train = data[lambda x: x.time_idx <= training_cutoff]
+X_test = data[lambda x: x.time_idx > training_cutoff]
+y_train = X_train.pop("volume")
+y_test = X_test.pop("volume")
+automl = AutoML()
+# Configure settings for FLAML model
+settings = {
+    "time_budget": budget,  # total running time in seconds
+    "metric": "mape",  # primary metric
+    "task": "ts_forecast_panel",  # task type
+    "log_file_name": "test/stallion_forecast.log",  # flaml log file
+    "eval_method": "holdout",
+}
+# Specify kwargs for TimeSeriesDataSet used by TemporalFusionTransformerEstimator
+fit_kwargs_by_estimator = {
+    "tft": {
+        "max_encoder_length": 24,
+        "static_categoricals": ["agency", "sku"],
+        "static_reals": ["avg_population_2017", "avg_yearly_household_income_2017"],
+        "time_varying_known_categoricals": ["special_days", "month"],
+        "variable_groups": {
+            "special_days": special_days
+        },  # group of categorical variables can be treated as one variable
+        "time_varying_known_reals": [
+            "time_idx",
+            "price_regular",
+            "discount_in_percent",
+        ],
+        "time_varying_unknown_categoricals": [],
+        "time_varying_unknown_reals": [
+            "y",  # always need a 'y' column for the target column
+            "log_volume",
+            "industry_volume",
+            "soda_volume",
+            "avg_max_temp",
+            "avg_volume_by_agency",
+            "avg_volume_by_sku",
+        ],
+        "batch_size": 256,
+        "max_epochs": 1,
+        "gpu_per_trial": -1,
+    }
+}
+# Train the model
+automl.fit(
+    X_train=X_train,
+    y_train=y_train,
+    **settings,
+    period=time_horizon,
+    group_ids=["agency", "sku"],
+    fit_kwargs_by_estimator=fit_kwargs_by_estimator,
+)
+# Compute predictions of testing dataset
+y_pred = automl.predict(X_test)
+print(y_test)
+print(y_pred)
+# best model
+print(automl.model.estimator)
+```
+
+#### Sample Output
+
+```
+[flaml.automl: 07-28 21:26:03] {2478} INFO - task = ts_forecast_panel
+[flaml.automl: 07-28 21:26:03] {2480} INFO - Data split method: time
+[flaml.automl: 07-28 21:26:03] {2483} INFO - Evaluation method: holdout
+[flaml.automl: 07-28 21:26:03] {2552} INFO - Minimizing error metric: mape
+[flaml.automl: 07-28 21:26:03] {2694} INFO - List of ML learners in AutoML Run: ['tft']
+[flaml.automl: 07-28 21:26:03] {2986} INFO - iteration 0, current learner tft
+GPU available: False, used: False
+TPU available: False, using: 0 TPU cores
+IPU available: False, using: 0 IPUs
+
+   | Name                               | Type                            | Params
+----------------------------------------------------------------------------------------
+0  | loss                               | QuantileLoss                    | 0
+1  | logging_metrics                    | ModuleList                      | 0
+2  | input_embeddings                   | MultiEmbedding                  | 1.3 K
+3  | prescalers                         | ModuleDict                      | 256
+4  | static_variable_selection          | VariableSelectionNetwork        | 3.4 K
+5  | encoder_variable_selection         | VariableSelectionNetwork        | 8.0 K
+6  | decoder_variable_selection         | VariableSelectionNetwork        | 2.7 K
+7  | static_context_variable_selection  | GatedResidualNetwork            | 1.1 K
+8  | static_context_initial_hidden_lstm | GatedResidualNetwork            | 1.1 K
+9  | static_context_initial_cell_lstm   | GatedResidualNetwork            | 1.1 K
+10 | static_context_enrichment          | GatedResidualNetwork            | 1.1 K
+11 | lstm_encoder                       | LSTM                            | 4.4 K
+12 | lstm_decoder                       | LSTM                            | 4.4 K
+13 | post_lstm_gate_encoder             | GatedLinearUnit                 | 544
+14 | post_lstm_add_norm_encoder         | AddNorm                         | 32
+15 | static_enrichment                  | GatedResidualNetwork            | 1.4 K
+16 | multihead_attn                     | InterpretableMultiHeadAttention | 676
+17 | post_attn_gate_norm                | GateAddNorm                     | 576
+18 | pos_wise_ff                        | GatedResidualNetwork            | 1.1 K
+19 | pre_output_gate_norm               | GateAddNorm                     | 576
+20 | output_layer                       | Linear                          | 119
+----------------------------------------------------------------------------------------
+33.6 K    Trainable params
+0         Non-trainable params
+33.6 K    Total params
+0.135     Total estimated model params size (MB)
+
+Epoch 19: 100%|██████████| 129/129 [00:56<00:00,  2.27it/s, loss=45.9, v_num=2, train_loss_step=43.00, val_loss=65.20, train_loss_epoch=46.50]
+
+[flaml.automl: 07-28 21:46:46] {3114} INFO - Estimated sufficient time budget=12424212s. Estimated necessary time budget=12424s.
+[flaml.automl: 07-28 21:46:46] {3161} INFO -  at 1242.6s,\testimator tft's best error=1324290483134574.7500,\tbest estimator tft's best error=1324290483134574.7500
+GPU available: False, used: False
+TPU available: False, using: 0 TPU cores
+IPU available: False, using: 0 IPUs
+
+   | Name                               | Type                            | Params
+----------------------------------------------------------------------------------------
+0  | loss                               | QuantileLoss                    | 0
+1  | logging_metrics                    | ModuleList                      | 0
+2  | input_embeddings                   | MultiEmbedding                  | 1.3 K
+3  | prescalers                         | ModuleDict                      | 256
+4  | static_variable_selection          | VariableSelectionNetwork        | 3.4 K
+5  | encoder_variable_selection         | VariableSelectionNetwork        | 8.0 K
+6  | decoder_variable_selection         | VariableSelectionNetwork        | 2.7 K
+7  | static_context_variable_selection  | GatedResidualNetwork            | 1.1 K
+8  | static_context_initial_hidden_lstm | GatedResidualNetwork            | 1.1 K
+9  | static_context_initial_cell_lstm   | GatedResidualNetwork            | 1.1 K
+10 | static_context_enrichment          | GatedResidualNetwork            | 1.1 K
+11 | lstm_encoder                       | LSTM                            | 4.4 K
+12 | lstm_decoder                       | LSTM                            | 4.4 K
+13 | post_lstm_gate_encoder             | GatedLinearUnit                 | 544
+14 | post_lstm_add_norm_encoder         | AddNorm                         | 32
+15 | static_enrichment                  | GatedResidualNetwork            | 1.4 K
+16 | multihead_attn                     | InterpretableMultiHeadAttention | 676
+17 | post_attn_gate_norm                | GateAddNorm                     | 576
+18 | pos_wise_ff                        | GatedResidualNetwork            | 1.1 K
+19 | pre_output_gate_norm               | GateAddNorm                     | 576
+20 | output_layer                       | Linear                          | 119
+----------------------------------------------------------------------------------------
+33.6 K    Trainable params
+0         Non-trainable params
+33.6 K    Total params
+0.135     Total estimated model params size (MB)
+Epoch 19: 100%|██████████| 145/145 [01:03<00:00,  2.28it/s, loss=45.2, v_num=3, train_loss_step=46.30, val_loss=67.60, train_loss_epoch=48.10]
+[flaml.automl: 07-28 22:08:05] {3425} INFO - retrain tft for 1279.6s
+[flaml.automl: 07-28 22:08:05] {3432} INFO - retrained model: TemporalFusionTransformer(
+  (loss): QuantileLoss()
+  (logging_metrics): ModuleList(
+    (0): SMAPE()
+    (1): MAE()
+    (2): RMSE()
+    (3): MAPE()
+  )
+  (input_embeddings): MultiEmbedding(
+    (embeddings): ModuleDict(
+      (agency): Embedding(58, 16)
+      (sku): Embedding(25, 10)
+      (special_days): TimeDistributedEmbeddingBag(11, 6, mode=sum)
+      (month): Embedding(12, 6)
+    )
+  )
+  (prescalers): ModuleDict(
+    (avg_population_2017): Linear(in_features=1, out_features=8, bias=True)
+    (avg_yearly_household_income_2017): Linear(in_features=1, out_features=8, bias=True)
+    (encoder_length): Linear(in_features=1, out_features=8, bias=True)
+    (y_center): Linear(in_features=1, out_features=8, bias=True)
+    (y_scale): Linear(in_features=1, out_features=8, bias=True)
+    (time_idx): Linear(in_features=1, out_features=8, bias=True)
+    (price_regular): Linear(in_features=1, out_features=8, bias=True)
+    (discount_in_percent): Linear(in_features=1, out_features=8, bias=True)
+    (relative_time_idx): Linear(in_features=1, out_features=8, bias=True)
+    (y): Linear(in_features=1, out_features=8, bias=True)
+    (log_volume): Linear(in_features=1, out_features=8, bias=True)
+    (industry_volume): Linear(in_features=1, out_features=8, bias=True)
+    (soda_volume): Linear(in_features=1, out_features=8, bias=True)
+    (avg_max_temp): Linear(in_features=1, out_features=8, bias=True)
+    (avg_volume_by_agency): Linear(in_features=1, out_features=8, bias=True)
+    (avg_volume_by_sku): Linear(in_features=1, out_features=8, bias=True)
+  )
+  (static_variable_selection): VariableSelectionNetwork(
+    (flattened_grn): GatedResidualNetwork(
+      (resample_norm): ResampleNorm(
+        (resample): TimeDistributedInterpolation()
+        (gate): Sigmoid()
+        (norm): LayerNorm((7,), eps=1e-05, elementwise_affine=True)
+      )
+      (fc1): Linear(in_features=66, out_features=7, bias=True)
+      (elu): ELU(alpha=1.0)
+      (fc2): Linear(in_features=7, out_features=7, bias=True)
+      (gate_norm): GateAddNorm(
+        (glu): GatedLinearUnit(
+          (dropout): Dropout(p=0.1, inplace=False)
+          (fc): Linear(in_features=7, out_features=14, bias=True)
+        )
+        (add_norm): AddNorm(
+          (norm): LayerNorm((7,), eps=1e-05, elementwise_affine=True)
+        )
+      )
+    )
+    (single_variable_grns): ModuleDict(
+      (agency): ResampleNorm(
+        (gate): Sigmoid()
+        (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True)
+      )
+      (sku): ResampleNorm(
+        (resample): TimeDistributedInterpolation()
+        (gate): Sigmoid()
+        (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True)
+      )
+      (avg_population_2017): GatedResidualNetwork(
+        (resample_norm): ResampleNorm(
+          (resample): TimeDistributedInterpolation()
+          (gate): Sigmoid()
+          (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True)
+        )
+        (fc1): Linear(in_features=8, out_features=8, bias=True)
+        (elu): ELU(alpha=1.0)
+        (fc2): Linear(in_features=8, out_features=8, bias=True)
+        (gate_norm): GateAddNorm(
+          (glu): GatedLinearUnit(
+            (dropout): Dropout(p=0.1, inplace=False)
+            (fc): Linear(in_features=8, out_features=32, bias=True)
+          )
+          (add_norm): AddNorm(
+            (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True)
+          )
+        )
+      )
+      (avg_yearly_household_income_2017): GatedResidualNetwork(
+        (resample_norm): ResampleNorm(
+          (resample): TimeDistributedInterpolation()
+          (gate): Sigmoid()
+          (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True)
+        )
+        (fc1): Linear(in_features=8, out_features=8, bias=True)
+        (elu): ELU(alpha=1.0)
+        (fc2): Linear(in_features=8, out_features=8, bias=True)
+        (gate_norm): GateAddNorm(
+          (glu): GatedLinearUnit(
+            (dropout): Dropout(p=0.1, inplace=False)
+            (fc): Linear(in_features=8, out_features=32, bias=True)
+          )
+          (add_norm): AddNorm(
+            (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True)
+          )
+        )
+      )
+      (encoder_length): GatedResidualNetwork(
+        (resample_norm): ResampleNorm(
+          (resample): TimeDistributedInterpolation()
+          (gate): Sigmoid()
+          (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True)
+        )
+        (fc1): Linear(in_features=8, out_features=8, bias=True)
+        (elu): ELU(alpha=1.0)
+        (fc2): Linear(in_features=8, out_features=8, bias=True)
+        (gate_norm): GateAddNorm(
+          (glu): GatedLinearUnit(
+            (dropout): Dropout(p=0.1, inplace=False)
+            (fc): Linear(in_features=8, out_features=32, bias=True)
+          )
+          (add_norm): AddNorm(
+            (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True)
+          )
+        )
+      )
+      (y_center): GatedResidualNetwork(
+        (resample_norm): ResampleNorm(
+          (resample): TimeDistributedInterpolation()
+          (gate): Sigmoid()
+          (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True)
+        )
+        (fc1): Linear(in_features=8, out_features=8, bias=True)
+        (elu): ELU(alpha=1.0)
+        (fc2): Linear(in_features=8, out_features=8, bias=True)
+        (gate_norm): GateAddNorm(
+          (glu): GatedLinearUnit(
+            (dropout): Dropout(p=0.1, inplace=False)
+            (fc): Linear(in_features=8, out_features=32, bias=True)
+          )
+          (add_norm): AddNorm(
+            (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True)
+          )
+        )
+      )
+      (y_scale): GatedResidualNetwork(
+        (resample_norm): ResampleNorm(
+          (resample): TimeDistributedInterpolation()
+          (gate): Sigmoid()
+          (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True)
+        )
+        (fc1): Linear(in_features=8, out_features=8, bias=True)
+        (elu): ELU(alpha=1.0)
+        (fc2): Linear(in_features=8, out_features=8, bias=True)
+        (gate_norm): GateAddNorm(
+          (glu): GatedLinearUnit(
+            (dropout): Dropout(p=0.1, inplace=False)
+            (fc): Linear(in_features=8, out_features=32, bias=True)
+          )
+          (add_norm): AddNorm(
+            (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True)
+          )
+        )
+      )
+    )
+    (prescalers): ModuleDict(
+      (avg_population_2017): Linear(in_features=1, out_features=8, bias=True)
+      (avg_yearly_household_income_2017): Linear(in_features=1, out_features=8, bias=True)
+      (encoder_length): Linear(in_features=1, out_features=8, bias=True)
+      (y_center): Linear(in_features=1, out_features=8, bias=True)
+      (y_scale): Linear(in_features=1, out_features=8, bias=True)
+    )
+    (softmax): Softmax(dim=-1)
+  )
+  (encoder_variable_selection): VariableSelectionNetwork(
+    (flattened_grn): GatedResidualNetwork(
+      (resample_norm): ResampleNorm(
+        (resample): TimeDistributedInterpolation()
+        (gate): Sigmoid()
+        (norm): LayerNorm((13,), eps=1e-05, elementwise_affine=True)
+      )
+      (fc1): Linear(in_features=100, out_features=13, bias=True)
+      (elu): ELU(alpha=1.0)
+      (context): Linear(in_features=16, out_features=13, bias=False)
+      (fc2): Linear(in_features=13, out_features=13, bias=True)
+      (gate_norm): GateAddNorm(
+        (glu): GatedLinearUnit(
+          (dropout): Dropout(p=0.1, inplace=False)
+          (fc): Linear(in_features=13, out_features=26, bias=True)
+        )
+        (add_norm): AddNorm(
+          (norm): LayerNorm((13,), eps=1e-05, elementwise_affine=True)
+        )
+      )
+    )
+    (single_variable_grns): ModuleDict(
+      (special_days): ResampleNorm(
+        (resample): TimeDistributedInterpolation()
+        (gate): Sigmoid()
+        (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True)
+      )
+      (month): ResampleNorm(
+        (resample): TimeDistributedInterpolation()
+        (gate): Sigmoid()
+        (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True)
+      )
+      (time_idx): GatedResidualNetwork(
+        (resample_norm): ResampleNorm(
+          (resample): TimeDistributedInterpolation()
+          (gate): Sigmoid()
+          (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True)
+        )
+        (fc1): Linear(in_features=8, out_features=8, bias=True)
+        (elu): ELU(alpha=1.0)
+        (fc2): Linear(in_features=8, out_features=8, bias=True)
+        (gate_norm): GateAddNorm(
+          (glu): GatedLinearUnit(
+            (dropout): Dropout(p=0.1, inplace=False)
+            (fc): Linear(in_features=8, out_features=32, bias=True)
+          )
+          (add_norm): AddNorm(
+            (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True)
+          )
+        )
+      )
+      (price_regular): GatedResidualNetwork(
+        (resample_norm): ResampleNorm(
+          (resample): TimeDistributedInterpolation()
+          (gate): Sigmoid()
+          (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True)
+        )
+        (fc1): Linear(in_features=8, out_features=8, bias=True)
+        (elu): ELU(alpha=1.0)
+        (fc2): Linear(in_features=8, out_features=8, bias=True)
+        (gate_norm): GateAddNorm(
+          (glu): GatedLinearUnit(
+            (dropout): Dropout(p=0.1, inplace=False)
+            (fc): Linear(in_features=8, out_features=32, bias=True)
+          )
+          (add_norm): AddNorm(
+            (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True)
+          )
+        )
+      )
+      (discount_in_percent): GatedResidualNetwork(
+        (resample_norm): ResampleNorm(
+          (resample): TimeDistributedInterpolation()
+          (gate): Sigmoid()
+          (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True)
+        )
+        (fc1): Linear(in_features=8, out_features=8, bias=True)
+        (elu): ELU(alpha=1.0)
+        (fc2): Linear(in_features=8, out_features=8, bias=True)
+        (gate_norm): GateAddNorm(
+          (glu): GatedLinearUnit(
+            (dropout): Dropout(p=0.1, inplace=False)
+            (fc): Linear(in_features=8, out_features=32, bias=True)
+          )
+          (add_norm): AddNorm(
+            (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True)
+          )
+        )
+      )
+      (relative_time_idx): GatedResidualNetwork(
+        (resample_norm): ResampleNorm(
+          (resample): TimeDistributedInterpolation()
+          (gate): Sigmoid()
+          (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True)
+        )
+        (fc1): Linear(in_features=8, out_features=8, bias=True)
+        (elu): ELU(alpha=1.0)
+        (fc2): Linear(in_features=8, out_features=8, bias=True)
+        (gate_norm): GateAddNorm(
+          (glu): GatedLinearUnit(
+            (dropout): Dropout(p=0.1, inplace=False)
+            (fc): Linear(in_features=8, out_features=32, bias=True)
+          )
+          (add_norm): AddNorm(
+            (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True)
+          )
+        )
+      )
+      (y): GatedResidualNetwork(
+        (resample_norm): ResampleNorm(
+          (resample): TimeDistributedInterpolation()
+          (gate): Sigmoid()
+          (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True)
+        )
+        (fc1): Linear(in_features=8, out_features=8, bias=True)
+        (elu): ELU(alpha=1.0)
+        (fc2): Linear(in_features=8, out_features=8, bias=True)
+        (gate_norm): GateAddNorm(
+          (glu): GatedLinearUnit(
+            (dropout): Dropout(p=0.1, inplace=False)
+            (fc): Linear(in_features=8, out_features=32, bias=True)
+          )
+          (add_norm): AddNorm(
+            (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True)
+          )
+        )
+      )
+      (log_volume): GatedResidualNetwork(
+        (resample_norm): ResampleNorm(
+          (resample): TimeDistributedInterpolation()
+          (gate): Sigmoid()
+          (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True)
+        )
+        (fc1): Linear(in_features=8, out_features=8, bias=True)
+        (elu): ELU(alpha=1.0)
+        (fc2): Linear(in_features=8, out_features=8, bias=True)
+        (gate_norm): GateAddNorm(
+          (glu): GatedLinearUnit(
+            (dropout): Dropout(p=0.1, inplace=False)
+            (fc): Linear(in_features=8, out_features=32, bias=True)
+          )
+          (add_norm): AddNorm(
+            (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True)
+          )
+        )
+      )
+      (industry_volume): GatedResidualNetwork(
+        (resample_norm): ResampleNorm(
+          (resample): TimeDistributedInterpolation()
+          (gate): Sigmoid()
+          (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True)
+        )
+        (fc1): Linear(in_features=8, out_features=8, bias=True)
+        (elu): ELU(alpha=1.0)
+        (fc2): Linear(in_features=8, out_features=8, bias=True)
+        (gate_norm): GateAddNorm(
+          (glu): GatedLinearUnit(
+            (dropout): Dropout(p=0.1, inplace=False)
+            (fc): Linear(in_features=8, out_features=32, bias=True)
+          )
+          (add_norm): AddNorm(
+            (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True)
+          )
+        )
+      )
+      (soda_volume): GatedResidualNetwork(
+        (resample_norm): ResampleNorm(
+          (resample): TimeDistributedInterpolation()
+          (gate): Sigmoid()
+          (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True)
+        )
+        (fc1): Linear(in_features=8, out_features=8, bias=True)
+        (elu): ELU(alpha=1.0)
+        (fc2): Linear(in_features=8, out_features=8, bias=True)
+        (gate_norm): GateAddNorm(
+          (glu): GatedLinearUnit(
+            (dropout): Dropout(p=0.1, inplace=False)
+            (fc): Linear(in_features=8, out_features=32, bias=True)
+          )
+          (add_norm): AddNorm(
+            (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True)
+          )
+        )
+      )
+      (avg_max_temp): GatedResidualNetwork(
+        (resample_norm): ResampleNorm(
+          (resample): TimeDistributedInterpolation()
+          (gate): Sigmoid()
+          (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True)
+        )
+        (fc1): Linear(in_features=8, out_features=8, bias=True)
+        (elu): ELU(alpha=1.0)
+        (fc2): Linear(in_features=8, out_features=8, bias=True)
+        (gate_norm): GateAddNorm(
+          (glu): GatedLinearUnit(
+            (dropout): Dropout(p=0.1, inplace=False)
+            (fc): Linear(in_features=8, out_features=32, bias=True)
+          )
+          (add_norm): AddNorm(
+            (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True)
+          )
+        )
+      )
+      (avg_volume_by_agency): GatedResidualNetwork(
+        (resample_norm): ResampleNorm(
+          (resample): TimeDistributedInterpolation()
+          (gate): Sigmoid()
+          (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True)
+        )
+        (fc1): Linear(in_features=8, out_features=8, bias=True)
+        (elu): ELU(alpha=1.0)
+        (fc2): Linear(in_features=8, out_features=8, bias=True)
+        (gate_norm): GateAddNorm(
+          (glu): GatedLinearUnit(
+            (dropout): Dropout(p=0.1, inplace=False)
+            (fc): Linear(in_features=8, out_features=32, bias=True)
+          )
+          (add_norm): AddNorm(
+            (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True)
+          )
+        )
+      )
+      (avg_volume_by_sku): GatedResidualNetwork(
+        (resample_norm): ResampleNorm(
+          (resample): TimeDistributedInterpolation()
+          (gate): Sigmoid()
+          (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True)
+        )
+        (fc1): Linear(in_features=8, out_features=8, bias=True)
+        (elu): ELU(alpha=1.0)
+        (fc2): Linear(in_features=8, out_features=8, bias=True)
+        (gate_norm): GateAddNorm(
+          (glu): GatedLinearUnit(
+            (dropout): Dropout(p=0.1, inplace=False)
+            (fc): Linear(in_features=8, out_features=32, bias=True)
+          )
+          (add_norm): AddNorm(
+            (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True)
+          )
+        )
+      )
+    )
+    (prescalers): ModuleDict(
+      (time_idx): Linear(in_features=1, out_features=8, bias=True)
+      (price_regular): Linear(in_features=1, out_features=8, bias=True)
+      (discount_in_percent): Linear(in_features=1, out_features=8, bias=True)
+      (relative_time_idx): Linear(in_features=1, out_features=8, bias=True)
+      (y): Linear(in_features=1, out_features=8, bias=True)
+      (log_volume): Linear(in_features=1, out_features=8, bias=True)
+      (industry_volume): Linear(in_features=1, out_features=8, bias=True)
+      (soda_volume): Linear(in_features=1, out_features=8, bias=True)
+      (avg_max_temp): Linear(in_features=1, out_features=8, bias=True)
+      (avg_volume_by_agency): Linear(in_features=1, out_features=8, bias=True)
+      (avg_volume_by_sku): Linear(in_features=1, out_features=8, bias=True)
+    )
+    (softmax): Softmax(dim=-1)
+  )
+  (decoder_variable_selection): VariableSelectionNetwork(
+    (flattened_grn): GatedResidualNetwork(
+      (resample_norm): ResampleNorm(
+        (resample): TimeDistributedInterpolation()
+        (gate): Sigmoid()
+        (norm): LayerNorm((6,), eps=1e-05, elementwise_affine=True)
+      )
+      (fc1): Linear(in_features=44, out_features=6, bias=True)
+      (elu): ELU(alpha=1.0)
+      (context): Linear(in_features=16, out_features=6, bias=False)
+      (fc2): Linear(in_features=6, out_features=6, bias=True)
+      (gate_norm): GateAddNorm(
+        (glu): GatedLinearUnit(
+          (dropout): Dropout(p=0.1, inplace=False)
+          (fc): Linear(in_features=6, out_features=12, bias=True)
+        )
+        (add_norm): AddNorm(
+          (norm): LayerNorm((6,), eps=1e-05, elementwise_affine=True)
+        )
+      )
+    )
+    (single_variable_grns): ModuleDict(
+      (special_days): ResampleNorm(
+        (resample): TimeDistributedInterpolation()
+        (gate): Sigmoid()
+        (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True)
+      )
+      (month): ResampleNorm(
+        (resample): TimeDistributedInterpolation()
+        (gate): Sigmoid()
+        (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True)
+      )
+      (time_idx): GatedResidualNetwork(
+        (resample_norm): ResampleNorm(
+          (resample): TimeDistributedInterpolation()
+          (gate): Sigmoid()
+          (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True)
+        )
+        (fc1): Linear(in_features=8, out_features=8, bias=True)
+        (elu): ELU(alpha=1.0)
+        (fc2): Linear(in_features=8, out_features=8, bias=True)
+        (gate_norm): GateAddNorm(
+          (glu): GatedLinearUnit(
+            (dropout): Dropout(p=0.1, inplace=False)
+            (fc): Linear(in_features=8, out_features=32, bias=True)
+          )
+          (add_norm): AddNorm(
+            (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True)
+          )
+        )
+      )
+      (price_regular): GatedResidualNetwork(
+        (resample_norm): ResampleNorm(
+          (resample): TimeDistributedInterpolation()
+          (gate): Sigmoid()
+          (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True)
+        )
+        (fc1): Linear(in_features=8, out_features=8, bias=True)
+        (elu): ELU(alpha=1.0)
+        (fc2): Linear(in_features=8, out_features=8, bias=True)
+        (gate_norm): GateAddNorm(
+          (glu): GatedLinearUnit(
+            (dropout): Dropout(p=0.1, inplace=False)
+            (fc): Linear(in_features=8, out_features=32, bias=True)
+          )
+          (add_norm): AddNorm(
+            (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True)
+          )
+        )
+      )
+      (discount_in_percent): GatedResidualNetwork(
+        (resample_norm): ResampleNorm(
+          (resample): TimeDistributedInterpolation()
+          (gate): Sigmoid()
+          (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True)
+        )
+        (fc1): Linear(in_features=8, out_features=8, bias=True)
+        (elu): ELU(alpha=1.0)
+        (fc2): Linear(in_features=8, out_features=8, bias=True)
+        (gate_norm): GateAddNorm(
+          (glu): GatedLinearUnit(
+            (dropout): Dropout(p=0.1, inplace=False)
+            (fc): Linear(in_features=8, out_features=32, bias=True)
+          )
+          (add_norm): AddNorm(
+            (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True)
+          )
+        )
+      )
+      (relative_time_idx): GatedResidualNetwork(
+        (resample_norm): ResampleNorm(
+          (resample): TimeDistributedInterpolation()
+          (gate): Sigmoid()
+          (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True)
+        )
+        (fc1): Linear(in_features=8, out_features=8, bias=True)
+        (elu): ELU(alpha=1.0)
+        (fc2): Linear(in_features=8, out_features=8, bias=True)
+        (gate_norm): GateAddNorm(
+          (glu): GatedLinearUnit(
+            (dropout): Dropout(p=0.1, inplace=False)
+            (fc): Linear(in_features=8, out_features=32, bias=True)
+          )
+          (add_norm): AddNorm(
+            (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True)
+          )
+        )
+      )
+    )
+    (prescalers): ModuleDict(
+      (time_idx): Linear(in_features=1, out_features=8, bias=True)
+      (price_regular): Linear(in_features=1, out_features=8, bias=True)
+      (discount_in_percent): Linear(in_features=1, out_features=8, bias=True)
+      (relative_time_idx): Linear(in_features=1, out_features=8, bias=True)
+    )
+    (softmax): Softmax(dim=-1)
+  )
+  (static_context_variable_selection): GatedResidualNetwork(
+    (fc1): Linear(in_features=16, out_features=16, bias=True)
+    (elu): ELU(alpha=1.0)
+    (fc2): Linear(in_features=16, out_features=16, bias=True)
+    (gate_norm): GateAddNorm(
+      (glu): GatedLinearUnit(
+        (dropout): Dropout(p=0.1, inplace=False)
+        (fc): Linear(in_features=16, out_features=32, bias=True)
+      )
+      (add_norm): AddNorm(
+        (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True)
+      )
+    )
+  )
+  (static_context_initial_hidden_lstm): GatedResidualNetwork(
+    (fc1): Linear(in_features=16, out_features=16, bias=True)
+    (elu): ELU(alpha=1.0)
+    (fc2): Linear(in_features=16, out_features=16, bias=True)
+    (gate_norm): GateAddNorm(
+      (glu): GatedLinearUnit(
+        (dropout): Dropout(p=0.1, inplace=False)
+        (fc): Linear(in_features=16, out_features=32, bias=True)
+      )
+      (add_norm): AddNorm(
+        (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True)
+      )
+    )
+  )
+  (static_context_initial_cell_lstm): GatedResidualNetwork(
+    (fc1): Linear(in_features=16, out_features=16, bias=True)
+    (elu): ELU(alpha=1.0)
+    (fc2): Linear(in_features=16, out_features=16, bias=True)
+    (gate_norm): GateAddNorm(
+      (glu): GatedLinearUnit(
+        (dropout): Dropout(p=0.1, inplace=False)
+        (fc): Linear(in_features=16, out_features=32, bias=True)
+      )
+      (add_norm): AddNorm(
+        (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True)
+      )
+    )
+  )
+  (static_context_enrichment): GatedResidualNetwork(
+    (fc1): Linear(in_features=16, out_features=16, bias=True)
+    (elu): ELU(alpha=1.0)
+    (fc2): Linear(in_features=16, out_features=16, bias=True)
+    (gate_norm): GateAddNorm(
+      (glu): GatedLinearUnit(
+        (dropout): Dropout(p=0.1, inplace=False)
+        (fc): Linear(in_features=16, out_features=32, bias=True)
+      )
+      (add_norm): AddNorm(
+        (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True)
+      )
+    )
+  )
+  (lstm_encoder): LSTM(16, 16, num_layers=2, batch_first=True, dropout=0.1)
+  (lstm_decoder): LSTM(16, 16, num_layers=2, batch_first=True, dropout=0.1)
+  (post_lstm_gate_encoder): GatedLinearUnit(
+    (dropout): Dropout(p=0.1, inplace=False)
+    (fc): Linear(in_features=16, out_features=32, bias=True)
+  )
+  (post_lstm_gate_decoder): GatedLinearUnit(
+    (dropout): Dropout(p=0.1, inplace=False)
+    (fc): Linear(in_features=16, out_features=32, bias=True)
+  )
+  (post_lstm_add_norm_encoder): AddNorm(
+    (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True)
+  )
+  (post_lstm_add_norm_decoder): AddNorm(
+    (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True)
+  )
+  (static_enrichment): GatedResidualNetwork(
+    (fc1): Linear(in_features=16, out_features=16, bias=True)
+    (elu): ELU(alpha=1.0)
+    (context): Linear(in_features=16, out_features=16, bias=False)
+    (fc2): Linear(in_features=16, out_features=16, bias=True)
+    (gate_norm): GateAddNorm(
+      (glu): GatedLinearUnit(
+        (dropout): Dropout(p=0.1, inplace=False)
+        (fc): Linear(in_features=16, out_features=32, bias=True)
+      )
+      (add_norm): AddNorm(
+        (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True)
+      )
+    )
+  )
+  (multihead_attn): InterpretableMultiHeadAttention(
+    (dropout): Dropout(p=0.1, inplace=False)
+    (v_layer): Linear(in_features=16, out_features=4, bias=True)
+    (q_layers): ModuleList(
+      (0): Linear(in_features=16, out_features=4, bias=True)
+      (1): Linear(in_features=16, out_features=4, bias=True)
+      (2): Linear(in_features=16, out_features=4, bias=True)
+      (3): Linear(in_features=16, out_features=4, bias=True)
+    )
+    (k_layers): ModuleList(
+      (0): Linear(in_features=16, out_features=4, bias=True)
+      (1): Linear(in_features=16, out_features=4, bias=True)
+      (2): Linear(in_features=16, out_features=4, bias=True)
+      (3): Linear(in_features=16, out_features=4, bias=True)
+    )
+    (attention): ScaledDotProductAttention(
+      (softmax): Softmax(dim=2)
+    )
+    (w_h): Linear(in_features=4, out_features=16, bias=False)
+  )
+  (post_attn_gate_norm): GateAddNorm(
+    (glu): GatedLinearUnit(
+      (dropout): Dropout(p=0.1, inplace=False)
+      (fc): Linear(in_features=16, out_features=32, bias=True)
+    )
+    (add_norm): AddNorm(
+      (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True)
+    )
+  )
+  (pos_wise_ff): GatedResidualNetwork(
+    (fc1): Linear(in_features=16, out_features=16, bias=True)
+    (elu): ELU(alpha=1.0)
+    (fc2): Linear(in_features=16, out_features=16, bias=True)
+    (gate_norm): GateAddNorm(
+      (glu): GatedLinearUnit(
+        (dropout): Dropout(p=0.1, inplace=False)
+        (fc): Linear(in_features=16, out_features=32, bias=True)
+      )
+      (add_norm): AddNorm(
+        (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True)
+      )
+    )
+  )
+  (pre_output_gate_norm): GateAddNorm(
+    (glu): GatedLinearUnit(
+      (fc): Linear(in_features=16, out_features=32, bias=True)
+    )
+    (add_norm): AddNorm(
+      (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True)
+    )
+  )
+  (output_layer): Linear(in_features=16, out_features=7, bias=True)
+)
+[flaml.automl: 07-28 22:08:05] {2725} INFO - fit succeeded
+[flaml.automl: 07-28 22:08:05] {2726} INFO - Time taken to find the best model: 1242.6435902118683
+[flaml.automl: 07-28 22:08:05] {2737} WARNING - Time taken to find the best model is 414% of the provided time budget and not all estimators' hyperparameter search converged. Consider increasing the time budget.\n"
+     ]
+    }
+   ],
+```
+
 [Link to notebook](https://github.com/microsoft/FLAML/blob/main/notebook/automl_time_series_forecast.ipynb) | [Open in colab](https://colab.research.google.com/github/microsoft/FLAML/blob/main/notebook/automl_time_series_forecast.ipynb)
--- a/website/docs/Use-Cases/Task-Oriented-AutoML.md
+++ b/website/docs/Use-Cases/Task-Oriented-AutoML.md
@@ -12,6 +12,7 @@
    - 'regression': regression.
    - 'ts_forecast': time series forecasting.
    - 'ts_forecast_classification': time series forecasting for classification.
+    - 'ts_forecast_panel': time series forecasting for panel datasets (multiple time series).
    - 'rank': learning to rank.
    - 'seq-classification': sequence classification.
    - 'seq-regression': sequence regression.
@@ -119,6 +120,7 @@ The estimator list can contain one or more estimator names, each corresponding t
    - 'arima': ARIMA for task "ts_forecast". Hyperparameters: p, d, q.
    - 'sarimax': SARIMAX for task "ts_forecast". Hyperparameters: p, d, q, P, D, Q, s.
    - 'transformer': Huggingface transformer models for task "seq-classification", "seq-regression", "multichoice-classification", "token-classification" and "summarization". Hyperparameters: learning_rate, num_train_epochs, per_device_train_batch_size, warmup_ratio, weight_decay, adam_epsilon, seed.
+    - 'temporal_fusion_transform': TemporalFusionTransformerEstimator for task "ts_forecast_panel". Hyperparameters: gradient_clip_val, hidden_size, hidden_continuous_size, attention_head_size, dropout, learning_rate.
 * Custom estimator. Use custom estimator for:
    - tuning an estimator that is not built-in;
    - customizing search space for a built-in estimator.