mirror of
https://github.com/microsoft/autogen.git
synced 2026-04-20 03:02:16 -04:00
Enhance Integration with Spark (#1097)
* add doc for spark * labelCol equals to label by default * change title and reformat * reference about default index type * fix doc build * Update website/docs/Examples/Integrate - Spark.md * update doc * Added more references * remove exception case when `y_train.name` is None * fix broken link --------- Co-authored-by: Wendong Li <v-wendongli@microsoft.com> Co-authored-by: Li Jiang <bnujli@gmail.com>
This commit is contained in:
@@ -420,7 +420,7 @@ An example of using Spark for parallel tuning is:
|
||||
```python
|
||||
automl.fit(X_train, y_train, n_concurrent_trials=4, use_spark=True)
|
||||
```
|
||||
For Spark clusters, by default, we will launch one trial per executor. However, sometimes we want to launch more trials than the number of executors (e.g., local mode). In this case, we can set the environment variable `FLAML_MAX_CONCURRENT` to override the detected `num_executors`. The final number of concurrent trials will be the minimum of `n_concurrent_trials` and `num_executors`. Also, GPU training is not supported yet when use_spark is True.
|
||||
Details about parallel tuning with Spark could be found [here](../Examples/Integrate%20-%20Spark#parallel-spark-jobs). For Spark clusters, by default, we will launch one trial per executor. However, sometimes we want to launch more trials than the number of executors (e.g., local mode). In this case, we can set the environment variable `FLAML_MAX_CONCURRENT` to override the detected `num_executors`. The final number of concurrent trials will be the minimum of `n_concurrent_trials` and `num_executors`. Also, GPU training is not supported yet when use_spark is True.
|
||||
|
||||
#### **Guidelines on parallel vs sequential tuning**
|
||||
|
||||
|
||||
@@ -293,6 +293,8 @@ Related arguments:
|
||||
- `use_spark`: A boolean of whether to use spark as the backend.
|
||||
- `resources_per_trial`: A dictionary of the hardware resources to allocate per trial, e.g., `{'cpu': 1}`. Only valid when using ray backend.
|
||||
|
||||
Details about parallel tuning with Spark could be found [here](../Examples/Integrate%20-%20Spark#parallel-spark-jobs).
|
||||
|
||||
|
||||
You can perform parallel tuning by specifying `use_ray=True` (requiring flaml[ray] option installed) or `use_spark=True`
|
||||
(requiring flaml[spark] option installed). You can also limit the amount of resources allocated per trial by specifying `resources_per_trial`,
|
||||
|
||||
Reference in New Issue
Block a user