azureml + ray (#344)

* examples and documentation about how to use azureml + ray
2026-04-20 03:02:16 -04:00 · 2021-12-23 13:37:07 -08:00
parent baa0359324
commit 300f286667
5 changed files with 171 additions and 6 deletions
--- a/website/docs/Examples/Integrate
+++ b/website/docs/Examples/Integrate
@@ -1,4 +1,4 @@
-FLAML can be used together with AzureML and mlflow.
+FLAML can be used together with AzureML. On top of that, using mlflow and ray is easy too.

 ### Prerequisites

@@ -48,4 +48,116 @@ with mlflow.start_run() as run:  # create a mlflow run

 The metrics in the run will be automatically logged in an experiment named "flaml" in your AzureML workspace.

-[Link to notebook](https://github.com/microsoft/FLAML/blob/main/notebook/integrate_azureml.ipynb) | [Open in colab](https://colab.research.google.com/github/microsoft/FLAML/blob/main/notebook/integrate_azureml.ipynb)
+[Link to notebook](https://github.com/microsoft/FLAML/blob/main/notebook/integrate_azureml.ipynb) | [Open in colab](https://colab.research.google.com/github/microsoft/FLAML/blob/main/notebook/integrate_azureml.ipynb)
+
+### Use ray to distribute across a cluster
+
+When you have a compute cluster in AzureML, you can distribute `flaml.AutoML` or `flaml.tune` with ray.
+
+#### Build a ray environment in AzureML
+
+Create a docker file such as [.Docker/Dockerfile-cpu](https://github.com/microsoft/FLAML/blob/main/test/.Docker/Dockerfile-cpu). Make sure `RUN pip install flaml[blendsearch,ray]` is included in the docker file.
+
+Then build a AzureML environment in the workspace `ws`.
+
+```python
+ray_environment_name = "aml-ray-cpu"
+ray_environment_dockerfile_path = "./Docker/Dockerfile-cpu"
+
+# Build CPU image for Ray
+ray_cpu_env = Environment.from_dockerfile(name=ray_environment_name, dockerfile=ray_environment_dockerfile_path)
+ray_cpu_env.register(workspace=ws)
+ray_cpu_build_details = ray_cpu_env.build(workspace=ws)
+
+import time
+while ray_cpu_build_details.status not in ["Succeeded", "Failed"]:
+    print(f"Awaiting completion of ray CPU environment build. Current status is: {ray_cpu_build_details.status}")
+    time.sleep(10)
+```
+
+You only need to do this step once for one workspace.
+
+#### Create a compute cluster with multiple nodes
+
+```python
+from azureml.core.compute import AmlCompute, ComputeTarget
+
+compute_target_name = "cpucluster"
+node_count = 2
+
+# This example uses CPU VM. For using GPU VM, set SKU to STANDARD_NC6
+compute_target_size = "STANDARD_D2_V2"
+
+if compute_target_name in ws.compute_targets:
+    compute_target = ws.compute_targets[compute_target_name]
+    if compute_target and type(compute_target) is AmlCompute:
+        if compute_target.provisioning_state == "Succeeded":
+            print("Found compute target; using it:", compute_target_name)
+        else:
+            raise Exception(
+                "Found compute target but it is in state", compute_target.provisioning_state)
+else:
+    print("creating a new compute target...")
+    provisioning_config = AmlCompute.provisioning_configuration(
+        vm_size=compute_target_size,
+        min_nodes=0,
+        max_nodes=node_count)
+
+    # Create the cluster
+    compute_target = ComputeTarget.create(ws, compute_target_name, provisioning_config)
+
+    # Can poll for a minimum number of nodes and for a specific timeout.
+    # If no min node count is provided it will use the scale settings for the cluster
+    compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)
+
+    # For a more detailed view of current AmlCompute status, use get_status()
+    print(compute_target.get_status().serialize())
+```
+
+If the computer target "cpucluster" already exists, it will not be recreated.
+
+#### Run distributed AutoML job
+
+Assuming you have an automl script like [ray/distribute_automl.py](https://github.com/microsoft/FLAML/blob/main/test/ray/distribute_automl.py). It uses `ray.init(address="auto")` to initialize the cluster, and uses `n_concurrent_trials=k` to inform `AutoML.fit()` to perform k concurrent trials in parallel.
+
+Submit an AzureML job as the following:
+
+```python
+from azureml.core import Workspace, Experiment, ScriptRunConfig, Environment
+
+command = ["python distribute_automl.py"]
+ray_environment_name = 'aml-ray-cpu'
+env = Environment.get(workspace=ws, name=ray_environment_name)
+config = ScriptRunConfig(
+    source_directory='ray/',
+    command=command,
+    compute_target=compute_target,
+    environment=env,
+)
+
+config.run_config.node_count = 2
+config.run_config.environment_variables["_AZUREML_CR_START_RAY"] = "true"
+config.run_config.environment_variables["AZUREML_COMPUTE_USE_COMMON_RUNTIME"] = "true"
+
+exp = Experiment(ws, 'distribute-automl')
+run = exp.submit(config)
+
+print(run.get_portal_url())  # link to ml.azure.com
+run.wait_for_completion(show_output=True)
+```
+
+The line
+`
+config.run_config.environment_variables["_AZUREML_CR_START_RAY"] = "true"
+`
+tells AzureML to start ray on each node of the cluster.
+
+#### Run distributed tune job
+
+Prepare a script like [ray/distribute_tune.py](https://github.com/microsoft/FLAML/blob/main/test/ray/distribute_tune.py). Replace the command in the above eample with:
+
+```python
+command = ["python distribute_tune.py"]
+```
+
+Everything else is the same.