fix issues in logging, bug in space.py, constraint sign, and improve code coverage (#388)

* console log handler

* version update

* doc

* skippable steps

* notebook update

* constraint sign

* doc for constraints

* bug fix: define-by-run and unflatten_hierarchical

* const

* handle nested space in indexof()

* test grid search

* test suggestion

* model test

* >1 ckpts

* always increase iter count

* log total # iterations

* security patch

* make iter_per_learner consistent
This commit is contained in:
Chi Wang
2022-01-14 13:39:09 -08:00
committed by GitHub
parent c1b5cb5348
commit 569908fbe6
25 changed files with 885 additions and 492 deletions

View File

@@ -18,7 +18,7 @@
"## 1. Introduction\n",
"\n",
"FLAML is a Python library (https://github.com/microsoft/FLAML) designed to automatically produce accurate machine learning models \n",
"with low computational cost. It is fast and cheap. The simple and lightweight design makes it easy to use and extend, such as adding new learners. FLAML can \n",
"with low computational cost. It is fast and economical. The simple and lightweight design makes it easy to use and extend, such as adding new learners. FLAML can \n",
"- serve as an economical AutoML engine,\n",
"- be used as a fast hyperparameter tuning tool, or \n",
"- be embedded in self-tuning software that requires low latency & resource in repetitive\n",

File diff suppressed because one or more lines are too long

View File

@@ -13,7 +13,7 @@
"source": [
"## 1. Introduction\n",
"\n",
"FLAML is a Python library (https://github.com/microsoft/FLAML) designed to automatically produce accurate machine learning models with low computational cost. It is fast and cheap. The simple and lightweight design makes it easy to use and extend, such as adding new learners. FLAML can\n",
"FLAML is a Python library (https://github.com/microsoft/FLAML) designed to automatically produce accurate machine learning models with low computational cost. It is fast and economical. The simple and lightweight design makes it easy to use and extend, such as adding new learners. FLAML can\n",
"\n",
" - serve as an economical AutoML engine,\n",
" - be used as a fast hyperparameter tuning tool, or\n",

View File

@@ -18,7 +18,7 @@
"## 1. Introduction\n",
"\n",
"FLAML is a Python library (https://github.com/microsoft/FLAML) designed to automatically produce accurate machine learning models \n",
"with low computational cost. It is fast and cheap. The simple and lightweight design makes it easy \n",
"with low computational cost. It is fast and economical. The simple and lightweight design makes it easy \n",
"to use and extend, such as adding new learners. FLAML can \n",
"- serve as an economical AutoML engine,\n",
"- be used as a fast hyperparameter tuning tool, or \n",

View File

@@ -18,7 +18,7 @@
"## 1. Introduction\n",
"\n",
"FLAML is a Python library (https://github.com/microsoft/FLAML) designed to automatically produce accurate machine learning models \n",
"with low computational cost. It is fast and cheap. The simple and lightweight design makes it easy \n",
"with low computational cost. It is fast and economical. The simple and lightweight design makes it easy \n",
"to use and extend, such as adding new learners. FLAML can \n",
"- serve as an economical AutoML engine,\n",
"- be used as a fast hyperparameter tuning tool, or \n",

View File

@@ -2,33 +2,34 @@
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) 2021. All rights reserved.\n",
"\n",
"Contributed by: @bnriiitb\n",
"\n",
"Licensed under the MIT License."
],
"metadata": {}
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Using AutoML in Sklearn Pipeline\n",
"\n",
"This tutorial will help you understand how FLAML's AutoML can be used as a transformer in the Sklearn pipeline."
],
"metadata": {}
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"## 1.Introduction\n",
"\n",
"### 1.1 FLAML - Fast and Lightweight AutoML\n",
"\n",
"FLAML is a Python library (https://github.com/microsoft/FLAML) designed to automatically produce accurate machine learning models with low computational cost. It is fast and cheap. The simple and lightweight design makes it easy to use and extend, such as adding new learners. \n",
"FLAML is a Python library (https://github.com/microsoft/FLAML) designed to automatically produce accurate machine learning models with low computational cost. It is fast and economical. The simple and lightweight design makes it easy to use and extend, such as adding new learners. \n",
"\n",
"FLAML can \n",
"- serve as an economical AutoML engine,\n",
@@ -42,11 +43,11 @@
"```bash\n",
"pip install flaml[notebook]\n",
"```"
],
"metadata": {}
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 1.2 Why are pipelines a silver bullet?\n",
"\n",
@@ -62,47 +63,42 @@
"* Allow hyperparameter tuning across the estimators\n",
"* Easier to share and collaborate with multiple users (bug fixes, enhancements etc)\n",
"* Enforce the implementation and order of steps"
],
"metadata": {}
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### As FLAML's AutoML module can be used a transformer in the Sklearn's pipeline we can get all the benefits of pipeline and thereby write extremley clean, and resuable code."
],
"metadata": {}
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [],
"source": [
"!pip install flaml[notebook];"
],
"outputs": [],
"metadata": {}
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Classification Example\n",
"### Load data and preprocess\n",
"\n",
"Download [Airlines dataset](https://www.openml.org/d/1169) from OpenML. The task is to predict whether a given flight will be delayed, given the information of the scheduled departure."
],
"metadata": {}
]
},
{
"cell_type": "code",
"execution_count": 4,
"source": [
"from flaml.data import load_openml_dataset\n",
"X_train, X_test, y_train, y_test = load_openml_dataset(\n",
" dataset_id=1169, data_dir='./', random_state=1234, dataset_format='array')"
],
"metadata": {},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"output_type": "stream",
"text": [
"load dataset from ./openml_ds1169.pkl\n",
"Dataset name: airlines\n",
@@ -111,38 +107,62 @@
]
}
],
"metadata": {}
"source": [
"from flaml.data import load_openml_dataset\n",
"X_train, X_test, y_train, y_test = load_openml_dataset(\n",
" dataset_id=1169, data_dir='./', random_state=1234, dataset_format='array')"
]
},
{
"cell_type": "code",
"execution_count": 5,
"source": [
"X_train[0]"
],
"metadata": {},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"array([ 12., 2648., 4., 15., 4., 450., 67.], dtype=float32)"
]
},
"execution_count": 5,
"metadata": {},
"execution_count": 5
"output_type": "execute_result"
}
],
"metadata": {}
"source": [
"X_train[0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Create a Pipeline"
],
"metadata": {}
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style>div.sk-top-container {color: black;background-color: white;}div.sk-toggleable {background-color: white;}label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.2em 0.3em;box-sizing: border-box;text-align: center;}div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}div.sk-estimator {font-family: monospace;background-color: #f0f8ff;margin: 0.25em 0.25em;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;}div.sk-estimator:hover {background-color: #d4ebff;}div.sk-parallel-item::after {content: \"\";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}div.sk-serial::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 2em;bottom: 0;left: 50%;}div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;}div.sk-item {z-index: 1;}div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;}div.sk-parallel-item {display: flex;flex-direction: column;position: relative;background-color: white;}div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}div.sk-parallel-item:only-child::after {width: 0;}div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0.2em;box-sizing: border-box;padding-bottom: 0.1em;background-color: white;position: relative;}div.sk-label label {font-family: monospace;font-weight: bold;background-color: white;display: inline-block;line-height: 1.2em;}div.sk-label-container {position: relative;z-index: 2;text-align: center;}div.sk-container {display: inline-block;position: relative;}</style><div class=\"sk-top-container\"><div class=\"sk-container\"><div class=\"sk-item sk-dashed-wrapped\"><div class=\"sk-label-container\"><div class=\"sk-label sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"b91d1bdf-ccb8-4fa5-a2d0-67a3538c0afc\" type=\"checkbox\" ><label class=\"sk-toggleable__label\" for=\"b91d1bdf-ccb8-4fa5-a2d0-67a3538c0afc\">Pipeline</label><div class=\"sk-toggleable__content\"><pre>Pipeline(steps=[('imputuer', SimpleImputer()),\n",
" ('standardizer', StandardScaler()),\n",
" ('automl', <flaml.automl.AutoML object at 0x7f046d56fb50>)])</pre></div></div></div><div class=\"sk-serial\"><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"a8311733-9e55-4c0c-9c2a-6b9ba6227596\" type=\"checkbox\" ><label class=\"sk-toggleable__label\" for=\"a8311733-9e55-4c0c-9c2a-6b9ba6227596\">SimpleImputer</label><div class=\"sk-toggleable__content\"><pre>SimpleImputer()</pre></div></div></div><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"52580e54-89ab-4fb7-83a1-ae13962854bb\" type=\"checkbox\" ><label class=\"sk-toggleable__label\" for=\"52580e54-89ab-4fb7-83a1-ae13962854bb\">StandardScaler</label><div class=\"sk-toggleable__content\"><pre>StandardScaler()</pre></div></div></div><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"b9fe5397-bf24-491d-a938-c39a780e1ac0\" type=\"checkbox\" ><label class=\"sk-toggleable__label\" for=\"b9fe5397-bf24-491d-a938-c39a780e1ac0\">AutoML</label><div class=\"sk-toggleable__content\"><pre><flaml.automl.AutoML object at 0x7f046d56fb50></pre></div></div></div></div></div></div></div>"
],
"text/plain": [
"Pipeline(steps=[('imputuer', SimpleImputer()),\n",
" ('standardizer', StandardScaler()),\n",
" ('automl', <flaml.automl.AutoML object at 0x7f046d56fb50>)])"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import sklearn\n",
"from sklearn import set_config\n",
@@ -163,39 +183,21 @@
" (\"automl\", automl)\n",
"])\n",
"automl_pipeline"
],
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"Pipeline(steps=[('imputuer', SimpleImputer()),\n",
" ('standardizer', StandardScaler()),\n",
" ('automl', <flaml.automl.AutoML object at 0x7f046d56fb50>)])"
],
"text/html": [
"<style>div.sk-top-container {color: black;background-color: white;}div.sk-toggleable {background-color: white;}label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.2em 0.3em;box-sizing: border-box;text-align: center;}div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}div.sk-estimator {font-family: monospace;background-color: #f0f8ff;margin: 0.25em 0.25em;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;}div.sk-estimator:hover {background-color: #d4ebff;}div.sk-parallel-item::after {content: \"\";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}div.sk-serial::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 2em;bottom: 0;left: 50%;}div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;}div.sk-item {z-index: 1;}div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;}div.sk-parallel-item {display: flex;flex-direction: column;position: relative;background-color: white;}div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}div.sk-parallel-item:only-child::after {width: 0;}div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0.2em;box-sizing: border-box;padding-bottom: 0.1em;background-color: white;position: relative;}div.sk-label label {font-family: monospace;font-weight: bold;background-color: white;display: inline-block;line-height: 1.2em;}div.sk-label-container {position: relative;z-index: 2;text-align: center;}div.sk-container {display: inline-block;position: relative;}</style><div class=\"sk-top-container\"><div class=\"sk-container\"><div class=\"sk-item sk-dashed-wrapped\"><div class=\"sk-label-container\"><div class=\"sk-label sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"b91d1bdf-ccb8-4fa5-a2d0-67a3538c0afc\" type=\"checkbox\" ><label class=\"sk-toggleable__label\" for=\"b91d1bdf-ccb8-4fa5-a2d0-67a3538c0afc\">Pipeline</label><div class=\"sk-toggleable__content\"><pre>Pipeline(steps=[('imputuer', SimpleImputer()),\n",
" ('standardizer', StandardScaler()),\n",
" ('automl', <flaml.automl.AutoML object at 0x7f046d56fb50>)])</pre></div></div></div><div class=\"sk-serial\"><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"a8311733-9e55-4c0c-9c2a-6b9ba6227596\" type=\"checkbox\" ><label class=\"sk-toggleable__label\" for=\"a8311733-9e55-4c0c-9c2a-6b9ba6227596\">SimpleImputer</label><div class=\"sk-toggleable__content\"><pre>SimpleImputer()</pre></div></div></div><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"52580e54-89ab-4fb7-83a1-ae13962854bb\" type=\"checkbox\" ><label class=\"sk-toggleable__label\" for=\"52580e54-89ab-4fb7-83a1-ae13962854bb\">StandardScaler</label><div class=\"sk-toggleable__content\"><pre>StandardScaler()</pre></div></div></div><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"b9fe5397-bf24-491d-a938-c39a780e1ac0\" type=\"checkbox\" ><label class=\"sk-toggleable__label\" for=\"b9fe5397-bf24-491d-a938-c39a780e1ac0\">AutoML</label><div class=\"sk-toggleable__content\"><pre><flaml.automl.AutoML object at 0x7f046d56fb50></pre></div></div></div></div></div></div></div>"
]
},
"metadata": {},
"execution_count": 6
}
],
"metadata": {}
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Run FLAML\n",
"In the FLAML automl run configuration, users can specify the task type, time budget, error metric, learner list, whether to subsample, resampling strategy type, and so on. All these arguments have default values which will be used if users do not provide them. For example, the default ML learners of FLAML are `['lgbm', 'xgboost', 'catboost', 'rf', 'extra_tree', 'lrl1']`. "
],
"metadata": {}
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"settings = {\n",
" \"time_budget\": 60, # total running time in seconds\n",
@@ -204,24 +206,16 @@
" \"estimator_list\":['xgboost','catboost','lgbm'],\n",
" \"log_file_name\": 'airlines_experiment.log', # flaml log file\n",
"}"
],
"outputs": [],
"metadata": {}
]
},
{
"cell_type": "code",
"execution_count": 8,
"source": [
"automl_pipeline.fit(X_train, y_train, \n",
" automl__time_budget=settings['time_budget'],\n",
" automl__metric=settings['metric'],\n",
" automl__estimator_list=settings['estimator_list'],\n",
" automl__log_training_metric=True)"
],
"metadata": {},
"outputs": [
{
"output_type": "stream",
"name": "stderr",
"output_type": "stream",
"text": [
"[flaml.automl: 08-22 21:32:13] {1130} INFO - Evaluation method: holdout\n",
"[flaml.automl: 08-22 21:32:14] {624} INFO - Using StratifiedKFold\n",
@@ -389,28 +383,47 @@
]
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
"Pipeline(steps=[('imputuer', SimpleImputer()),\n",
" ('standardizer', StandardScaler()),\n",
" ('automl', <flaml.automl.AutoML object at 0x7f046d56fb50>)])"
],
"text/html": [
"<style>div.sk-top-container {color: black;background-color: white;}div.sk-toggleable {background-color: white;}label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.2em 0.3em;box-sizing: border-box;text-align: center;}div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}div.sk-estimator {font-family: monospace;background-color: #f0f8ff;margin: 0.25em 0.25em;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;}div.sk-estimator:hover {background-color: #d4ebff;}div.sk-parallel-item::after {content: \"\";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}div.sk-serial::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 2em;bottom: 0;left: 50%;}div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;}div.sk-item {z-index: 1;}div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;}div.sk-parallel-item {display: flex;flex-direction: column;position: relative;background-color: white;}div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}div.sk-parallel-item:only-child::after {width: 0;}div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0.2em;box-sizing: border-box;padding-bottom: 0.1em;background-color: white;position: relative;}div.sk-label label {font-family: monospace;font-weight: bold;background-color: white;display: inline-block;line-height: 1.2em;}div.sk-label-container {position: relative;z-index: 2;text-align: center;}div.sk-container {display: inline-block;position: relative;}</style><div class=\"sk-top-container\"><div class=\"sk-container\"><div class=\"sk-item sk-dashed-wrapped\"><div class=\"sk-label-container\"><div class=\"sk-label sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"b994edf1-5e76-4cd3-b719-4a204af673dc\" type=\"checkbox\" ><label class=\"sk-toggleable__label\" for=\"b994edf1-5e76-4cd3-b719-4a204af673dc\">Pipeline</label><div class=\"sk-toggleable__content\"><pre>Pipeline(steps=[('imputuer', SimpleImputer()),\n",
" ('standardizer', StandardScaler()),\n",
" ('automl', <flaml.automl.AutoML object at 0x7f046d56fb50>)])</pre></div></div></div><div class=\"sk-serial\"><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"c94ee64a-d8b1-4cbb-aeca-952bf6963c13\" type=\"checkbox\" ><label class=\"sk-toggleable__label\" for=\"c94ee64a-d8b1-4cbb-aeca-952bf6963c13\">SimpleImputer</label><div class=\"sk-toggleable__content\"><pre>SimpleImputer()</pre></div></div></div><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"6a28d11a-19e2-4243-8b85-e3ba5f6f2a7e\" type=\"checkbox\" ><label class=\"sk-toggleable__label\" for=\"6a28d11a-19e2-4243-8b85-e3ba5f6f2a7e\">StandardScaler</label><div class=\"sk-toggleable__content\"><pre>StandardScaler()</pre></div></div></div><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"03dcbe59-a8be-4f09-a944-115d90939f81\" type=\"checkbox\" ><label class=\"sk-toggleable__label\" for=\"03dcbe59-a8be-4f09-a944-115d90939f81\">AutoML</label><div class=\"sk-toggleable__content\"><pre><flaml.automl.AutoML object at 0x7f046d56fb50></pre></div></div></div></div></div></div></div>"
],
"text/plain": [
"Pipeline(steps=[('imputuer', SimpleImputer()),\n",
" ('standardizer', StandardScaler()),\n",
" ('automl', <flaml.automl.AutoML object at 0x7f046d56fb50>)])"
]
},
"execution_count": 8,
"metadata": {},
"execution_count": 8
"output_type": "execute_result"
}
],
"metadata": {}
"source": [
"automl_pipeline.fit(X_train, y_train, \n",
" automl__time_budget=settings['time_budget'],\n",
" automl__metric=settings['metric'],\n",
" automl__estimator_list=settings['estimator_list'],\n",
" automl__log_training_metric=True)"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Best ML leaner: xgboost\n",
"Best hyperparmeter config: {'n_estimators': 63, 'max_leaves': 1797, 'min_child_weight': 0.07275175679381725, 'learning_rate': 0.06234183309508761, 'subsample': 0.9814772488195874, 'colsample_bylevel': 0.810466508891351, 'colsample_bytree': 0.8005378817953572, 'reg_alpha': 0.5768305704485758, 'reg_lambda': 6.867180836557797, 'FLAML_sample_size': 364083}\n",
"Best accuracy on validation data: 0.6721\n",
"Training duration of best run: 15.45 s\n"
]
}
],
"source": [
"# Get the automl object from the pipeline\n",
"automl = automl_pipeline.steps[2][1]\n",
@@ -420,75 +433,55 @@
"print('Best hyperparmeter config:', automl.best_config)\n",
"print('Best accuracy on validation data: {0:.4g}'.format(1-automl.best_loss))\n",
"print('Training duration of best run: {0:.4g} s'.format(automl.best_config_train_time))"
],
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Best ML leaner: xgboost\n",
"Best hyperparmeter config: {'n_estimators': 63, 'max_leaves': 1797, 'min_child_weight': 0.07275175679381725, 'learning_rate': 0.06234183309508761, 'subsample': 0.9814772488195874, 'colsample_bylevel': 0.810466508891351, 'colsample_bytree': 0.8005378817953572, 'reg_alpha': 0.5768305704485758, 'reg_lambda': 6.867180836557797, 'FLAML_sample_size': 364083}\n",
"Best accuracy on validation data: 0.6721\n",
"Training duration of best run: 15.45 s\n"
]
}
],
"metadata": {}
]
},
{
"cell_type": "code",
"execution_count": 10,
"source": [
"automl.model"
],
"metadata": {},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"<flaml.model.XGBoostSklearnEstimator at 0x7f03a5eada00>"
]
},
"execution_count": 10,
"metadata": {},
"execution_count": 10
"output_type": "execute_result"
}
],
"metadata": {}
"source": [
"automl.model"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. Persist the model binary file"
],
"metadata": {}
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"# Persist the automl object as pickle file\n",
"import pickle\n",
"with open('automl.pkl', 'wb') as f:\n",
" pickle.dump(automl, f, pickle.HIGHEST_PROTOCOL)"
],
"outputs": [],
"metadata": {}
]
},
{
"cell_type": "code",
"execution_count": 12,
"source": [
"# Performance inference on the testing dataset\n",
"y_pred = automl_pipeline.predict(X_test)\n",
"print('Predicted labels', y_pred)\n",
"print('True labels', y_test)\n",
"y_pred_proba = automl_pipeline.predict_proba(X_test)[:,1]\n",
"print('Predicted probas ',y_pred_proba[:5])"
],
"metadata": {},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"output_type": "stream",
"text": [
"Predicted labels [0 1 1 ... 0 1 0]\n",
"True labels [0 0 0 ... 1 0 1]\n",
@@ -496,13 +489,23 @@
]
}
],
"metadata": {}
"source": [
"# Performance inference on the testing dataset\n",
"y_pred = automl_pipeline.predict(X_test)\n",
"print('Predicted labels', y_pred)\n",
"print('True labels', y_test)\n",
"y_pred_proba = automl_pipeline.predict_proba(X_test)[:,1]\n",
"print('Predicted probas ',y_pred_proba[:5])"
]
}
],
"metadata": {
"interpreter": {
"hash": "0cfea3304185a9579d09e0953576b57c8581e46e6ebc6dfeb681bc5a511f7544"
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3.8.0 64-bit ('blend': conda)"
"display_name": "Python 3.8.0 64-bit ('blend': conda)",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
@@ -515,11 +518,8 @@
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.0"
},
"interpreter": {
"hash": "0cfea3304185a9579d09e0953576b57c8581e46e6ebc6dfeb681bc5a511f7544"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
}