Files
concrete/docs/user/advanced_examples/QuantizedLogisticRegression.ipynb
Jeremy Bradley-Silverio Donato f387eaedba docs: English checking and improvement
2021-12-20 11:43:52 +01:00

569 lines
130 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"id": "9b835b74",
"metadata": {},
"source": [
"# Quantized Logistic Regression\n",
"\n",
"Currently, **Concrete** only supports unsigned integers up to 7-bits. Nevertheless, we want to evaluate a logistic regression model with it. Luckily, we can make use of **quantization** to overcome this limitation."
]
},
{
"cell_type": "markdown",
"id": "7d46edc9",
"metadata": {},
"source": [
"### Let's start by importing some libraries to develop our logistic regression model."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "858205d9",
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"from sklearn.linear_model import LogisticRegression\n",
"from sklearn.datasets import make_classification\n",
"from sklearn.model_selection import train_test_split\n",
"\n",
"from copy import deepcopy\n",
"from typing import Any, Dict\n",
"\n",
"from tqdm import tqdm"
]
},
{
"cell_type": "markdown",
"id": "86b77c19",
"metadata": {},
"source": [
"### Now import Concrete quantization tools. "
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "94df1602",
"metadata": {},
"outputs": [],
"source": [
"from concrete.quantization import (\n",
" QuantizedArray,\n",
" QuantizedLinear,\n",
" QuantizedModule,\n",
" QuantizedSigmoid,\n",
")"
]
},
{
"cell_type": "markdown",
"id": "ff9c1757",
"metadata": {},
"source": [
"### And some helpers for visualization."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "67330862",
"metadata": {},
"outputs": [],
"source": [
"%matplotlib inline\n",
"\n",
"import matplotlib.pyplot as plt\n",
"from IPython.display import display"
]
},
{
"cell_type": "markdown",
"id": "d4f43095",
"metadata": {},
"source": [
"### And, finally, the FHE compiler."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "3b76a5f6",
"metadata": {},
"outputs": [],
"source": [
"import concrete.numpy as hnp"
]
},
{
"cell_type": "markdown",
"id": "34959f0a",
"metadata": {},
"source": [
"### Define our Quantized Logistic Regression model."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "a12ce041",
"metadata": {},
"outputs": [],
"source": [
"class QuantizedLogisticRegression(QuantizedModule):\n",
" \"\"\"\n",
" Quantized Logistic Regression\n",
" Building on top of QuantizedModule, this class will chain together a linear transformation\n",
" and an inverse-link function, in this case the logistic function\n",
" \"\"\"\n",
"\n",
" @staticmethod\n",
" def from_sklearn(sklearn_model, calibration_data):\n",
" \"\"\"Create a Quantized Logistic Regression initialized from a sklearn trained model\"\"\"\n",
" if sklearn_model.coef_.ndim == 1:\n",
" weights = np.expand_dims(sklearn_model.coef_, 1)\n",
" else:\n",
" weights = sklearn_model.coef_.transpose()\n",
"\n",
" bias = sklearn_model.intercept_\n",
" # In our case we have two data dimensions, the precision of the weights needs to be 2 bits, \n",
" # as for now we need the quantized values to be greater than zero for weights\n",
" # Thus, to insure a maximum of 7 bits in the output of the linear transformation, we choose\n",
" # 4 bits for the data and the minimum of 1 for the bias\n",
" return QuantizedLogisticRegression(4, 2, 1, 6, weights, bias, calibration_data)\n",
"\n",
" def __init__(self, q_bits, w_bits, b_bits, out_bits, weights, bias, calibration_data) -> None:\n",
" \"\"\"\n",
" Create the Logistic regression with different quantization bit precisions:\n",
"\n",
" Quantization Parameters - Number of bits:\n",
" q_bits (int): bits for input data, insuring that the number of bits of\n",
" the w . x + b operation does not exceed 7 for the calibration data\n",
" w_bits (int): bits for weights: in the case of a univariate regression this\n",
" can be 1\n",
" b_bits (int): bits for bias (this is a single value so a single bit is enough)\n",
" out_bits (int): bits for the result of the linear transformation (w.x + b).\n",
" In the case of Logistic Regression the result of the linear\n",
" transformation is input to a univariate inverse-link function, so\n",
" this value can be 7\n",
"\n",
" Other parameters:\n",
" weights: a numpy nd-array of weights (Nxd) where d is the data dimensionality\n",
" bias: a numpy scalar\n",
" calibration_data: a numpy nd-array of data (Nxd)\n",
" \"\"\"\n",
" self.n_bits = out_bits\n",
"\n",
" # We need to calibrate to a sufficiently low number of bits\n",
" # so that the output of the Linear layer (w . x + b)\n",
" # does not exceed 7 bits\n",
" self.q_calibration_data = QuantizedArray(q_bits, calibration_data)\n",
"\n",
" # Quantize the weights and create the quantized linear layer\n",
" q_weights = QuantizedArray(w_bits, weights)\n",
" q_bias = QuantizedArray(b_bits, bias)\n",
" q_layer = QuantizedLinear(out_bits, q_weights, q_bias)\n",
"\n",
" # Store quantized layers\n",
" quant_layers_dict: Dict[str, Any] = {}\n",
"\n",
" # Calibrate the linear layer and obtain calibration_data for the next layers\n",
" calibration_data = self._calibrate_and_store_layers_activation(\n",
" \"linear\", q_layer, calibration_data, quant_layers_dict\n",
" )\n",
"\n",
" # Add the inverse-link for inference.\n",
" # This needs to be quantized since it's computed in FHE,\n",
" # but we can use 7 bits of output since, in this case,\n",
" # the result of the inverse-link is not processed by any further layers\n",
" # Seven bits is the maximum precision but this could be lowered to improve speed\n",
" # at the possible expense of higher deviance of the regressor\n",
" q_logit = QuantizedSigmoid(n_bits=7)\n",
"\n",
" # Now calibrate the inverse-link function with the linear layer's output data\n",
" calibration_data = self._calibrate_and_store_layers_activation(\n",
" \"invlink\", q_logit, calibration_data, quant_layers_dict\n",
" )\n",
"\n",
" # Finally construct our Module using the quantized layers\n",
" super().__init__(quant_layers_dict)\n",
"\n",
" def _calibrate_and_store_layers_activation(\n",
" self, name, q_function, calibration_data, quant_layers_dict\n",
" ):\n",
" \"\"\"\n",
" This function calibrates a layer of a quantized module (e.g. linear, inverse-link,\n",
" activation, etc) by looking at the input data, then computes the output of the quantized\n",
" version of the layer to be used as input to the following layers\n",
" \"\"\"\n",
"\n",
" # Calibrate the output of the layer\n",
" q_function.calibrate(calibration_data)\n",
" # Store the learned quantized layer\n",
" quant_layers_dict[name] = q_function\n",
" # Create new calibration data (output of the previous layer)\n",
" q_calibration_data = QuantizedArray(self.n_bits, calibration_data)\n",
" # Dequantize to have the value in clear and ready for next calibration\n",
" return q_function(q_calibration_data).dequant()\n",
"\n",
" def quantize_input(self, x):\n",
" q_input_arr = deepcopy(self.q_calibration_data)\n",
" q_input_arr.update_values(x)\n",
" return q_input_arr\n",
"\n"
]
},
{
"cell_type": "markdown",
"id": "0df30d0e",
"metadata": {},
"source": [
"### We need a training set, specifically a handcrafted one for simplicity. Let's also define a grid on which to test our classifier."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "caef5aed",
"metadata": {},
"outputs": [],
"source": [
"X, y = make_classification(\n",
" n_features=2,\n",
" n_redundant=0,\n",
" n_informative=2,\n",
" random_state=2,\n",
" n_clusters_per_class=1,\n",
" n_samples=100,\n",
")\n",
"\n",
"rng = np.random.RandomState(2)\n",
"X += 2 * rng.uniform(size=X.shape)\n",
"\n",
"b_min = np.min(X, axis=0)\n",
"b_max = np.max(X, axis=0)\n",
"\n",
"x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=42)\n",
"\n",
"x_test_grid, y_test_grid = np.meshgrid(\n",
" np.linspace(b_min[0], b_max[0], 30), np.linspace(b_min[1], b_max[1], 30)\n",
")\n",
"x_grid_test = np.vstack([x_test_grid.ravel(), y_test_grid.ravel()]).transpose()\n"
]
},
{
"cell_type": "markdown",
"id": "0b209247",
"metadata": {},
"source": [
"### Train a logistic regression with sklearn on the training set."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "ec57fede",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"LogisticRegression()"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"logreg = LogisticRegression()\n",
"logreg.fit(x_train, y_train)"
]
},
{
"cell_type": "markdown",
"id": "5be6c7d5",
"metadata": {},
"source": [
"### Let's visualize our data set and initial classifier to get a grasp on it."
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "f7076523",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "",
"text/plain": [
"<Figure size 864x576 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"y_score_grid = logreg.predict_proba(x_grid_test)[:,1]\n",
"\n",
"plt.ioff()\n",
"plt.clf()\n",
"fig, ax = plt.subplots(1, figsize=(12,8))\n",
"fig.patch.set_facecolor('white')\n",
"ax.contourf(x_test_grid, y_test_grid, y_score_grid.reshape(x_test_grid.shape), cmap='coolwarm')\n",
"CS1 = ax.contour(\n",
" x_test_grid,\n",
" y_test_grid,\n",
" y_score_grid.reshape(x_test_grid.shape),\n",
" levels=[0.5],\n",
" linewidths=2,\n",
")\n",
"CS1.collections[0].set_label(\"Sklearn decision boundary\")\n",
"ax.scatter(x_train[:,0], x_train[:,1],c=y_train, marker=\"D\", cmap=\"jet\")\n",
"ax.scatter(x_test[:,0], x_test[:,1], c=y_test, marker=\"x\", cmap=\"jet\")\n",
"ax.legend(loc=\"upper right\")\n",
"display(fig)"
]
},
{
"cell_type": "markdown",
"id": "996fbe05",
"metadata": {},
"source": [
"### Calibrate the model for quantization using both training and test data\n"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "06ed91dd",
"metadata": {},
"outputs": [],
"source": [
"calib_data = X \n",
"q_logreg = QuantizedLogisticRegression.from_sklearn(logreg, calib_data)"
]
},
{
"cell_type": "markdown",
"id": "cd74c5e7",
"metadata": {},
"source": [
"### Now, we can compile our model to FHE, taking as the possible input set all of our dataset."
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "b8f8f95b",
"metadata": {},
"outputs": [],
"source": [
"X_q = q_logreg.quantize_input(X)\n",
"\n",
"engine = q_logreg.compile(X_q)"
]
},
{
"cell_type": "markdown",
"id": "b608faef",
"metadata": {},
"source": [
"### Time to make some predictions, first in the clear."
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "97eaf932",
"metadata": {},
"outputs": [],
"source": [
"# Test the original classifier\n",
"y_pred_test = np.asarray(logreg.predict(x_test))\n",
"\n",
"# Now that the model is quantized, predict on the test set\n",
"x_test_q = q_logreg.quantize_input(x_test)\n",
"q_y_score_test = q_logreg.forward_and_dequant(x_test_q)\n",
"q_y_pred_test = (q_y_score_test > 0.5).astype(np.int32)\n",
"\n",
"# Predict sklearn classifier probabilities on the domain\n",
"y_score_grid = logreg.predict_proba(x_grid_test)[:, 0]\n",
"\n",
"# Predict quantized classifier probabilities on the whole domain to plot contours\n",
"grid_test_q = q_logreg.quantize_input(x_grid_test)\n",
"q_y_score_grid = q_logreg.forward_and_dequant(grid_test_q)\n",
"q_y_pred_test = (q_y_score_test > 0.5).astype(np.int32)\n"
]
},
{
"cell_type": "markdown",
"id": "8fb62d52",
"metadata": {},
"source": [
"### Now let's predict using the quantized FHE classifier."
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "bc999411",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|██████████| 40/40 [01:11<00:00, 1.80s/it]\n"
]
}
],
"source": [
"non_homomorphic_correct = 0\n",
"homomorphic_correct = 0\n",
"\n",
"# Track the samples that are wrongly classified due to quantization issues\n",
"q_wrong_predictions = np.zeros((0, 2), dtype=X.dtype)\n",
"\n",
"# Predict the FHE quantized classifier probabilities on the test set.\n",
"# Compute FHE quantized accuracy, clear-quantized accuracy and \n",
"# keep track of samples wrongly classified due to quantization\n",
"for i, x_i in enumerate(tqdm(x_test_q.qvalues)):\n",
" y_i = y_test[i]\n",
"\n",
" fhe_in_sample = np.expand_dims(x_i, 1).transpose([1, 0]).astype(np.uint8)\n",
"\n",
" q_pred_fhe = engine.run(fhe_in_sample)\n",
" y_score_fhe = q_logreg.dequantize_output(q_pred_fhe)\n",
" homomorphic_prediction = (y_score_fhe > 0.5).astype(np.int32)\n",
"\n",
" non_homomorphic_prediction = q_y_pred_test[i]\n",
" if non_homomorphic_prediction == y_i:\n",
" non_homomorphic_correct += 1\n",
" elif y_pred_test[i] == y_i:\n",
" q_wrong_predictions = np.vstack((q_wrong_predictions, x_test[i, :]))\n",
"\n",
" if homomorphic_prediction == y_i:\n",
" homomorphic_correct += 1"
]
},
{
"cell_type": "markdown",
"id": "f8c1d98a",
"metadata": {},
"source": [
"### Aggregate accuracies for all the versions of the classifier."
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "8f3236fb",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Sklearn accuracy: 90.0000\n",
"Non Homomorphic Accuracy: 85.0000\n",
"Homomorphic Accuracy: 85.0000\n",
"Difference Percentage: 0.00%\n"
]
}
],
"source": [
"sklearn_acc = np.sum(y_pred_test == y_test) / len(y_test) * 100\n",
"non_homomorphic_accuracy = (non_homomorphic_correct / len(y_test)) * 100\n",
"homomorphic_accuracy = (homomorphic_correct / len(y_test)) * 100\n",
"difference = abs(homomorphic_accuracy - non_homomorphic_accuracy)\n",
"\n",
"print(f\"Sklearn accuracy: {sklearn_acc:.4f}\")\n",
"print(f\"Non Homomorphic Accuracy: {non_homomorphic_accuracy:.4f}\")\n",
"print(f\"Homomorphic Accuracy: {homomorphic_accuracy:.4f}\")\n",
"print(f\"Difference Percentage: {difference:.2f}%\")\n"
]
},
{
"cell_type": "markdown",
"id": "4810fdaf",
"metadata": {},
"source": [
"### Plot the results of both the original and FHE versions of the classifier, showing classification errors induced by quantization with a red circle."
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "41b274ed",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "",
"text/plain": [
"<Figure size 864x576 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"plt.clf()\n",
"fig, ax = plt.subplots(1,figsize=(12,8))\n",
"fig.patch.set_facecolor('white')\n",
"ax.contourf(x_test_grid, y_test_grid, q_y_score_grid.reshape(x_test_grid.shape), cmap=\"coolwarm\")\n",
"CS1 = ax.contour(\n",
" x_test_grid,\n",
" y_test_grid,\n",
" q_y_score_grid.reshape(x_test_grid.shape),\n",
" levels=[0.5],\n",
" linewidths=2,\n",
")\n",
"ax.scatter(x_train[:, 0], x_train[:, 1], c=y_train, cmap=\"jet\", marker=\"D\")\n",
"ax.scatter(\n",
" q_wrong_predictions[:, 0], q_wrong_predictions[:, 1], c=\"red\", marker=\"o\", edgecolors=\"k\", s=32\n",
")\n",
"ax.scatter(x_test[:, 0], x_test[:, 1], c=q_y_pred_test, cmap=\"jet\", marker=\"x\")\n",
"CS2 = ax.contour(\n",
" x_test_grid,\n",
" y_test_grid,\n",
" y_score_grid.reshape(x_test_grid.shape),\n",
" levels=[0.5],\n",
" linewidths=2,\n",
" linestyles=\"dashed\",\n",
" cmap=\"hot\",\n",
")\n",
"ax.clabel(CS1, CS1.levels, inline=True, fontsize=10)\n",
"ax.clabel(CS2, CS2.levels, inline=True, fontsize=10)\n",
"CS1.collections[0].set_label(f\"Quantized FHE decision boundary, acc={homomorphic_accuracy:.1f}\")\n",
"CS2.collections[0].set_label(f\"Sklearn decision boundary, acc={sklearn_acc:.1f}\")\n",
"ax.legend(loc=\"upper right\")\n",
"display(fig)"
]
},
{
"cell_type": "markdown",
"id": "52a83d37",
"metadata": {},
"source": [
"### Enjoy!"
]
}
],
"metadata": {
"execution": {
"timeout": 10800
}
},
"nbformat": 4,
"nbformat_minor": 5
}