{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Converters" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Spox does not directly offer any _ONNX converters_ (utilities for translating ML models into ONNX), but it can be easily used to implement a _converter protocol_.\n", "We'll go over an example way of achieving this.\n", "In general, it is easiest to convert operations from libraries like `numpy` or deep learning frameworks, since ONNX follows similar principles." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "tags": [] }, "outputs": [], "source": [ "from typing import Dict\n", "import onnx\n", "import onnxruntime\n", "import numpy as np\n", "from spox import argument, build, Tensor, Var\n", "import spox.opset.ai.onnx.v17 as op\n", "\n", "\n", "def run(model: onnx.ModelProto, **kwargs) -> list[np.ndarray]:\n", " return onnxruntime.InferenceSession(model.SerializeToString()).run(\n", " None,\n", " {k: np.array(v) for k, v in kwargs.items()}\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Functions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We'll start with simple conversion of Python functions on `numpy.array`s into Spox equivalents on `Var`s (of tensors).\n", "\n", "Let's define functions computing means on a pair of tensors:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "tags": [] }, "outputs": [], "source": [ "def arithmetic_mean(a: np.ndarray, b: np.ndarray) -> np.ndarray:\n", " return np.divide(np.add(a, b), 2)\n", "\n", "\n", "def geometric_mean(a: np.ndarray, b: np.ndarray) -> np.ndarray:\n", " return np.sqrt(np.multiply(a, b))\n", "\n", "\n", "def harmonic_mean(a: np.ndarray, b: np.ndarray) -> np.ndarray:\n", " return np.divide(2., np.add(\n", " np.reciprocal(a),\n", " np.reciprocal(b)\n", " ))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can now define equivalents in Spox. We'll follow a _contract_ stating that arguments and results of `numpy.ndarray` become `Var`, which is expected to be a tensor." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "tags": [] }, "outputs": [], "source": [ "def spox_arithmetic_mean(a: Var, b: Var) -> Var:\n", " return op.div(op.add(a, b), op.constant(value_float=2.))\n", "\n", "\n", "def spox_geometric_mean(a: Var, b: Var) -> Var:\n", " return op.sqrt(op.mul(a, b))\n", "\n", "\n", "def spox_harmonic_mean(a: Var, b: Var) -> Var:\n", " return op.div(op.constant(value_float=2.), op.add(\n", " op.reciprocal(a),\n", " op.reciprocal(b)\n", " ))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Estimators" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's also consider an `sklearn`-like estimator on 'dataframes' (dictionaries of arrays)." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "tags": [] }, "outputs": [], "source": [ "class PairwiseMeans:\n", " kind: str # 'arithmetic', 'geometric', or 'harmonic'\n", " first: str\n", " second: str # name of first and second 'column' to find the mean of\n", "\n", " def __init__(self, kind: str, first: str, second: str):\n", " self.kind = kind\n", " self.first = first\n", " self.second = second\n", "\n", " def predict(self, data: Dict[str, np.ndarray]) -> np.ndarray:\n", " means = {\n", " 'arithmetic': arithmetic_mean,\n", " 'geometric': geometric_mean,\n", " 'harmonic': harmonic_mean,\n", " }\n", " return means[self.kind](data[self.first], data[self.second])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The equivalent in Spox could be a class 'decorating' a `PairwiseMeans` instance - consuming it and implementing the same interface, but using `Var`s instead of `numpy.ndarray`." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "tags": [] }, "outputs": [], "source": [ "class SpoxPairwiseMeans:\n", " estimator: PairwiseMeans\n", "\n", " def __init__(self, estimator: PairwiseMeans):\n", " self.estimator = estimator\n", "\n", " def predict(self, data: Dict[str, Var]) -> Var:\n", " means = {\n", " 'arithmetic': spox_arithmetic_mean,\n", " 'geometric': spox_geometric_mean,\n", " 'harmonic': spox_harmonic_mean,\n", " }\n", " return means[self.estimator.kind](data[self.estimator.first], data[self.estimator.second])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Converter" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To provide a simple API for conversion, we can define a `convert` function handling the possible conversions. The mapping could be defined with e.g. a dictionary to make it more dynamically extensible." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "tags": [] }, "outputs": [], "source": [ "def convert(obj):\n", " if obj is arithmetic_mean:\n", " return spox_arithmetic_mean\n", " elif obj is geometric_mean:\n", " return spox_geometric_mean\n", " elif obj is harmonic_mean:\n", " return spox_harmonic_mean\n", " elif type(obj) is PairwiseMeans:\n", " return SpoxPairwiseMeans(obj)\n", " raise ValueError(f\"No converter for: {obj}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To build a model, we have to construct the arguments and pass them with the result to `spox.build`. This could be abstracted away with a usage of `inspect.signature` and by extracting the input types from example input data, but we'll not consider this here." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "tags": [] }, "outputs": [], "source": [ "pairwise_means = PairwiseMeans('harmonic', 'x', 'z')" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "tags": [] }, "outputs": [], "source": [ "vec = Tensor(np.float32, ('N',))\n", "x, y, z = argument(vec), argument(vec), argument(vec)\n", "\n", "\n", "def simple_convert_build(fun):\n", " return build({'x': x, 'y': y}, {'r': convert(fun)(x, y)})\n", "\n", "\n", "arithmetic_mean_model = simple_convert_build(arithmetic_mean)\n", "geometric_mean_model = simple_convert_build(geometric_mean)\n", "harmonic_mean_model = simple_convert_build(harmonic_mean)\n", "pairwise_means_model = build(\n", " {'x': x, 'y': y, 'z': z},\n", " {'r': convert(pairwise_means).predict({'x': x, 'y': y, 'z': z})}\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Checking equivalence" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can now test equivalence by running the `onnxruntime` with the previously defined `run` utility." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "tags": [] }, "outputs": [], "source": [ "x0 = np.array([1, 2, 3], dtype=np.float32)\n", "y0 = np.array([4, 6, 5], dtype=np.float32)\n", "z0 = np.array([-2, -1, -0.5], dtype=np.float32)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "An example run looks like this. Note that this is not going through Spox, as at this point `arithmetic_mean_model` is an `onnx.ModelProto`." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "tags": [] }, "outputs": [ { "data": { "text/plain": [ "(array([2.5, 4. , 4. ], dtype=float32), array([2.5, 4. , 4. ], dtype=float32))" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arithmetic_mean(x0, y0), run(arithmetic_mean_model, x=x0, y=y0)[0]" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[2.5 4. 4. ] [2.5 4. 4. ]\n", "[2. 3.4641016 3.8729835] [2. 3.4641016 3.8729835]\n", "[1.6 3. 3.7499998] [1.6 3. 3.7499998]\n" ] } ], "source": [ "tests = [\n", " (arithmetic_mean, arithmetic_mean_model),\n", " (geometric_mean, geometric_mean_model),\n", " (harmonic_mean, harmonic_mean_model),\n", "]\n", "for py_function, onnx_model in tests:\n", " actual = run(onnx_model, x=x0, y=y0)[0]\n", " desired = py_function(x0, y0)\n", " print(actual, desired)\n", " np.testing.assert_allclose(actual, desired)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 4. -4. -1.2] [ 4. -4. -1.2]\n" ] } ], "source": [ "actual = run(pairwise_means_model, x=x0, y=y0, z=z0)[0]\n", "desired = pairwise_means.predict({'x': x0, 'y': y0, 'z': z0})\n", "print(actual, desired)\n", "np.testing.assert_allclose(actual, desired)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.0" } }, "nbformat": 4, "nbformat_minor": 4 }