{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "#### Base Non-binary node functionality\n", "\n", "\n", "Optional: Create a (conda) environment and activate it, install the package\n", "\n", "```bash\n", " conda create -y -n conda_nbnode python=3.8\n", " conda activate conda_nbnode\n", " git clone https://github.com/ggrlab/nbnode\n", " cd nbnode\n", " pip install --upgrade pip\n", " pip install . \n", "```\n", "\n", "\n", "Base-functionality of the package is to enable non-binary trees. The following creates\n", "a tree with a root node ``a`` and three children ``a0``, ``a1`` and ``a2``. ``a1`` is the only child with another child ``a1a``.\n", "\n", "```\n", " a\n", " ├── a0\n", " ├── a1\n", " │ └── a1a\n", " └── a2\n", "```\n", "\n", "\n", "A basic non-binary node (``NBNode``) consists of four important attributes:\n", "\n", " - ``name`` The name of the node. This is the only mandatory attribute.\n", " - ``parent`` The parent node of this node.\n", " - ``decision_name`` The name of the value leading to this node. \n", " - ``decision_value`` The value leading to this node.\n", "\n", " \n", "The name of the node must only be unique within all childs of the parent node.\n", "The ``decision_name`` and ``decision_value`` are the named values leading to this node. Note that \n", "``decision_name`` must be a string, but ``decision_value`` can be anything, including strings, integers, floats, etc.\n", "\n", "To build the tree above, we can use the following code:\n" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "NBNode('/a/a1/a1a', counter=0, decision_name='m2', decision_value='test')" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from nbnode.nbnode import NBNode\n", "simple_tree = NBNode(\"a\")\n", "NBNode(\"a0\", parent=simple_tree, decision_value=-1, decision_name=\"m1\")\n", "a1 = NBNode(\"a1\", parent=simple_tree, decision_value=1, decision_name=\"m1\")\n", "NBNode(\"a2\", parent=simple_tree, decision_value=\"another\", decision_name=\"m3\")\n", "NBNode(\"a1a\", parent=a1, decision_value=\"test\", decision_name=\"m2\")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "We can check if the previous tree was built correctly: " ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "a (counter:0)\n", "├── a0 (counter:0)\n", "├── a1 (counter:0)\n", "│ └── a1a (counter:0)\n", "└── a2 (counter:0)\n" ] } ], "source": [ "simple_tree.pretty_print()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And we can show additional information about each node of the tree:" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "a (counter:0, decision_name:None, decision_value:None)\n", "├── a0 (counter:0, decision_name:m1, decision_value:-1)\n", "├── a1 (counter:0, decision_name:m1, decision_value:1)\n", "│ └── a1a (counter:0, decision_name:m2, decision_value:test)\n", "└── a2 (counter:0, decision_name:m3, decision_value:another)\n" ] } ], "source": [ "simple_tree.pretty_print(\"__long__\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Alternatively, we prepared the tree already for you:\n", "import nbnode.nbnode_trees as nbtree\n", "simple_tree = nbtree.tree_simple()\n", "simple_tree.pretty_print(\"__long__\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, we use the tree to predict the final node of a new data point.\n", "The following values, supplied as two lists ``values`` and ``names`` are used to predict the final node." ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NBNode('/a/a1/a1a', counter=0, decision_name='m2', decision_value='test')\n" ] } ], "source": [ "single_prediction = simple_tree.predict(\n", " values=[1, \"test\", 2], names=[\"m1\", \"m2\", \"m3\"]\n", " )\n", "print(single_prediction)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "This returns the identified NBnode object defined by the values. \n", "``NBNode`` can additionally handle the following data types: " ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Dictionary\n", "{'m1': 1, 'm2': 'test', 'm3': 2}\n", "Prediction: \n", "NBNode('/a/a1/a1a', counter=0, decision_name='m2', decision_value='test')\n" ] } ], "source": [ "print(\"\\nDictionary\")\n", "value_dict = {\"m1\": 1, \"m2\": \"test\", \"m3\": 2}\n", "print(value_dict)\n", "pred_dict = simple_tree.predict(values=value_dict)\n", "print(\"Prediction: \")\n", "print(pred_dict)\n" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Pandas DataFrame\n", " m1 m2 m3\n", "0 1 test 2\n", "\n", "Prediction: \n", "0 (((NBNode('/a/a1/a1a', counter=0, decision_nam...\n", "dtype: object\n" ] } ], "source": [ "print(\"\\nPandas DataFrame\")\n", "import pandas as pd\n", "value_df = pd.DataFrame.from_dict([value_dict])\n", "print(value_df)\n", "print(\"\\nPrediction: \")\n", "pred_df = simple_tree.predict(values=value_df)\n", "print(pred_df)" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Numpy array: Only for numerical values\n", "[[-1 0 0]]\n", "0 (((NBNode('/a/a0', counter=0, decision_name='m...\n", "dtype: object\n" ] } ], "source": [ "print(\"\\nNumpy array: Only for numerical values\")\n", "import numpy as np\n", "values_np = np.array([[-1, 0, 0]])\n", "print(values_np)\n", "pred_np = simple_tree.predict(values=values_np, names=[\"m1\", \"m2\", \"m3\"])\n", "print(pred_np)\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# NBNode basic methods\n", "\n", "``NBNode`` has a large number of implemented basic methods: " ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "a (counter:0, decision_name:None, decision_value:None)\n", "├── a0 (counter:0, decision_name:m1, decision_value:-1)\n", "├── a1 (counter:0, decision_name:m1, decision_value:1)\n", "│ └── a1a (counter:0, decision_name:m2, decision_value:test)\n", "└── a2 (counter:0, decision_name:m3, decision_value:another)\n", "a (counter:0)\n", "├── a0 (counter:0)\n", "├── a1 (counter:0)\n", "│ └── a1a (counter:0)\n", "└── a2 (counter:0)\n", "a (decision_name:None, decision_value:None)\n", "├── a0 (decision_name:m1, decision_value:-1)\n", "├── a1 (decision_name:m1, decision_value:1)\n", "│ └── a1a (decision_name:m2, decision_value:test)\n", "└── a2 (decision_name:m3, decision_value:another)\n", "NBNode('/a/a1', counter=0, decision_name='m1', decision_value=1)\n", "/a/a1\n", "NBNode('/a/a1', counter=0, decision_name='m1', decision_value=1)\n" ] } ], "source": [ "from nbnode.nbnode import NBNode\n", "import nbnode.nbnode_trees as nbtree\n", "simple_tree = nbtree.tree_simple()\n", "\n", "# Print the tree\n", "simple_tree.pretty_print(\"__long__\")\n", "# Print specific attributes of the tree as list\n", "simple_tree.pretty_print([\"counter\"])\n", "simple_tree.pretty_print([\"decision_name\", \"decision_value\"])\n", "simple_tree.__dict__\n", "\n", "\n", "# Access nodes\n", "# Access a child of any (here root) node\n", "simple_tree.children\n", "a1 = simple_tree.children[1]\n", "print(a1)\n", "\n", "# You can also access nodes by their _full_ name\n", "# full name is the path from root to the node, not the decision name, nor the node name\n", "# You can retrieve the full name of a node by\n", "print(a1.get_name_full())\n", "# Mind the \"/\" (\"root\") at the beginning of the path\n", "a1_by_name = simple_tree[\"/a/a1\"]\n", "print(a1_by_name)\n", "\n", "# We can compare nodes! Here we have the exact same node, so it is identical. \n", "assert a1_by_name == a1\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# Decision cutoffs \n", "\n", "``NBNode`` can also be used to split and then decide on continuous features. " ] }, { "cell_type": "code", "execution_count": 74, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "a (counter:0, decision_name:None, decision_value:None)\n", "├── a0 (counter:0, decision_name:m1, decision_value:1)\n", "└── a1 (counter:0, decision_name:m1, decision_value:-1)\n" ] } ], "source": [ "continuous_tree = NBNode(\"a\")\n", "NBNode(\"a0\", parent=continuous_tree, decision_value=1, decision_name=\"m1\", decision_cutoff=0.5)\n", "NBNode(\"a1\", parent=continuous_tree, decision_value=-1, decision_name=\"m1\", decision_cutoff=0.5)\n", "continuous_tree.pretty_print(\"__long__\")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "The above ``continuous_tree`` contains two nodes, which both decide on the value of ``m1`` with either 1 or -1. Additionally, they have a decision cutoff. \n", "Until now, ``NBNode`` needed an **exact** match of the decision value. With ``decision_cutoff``, the value in ``decision_name`` is first cut at the cutoff and returns: \n", "\n", "```python\n", " True if >= 0.5\n", " False if < 0.5\n", "```" ] }, { "cell_type": "code", "execution_count": 75, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NBNode('/a/a0', counter=0, decision_name='m1', decision_value=1)\n", "NBNode('/a/a1', counter=0, decision_name='m1', decision_value=-1)\n", "NBNode('/a/a0', counter=0, decision_name='m1', decision_value=1)\n", "NBNode('/a/a1', counter=0, decision_name='m1', decision_value=-1)\n" ] } ], "source": [ "print(continuous_tree.predict(values=[0.6], names=[\"m1\"]))\n", "print(continuous_tree.predict(values=[0.4], names=[\"m1\"]))\n", "\n", "print(continuous_tree.predict(values=[1], names=[\"m1\"]))\n", "print(continuous_tree.predict(values=[-1], names=[\"m1\"]))\n", "\n", "print(continuous_tree.predict(values=[10], names=[\"m1\"]))\n", "print(continuous_tree.predict(values=[-10], names=[\"m1\"]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Multiple decision values\n", "\n", "Some nodes need not only a single value to decide on the endnode but multiple. With NBNode, you can decide on any number of features. " ] }, { "cell_type": "code", "execution_count": 85, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "a (counter:0, decision_name:None, decision_value:None)\n", "├── a0 (counter:0, decision_name:m1, decision_value:-1)\n", "├── a1 (counter:0, decision_name:m1, decision_value:1)\n", "│ └── a1a (counter:0, decision_name:m2, decision_value:test)\n", "├── a2 (counter:0, decision_name:m3, decision_value:another)\n", "└── a3 (counter:0, decision_name:['m2', 'm4'], decision_value:['test', 1])\n", "Predictions\n", "\n", "\n", "NBNode('/a/a3', counter=0, decision_name=['m2', 'm4'], decision_value=['test', 1])\n", "ValueError: Could not find a fitting endnode for the data you gave. You also did not allow for part predictions.\n" ] } ], "source": [ "from nbnode.nbnode import NBNode\n", "\n", "mytree = NBNode(\"a\")\n", "# a0 =\n", "NBNode(\"a0\", parent=mytree, decision_value=-1, decision_name=\"m1\")\n", "a1 = NBNode(\"a1\", parent=mytree, decision_value=1, decision_name=\"m1\")\n", "# a2 =\n", "NBNode(\"a2\", parent=mytree, decision_value=\"another\", decision_name=\"m3\")\n", "# a1a =\n", "NBNode(\"a1a\", parent=a1, decision_value=\"test\", decision_name=\"m2\")\n", "NBNode(\n", " \"a3\",\n", " parent=mytree,\n", " decision_value=[\"test\", 1],\n", " decision_name=[\"m2\", \"m4\"],\n", " decision_cutoff=[None, 0],\n", ")\n", "\n", "mytree.pretty_print(\"__long__\")\n", "\n", "print(\"\\n\\nPredictions\")\n", "print(mytree.predict(values=[None, \"test\", None, 3], names=[\"m1\", \"m2\", \"m3\", \"m4\"]))\n", "try: \n", " print(mytree.predict(\n", " values=[None, \"NOT_test\", None, 3], names=[\"m1\", \"m2\", \"m3\", \"m4\"]\n", " ))\n", "except ValueError:\n", " print(\"ValueError: Could not find a fitting endnode for the data you gave. You also did not allow for part predictions.\")" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.16" }, "varInspector": { "cols": { "lenName": 16, "lenType": 16, "lenVar": 40 }, "kernels_config": { "python": { "delete_cmd_postfix": "", "delete_cmd_prefix": "del ", "library": "var_list.py", "varRefreshCmd": "print(var_dic_list())" }, "r": { "delete_cmd_postfix": ") ", "delete_cmd_prefix": "rm(", "library": "var_list.r", "varRefreshCmd": "cat(var_dic_list()) " } }, "types_to_exclude": [ "module", "function", "builtin_function_or_method", "instance", "_Feature" ], "window_display": false } }, "nbformat": 4, "nbformat_minor": 2 }