Base Non-binary node functionality¶
Optional: Create a (conda) environment and activate it, install the package
conda create -y -n conda_nbnode python=3.8
conda activate conda_nbnode
git clone https://github.com/ggrlab/nbnode
cd nbnode
pip install --upgrade pip
pip install .
Base-functionality of the package is to enable non-binary trees. The following creates a tree with a root node a and three children a0, a1 and a2. a1 is the only child with another child a1a.
a
├── a0
├── a1
│ └── a1a
└── a2
A basic non-binary node (NBNode) consists of four important attributes:
- ``name`` The name of the node. This is the only mandatory attribute.
- ``parent`` The parent node of this node.
- ``decision_name`` The name of the value leading to this node.
- ``decision_value`` The value leading to this node.
The name of the node must only be unique within all childs of the parent node. The decision_name and decision_value are the named values leading to this node. Note that decision_name must be a string, but decision_value can be anything, including strings, integers, floats, etc.
To build the tree above, we can use the following code:
[45]:
from nbnode.nbnode import NBNode
simple_tree = NBNode("a")
NBNode("a0", parent=simple_tree, decision_value=-1, decision_name="m1")
a1 = NBNode("a1", parent=simple_tree, decision_value=1, decision_name="m1")
NBNode("a2", parent=simple_tree, decision_value="another", decision_name="m3")
NBNode("a1a", parent=a1, decision_value="test", decision_name="m2")
[45]:
NBNode('/a/a1/a1a', counter=0, decision_name='m2', decision_value='test')
We can check if the previous tree was built correctly:
[46]:
simple_tree.pretty_print()
a (counter:0)
├── a0 (counter:0)
├── a1 (counter:0)
│ └── a1a (counter:0)
└── a2 (counter:0)
And we can show additional information about each node of the tree:
[53]:
simple_tree.pretty_print("__long__")
a (counter:0, decision_name:None, decision_value:None)
├── a0 (counter:0, decision_name:m1, decision_value:-1)
├── a1 (counter:0, decision_name:m1, decision_value:1)
│ └── a1a (counter:0, decision_name:m2, decision_value:test)
└── a2 (counter:0, decision_name:m3, decision_value:another)
[ ]:
# Alternatively, we prepared the tree already for you:
import nbnode.nbnode_trees as nbtree
simple_tree = nbtree.tree_simple()
simple_tree.pretty_print("__long__")
Finally, we use the tree to predict the final node of a new data point. The following values, supplied as two lists values and names are used to predict the final node.
[48]:
single_prediction = simple_tree.predict(
values=[1, "test", 2], names=["m1", "m2", "m3"]
)
print(single_prediction)
NBNode('/a/a1/a1a', counter=0, decision_name='m2', decision_value='test')
This returns the identified NBnode object defined by the values. NBNode can additionally handle the following data types:
[49]:
print("\nDictionary")
value_dict = {"m1": 1, "m2": "test", "m3": 2}
print(value_dict)
pred_dict = simple_tree.predict(values=value_dict)
print("Prediction: ")
print(pred_dict)
Dictionary
{'m1': 1, 'm2': 'test', 'm3': 2}
Prediction:
NBNode('/a/a1/a1a', counter=0, decision_name='m2', decision_value='test')
[50]:
print("\nPandas DataFrame")
import pandas as pd
value_df = pd.DataFrame.from_dict([value_dict])
print(value_df)
print("\nPrediction: ")
pred_df = simple_tree.predict(values=value_df)
print(pred_df)
Pandas DataFrame
m1 m2 m3
0 1 test 2
Prediction:
0 (((NBNode('/a/a1/a1a', counter=0, decision_nam...
dtype: object
[51]:
print("\nNumpy array: Only for numerical values")
import numpy as np
values_np = np.array([[-1, 0, 0]])
print(values_np)
pred_np = simple_tree.predict(values=values_np, names=["m1", "m2", "m3"])
print(pred_np)
Numpy array: Only for numerical values
[[-1 0 0]]
0 (((NBNode('/a/a0', counter=0, decision_name='m...
dtype: object
NBNode basic methods¶
NBNode has a large number of implemented basic methods:
[66]:
from nbnode.nbnode import NBNode
import nbnode.nbnode_trees as nbtree
simple_tree = nbtree.tree_simple()
# Print the tree
simple_tree.pretty_print("__long__")
# Print specific attributes of the tree as list
simple_tree.pretty_print(["counter"])
simple_tree.pretty_print(["decision_name", "decision_value"])
simple_tree.__dict__
# Access nodes
# Access a child of any (here root) node
simple_tree.children
a1 = simple_tree.children[1]
print(a1)
# You can also access nodes by their _full_ name
# full name is the path from root to the node, not the decision name, nor the node name
# You can retrieve the full name of a node by
print(a1.get_name_full())
# Mind the "/" ("root") at the beginning of the path
a1_by_name = simple_tree["/a/a1"]
print(a1_by_name)
# We can compare nodes! Here we have the exact same node, so it is identical.
assert a1_by_name == a1
a (counter:0, decision_name:None, decision_value:None)
├── a0 (counter:0, decision_name:m1, decision_value:-1)
├── a1 (counter:0, decision_name:m1, decision_value:1)
│ └── a1a (counter:0, decision_name:m2, decision_value:test)
└── a2 (counter:0, decision_name:m3, decision_value:another)
a (counter:0)
├── a0 (counter:0)
├── a1 (counter:0)
│ └── a1a (counter:0)
└── a2 (counter:0)
a (decision_name:None, decision_value:None)
├── a0 (decision_name:m1, decision_value:-1)
├── a1 (decision_name:m1, decision_value:1)
│ └── a1a (decision_name:m2, decision_value:test)
└── a2 (decision_name:m3, decision_value:another)
NBNode('/a/a1', counter=0, decision_name='m1', decision_value=1)
/a/a1
NBNode('/a/a1', counter=0, decision_name='m1', decision_value=1)
Decision cutoffs¶
NBNode can also be used to split and then decide on continuous features.
[74]:
continuous_tree = NBNode("a")
NBNode("a0", parent=continuous_tree, decision_value=1, decision_name="m1", decision_cutoff=0.5)
NBNode("a1", parent=continuous_tree, decision_value=-1, decision_name="m1", decision_cutoff=0.5)
continuous_tree.pretty_print("__long__")
a (counter:0, decision_name:None, decision_value:None)
├── a0 (counter:0, decision_name:m1, decision_value:1)
└── a1 (counter:0, decision_name:m1, decision_value:-1)
The above continuous_tree contains two nodes, which both decide on the value of m1 with either 1 or -1. Additionally, they have a decision cutoff. Until now, NBNode needed an exact match of the decision value. With decision_cutoff, the value in decision_name is first cut at the cutoff and returns:
True if >= 0.5
False if < 0.5
[75]:
print(continuous_tree.predict(values=[0.6], names=["m1"]))
print(continuous_tree.predict(values=[0.4], names=["m1"]))
print(continuous_tree.predict(values=[1], names=["m1"]))
print(continuous_tree.predict(values=[-1], names=["m1"]))
print(continuous_tree.predict(values=[10], names=["m1"]))
print(continuous_tree.predict(values=[-10], names=["m1"]))
NBNode('/a/a0', counter=0, decision_name='m1', decision_value=1)
NBNode('/a/a1', counter=0, decision_name='m1', decision_value=-1)
NBNode('/a/a0', counter=0, decision_name='m1', decision_value=1)
NBNode('/a/a1', counter=0, decision_name='m1', decision_value=-1)
Multiple decision values¶
Some nodes need not only a single value to decide on the endnode but multiple. With NBNode, you can decide on any number of features.
[85]:
from nbnode.nbnode import NBNode
mytree = NBNode("a")
# a0 =
NBNode("a0", parent=mytree, decision_value=-1, decision_name="m1")
a1 = NBNode("a1", parent=mytree, decision_value=1, decision_name="m1")
# a2 =
NBNode("a2", parent=mytree, decision_value="another", decision_name="m3")
# a1a =
NBNode("a1a", parent=a1, decision_value="test", decision_name="m2")
NBNode(
"a3",
parent=mytree,
decision_value=["test", 1],
decision_name=["m2", "m4"],
decision_cutoff=[None, 0],
)
mytree.pretty_print("__long__")
print("\n\nPredictions")
print(mytree.predict(values=[None, "test", None, 3], names=["m1", "m2", "m3", "m4"]))
try:
print(mytree.predict(
values=[None, "NOT_test", None, 3], names=["m1", "m2", "m3", "m4"]
))
except ValueError:
print("ValueError: Could not find a fitting endnode for the data you gave. You also did not allow for part predictions.")
a (counter:0, decision_name:None, decision_value:None)
├── a0 (counter:0, decision_name:m1, decision_value:-1)
├── a1 (counter:0, decision_name:m1, decision_value:1)
│ └── a1a (counter:0, decision_name:m2, decision_value:test)
├── a2 (counter:0, decision_name:m3, decision_value:another)
└── a3 (counter:0, decision_name:['m2', 'm4'], decision_value:['test', 1])
Predictions
NBNode('/a/a3', counter=0, decision_name=['m2', 'm4'], decision_value=['test', 1])
ValueError: Could not find a fitting endnode for the data you gave. You also did not allow for part predictions.