Python - Data Classes

Eliminate boiler-plate code with Python data classes.
Python - Data Classes

Data Classes, introduced within Python 3.7, are classes used to (though not limited to) store data. They eliminate much of the boiler-plate code that is typically required when creating Python based classes.

Decorator

A Data Class is created via the use of the @dataclass decorator. Like so:

from dataclasses import dataclass

@dataclass
class Interface:
    name: str
    speed: int
    mtu: int

Special Methods

As we first mentioned, data classes reduce the amount of boiler plate code required when writing classes. By default a data class will generate the special methods __init__, __repr__, __str__ and __eq__ for you.

For example if we instantiate instances from the Interface data class, we can inspect the instances attributes, compare our instances, and also see a representation of our instance, all out of the box.

>>> interface1 = Interface(name='Ethernet1/1',speed=1000,mtu=1500)
>>> interface2 = Interface(name='Ethernet1/2',speed=1000,mtu=1500)

>>> interface1.name
'Ethernet1/1'

>>> print(interface1)
Interface(name='Ethernet1/1', speed=1000, mtu=1500)

>>> interface1
Interface(name='Ethernet1/1', speed=1000, mtu=1500)

>>> interface1 == interface2
False

Normally we would have had to write the following (special classes) to achieve the same functionality.

class Interface:
    def __init__(self, name: str, speed: int, mtu: int) -> None:
        self.name = name
        self.speed = speed
        self.mtu = mtu

    def __str__(self):
        ...

    def __repr__(self):
        ...

    def __eq__(self, other):
        ...

Default Values

Much like how default values are added to __init__, data classes allow for default values to be added to your fields, like so.

from dataclasses import dataclass

@dataclass
class Interface:
    name: str
    speed: int = 1000
    mtu: int = 1500

>>> Interface(name='Ethernet1/1')
Interface(name='Ethernet1/1', speed=1000, mtu=1500)

Inheritance

Much like regular classes, data classes also support inheritance via subclassing. Below shows an example.

from dataclasses import dataclass

@dataclass
class Interface:
    name: str
    speed: int = 1000
    mtu: int = 1500

@dataclass
class SVI(Interface):
     vlan: int = None

>>> SVI(name='Vlan100',vlan=100)
SVI(name='Vlan100', speed=1000, mtu=1500, vlan=100)

Dictionary Conversion

Data Classes also provide a built in function -- asdict() -- that converts the data class instance to a dict. Below shows an example:

from dataclasses import asdict
svi = SVI(name='Vlan100',vlan=100)

>>> asdict(svi)
{'name': 'Vlan100', 'speed': 1000, 'mtu': 1500, 'vlan': 100}

Type Hints

Python is a dynamically typed language, meaning that you do not have to declare the variable type when assigning a value to it. However, this can present issues. For example, when your program receives data of a certain type that wasn't accounted for this can have unexpected results.

Python provides a feature called Type Hinting which allows you to provide (yep you guessed it) a hint to Python of what the type should be. In order to check for type errors, a type checker such as Mypy is required.

Let's look at a quick example. First we create a data class and instantiate an instance with a type error.

$ cat interface_dataclass.py                                                
from dataclasses import dataclass

@dataclass
class Interface:
    name: str
    speed: int = 1000
    mtu: int = 1500

Interface(name='Ethernet1/1', speed="1000GB", mtu=1500)

We can then run the Mypy type checker against our file, which alerts us to the type error.

$ mypy interface_dataclass.py                                                     
interface_dataclass.py:9: error: Argument "speed" to "Interface" has incompatible type "str"; expected "int"
Found 1 error in 1 file (checked 1 source file)

Customizing Fields

The core type in dataclasses is the Field type. By default, just setting a class attribute will instantiate a Field on your class as shown in previous examples.[1]

To customize the behaviour on your data class field, dataclasses provide a number of field() parameters. Within the scope of this article we will cover 2 of these parameters - metadata and default_factory. Full details on all of the available parameters can be found at https://docs.python.org/3/library/dataclasses.html#dataclasses.field.

Meta Data

To attach additional information to the field we can use metadata. Like so:

from dataclasses import dataclass, field

@dataclass
class Interface:
    name: str
    speed: int = field(default=1000, metadata={'unit': 'megabits'})
    mtu: int = field(default=1500, metadata={'unit': 'bytes'})

To retrieve our metadata information, along with other field information, we use the fields() function.

>>> int = Interface(name='Ethernet1/1', speed=1000, mtu=1500)
>>> fields(int)[2].metadata['unit']
'bytes'

Default Factory

The default_factory parameter allows you to provide a zero-argument callable that will be called when a default value is needed for this field.

Let's look at an example. Here will use a default_factory to build a list of Interface objects whenever we instantiate the Device data class.

First we define our Interface data class, much like we have done in previous examples.

from dataclasses import dataclass, field

@dataclass
class Interface:
    name: str
    speed: int = field(default=1000)
    mtu: int = field(default=1500)

Next we will create our zero-argument callable (aka default_factory) which will build our Interface objects via a list comprehension.

def build_interfaces():
    return [Interface(f'Ethernet1/{int}') for int in range(0,23)]

This default factory is then referenced within our interfaces field.

from typing import List

@dataclass
class Device:
    name: str
    vendor: str
    model: str
    interfaces: List[Interface] = field(default_factory=build_interfaces)

Now when we create an instance of Device, our default_factory is called and the interface objects created accordingly.

>>> Device(name='rtr001',vendor='Cisco',model='Nexus9372')
Device(name='rtr001', vendor='Cisco', model='Nexus9372', interfaces=[Interface(name='Ethernet1/0', speed=1000, mtu=1500), Interface(name='Ethernet1/1', speed=1000, mtu=1500), Interface(name='Ethernet1/2', speed=1000, mtu=1500), Interface(name='Ethernet1/3', speed=1000, mtu=1500), Interface(name='Ethernet1/4', speed=1000, mtu=1500),...

Post-Init

Finally, we have the ability to add functionality to your data class that will run after the auto-generated __init__, via the __post_init__ special method. Like so:

@dataclass
class Device:
    name: str
    vendor: str
    model: str

    def __post_init__(self):
        print("Device added to CMDB")
 
>>> device = Device(name='sw001', vendor='Cisco',model='Nexus9372')
Device added to CMDB

References


  1. "A brief tour of Python 3.7 data classes | Hacker Noon." 21 Jan. 2018, https://hackernoon.com/a-brief-tour-of-python-3-7-data-classes-22ee5e046517. Accessed 4 Oct. 2020. ↩︎

Subscribe to our newsletter to keep updated.

Don't miss anything. Get all the latest posts delivered straight to your inbox.
Great! Check your inbox and click the link to confirm your subscription.
Error! Please enter a valid email address!