The other day I was made aware (thanks to Dinesh Dutt) of a small tip when working with and iterating over Pandas DataFrames.

If you`ve used Python and tools like Batfish (course here) or Suzieq (course here) to automate your network, then you may be familiar with DataFrames. However, for those of you who are new to Panda DataFrames. A Pandas DataFrame is a,

2D Python data structure that allows you to work with your data via rows and columns, which eases the pain when working with large amounts of data in Python. You can think of it much like a spreadsheet, but 100% Python-based.

Great! So what was the tip?

To speed up your DataFrame iterations use itertuples instead of iterrows.

Is it really that much faster? This called for a quick benchmark test.

# Collect all the interfaces from the network with Batfish.
df = bfq.interfaceProperties().answer().frame()

# Some imports
from timeit import timeit
from rich import print

# Define the 2 iteration methods
def loop_via_itertuples():
    for _ in df.itertuples():
        continue

def loop_via_iterrows():
    for _, _ in df.iterrows():
        continue

With this in place, we can quickly benchmark the 2 iteration types, by running each iteration function 10 times and collecting the time they take to run. Like so:

time_iterrows = timeit(loop_via_iterrows, number=10)
time_itertuples = timeit(loop_via_itertuples, number=10)

Finally, we print the results...

print(
    "===\n"
    f"Execution Time for df.iterrows = {time_iterrows}\n"
    f"Execution Time for df.itertuples = {time_itertuples}\n"
    f"Result: itertuples is {time_increase(time_iterrows, time_itertuples)} times faster then iterrows!\n"
    f"===\n"
)
===
Execution Time for df.iterrows = 0.9970452000006844
Execution Time for df.itertuples = 0.07641729999977542
Result: itertuples is 13 times faster then iterrows!

13 times faster! So yes it's certainly faster! And to answer the final question of why? The TL;DR is (full details here),

The reason iterrows() is slower than itertuples() is due to iterrows() performing a lot of type checks in the lifetime of its call.

So there we have it when iterating over Panda DataFrames always use itertuples rather than iterrows.

Master Network Automation
Join today and get access to:
  • Full deep-dive course library (inc. Batfish, pyATS, Netmiko)
  • Code repositories inc. full course code, scripts and examples
  • 24x7 Multi-vendor labs (Arista, Cisco, Juniper)
  • Private online community
  • Monthly online meetups
  • Monthly guest speakers

100% Satisfaction Guaranteed!
Cancel your membership at any time.


Join Today