Optimizing MicroPython Code: Strategies for Enhanced Performance
Practical MicroPython optimization techniques: profiling with timed decorators, const(), local variable caching, .mpy bytecode, frozen modules, avoiding allocations, and using native/viper code emitters.
MicroPython runs on microcontrollers with as little as 256KB of RAM and slow flash storage. Standard Python performance habits don't always apply — small changes can make a significant difference. This guide covers practical optimizations with examples.
Step 1: Profile Before Optimizing
Don't guess where the bottleneck is. Use a timing decorator to measure execution time in microseconds:
import time
def timed(func):
def wrapper(*args, **kwargs):
start = time.ticks_us()
result = func(*args, **kwargs)
elapsed = time.ticks_diff(time.ticks_us(), start)
print(f"{func.__name__}: {elapsed / 1000:.3f}ms")
return result
return wrapper
@timed
def slow_function():
return sum(range(1000))
slow_function()
# Output: slow_function: 12.450ms
Optimize the slowest function first.
Use const() for Integer Constants
const() replaces identifiers with literal integers at compile time, eliminating dictionary lookups at runtime. It only works with integer values.
from micropython import const
# Without const: each access looks up 'MAX_RETRIES' in a dict
MAX_RETRIES = 5
# With const: replaced with literal 5 at compile time — faster
MAX_RETRIES = const(5)
LED_PIN = const(2)
BUFFER_SIZE = const(1 << 10) # 1024
Use const() for any integer that's used in a tight loop.
Cache Global References in Local Variables
Attribute lookups on objects and global variable lookups are slower than local variable access. In tight loops, bind frequently used objects to local names:
from machine import Pin
import time
# Slow — 'led.value' resolved on each iteration
led = Pin(2, Pin.OUT)
for _ in range(1000):
led.value(not led.value())
# Faster — bind method and sleep to locals
def blink_fast(count):
led = Pin(2, Pin.OUT)
toggle = led.value
sleep = time.sleep_ms
for _ in range(count):
toggle(not toggle())
sleep(10)
Avoid Creating Objects in Loops
Object allocation triggers garbage collection, which pauses execution. Avoid creating objects in tight loops:
# Slow — allocates a new list every iteration
def process():
for i in range(100):
data = [i * 2, i * 3, i * 4] # allocates list 100 times
send(data)
# Faster — allocate once, reuse
def process():
data = [0, 0, 0]
for i in range(100):
data[0] = i * 2
data[1] = i * 3
data[2] = i * 4
send(data)
Use bytearray instead of bytes for mutable buffers.
Use memoryview for Buffer Slicing
bytes[start:end] creates a copy. memoryview gives a view into the same memory:
data = bytearray(1024)
# Slow — creates a copy
chunk = data[10:20]
# Fast — no copy
chunk = memoryview(data)[10:20]
Especially important when passing buffer slices to I2C/SPI functions in tight loops.
Compile to .mpy Bytecode
.mpy files are pre-compiled bytecode. They load faster (no compilation step at import time) and use less RAM:
# Install mpy-cross on your host machine
pip install mpy-cross
# Compile
mpy-cross mymodule.py # produces mymodule.mpy
Upload mymodule.mpy to the device instead of mymodule.py. The import syntax is identical.
Freeze Modules into Firmware
Frozen modules are compiled into the firmware itself and execute from flash, freeing RAM. This gives the best performance for code that doesn't change.
To freeze a module, add it to the modules/ directory in the MicroPython port you're building and recompile. Pre-built firmware with common libraries (like uasyncio) often has them frozen already.
Check if a module is frozen:
import uasyncio
import sys
print(sys.modules['uasyncio'].__file__)
# If it shows a path in flash, it's frozen
Use Native and Viper Code Emitters
For the most performance-critical functions, use MicroPython's code emitters:
# @micropython.native — compiles to native machine code (2-5x faster)
@micropython.native
def fast_sum(data):
total = 0
for x in data:
total += x
return total
# @micropython.viper — even faster, uses machine-word types
@micropython.viper
def ultra_fast_sum(data: ptr8, n: int) -> int:
total = 0
for i in range(n):
total += data[i]
return total
viper has strict type constraints (only works with int, ptr8, ptr16, ptr32) but can approach C-level speed for numeric loops.
Avoid *args and **kwargs in Hot Code
Variable argument functions have higher call overhead. Use explicit parameters in functions called frequently:
# Slower
def log(*args, **kwargs): ...
# Faster in tight loops
def log(level, message): ...
Summary of Techniques
| Technique | Benefit | When to use |
|---|---|---|
| Profile with timed | Find actual bottleneck | Always first |
| const() | Eliminate dict lookups | Integer constants in hot code |
| Cache locals | Faster attribute access | Tight loops |
| Avoid allocations | Reduce GC pauses | Loops, ISRs |
| memoryview | Zero-copy slicing | Buffer-heavy code |
| .mpy files | Faster import, less RAM | Library modules |
| Frozen modules | Best performance + zero RAM | Stable libraries |
| @native / @viper | Near-C speed | Numeric inner loops |
Conclusion
Start with profiling to identify the real bottleneck, apply const() and local variable caching broadly (they're low-effort and widely applicable), and reach for @native/@viper only when the profiler shows a numeric function is the bottleneck. Premature optimization at the assembly level wastes time — measure first.