Troubleshooting Common Issues with the OLSR daemonThe Optimized Link State Routing (OLSR) daemon (commonly olsrd or OLSRd2 in newer implementations) is widely used in wireless mesh networks to provide proactive routing. While OLSR is reliable and lightweight, operators still encounter configuration, interoperability, and performance issues. This article walks through common problems, diagnostic techniques, and practical fixes to get an OLSR-based mesh healthy again.
1. Verify basic prerequisites
Before deep debugging, confirm these fundamentals:
- OLSR daemon is running: check process list (e.g., systemctl status olsrd or ps aux | grep olsrd).
- Network interfaces are up and configured with correct IP addresses and netmasks.
- Firewall rules allow OLSR traffic: OLSR uses UDP; classic OLSR (RFC 3626) uses UDP port 698 for HELLO/TC messages (some implementations may use different or configurable ports).
- Time and clock sync is reasonable between nodes — large clock skew can cause confusing logs.
If any of the above fail, fix them first; many apparent OLSR problems are just basic network or service failures.
2. Check logs and run in foreground for verbose output
Logs are the most direct source of clues.
- Start the daemon in foreground/verbose mode to see runtime messages:
- olsrd:
olsrd -d 5
(or use higher debug level) orolsrd -f /etc/olsrd/olsrd.conf -d 6
- OLSRd2:
olsrd2 -d 6 -c /etc/olsrd2/olsrd2.conf
- olsrd:
- Inspect system logs:
journalctl -u olsrd
or/var/log/syslog
depending on your distro. - Look for repeated warnings/errors such as “interface down”, “no neighbors”, “invalid message”, or “plugin load failed”.
Common actionable log messages:
- “No interfaces found to run on” — check interface configuration in olsrd.conf and ensure interface names match system (ip link show).
- “Failed to bind socket” — indicates port in use or insufficient permissions; confirm no other process is using UDP port 698 and run as root or configure capabilities.
3. No neighbors discovered / neighbors disappearing
Symptoms: routing tables empty, ping to other nodes fails, neighbor list shows zero or fluctuating entries.
Troubleshooting steps:
- Interface mismatch: ensure OLSR is listening on the wireless interface (e.g., wlan0). In olsrd.conf check the Interfaces section; use
SetInterface
or equivalent entries for OLSRd2. - IP addressing: nodes must be in the same IP subnet for OLSR to form adjacency (unless using routing over different address families). Verify with
ip addr
. - Wireless mode and driver issues: some wireless drivers disable multicast or block ad-hoc/mesh modes. Confirm interface supports ad-hoc/mesh and is set correctly (e.g.,
iwconfig
oriw
). - Multicast/mode problems: OLSR HELLOs use multicast addresses. If multicast is blocked on the link, neighbors won’t see HELLOs. Test multicast reachability or enable multicast forwarding.
- Signal/physical problems: poor link quality or interference causes packet loss. Use tools like
iw
,iwlist
, oriw dev wlan0 scan
and check tx/retry rates. Move nodes closer or change channels. - Mismatched OLSR versions or incompatible plugins can prevent neighbor formation. Use compatible OLSR versions across nodes where possible.
Quick checks:
- tcpdump capture on the interface:
sudo tcpdump -i wlan0 -n udp port 698
— do you see HELLO/TC packets from neighbors? olsrctl
(orolsrd2-ctrl
) show neighbors and routes.
4. Routing loops, stale routes, or slow convergence
Symptoms: packets taking suboptimal paths, intermittent routing loops, old routes persisting after topology change.
Causes and fixes:
- High OLSR intervals: OLSR is proactive; if HELLO/TC intervals are long, topology changes propagate slowly. Reduce intervals in config to improve convergence at the cost of additional overhead.
- Link quality metrics: using ETX or link-quality extensions incorrectly tuned can prefer bad links. Re-evaluate link-quality calculation settings and thresholds.
- MPR issues: MPR selection errors can lead to inefficient dissemination. Ensure MPR selection criteria (willingness, willingness levels) are configured sensibly. Resetting neighbor tables by restarting the daemon can help while diagnosing.
- Dual-interface or two-path asymmetry: If nodes have multiple interfaces or asymmetric links, traffic may follow unexpected paths. Pin routing to the intended interface using interface-specific rules or policy routing, or use netfilter to debug.
- Stale topology entries: ensure TC timeout values are reasonable. If too long, stale entries remain; too short and transient changes cause route flap.
Diagnostic commands:
olsrctl topology
/olsrd2-ctrl show
to inspect topology/flooding state.ip route
to view kernel route table and compare with OLSR output.
5. High CPU or memory usage
OLSR is lightweight but misconfiguration or bugs can cause spikes.
Common causes:
- Excessively low HELLO/TC intervals create heavy control traffic.
- Too many nodes or dense networks: OLSR scales poorly in extremely dense networks unless tuned.
- Misbehaving plugins or telemetry modules. Disable plugins one-by-one to identify culprit.
- Memory leaks in older versions—upgrade to latest stable release.
Mitigations:
- Increase intervals slightly; use Hysteresis or link-quality extensions with caution.
- Limit plugin features or logging verbosity.
- Upgrade to OLSRd2 if current version lacks performance fixes.
6. Plugin and feature-related errors
Many distributions include plugins (e.g., HTTPS webadmin, NAT, JSON output). Plugin failures often show as startup errors.
Steps:
- Check that plugin files exist and have correct permissions.
- Verify plugin dependencies (libraries) are installed.
- Temporarily disable plugins in config to see if the core daemon runs correctly.
- For webadmin authentication errors, reset credentials or inspect the configuration file for typos.
7. Interoperability problems (different OLSR implementations)
When mixing olsrd, OLSRd2, or other implementations, subtle incompatibilities may appear.
Tips:
- Prefer compatible or the same major implementation across the network when possible.
- Ensure you’re using the same OLSR protocol version and extensions (e.g., Link Quality extensions, HNA formats).
- Disable non-essential extensions when testing to isolate the protocol core.
8. IP version mismatches (IPv4 vs IPv6)
If some nodes are IPv6-only or using different address families, OLSR adjacency and route distribution can fail.
Checklist:
- Confirm olsrd/olsrd2 is configured for the address family used (IPv4/IPv6).
- Check that HELLOs and TCs are being sent on the correct family and multicast addresses (224.0.0.x for IPv4, ff02::1:xxxx for IPv6 where applicable).
- Use dual-stack configuration if you need both.
9. Firewall and SELinux/AppArmor interference
Firewalls or MAC-layer security systems can silently drop OLSR traffic.
Actions:
- Temporarily disable firewall rules to test adjacency (ufw, firewalld, iptables/nftables).
- Allow UDP port 698 (or the configured port) in input and forward chains.
- Check SELinux/AppArmor logs if plugin modules are denied file or network access; create appropriate policies or run in permissive mode while testing.
10. Common configuration mistakes
- Wrong interface names after system upgrade or Predictable Network Interface Name changes — update olsrd.conf.
- Typos in config keys or wrong path to pid/socket files.
- Duplicate IP addresses on the mesh.
- Not enabling IP forwarding when used as a gateway (sysctl net.ipv4.ip_forward=1).
- Misconfigured HNA entries causing incorrect external network advertisements.
11. Step-by-step troubleshooting checklist
- Confirm daemon process and version.
- Verify interface up and IP addressing.
- Check firewall and allow OLSR UDP port.
- Capture packets on the interface to confirm HELLO/TC presence.
- Start daemon with high debug level and inspect logs.
- Verify neighbors with olsrctl/olsrd2-ctrl.
- Inspect route table and compare with topology output.
- Disable plugins and nonessential extensions.
- Adjust intervals and link-quality settings if convergence is slow.
- Upgrade to latest stable OLSR implementation if suspecting bugs.
12. Example tcpdump and olsrdctl commands
Run these locally to gather data:
sudo tcpdump -i wlan0 -n udp port 698 olsrctl n # show neighbors (olsrd) olsrd2-ctrl -s # show status (OLSRd2) olsrctl t # show topology ip route show journalctl -u olsrd -f
13. When to escalate / seek community help
Collect these before asking for help:
- olsrd/olsrd2 version and full config file (sanitized of secrets).
- Output of neighbor/topology tables and ip route.
- tcpdump captures showing HELLO/TC packets (or their absence).
- Relevant log excerpts with debug level output.
Provide concise environment info: kernel version, wireless driver, interface modes, and whether nodes are single- or multi-interface.
14. Preventive best practices
- Use consistent software versions across nodes.
- Keep HELLO/TC intervals balanced for your topology size.
- Monitor link quality metrics and set realistic thresholds.
- Use configuration management (Ansible/Chef) to keep settings uniform.
- Keep backups of working configs and document network topology.
Troubleshooting OLSR typically follows a methodical path: confirm basic networking and service status, inspect control packets, analyze logs, and progressively narrow the problem by disabling features or tuning timers. Collecting the right debug output before changing many parameters saves time and helps the community or vendor give precise advice.
Leave a Reply