Handshake Failures
How to diagnose and fix peer handshake errors
Troubleshooting Handshake Failures
This guide helps you diagnose and resolve peer handshake failures when adding agents to your cluster.
Understanding Handshakes
When you add a peer, Arctic performs a handshake:
- The local agent contacts the remote agent
- Both agents exchange Ed25519 public keys
- Both verify signatures against the shared license
- On success, both store each other's peer information
Common Error Messages
Connection Refused
Error: handshake failed: connection refusedCause: Cannot establish TCP connection to the remote agent.
Resolution:
-
Verify the remote agent is running:
curl http://REMOTE_IP:8080/livez -
Check network connectivity:
ping REMOTE_IP telnet REMOTE_IP 8080 -
Verify firewall allows port 8080
Connection Timeout
Error: handshake failed: connection timeoutCause: Network path exists but connection cannot complete.
Resolution:
- Check for firewall rules blocking the connection
- Verify there are no NAT issues
- Check the remote agent is listening on the expected interface
License Mismatch
Error: handshake failed: license mismatchCause: The agents were bootstrapped with different licenses.
Resolution:
-
Check license IDs on both agents:
# On local agent arctic license show # On remote agent arctic license show --url http://REMOTE_IP:8080 -
If different, re-bootstrap one agent with the correct license
Invalid Signature
Error: handshake failed: invalid signatureCause: The peer's signature does not verify against the license public keys.
Resolution:
- This may indicate a tampered or corrupted peer key
- Re-bootstrap the affected agent
- If persistent, contact support
Peer Already Exists
Error: peer already exists in clusterCause: This peer was previously added to the cluster.
Resolution:
-
List existing peers:
arctic peers list -
The peer may already be connected
-
If you need to re-add, delete first:
arctic peers delete PEER_ID --yes
Node Limit Exceeded
Error: handshake failed: node limit exceededCause: Your license has a maximum number of nodes.
Resolution:
-
Check your license limits:
arctic license show -
Remove unused peers to make room
-
Contact your administrator to upgrade the license
Debugging Steps
1. Enable Debug Logging
Run the CLI with debug output:
arctic peers add REMOTE_IP:8080 --debugOr trace HTTP requests:
arctic peers add REMOTE_IP:8080 --trace2. Check Agent Logs
View logs on both agents:
# Local agent
journalctl -u arctic-agent -f
# Remote agent (via SSH)
ssh user@REMOTE_IP journalctl -u arctic-agent -f3. Verify Cluster Identity
Check the remote agent's cluster identity (no auth required):
curl http://REMOTE_IP:8080/v1/cluster/identityResponse shows:
{
"peer_id": "01HXYZ...",
"public_key": "base64...",
"license_id": "lic_...",
"cluster_id": "01HABC..."
}Verify license_id matches your cluster.
4. Test Network Both Directions
Handshakes require bidirectional communication. Test from both sides:
# From local to remote
curl http://REMOTE_IP:8080/livez
# From remote to local (via SSH)
ssh user@REMOTE_IP curl http://LOCAL_IP:8080/livezFirewall Requirements
Ensure these ports are open:
| Port | Protocol | Direction | Purpose |
|---|---|---|---|
| 8080 | TCP | Bidirectional | API and handshake |
| 51840 | UDP | Bidirectional | IP tunnel (Tempest) |
NAT Considerations
If agents are behind NAT:
- Use port forwarding to expose port 8080
- Specify the public address when adding peers
- Consider a VPN for consistent addressing
Recovery Steps
If handshakes consistently fail:
-
Restart agents on both sides:
systemctl restart arctic-agent -
Re-bootstrap if needed (loses local state):
# Stop agent systemctl stop arctic-agent # Remove database rm /opt/tillered/arctic.db # Start and re-bootstrap systemctl start arctic-agent arctic bootstrap --url http://localhost:8080 --license-file license.json -
Contact support if the issue persists after trying all steps