Troubleshooting Handshake Failures

This guide helps you diagnose and resolve peer handshake failures when adding agents to your cluster.

Understanding Handshakes

When you add a peer, Arctic performs a handshake:

The local agent contacts the remote agent
Both agents exchange Ed25519 public keys
Both verify signatures against the shared license
On success, both store each other's peer information

Common Error Messages

Connection Refused

Error: handshake failed: connection refused

Cause: Cannot establish TCP connection to the remote agent.

Resolution:

Verify the remote agent is running:
```
curl http://REMOTE_IP:8080/livez
```
Check network connectivity:
```
ping REMOTE_IP
telnet REMOTE_IP 8080
```
Verify firewall allows port 8080

Connection Timeout

Error: handshake failed: connection timeout

Cause: Network path exists but connection cannot complete.

Resolution:

Check for firewall rules blocking the connection
Verify there are no NAT issues
Check the remote agent is listening on the expected interface

License Mismatch

Error: handshake failed: license mismatch

Cause: The agents were bootstrapped with different licenses.

Resolution:

Check license IDs on both agents:

# On local agent
arctic license show

# On remote agent
arctic license show --url http://REMOTE_IP:8080

If different, re-bootstrap one agent with the correct license

Invalid Signature

Error: handshake failed: invalid signature

Cause: The peer's signature does not verify against the license public keys.

Resolution:

This may indicate a tampered or corrupted peer key
Re-bootstrap the affected agent
If persistent, contact support

Peer Already Exists

Error: peer already exists in cluster

Cause: This peer was previously added to the cluster.

Resolution:

List existing peers:
```
arctic peers list
```
The peer may already be connected
If you need to re-add, delete first:
```
arctic peers delete PEER_ID --yes
```

Node Limit Exceeded

Error: handshake failed: node limit exceeded

Cause: Your license has a maximum number of nodes.

Resolution:

Check your license limits:
```
arctic license show
```
Remove unused peers to make room
Contact your administrator to upgrade the license

Debugging Steps

1. Enable Debug Logging

Run the CLI with debug output:

arctic peers add REMOTE_IP:8080 --debug

Or trace HTTP requests:

arctic peers add REMOTE_IP:8080 --trace

2. Check Agent Logs

View logs on both agents:

# Local agent
journalctl -u arctic-agent -f

# Remote agent (via SSH)
ssh user@REMOTE_IP journalctl -u arctic-agent -f

3. Verify Cluster Identity

Check the remote agent's cluster identity (no auth required):

curl http://REMOTE_IP:8080/v1/cluster/identity

Response shows:

{
  "peer_id": "01HXYZ...",
  "public_key": "base64...",
  "license_id": "lic_...",
  "cluster_id": "01HABC..."
}

Verify license_id matches your cluster.

4. Test Network Both Directions

Handshakes require bidirectional communication. Test from both sides:

# From local to remote
curl http://REMOTE_IP:8080/livez

# From remote to local (via SSH)
ssh user@REMOTE_IP curl http://LOCAL_IP:8080/livez

Firewall Requirements

Ensure these ports are open:

Port	Protocol	Direction	Purpose
8080	TCP	Bidirectional	API and handshake
51840	UDP	Bidirectional	IP tunnel (Tempest)

NAT Considerations

If agents are behind NAT:

Use port forwarding to expose port 8080
Specify the public address when adding peers
Consider a VPN for consistent addressing

Recovery Steps

If handshakes consistently fail:

Restart agents on both sides:
```
systemctl restart arctic-agent
```

Re-bootstrap if needed (loses local state):

# Stop agent
systemctl stop arctic-agent

# Remove database
rm /opt/tillered/arctic.db

# Start and re-bootstrap
systemctl start arctic-agent
arctic bootstrap --url http://localhost:8080 --license-file license.json

Contact support if the issue persists after trying all steps

Handshake Failures

On this page