Tillered Arctic
How-To GuidesTroubleshooting

Handshake Failures

How to diagnose and fix peer handshake errors

Troubleshooting Handshake Failures

This guide helps you diagnose and resolve peer handshake failures when adding agents to your cluster.

Understanding Handshakes

When you add a peer, Arctic performs a handshake:

  1. The local agent contacts the remote agent
  2. Both agents exchange Ed25519 public keys
  3. Both verify signatures against the shared license
  4. On success, both store each other's peer information

Common Error Messages

Connection Refused

Error: handshake failed: connection refused

Cause: Cannot establish TCP connection to the remote agent.

Resolution:

  1. Verify the remote agent is running:

    curl http://REMOTE_IP:8080/livez
  2. Check network connectivity:

    ping REMOTE_IP
    telnet REMOTE_IP 8080
  3. Verify firewall allows port 8080

Connection Timeout

Error: handshake failed: connection timeout

Cause: Network path exists but connection cannot complete.

Resolution:

  1. Check for firewall rules blocking the connection
  2. Verify there are no NAT issues
  3. Check the remote agent is listening on the expected interface

License Mismatch

Error: handshake failed: license mismatch

Cause: The agents were bootstrapped with different licenses.

Resolution:

  1. Check license IDs on both agents:

    # On local agent
    arctic license show
    
    # On remote agent
    arctic license show --url http://REMOTE_IP:8080
  2. If different, re-bootstrap one agent with the correct license

Invalid Signature

Error: handshake failed: invalid signature

Cause: The peer's signature does not verify against the license public keys.

Resolution:

  1. This may indicate a tampered or corrupted peer key
  2. Re-bootstrap the affected agent
  3. If persistent, contact support

Peer Already Exists

Error: peer already exists in cluster

Cause: This peer was previously added to the cluster.

Resolution:

  1. List existing peers:

    arctic peers list
  2. The peer may already be connected

  3. If you need to re-add, delete first:

    arctic peers delete PEER_ID --yes

Node Limit Exceeded

Error: handshake failed: node limit exceeded

Cause: Your license has a maximum number of nodes.

Resolution:

  1. Check your license limits:

    arctic license show
  2. Remove unused peers to make room

  3. Contact your administrator to upgrade the license

Debugging Steps

1. Enable Debug Logging

Run the CLI with debug output:

arctic peers add REMOTE_IP:8080 --debug

Or trace HTTP requests:

arctic peers add REMOTE_IP:8080 --trace

2. Check Agent Logs

View logs on both agents:

# Local agent
journalctl -u arctic-agent -f

# Remote agent (via SSH)
ssh user@REMOTE_IP journalctl -u arctic-agent -f

3. Verify Cluster Identity

Check the remote agent's cluster identity (no auth required):

curl http://REMOTE_IP:8080/v1/cluster/identity

Response shows:

{
  "peer_id": "01HXYZ...",
  "public_key": "base64...",
  "license_id": "lic_...",
  "cluster_id": "01HABC..."
}

Verify license_id matches your cluster.

4. Test Network Both Directions

Handshakes require bidirectional communication. Test from both sides:

# From local to remote
curl http://REMOTE_IP:8080/livez

# From remote to local (via SSH)
ssh user@REMOTE_IP curl http://LOCAL_IP:8080/livez

Firewall Requirements

Ensure these ports are open:

PortProtocolDirectionPurpose
8080TCPBidirectionalAPI and handshake
51840UDPBidirectionalIP tunnel (Tempest)

NAT Considerations

If agents are behind NAT:

  1. Use port forwarding to expose port 8080
  2. Specify the public address when adding peers
  3. Consider a VPN for consistent addressing

Recovery Steps

If handshakes consistently fail:

  1. Restart agents on both sides:

    systemctl restart arctic-agent
  2. Re-bootstrap if needed (loses local state):

    # Stop agent
    systemctl stop arctic-agent
    
    # Remove database
    rm /opt/tillered/arctic.db
    
    # Start and re-bootstrap
    systemctl start arctic-agent
    arctic bootstrap --url http://localhost:8080 --license-file license.json
  3. Contact support if the issue persists after trying all steps