Troubleshooting

This page is organized by symptom. Start with the behavior you see, run the smallest checks first, and only then move into invasive actions like rebuilding snapshots or changing flags.

Validator fails immediately on first boot

Checks

bash

journalctl -u zink-validator --since "10 minutes ago"
agave-validator --version
cat /proc/sys/vm/max_map_count
ulimit -n

Common causes

missing or unreadable identity keypair
incorrect file ownership or permissions
missing Linux tuning (vm.max_map_count, file limits, memlock)
wrong validator version for the cluster
bad flag syntax in the launch script

What to do

confirm the sol user can read the identity keypair and write logs
re-check the startup script line by line with agave-validator --help
verify systemd LimitNOFILE and LimitMEMLOCK are set
compare the local validator version to the operator-provided cluster version

Node does not appear in gossip

Checks

bash

solana gossip | grep <IDENTITY_PUBKEY>
ss -tulpn | grep agave-validator
journalctl -u zink-validator --since "10 minutes ago"

Common causes

wrong --entrypoint
blocked UDP/TCP traffic in the --dynamic-port-range
wrong identity key configured
host networking, NAT, or public-IP issue
process never started successfully

What to do

verify firewall rules for the configured dynamic port range
confirm the identity public key matches the intended node
confirm the cluster entrypoints are current
inspect logs for bind failures or early startup exit

Node will not catch up

Checks

bash

solana catchup <IDENTITY_PUBKEY>
free -h
iostat -xz 1
journalctl -u zink-validator --since "30 minutes ago"

Common causes

disks too slow
insufficient RAM
old or bad snapshot
bandwidth bottleneck
wrong cluster or genesis-hash mismatch

What to do

verify local hardware really meets the published floor
confirm --expected-genesis-hash is correct
check for repeated snapshot replay or repair loops in logs
check whether ledger/accounts disks are saturating

Validator is up but the vote account is not progressing

Checks

bash

solana vote-account <VOTE_ACCOUNT_PUBKEY>
solana validators | grep <VOTE_ACCOUNT_OR_IDENTITY_PUBKEY>
solana stakes <VOTE_ACCOUNT_PUBKEY>

Common causes

wrong vote account configured
authority mismatch
node too far behind to vote reliably
onboarding / delegation issue on a permissioned cluster

What to do

confirm the configured vote account matches onboarding records
confirm the node is near cluster head
verify the vote account has the expected authorities
coordinate with Zink operators if validator-set admission is still pending

RPC node responds slowly or returns stale data

Checks

bash

curl -s http://127.0.0.1:8899/health
solana --url http://127.0.0.1:8899 slot
solana slot

Common causes

slot lag versus cluster head
overloaded account indexes
insufficient RAM or disk throughput
too much client traffic for a single node

What to do

compare the local slot to a trusted reference RPC
reduce unnecessary indexes
move heavy workloads to separate RPC nodes
place a proxy or load balancer in front of multiple nodes if traffic volume justifies it

Snapshot sync or restart loops

Checks

bash

journalctl -u zink-validator --since "1 hour ago" | tail -200
lsblk
df -h /mnt/ledger /mnt/accounts

Common causes

corrupt snapshot or ledger state
not enough free disk space
WAL / replay failures
version mismatch with cluster

What to do

confirm enough free space exists on ledger and accounts volumes
confirm local agave-validator version matches cluster expectations
review whether --wal-recovery-mode skip_any_corrupted_record is appropriate for your situation
only rebuild local state after less destructive diagnostics fail

Wrong cluster / wrong endpoint mistakes

This one is embarrassingly common.

Checks

bash

solana config get
solana cluster-version

What to do

verify the CLI is pointed at the intended Zink cluster
verify your application config uses the same RPC URL you think it does
verify browser wallets are not silently connected to a different network
verify the validator startup script is using the current Zink bootstrap bundle instead of copied old values

Zink recommendation

When debugging production incidents, check cluster targeting first. A surprising amount of “bad data” is just a tool or wallet pointed at the wrong chain.

Clock drift or time-sync weirdness

Checks

bash

timedatectl status

Common causes

NTP disabled
host clock drift after suspend / resume or VM host issues

What to do

enable a reliable time-sync service such as systemd-timesyncd or chrony
correct time drift before chasing consensus symptoms that may only be side effects

Before escalating

Gather:

node identity pubkey
vote account pubkey, if applicable
exact cluster / RPC URL
current validator version
recent log excerpt
output from solana catchup, solana gossip, and solana vote-account where relevant

That saves a lot of back-and-forth if you need help from the Zink team.

Troubleshooting ​

Validator fails immediately on first boot ​

Checks ​

Common causes ​

What to do ​

Node does not appear in gossip ​

Checks ​

Common causes ​

What to do ​

Node will not catch up ​

Checks ​

Common causes ​

What to do ​

Validator is up but the vote account is not progressing ​

Checks ​

Common causes ​

What to do ​

RPC node responds slowly or returns stale data ​

Checks ​

Common causes ​

What to do ​

Snapshot sync or restart loops ​

Checks ​

Common causes ​

What to do ​

Wrong cluster / wrong endpoint mistakes ​

Checks ​

What to do ​

Clock drift or time-sync weirdness ​

Checks ​

Common causes ​

What to do ​

Before escalating ​

Related pages ​

Troubleshooting

Validator fails immediately on first boot

Checks

Common causes

What to do

Node does not appear in gossip

Checks

Common causes

What to do

Node will not catch up

Checks

Common causes

What to do

Validator is up but the vote account is not progressing

Checks

Common causes

What to do

RPC node responds slowly or returns stale data

Checks

Common causes

What to do

Snapshot sync or restart loops

Checks

Common causes

What to do

Wrong cluster / wrong endpoint mistakes

Checks

What to do

Clock drift or time-sync weirdness

Checks

Common causes

What to do

Before escalating

Related pages