Difficult debugging problem

erlang
phoenix
deployment

#1

I’m having a very frustrating issue debugging something, and I’m now resorting to the forum for help.

I’m sure people will recognize this question as I’ve exhausted a lot of options trying to solve it.

I have an API that currently gets ~100,000 incoming requests/minute (1,600 RPS). For each incoming request, I create 2 to 4 outbound requests which each have a 500ms timeout. If any requests don’t complete within 500ms, I take the ones which completed successfully, analyze the results, and then return the response back to the original incoming API request. So basically 100K incoming equates to 200K-400K outgoing.

I need to scale incoming requests to 25K RPS (to start) and beyond… upwards of 200K incoming RPS (400K-800K outgoing)

I log all outgoing request error codes, timeouts, etc.

I’m encountering an error where outgoing requests will start to report timeouts at ~20K incoming req/min (a very low number… thats only about 1000 outgoing RPS). If it goes much higher than that, I will see outgoing request timeouts as high as 80% to 95%.

I’ve hired a few people to help me figure this out, and everyone is stumped. Here are my current findings:

  • Increased ulimit
  • Added +K and +Q 134217727 (total ports) in vm.args
  • Migrating from AWS to Google Cloud helped by about a factor of 2-3x but didn’t fix the issue. Currently I’m on Google Cloud and the issue still happens at ~20K/min (incoming)
  • I know for a fact it’s not the APIs themselves (I’ll spare you all those details)
  • I’ve tried multiple HTTP libraries (Tesla, Buoy, Machine Gun, HTTPoison), all have basically the same results with different levels of timeouts and errors
  • I’ve tried not using a pool, and adjusting pool settings. I don’t remember the exact specifics of how it affected it, I just know it didn’t do much

Most importantly… observer seems to report no issues. Every single person that looks is like “hmm… you’re right, observer shows no MsgQ backups, and Reds is a normal number. Theres no IO load.”

So far, everyone has thought this is some issue lying outside of Erlang (all things seem to point to it, especially considering migrating from AWS to GKE helped).

However, last night I did a test which 100% confirms to me the issue is not ‘outside’ but within Erlang/Elixir/the HTTP library.

First, I created a node pool in Kubernetes with a single node: a highcpu-64 (64 CPUs and 240GB memory). There are zero other nodes running.

For those not familiar with kubernetes, a “node” is a server instance, and a “pod” is an instance of the app running.

I then made a single pod running on that node. I observed the app would start having timeouts at ~20K/min. Obviously the CPU and Memory on the machine was at like 1% on each because its a huge server. Again, when timeouts occur, observer has no backed up MsgQ and Reds is fairly normal.

I then increased pods to 12, all running on the same node (no changes were made to the node, networking settings, etc. All variables are the same).

Each instance of the app logs stats, so now instead of seeing 1 set of stats @ 20K/min I see 12 sets of stats @ 1.6K/min each. Low and behold, the problem magically goes away. In fact, I can now increase the traffic to 10K per app instance (10000 * 12 = 120K/min), with < 1% timeouts.

To phrase it another way: 1 app on 1 64-cpu server = 20K/min max throughput before errors. 12 apps on SAME 64-cpu server with no other changes = 10K throughput/min per app without errors. Thats 120K total/min. 120K is almost an order of magnitude greater than 20K/min so the max throughput without errors absolutely increased by changing only erlang/elixir and no outside server settings/firewall settings, etc. I’m sure it could even up to ~20K (each) again before it starts getting errors but its difficult to control the traffic level.

This to me proves the problem lies in the elixir/erlang app space and not some external issue (UNLESS each instance of erlang is somehow allocating outside resources based on a limit, and creating new app instances is “fixing it” because its getting more of those outside resources).

I guess I’m out of ideas of what to think about / try. If anyone has any complicated linux commands I could run, like using strace or something to help debug I’d love to hear. I could sit here all day playing with pool settings and http libraries but my gut says its not that, because in every library I try there’s always some low limit, which makes me think the problem is somewhere else.

Sorry this isn’t more specific, theres so many things I’ve tried and SO many knobs and dials to turn that it makes this extremely difficult to get a 100% cause / effect, especially considering the traffic level always fluctuates. So one minute I’ll be testing something at 8K/min but then i’ll change something and the traffic level goes to 13K/min, changing variables.


Elixir Forum 2019 Update!
#2

what cpu/memory limits have you configured for the pods?

how much memory is the Beam VM using when timeouts starts to occur?


#3

Based just on reading through this, nothing immediately stands out. My main purpose in replying is just to say that my experience with kubernetes heavily suggests

is a very likely an issue.

You mention gaining some benefit going from AWS to GCP, what instance types were used on each? Did you get any benefit from using images with higher resources?


#4

As hinted at by the first reply, when scaling Elixir you’ll always want to first increase the number of CPUs available per pod before you scale the number of pods. There’s no point in running 12 1 cpu pods when you could just do 1 pod with 12 cpus. The latter will perform much better.

EDIT: When you spawn it with just 1 pod, how many schedulers does the log print out are available?


#5

This is pretty far outside what I know so the following is just a theory… What do you see when you run ss -s in the container to get a summary of the socket states when the problem is happening?

My (probably wrong!) theory is that you’re running out of space in a NAT table somewhere. You’re connecting the same container IP to the same backend service IPs over and over again so if those connections aren’t being closed properly (because you timeout after 500ms and maybe don’t let them shut down properly) then they could be stuck in CLOSE_WAIT. Those connections won’t be removed until they’ve timed out (kube-proxy docs say 1 hour though I don’t know if this the actual relevant timeout).

I’ll stress again I don’t really know much about k8s networking but I have read that outgoing connections are SNAT’d.


#6

I don’t believe I have any “limits” imposed. Is that in my YAML file? I will double check if you tell me that’s where they are.

Here is my results from inside the docker container:

bash-4.4# ulimit -a

core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 967296
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1048576
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) unlimited
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

bash-4.4# sysctl -a

abi.vsyscall32 = 1
debug.exception-trace = 1
debug.kprobes-optimization = 1
dev.hpet.max-user-freq = 64
dev.scsi.logging_level = 0
fs.aio-max-nr = 65536
fs.aio-nr = 0
fs.dentry-state = 282753	257773	45	0	0	0
fs.dir-notify-enable = 1
fs.epoll.max_user_watches = 50714173
fs.file-max = 24732833
fs.file-nr = 3712	0	24732833
fs.inode-nr = 194474	521
fs.inode-state = 194474	521	0	0	0	0	0
fs.inotify.max_queued_events = 16384
fs.inotify.max_user_instances = 128
fs.inotify.max_user_watches = 8192
fs.lease-break-time = 45
fs.leases-enable = 1
fs.mount-max = 100000
fs.mqueue.msg_default = 10
fs.mqueue.msg_max = 10
fs.mqueue.msgsize_default = 8192
fs.mqueue.msgsize_max = 8192
fs.mqueue.queues_max = 256
fs.nr_open = 1048576
fs.overflowgid = 65534
fs.overflowuid = 65534
fs.pipe-max-size = 1048576
fs.pipe-user-pages-hard = 0
fs.pipe-user-pages-soft = 16384
fs.protected_hardlinks = 1
fs.protected_symlinks = 1
fs.quota.allocated_dquots = 0
fs.quota.cache_hits = 0
fs.quota.drops = 0
fs.quota.free_dquots = 0
fs.quota.lookups = 0
fs.quota.reads = 0
fs.quota.syncs = 240
fs.quota.writes = 0
fs.suid_dumpable = 2
kernel.acct = 4	2	30
kernel.acpi_video_flags = 0
kernel.auto_msgmni = 0
kernel.bootloader_type = 114
kernel.bootloader_version = 2
kernel.cad_pid = 0
kernel.cap_last_cap = 37
kernel.core_pattern = core.%e.%p.%t
kernel.core_pipe_limit = 4
kernel.core_uses_pid = 0
kernel.ctrl-alt-del = 0
kernel.dmesg_restrict = 0
kernel.domainname = (none)
kernel.ftrace_dump_on_oops = 0
kernel.ftrace_enabled = 1
kernel.hardlockup_all_cpu_backtrace = 0
kernel.hardlockup_panic = 1
kernel.hostname = app-web-7977665d78-pgsxv
kernel.hotplug =
kernel.hung_task_check_count = 4194304
kernel.hung_task_panic = 1
kernel.hung_task_timeout_secs = 300
kernel.hung_task_warnings = 10
kernel.io_delay_type = 1
kernel.keys.gc_delay = 300
kernel.keys.maxbytes = 20000
kernel.keys.maxkeys = 200
kernel.keys.root_maxbytes = 25000000
kernel.keys.root_maxkeys = 1000000
kernel.kptr_restrict = 1
kernel.max_lock_depth = 1024
kernel.modprobe = /sbin/modprobe
kernel.modules_disabled = 0
kernel.msg_next_id = -1
kernel.msgmax = 8192
kernel.msgmnb = 16384
kernel.msgmni = 32000
kernel.ngroups_max = 65536
kernel.nmi_watchdog = 0
kernel.ns_last_pid = 1484
kernel.osrelease = 4.14.65+
kernel.ostype = Linux
kernel.overflowgid = 65534
kernel.overflowuid = 65534
kernel.panic = 10
kernel.panic_on_io_nmi = 0
kernel.panic_on_oops = 1
kernel.panic_on_rcu_stall = 0
kernel.panic_on_stackoverflow = 0
kernel.panic_on_unrecovered_nmi = 0
kernel.panic_on_warn = 0
kernel.perf_cpu_time_max_percent = 25
kernel.perf_event_max_contexts_per_stack = 8
kernel.perf_event_max_sample_rate = 100000
kernel.perf_event_max_stack = 127
kernel.perf_event_mlock_kb = 516
kernel.perf_event_paranoid = 2
kernel.pid_max = 4194304
kernel.poweroff_cmd = /sbin/poweroff
kernel.print-fatal-signals = 0
kernel.printk = 7	4	1	7
kernel.printk_delay = 0
kernel.printk_devkmsg = ratelimit
kernel.printk_ratelimit = 5
kernel.printk_ratelimit_burst = 10
kernel.pty.max = 4096
kernel.pty.nr = 3
kernel.pty.reserve = 1024
kernel.random.boot_id = 6a238ad6-593d-4c88-9355-b4754334ee39
kernel.random.entropy_avail = 3396
kernel.random.poolsize = 4096
kernel.random.read_wakeup_threshold = 64
kernel.random.urandom_min_reseed_secs = 60
kernel.random.uuid = 5b0c810b-bfdd-46fa-9ce6-9d69c79d78bf
kernel.random.write_wakeup_threshold = 896
kernel.randomize_va_space = 2
kernel.real-root-dev = 0
kernel.sched_cfs_bandwidth_slice_us = 5000
kernel.sched_child_runs_first = 0
kernel.sched_rr_timeslice_ms = 100
kernel.sched_rt_period_us = 1000000
kernel.sched_rt_runtime_us = 950000
kernel.seccomp.actions_avail = kill_process kill_thread trap errno trace log allow
kernel.seccomp.actions_logged = kill_process kill_thread trap errno trace log
kernel.selinux_enforcing = 0
kernel.sem = 32000	1024000000	500	32000
kernel.sem_next_id = -1
kernel.shm_next_id = -1
kernel.shm_rmid_forced = 0
kernel.shmall = 18446744073692774399
kernel.shmmax = 18446744073692774399
kernel.shmmni = 4096
kernel.soft_watchdog = 1
kernel.softlockup_all_cpu_backtrace = 0
kernel.softlockup_panic = 1
kernel.sysctl_writes_strict = 1
kernel.sysrq = 1
kernel.tainted = 0
kernel.threads-max = 1934592
kernel.timer_migration = 1
kernel.traceoff_on_warning = 0
kernel.tracepoint_printk = 0
kernel.unknown_nmi_panic = 0
kernel.unprivileged_bpf_disabled = 1
sysctl: error reading key 'kernel.unprivileged_userns_apparmor_policy': Operation not permitted
kernel.usermodehelper.bset = 4294967295	63
kernel.usermodehelper.inheritable = 4294967295	63
kernel.version = #1 SMP Thu Oct 25 10:42:50 PDT 2018
kernel.watchdog = 1
kernel.watchdog_cpumask = 0-63
kernel.watchdog_thresh = 10
kernel.yama.ptrace_scope = 1
net.core.android_paranoid = 0
net.core.somaxconn = 128
net.core.xfrm_acq_expires = 30
net.core.xfrm_aevent_etime = 10
net.core.xfrm_aevent_rseqth = 2
net.core.xfrm_larval_drop = 1
net.ipv4.conf.all.accept_local = 0
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.all.accept_source_route = 0
net.ipv4.conf.all.arp_accept = 0
net.ipv4.conf.all.arp_announce = 0
net.ipv4.conf.all.arp_filter = 0
net.ipv4.conf.all.arp_ignore = 0
net.ipv4.conf.all.arp_notify = 0
net.ipv4.conf.all.bootp_relay = 0
net.ipv4.conf.all.disable_policy = 0
net.ipv4.conf.all.disable_xfrm = 0
net.ipv4.conf.all.drop_gratuitous_arp = 0
net.ipv4.conf.all.drop_unicast_in_l2_multicast = 0
net.ipv4.conf.all.force_igmp_version = 0
net.ipv4.conf.all.forwarding = 1
net.ipv4.conf.all.igmpv2_unsolicited_report_interval = 10000
net.ipv4.conf.all.igmpv3_unsolicited_report_interval = 1000
net.ipv4.conf.all.ignore_routes_with_linkdown = 0
net.ipv4.conf.all.log_martians = 0
net.ipv4.conf.all.mc_forwarding = 0
net.ipv4.conf.all.medium_id = 0
net.ipv4.conf.all.promote_secondaries = 0
net.ipv4.conf.all.proxy_arp = 0
net.ipv4.conf.all.proxy_arp_pvlan = 0
net.ipv4.conf.all.route_localnet = 1
net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.all.secure_redirects = 1
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.all.shared_media = 1
net.ipv4.conf.all.src_valid_mark = 0
net.ipv4.conf.all.tag = 0
net.ipv4.conf.default.accept_local = 0
net.ipv4.conf.default.accept_redirects = 0
net.ipv4.conf.default.accept_source_route = 0
net.ipv4.conf.default.arp_accept = 0
net.ipv4.conf.default.arp_announce = 0
net.ipv4.conf.default.arp_filter = 0
net.ipv4.conf.default.arp_ignore = 0
net.ipv4.conf.default.arp_notify = 0
net.ipv4.conf.default.bootp_relay = 0
net.ipv4.conf.default.disable_policy = 0
net.ipv4.conf.default.disable_xfrm = 0
net.ipv4.conf.default.drop_gratuitous_arp = 0
net.ipv4.conf.default.drop_unicast_in_l2_multicast = 0
net.ipv4.conf.default.force_igmp_version = 0
net.ipv4.conf.default.forwarding = 1
net.ipv4.conf.default.igmpv2_unsolicited_report_interval = 10000
net.ipv4.conf.default.igmpv3_unsolicited_report_interval = 1000
net.ipv4.conf.default.ignore_routes_with_linkdown = 0
net.ipv4.conf.default.log_martians = 0
net.ipv4.conf.default.mc_forwarding = 0
net.ipv4.conf.default.medium_id = 0
net.ipv4.conf.default.promote_secondaries = 0
net.ipv4.conf.default.proxy_arp = 0
net.ipv4.conf.default.proxy_arp_pvlan = 0
net.ipv4.conf.default.route_localnet = 0
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.default.secure_redirects = 1
net.ipv4.conf.default.send_redirects = 0
net.ipv4.conf.default.shared_media = 1
net.ipv4.conf.default.src_valid_mark = 0
net.ipv4.conf.default.tag = 0
net.ipv4.conf.eth0.accept_local = 0
net.ipv4.conf.eth0.accept_redirects = 0
net.ipv4.conf.eth0.accept_source_route = 0
net.ipv4.conf.eth0.arp_accept = 0
net.ipv4.conf.eth0.arp_announce = 0
net.ipv4.conf.eth0.arp_filter = 0
net.ipv4.conf.eth0.arp_ignore = 0
net.ipv4.conf.eth0.arp_notify = 0
net.ipv4.conf.eth0.bootp_relay = 0
net.ipv4.conf.eth0.disable_policy = 0
net.ipv4.conf.eth0.disable_xfrm = 0
net.ipv4.conf.eth0.drop_gratuitous_arp = 0
net.ipv4.conf.eth0.drop_unicast_in_l2_multicast = 0
net.ipv4.conf.eth0.force_igmp_version = 0
net.ipv4.conf.eth0.forwarding = 1
net.ipv4.conf.eth0.igmpv2_unsolicited_report_interval = 10000
net.ipv4.conf.eth0.igmpv3_unsolicited_report_interval = 1000
net.ipv4.conf.eth0.ignore_routes_with_linkdown = 0
net.ipv4.conf.eth0.log_martians = 0
net.ipv4.conf.eth0.mc_forwarding = 0
net.ipv4.conf.eth0.medium_id = 0
net.ipv4.conf.eth0.promote_secondaries = 0
net.ipv4.conf.eth0.proxy_arp = 0
net.ipv4.conf.eth0.proxy_arp_pvlan = 0
net.ipv4.conf.eth0.route_localnet = 0
net.ipv4.conf.eth0.rp_filter = 1
net.ipv4.conf.eth0.secure_redirects = 1
net.ipv4.conf.eth0.send_redirects = 0
net.ipv4.conf.eth0.shared_media = 1
net.ipv4.conf.eth0.src_valid_mark = 0
net.ipv4.conf.eth0.tag = 0
net.ipv4.conf.lo.accept_local = 0
net.ipv4.conf.lo.accept_redirects = 0
net.ipv4.conf.lo.accept_source_route = 0
net.ipv4.conf.lo.arp_accept = 0
net.ipv4.conf.lo.arp_announce = 0
net.ipv4.conf.lo.arp_filter = 0
net.ipv4.conf.lo.arp_ignore = 0
net.ipv4.conf.lo.arp_notify = 0
net.ipv4.conf.lo.bootp_relay = 0
net.ipv4.conf.lo.disable_policy = 1
net.ipv4.conf.lo.disable_xfrm = 1
net.ipv4.conf.lo.drop_gratuitous_arp = 0
net.ipv4.conf.lo.drop_unicast_in_l2_multicast = 0
net.ipv4.conf.lo.force_igmp_version = 0
net.ipv4.conf.lo.forwarding = 1
net.ipv4.conf.lo.igmpv2_unsolicited_report_interval = 10000
net.ipv4.conf.lo.igmpv3_unsolicited_report_interval = 1000
net.ipv4.conf.lo.ignore_routes_with_linkdown = 0
net.ipv4.conf.lo.log_martians = 0
net.ipv4.conf.lo.mc_forwarding = 0
net.ipv4.conf.lo.medium_id = 0
net.ipv4.conf.lo.promote_secondaries = 0
net.ipv4.conf.lo.proxy_arp = 0
net.ipv4.conf.lo.proxy_arp_pvlan = 0
net.ipv4.conf.lo.route_localnet = 0
net.ipv4.conf.lo.rp_filter = 1
net.ipv4.conf.lo.secure_redirects = 1
net.ipv4.conf.lo.send_redirects = 0
net.ipv4.conf.lo.shared_media = 1
net.ipv4.conf.lo.src_valid_mark = 0
net.ipv4.conf.lo.tag = 0
net.ipv4.fib_multipath_hash_policy = 0
net.ipv4.fib_multipath_use_neigh = 0
net.ipv4.fwmark_reflect = 0
net.ipv4.icmp_echo_ignore_all = 0
net.ipv4.icmp_echo_ignore_broadcasts = 1
net.ipv4.icmp_errors_use_inbound_ifaddr = 0
net.ipv4.icmp_ignore_bogus_error_responses = 1
net.ipv4.icmp_ratelimit = 1000
net.ipv4.icmp_ratemask = 6168
net.ipv4.igmp_link_local_mcast_reports = 1
net.ipv4.igmp_max_memberships = 20
net.ipv4.igmp_max_msf = 10
net.ipv4.igmp_qrv = 2
net.ipv4.ip_default_ttl = 64
net.ipv4.ip_dynaddr = 0
net.ipv4.ip_early_demux = 1
net.ipv4.ip_forward = 1
net.ipv4.ip_forward_use_pmtu = 0
net.ipv4.ip_local_port_range = 32768	60999
net.ipv4.ip_local_reserved_ports =
net.ipv4.ip_no_pmtu_disc = 0
net.ipv4.ip_nonlocal_bind = 0
net.ipv4.ip_unprivileged_port_start = 1024
net.ipv4.ipfrag_high_thresh = 4194304
net.ipv4.ipfrag_low_thresh = 3145728
net.ipv4.ipfrag_max_dist = 64
net.ipv4.ipfrag_time = 30
net.ipv4.neigh.eth0.anycast_delay = 100
net.ipv4.neigh.eth0.app_solicit = 0
net.ipv4.neigh.eth0.base_reachable_time = 30
net.ipv4.neigh.eth0.base_reachable_time_ms = 30000
net.ipv4.neigh.eth0.delay_first_probe_time = 5
net.ipv4.neigh.eth0.gc_stale_time = 60
net.ipv4.neigh.eth0.locktime = 100
net.ipv4.neigh.eth0.mcast_resolicit = 0
net.ipv4.neigh.eth0.mcast_solicit = 3
net.ipv4.neigh.eth0.proxy_delay = 80
net.ipv4.neigh.eth0.proxy_qlen = 64
net.ipv4.neigh.eth0.retrans_time = 100
net.ipv4.neigh.eth0.retrans_time_ms = 1000
net.ipv4.neigh.eth0.ucast_solicit = 3
net.ipv4.neigh.eth0.unres_qlen = 101
net.ipv4.neigh.eth0.unres_qlen_bytes = 212992
net.ipv4.neigh.lo.anycast_delay = 100
net.ipv4.neigh.lo.app_solicit = 0
net.ipv4.neigh.lo.base_reachable_time = 30
net.ipv4.neigh.lo.base_reachable_time_ms = 30000
net.ipv4.neigh.lo.delay_first_probe_time = 5
net.ipv4.neigh.lo.gc_stale_time = 60
net.ipv4.neigh.lo.locktime = 100
net.ipv4.neigh.lo.mcast_resolicit = 0
net.ipv4.neigh.lo.mcast_solicit = 3
net.ipv4.neigh.lo.proxy_delay = 80
net.ipv4.neigh.lo.proxy_qlen = 64
net.ipv4.neigh.lo.retrans_time = 100
net.ipv4.neigh.lo.retrans_time_ms = 1000
net.ipv4.neigh.lo.ucast_solicit = 3
net.ipv4.neigh.lo.unres_qlen = 101
net.ipv4.neigh.lo.unres_qlen_bytes = 212992
net.ipv4.ping_group_range = 1	0
net.ipv4.tcp_base_mss = 1024
net.ipv4.tcp_default_init_rwnd = 20
net.ipv4.tcp_early_demux = 1
net.ipv4.tcp_ecn = 2
net.ipv4.tcp_ecn_fallback = 1
net.ipv4.tcp_fin_timeout = 60
net.ipv4.tcp_fwmark_accept = 0
net.ipv4.tcp_keepalive_intvl = 75
net.ipv4.tcp_keepalive_probes = 9
net.ipv4.tcp_keepalive_time = 7200
net.ipv4.tcp_l3mdev_accept = 0
net.ipv4.tcp_max_syn_backlog = 2048
net.ipv4.tcp_max_tw_buckets = 262144
net.ipv4.tcp_mtu_probing = 0
net.ipv4.tcp_notsent_lowat = 4294967295
net.ipv4.tcp_orphan_retries = 0
net.ipv4.tcp_probe_interval = 600
net.ipv4.tcp_probe_threshold = 8
net.ipv4.tcp_reordering = 3
net.ipv4.tcp_retries1 = 3
net.ipv4.tcp_retries2 = 15
net.ipv4.tcp_sack = 1
net.ipv4.tcp_syn_retries = 6
net.ipv4.tcp_synack_retries = 5
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_timestamps = 1
net.ipv4.tcp_tw_reuse = 0
net.ipv4.tcp_window_scaling = 1
net.ipv4.udp_early_demux = 1
net.ipv4.udp_l3mdev_accept = 0
net.ipv4.vs.am_droprate = 10
net.ipv4.vs.amemthresh = 1024
net.ipv4.vs.backup_only = 0
net.ipv4.vs.cache_bypass = 0
net.ipv4.vs.conn_reuse_mode = 1
net.ipv4.vs.conntrack = 0
net.ipv4.vs.drop_entry = 0
net.ipv4.vs.drop_packet = 0
net.ipv4.vs.expire_nodest_conn = 0
net.ipv4.vs.expire_quiescent_template = 0
net.ipv4.vs.ignore_tunneled = 0
net.ipv4.vs.nat_icmp_send = 0
net.ipv4.vs.pmtu_disc = 1
net.ipv4.vs.schedule_icmp = 0
net.ipv4.vs.secure_tcp = 0
net.ipv4.vs.sloppy_sctp = 0
net.ipv4.vs.sloppy_tcp = 0
net.ipv4.vs.snat_reroute = 1
net.ipv4.vs.sync_persist_mode = 0
net.ipv4.vs.sync_ports = 1
net.ipv4.vs.sync_qlen_max = 1930222
net.ipv4.vs.sync_refresh_period = 0
net.ipv4.vs.sync_retries = 0
net.ipv4.vs.sync_sock_size = 0
net.ipv4.vs.sync_threshold = 3	50
net.ipv4.vs.sync_version = 1
net.ipv4.xfrm4_gc_thresh = 32768
net.ipv6.anycast_src_echo_reply = 0
net.ipv6.auto_flowlabels = 1
net.ipv6.bindv6only = 0
net.ipv6.conf.all.accept_dad = 0
net.ipv6.conf.all.accept_ra = 1
net.ipv6.conf.all.accept_ra_defrtr = 1
net.ipv6.conf.all.accept_ra_from_local = 0
net.ipv6.conf.all.accept_ra_min_hop_limit = 1
net.ipv6.conf.all.accept_ra_mtu = 1
net.ipv6.conf.all.accept_ra_pinfo = 1
net.ipv6.conf.all.accept_redirects = 1
net.ipv6.conf.all.accept_source_route = 0
net.ipv6.conf.all.addr_gen_mode = 0
net.ipv6.conf.all.autoconf = 1
net.ipv6.conf.all.dad_transmits = 1
net.ipv6.conf.all.disable_ipv6 = 0
net.ipv6.conf.all.disable_policy = 0
net.ipv6.conf.all.drop_unicast_in_l2_multicast = 0
net.ipv6.conf.all.drop_unsolicited_na = 0
net.ipv6.conf.all.enhanced_dad = 1
net.ipv6.conf.all.force_mld_version = 0
net.ipv6.conf.all.force_tllao = 0
net.ipv6.conf.all.forwarding = 0
net.ipv6.conf.all.hop_limit = 64
net.ipv6.conf.all.ignore_routes_with_linkdown = 0
net.ipv6.conf.all.keep_addr_on_down = 0
net.ipv6.conf.all.max_addresses = 16
net.ipv6.conf.all.max_desync_factor = 600
net.ipv6.conf.all.mldv1_unsolicited_report_interval = 10000
net.ipv6.conf.all.mldv2_unsolicited_report_interval = 1000
net.ipv6.conf.all.mtu = 1280
net.ipv6.conf.all.ndisc_notify = 0
net.ipv6.conf.all.proxy_ndp = 0
net.ipv6.conf.all.regen_max_retry = 3
net.ipv6.conf.all.router_solicitation_delay = 1
net.ipv6.conf.all.router_solicitation_interval = 4
net.ipv6.conf.all.router_solicitation_max_interval = 3600
net.ipv6.conf.all.router_solicitations = -1
net.ipv6.conf.all.seg6_enabled = 0
sysctl: error reading key 'net.ipv6.conf.all.stable_secret': I/O error
net.ipv6.conf.all.suppress_frag_ndisc = 1
net.ipv6.conf.all.temp_prefered_lft = 86400
net.ipv6.conf.all.temp_valid_lft = 604800
net.ipv6.conf.all.use_oif_addrs_only = 0
net.ipv6.conf.all.use_tempaddr = 0
net.ipv6.conf.default.accept_dad = 1
net.ipv6.conf.default.accept_ra = 1
net.ipv6.conf.default.accept_ra_defrtr = 1
net.ipv6.conf.default.accept_ra_from_local = 0
net.ipv6.conf.default.accept_ra_min_hop_limit = 1
net.ipv6.conf.default.accept_ra_mtu = 1
net.ipv6.conf.default.accept_ra_pinfo = 1
net.ipv6.conf.default.accept_redirects = 1
net.ipv6.conf.default.accept_source_route = 0
net.ipv6.conf.default.addr_gen_mode = 0
net.ipv6.conf.default.autoconf = 1
net.ipv6.conf.default.dad_transmits = 0
net.ipv6.conf.default.disable_ipv6 = 0
net.ipv6.conf.default.disable_policy = 0
net.ipv6.conf.default.drop_unicast_in_l2_multicast = 0
net.ipv6.conf.default.drop_unsolicited_na = 0
net.ipv6.conf.default.enhanced_dad = 1
net.ipv6.conf.default.force_mld_version = 0
net.ipv6.conf.default.force_tllao = 0
net.ipv6.conf.default.forwarding = 0
net.ipv6.conf.default.hop_limit = 64
net.ipv6.conf.default.ignore_routes_with_linkdown = 0
net.ipv6.conf.default.keep_addr_on_down = 0
net.ipv6.conf.default.max_addresses = 16
net.ipv6.conf.default.max_desync_factor = 600
net.ipv6.conf.default.mldv1_unsolicited_report_interval = 10000
net.ipv6.conf.default.mldv2_unsolicited_report_interval = 1000
net.ipv6.conf.default.mtu = 1280
net.ipv6.conf.default.ndisc_notify = 0
net.ipv6.conf.default.proxy_ndp = 0
net.ipv6.conf.default.regen_max_retry = 3
net.ipv6.conf.default.router_solicitation_delay = 1
net.ipv6.conf.default.router_solicitation_interval = 4
net.ipv6.conf.default.router_solicitation_max_interval = 3600
net.ipv6.conf.default.router_solicitations = -1
net.ipv6.conf.default.seg6_enabled = 0
sysctl: error reading key 'net.ipv6.conf.default.stable_secret': I/O error
net.ipv6.conf.default.suppress_frag_ndisc = 1
net.ipv6.conf.default.temp_prefered_lft = 86400
net.ipv6.conf.default.temp_valid_lft = 604800
net.ipv6.conf.default.use_oif_addrs_only = 0
net.ipv6.conf.default.use_tempaddr = 0
net.ipv6.conf.eth0.accept_dad = 0
net.ipv6.conf.eth0.accept_ra = 1
net.ipv6.conf.eth0.accept_ra_defrtr = 1
net.ipv6.conf.eth0.accept_ra_from_local = 0
net.ipv6.conf.eth0.accept_ra_min_hop_limit = 1
net.ipv6.conf.eth0.accept_ra_mtu = 1
net.ipv6.conf.eth0.accept_ra_pinfo = 1
net.ipv6.conf.eth0.accept_redirects = 1
net.ipv6.conf.eth0.accept_source_route = 0
net.ipv6.conf.eth0.addr_gen_mode = 0
net.ipv6.conf.eth0.autoconf = 1
net.ipv6.conf.eth0.dad_transmits = 0
net.ipv6.conf.eth0.disable_ipv6 = 0
net.ipv6.conf.eth0.disable_policy = 0
net.ipv6.conf.eth0.drop_unicast_in_l2_multicast = 0
net.ipv6.conf.eth0.drop_unsolicited_na = 0
net.ipv6.conf.eth0.enhanced_dad = 1
net.ipv6.conf.eth0.force_mld_version = 0
net.ipv6.conf.eth0.force_tllao = 0
net.ipv6.conf.eth0.forwarding = 0
net.ipv6.conf.eth0.hop_limit = 64
net.ipv6.conf.eth0.ignore_routes_with_linkdown = 0
net.ipv6.conf.eth0.keep_addr_on_down = 0
net.ipv6.conf.eth0.max_addresses = 16
net.ipv6.conf.eth0.max_desync_factor = 600
net.ipv6.conf.eth0.mldv1_unsolicited_report_interval = 10000
net.ipv6.conf.eth0.mldv2_unsolicited_report_interval = 1000
net.ipv6.conf.eth0.mtu = 1460
net.ipv6.conf.eth0.ndisc_notify = 0
net.ipv6.conf.eth0.proxy_ndp = 0
net.ipv6.conf.eth0.regen_max_retry = 3
net.ipv6.conf.eth0.router_solicitation_delay = 1
net.ipv6.conf.eth0.router_solicitation_interval = 4
net.ipv6.conf.eth0.router_solicitation_max_interval = 3600
net.ipv6.conf.eth0.router_solicitations = -1
net.ipv6.conf.eth0.seg6_enabled = 0
sysctl: error reading key 'net.ipv6.conf.eth0.stable_secret': I/O error
net.ipv6.conf.eth0.suppress_frag_ndisc = 1
net.ipv6.conf.eth0.temp_prefered_lft = 86400
net.ipv6.conf.eth0.temp_valid_lft = 604800
net.ipv6.conf.eth0.use_oif_addrs_only = 0
net.ipv6.conf.eth0.use_tempaddr = 0
net.ipv6.conf.lo.accept_dad = -1
net.ipv6.conf.lo.accept_ra = 1
net.ipv6.conf.lo.accept_ra_defrtr = 1
net.ipv6.conf.lo.accept_ra_from_local = 0
net.ipv6.conf.lo.accept_ra_min_hop_limit = 1
net.ipv6.conf.lo.accept_ra_mtu = 1
net.ipv6.conf.lo.accept_ra_pinfo = 1
net.ipv6.conf.lo.accept_redirects = 1
net.ipv6.conf.lo.accept_source_route = 0
net.ipv6.conf.lo.addr_gen_mode = 0
net.ipv6.conf.lo.autoconf = 1
net.ipv6.conf.lo.dad_transmits = 1
net.ipv6.conf.lo.disable_ipv6 = 0
net.ipv6.conf.lo.disable_policy = 0
net.ipv6.conf.lo.drop_unicast_in_l2_multicast = 0
net.ipv6.conf.lo.drop_unsolicited_na = 0
net.ipv6.conf.lo.enhanced_dad = 1
net.ipv6.conf.lo.force_mld_version = 0
net.ipv6.conf.lo.force_tllao = 0
net.ipv6.conf.lo.forwarding = 0
net.ipv6.conf.lo.hop_limit = 64
net.ipv6.conf.lo.ignore_routes_with_linkdown = 0
net.ipv6.conf.lo.keep_addr_on_down = 0
net.ipv6.conf.lo.max_addresses = 16
net.ipv6.conf.lo.max_desync_factor = 600
net.ipv6.conf.lo.mldv1_unsolicited_report_interval = 10000
net.ipv6.conf.lo.mldv2_unsolicited_report_interval = 1000
net.ipv6.conf.lo.mtu = 65536
net.ipv6.conf.lo.ndisc_notify = 0
net.ipv6.conf.lo.proxy_ndp = 0
net.ipv6.conf.lo.regen_max_retry = 3
net.ipv6.conf.lo.router_solicitation_delay = 1
net.ipv6.conf.lo.router_solicitation_interval = 4
net.ipv6.conf.lo.router_solicitation_max_interval = 3600
net.ipv6.conf.lo.router_solicitations = -1
net.ipv6.conf.lo.seg6_enabled = 0
sysctl: error reading key 'net.ipv6.conf.lo.stable_secret': I/O error
net.ipv6.conf.lo.suppress_frag_ndisc = 1
net.ipv6.conf.lo.temp_prefered_lft = 86400
net.ipv6.conf.lo.temp_valid_lft = 604800
net.ipv6.conf.lo.use_oif_addrs_only = 0
net.ipv6.conf.lo.use_tempaddr = -1
net.ipv6.flowlabel_consistency = 1
net.ipv6.flowlabel_reflect = 0
net.ipv6.flowlabel_state_ranges = 0
net.ipv6.fwmark_reflect = 0
net.ipv6.icmp.ratelimit = 1000
net.ipv6.idgen_delay = 1
net.ipv6.idgen_retries = 3
net.ipv6.ip6frag_high_thresh = 4194304
net.ipv6.ip6frag_low_thresh = 3145728
net.ipv6.ip6frag_time = 60
net.ipv6.ip_nonlocal_bind = 0
net.ipv6.neigh.eth0.anycast_delay = 100
net.ipv6.neigh.eth0.app_solicit = 0
net.ipv6.neigh.eth0.base_reachable_time = 30
net.ipv6.neigh.eth0.base_reachable_time_ms = 30000
net.ipv6.neigh.eth0.delay_first_probe_time = 5
net.ipv6.neigh.eth0.gc_stale_time = 60
net.ipv6.neigh.eth0.locktime = 0
net.ipv6.neigh.eth0.mcast_resolicit = 0
net.ipv6.neigh.eth0.mcast_solicit = 3
net.ipv6.neigh.eth0.proxy_delay = 80
net.ipv6.neigh.eth0.proxy_qlen = 64
net.ipv6.neigh.eth0.retrans_time = 1000
net.ipv6.neigh.eth0.retrans_time_ms = 1000
net.ipv6.neigh.eth0.ucast_solicit = 3
net.ipv6.neigh.eth0.unres_qlen = 101
net.ipv6.neigh.eth0.unres_qlen_bytes = 212992
net.ipv6.neigh.lo.anycast_delay = 100
net.ipv6.neigh.lo.app_solicit = 0
net.ipv6.neigh.lo.base_reachable_time = 30
net.ipv6.neigh.lo.base_reachable_time_ms = 30000
net.ipv6.neigh.lo.delay_first_probe_time = 5
net.ipv6.neigh.lo.gc_stale_time = 60
net.ipv6.neigh.lo.locktime = 0
net.ipv6.neigh.lo.mcast_resolicit = 0
net.ipv6.neigh.lo.mcast_solicit = 3
net.ipv6.neigh.lo.proxy_delay = 80
net.ipv6.neigh.lo.proxy_qlen = 64
net.ipv6.neigh.lo.retrans_time = 1000
net.ipv6.neigh.lo.retrans_time_ms = 1000
net.ipv6.neigh.lo.ucast_solicit = 3
net.ipv6.neigh.lo.unres_qlen = 101
net.ipv6.neigh.lo.unres_qlen_bytes = 212992
net.ipv6.route.gc_elasticity = 9
net.ipv6.route.gc_interval = 30
net.ipv6.route.gc_min_interval = 0
net.ipv6.route.gc_min_interval_ms = 500
net.ipv6.route.gc_thresh = 1024
net.ipv6.route.gc_timeout = 60
net.ipv6.route.max_size = 4096
net.ipv6.route.min_adv_mss = 1220
net.ipv6.route.mtu_expires = 600
net.ipv6.xfrm6_gc_thresh = 32768
net.netfilter.nf_conntrack_acct = 0
net.netfilter.nf_conntrack_buckets = 65536
net.netfilter.nf_conntrack_checksum = 1
net.netfilter.nf_conntrack_count = 0
net.netfilter.nf_conntrack_dccp_loose = 1
net.netfilter.nf_conntrack_dccp_timeout_closereq = 64
net.netfilter.nf_conntrack_dccp_timeout_closing = 64
net.netfilter.nf_conntrack_dccp_timeout_open = 43200
net.netfilter.nf_conntrack_dccp_timeout_partopen = 480
net.netfilter.nf_conntrack_dccp_timeout_request = 240
net.netfilter.nf_conntrack_dccp_timeout_respond = 480
net.netfilter.nf_conntrack_dccp_timeout_timewait = 240
net.netfilter.nf_conntrack_events = 1
net.netfilter.nf_conntrack_expect_max = 1024
net.netfilter.nf_conntrack_generic_timeout = 600
net.netfilter.nf_conntrack_helper = 0
net.netfilter.nf_conntrack_icmp_timeout = 30
net.netfilter.nf_conntrack_log_invalid = 0
net.netfilter.nf_conntrack_max = 2097152
net.netfilter.nf_conntrack_sctp_timeout_closed = 10
net.netfilter.nf_conntrack_sctp_timeout_cookie_echoed = 3
net.netfilter.nf_conntrack_sctp_timeout_cookie_wait = 3
net.netfilter.nf_conntrack_sctp_timeout_established = 432000
net.netfilter.nf_conntrack_sctp_timeout_heartbeat_acked = 210
net.netfilter.nf_conntrack_sctp_timeout_heartbeat_sent = 30
net.netfilter.nf_conntrack_sctp_timeout_shutdown_ack_sent = 3
net.netfilter.nf_conntrack_sctp_timeout_shutdown_recd = 0
net.netfilter.nf_conntrack_sctp_timeout_shutdown_sent = 0
net.netfilter.nf_conntrack_tcp_be_liberal = 0
net.netfilter.nf_conntrack_tcp_loose = 1
net.netfilter.nf_conntrack_tcp_max_retrans = 3
net.netfilter.nf_conntrack_tcp_timeout_close = 10
net.netfilter.nf_conntrack_tcp_timeout_close_wait = 60
net.netfilter.nf_conntrack_tcp_timeout_established = 432000
net.netfilter.nf_conntrack_tcp_timeout_fin_wait = 120
net.netfilter.nf_conntrack_tcp_timeout_last_ack = 30
net.netfilter.nf_conntrack_tcp_timeout_max_retrans = 300
net.netfilter.nf_conntrack_tcp_timeout_syn_recv = 60
net.netfilter.nf_conntrack_tcp_timeout_syn_sent = 120
net.netfilter.nf_conntrack_tcp_timeout_time_wait = 120
net.netfilter.nf_conntrack_tcp_timeout_unacknowledged = 300
net.netfilter.nf_conntrack_timestamp = 0
net.netfilter.nf_conntrack_udp_timeout = 30
net.netfilter.nf_conntrack_udp_timeout_stream = 180
net.netfilter.nf_log.0 = NONE
net.netfilter.nf_log.1 = NONE
net.netfilter.nf_log.10 = NONE
net.netfilter.nf_log.11 = NONE
net.netfilter.nf_log.12 = NONE
net.netfilter.nf_log.2 = NONE
net.netfilter.nf_log.3 = NONE
net.netfilter.nf_log.4 = NONE
net.netfilter.nf_log.5 = NONE
net.netfilter.nf_log.6 = NONE
net.netfilter.nf_log.7 = NONE
net.netfilter.nf_log.8 = NONE
net.netfilter.nf_log.9 = NONE
net.unix.max_dgram_qlen = 60
user.max_cgroup_namespaces = 967296
user.max_inotify_instances = 128
user.max_inotify_watches = 8192
user.max_ipc_namespaces = 967296
user.max_mnt_namespaces = 967296
user.max_net_namespaces = 967296
user.max_pid_namespaces = 967296
user.max_user_namespaces = 967296
user.max_uts_namespaces = 967296
vm.admin_reserve_kbytes = 8192
vm.block_dump = 0
vm.compact_unevictable_allowed = 1
vm.dirty_background_bytes = 0
vm.dirty_background_ratio = 10
vm.dirty_bytes = 0
vm.dirty_expire_centisecs = 3000
vm.dirty_ratio = 20
vm.dirty_writeback_centisecs = 500
vm.dirtytime_expire_seconds = 43200
vm.disk_based_swap = 0
vm.drop_caches = 0
vm.extfrag_threshold = 500
vm.hugepages_treat_as_movable = 0
vm.hugetlb_shm_group = 0
vm.laptop_mode = 0
vm.legacy_va_layout = 0
vm.lowmem_reserve_ratio = 256	256	32
vm.max_map_count = 65530
vm.min_filelist_kbytes = 0
vm.min_free_kbytes = 67584
vm.min_slab_ratio = 5
vm.min_unmapped_ratio = 1
vm.mmap_min_addr = 65536
vm.mmap_noexec_taint = 1
vm.mmap_rnd_bits = 28
vm.mmap_rnd_compat_bits = 8
vm.nr_hugepages = 0
vm.nr_hugepages_mempolicy = 0
vm.nr_overcommit_hugepages = 0
vm.nr_pdflush_threads = 0
vm.numa_zonelist_order = Node
vm.oom_dump_tasks = 1
vm.oom_kill_allocating_task = 0
vm.overcommit_kbytes = 0
vm.overcommit_memory = 1
vm.overcommit_ratio = 50
vm.page-cluster = 3
vm.panic_on_oom = 0
vm.percpu_pagelist_fraction = 0
vm.stat_interval = 1
vm.swappiness = 60
vm.user_reserve_kbytes = 131072
vm.vfs_cache_pressure = 100
vm.watermark_scale_factor = 10
vm.zone_reclaim_mode = 0

#7

Interesting. I’m using “Google container-optimized OS” and that command isn’t available. Is that something I can install with apt-get or does it have another name? I don’t have it on my mac either so I can’t read about it


#8

maybe netstat -s is available?


#9

Definitely do a test where you explicitly set 12 CPUs and 24 schedulers in your vm.args.

In your pod spec do:

  limits:
    cpu: 12000m
  requests:
    cpu: 12000m

and in your vm.args +S 24.

One aside to check here if you’re on K8s is DNS lookups. Each outbound request from within K8s will do like a half dozen DNS attempts to see if there are any local services that use that domain name before actually heading to the outside world. Google may optimize better for this. You can get around this by prefixing the domains you hit with a . to indicate that they are fully qualified.

ALSO be sure to either configure the KubeDNS pods with some CPU requests / limits or ensure they’re on a different node, otherwise you can use so much CPU that KubeDNS can get throttled and stop responding properly.

In fact I’d consider trying to run a test outside of K8s just to eliminate this specific issue.


#10

Awesome suggestions! Let me check some of this stuff. So you’re saying on the domain thing that the domain I give to the HTTP lib is like this?

http://.example.com?


#11

Oddly enough that works on my mac (huge output) but when I run it on the server the command works but with different options:

netstat: unrecognized option: s
BusyBox v1.28.4 (2018-12-06 15:13:21 UTC) multi-call binary.

Usage: netstat [-ral] [-tuwx] [-enWp]

Display networking information

	-r	Routing table
	-a	All sockets
	-l	Listening sockets
		Else: connected sockets
	-t	TCP sockets
	-u	UDP sockets
	-w	Raw sockets
	-x	Unix sockets
		Else: all socket types
	-e	Other/more information
	-n	Don't resolve names
	-W	Wide display
	-p	Show PID/program name for sockets

#12

Yes that’s correct. It’ll still hit KubeDNS but it lowers the number of different domains that the KubeDNS service will try to resolve. All of my other comments about KubeDNS still apply though, this is just a handy optimization.


#13

Hey everyone I’m going through these replies slowly, sorry if I don’t respond to you right away.

I am 1000% grateful and thankful for all your guys and gals input/feedback. I’m processing all of it currently, and it’s taking me awhile to research each reply

In my pod YAML this is what was there:

resources:
  requests:
    cpu: 100m

But thats the only limit

This is interesting though. I’m reading here: https://cloud.google.com/blog/products/gcp/kubernetes-best-practices-resource-requests-and-limits

I originally wanted the LOWEST number of apps (because they all log independently and its going to create a database bottleneck if i have 100 things logging at once). If it HAS to come to that, so be it, but I’d like to avoid it.

What if I put it at 64000m? it seems to a bad idea according to this document because erlang may not know how to deal with that many CPU’s. does that sound accurate?


#14

looks like you found the reason! that is 0.1 cpu per pod… explains your test results.

read
https://kubernetes.io/docs/tasks/administer-cluster/manage-resources/memory-default-namespace/
and
https://kubernetes.io/docs/tasks/administer-cluster/manage-resources/cpu-default-namespace/


#15

Phoenix has been tested on boxes up to 128 cores and it scaled linearly, so 64 should be fine. I’d consider manually specifying the number of schedulers because I don’t know off the top of my head if erlang can tell what you’ve allocated for the pod or not. Usually you want twice as many schedulers as cores. So if you do 6400m CPUs then do +S 128 in your vm.args file.

The final main thing to note for the 1 erlang pod route is to figure out if any of the libraries you’re using have a pool of fixed size. You’ll want to make sure to increase its size.

@outlog requests and limits are different things. Doing 0.1 CPU requests doesn’t limit it to 0.1 CPU, but it does mean that it may be throttled up to that if other pods want CPU. The way to achieve the highest Quality Of Service guarantees in K8s is to specify BOTH an identical CPU limit and request value.


#16

Ok I actually tried running it on the instance itself (via SSH) rather than using bash IN the container:

Ip:
    Forwarding: 1
    416024035 total packets received
    2 with invalid addresses
    415262118 forwarded
    0 incoming packets discarded
    761912 incoming packets delivered
    416003450 requests sent out
    390 outgoing packets dropped
    1 dropped because of missing route
Icmp:
    22 ICMP messages received
    3 input ICMP message failed
    ICMP input histogram:
        destination unreachable: 10
        echo requests: 11
        echo replies: 1
    11 ICMP messages sent
    0 ICMP messages failed
    ICMP output histogram:
        echo replies: 11
IcmpMsg:
        InType0: 1
        InType3: 10
        InType8: 11
        OutType0: 11
Tcp:
    48909 active connection openings
    16003 passive connection openings
    41 failed connection attempts
    8270 connection resets received
    15 connections established
    761425 segments received
    840810 segments sent out
    225 segments retransmitted
    0 bad segments received
    2994 resets sent
Udp:
    467 packets received
    0 packets to unknown port received
    0 packet receive errors
    470 packets sent
    0 receive buffer errors
    0 send buffer errors
UdpLite:
TcpExt:
    33 resets received for embryonic SYN_RECV sockets
    2 packets pruned from receive queue because of socket buffer overrun
    9925 TCP sockets finished time wait in fast timer
    31097 delayed acks sent
    2 delayed acks further delayed because of locked socket
    Quick ack mode was activated 3 times
    272044 packet headers predicted
    160323 acknowledgments not containing data payload received
    101516 predicted acknowledgments
    TCPSackRecovery: 7
    Detected reordering 321 times using SACK
    1 congestion windows recovered without slow start after partial ack
    22 fast retransmits
    TCPTimeouts: 66
    TCPLossProbes: 147
    TCPDSACKOldSent: 3
    TCPDSACKRecv: 152
    9 connections reset due to unexpected data
    5 connections reset due to early user close
    TCPDSACKIgnoredNoUndo: 139
    TCPSackMerged: 7
    TCPSackShiftFallback: 302
    TCPRcvCoalesce: 70011
    TCPOFOQueue: 566
    TCPAutoCorking: 9
    TCPFromZeroWindowAdv: 1713
    TCPToZeroWindowAdv: 1713
    TCPWantZeroWindowAdv: 1078
    TCPSynRetrans: 56
    TCPOrigDataSent: 395019
    TCPHystartTrainDetect: 2
    TCPHystartTrainCwnd: 654
    TCPKeepAlive: 6156
IpExt:
    InMcastPkts: 6
    OutMcastPkts: 10
    InOctets: 359183513039
    OutOctets: 717144072828
    InMcastOctets: 570
    OutMcastOctets: 730
    InNoECTPkts: 479288616
    InECT0Pkts: 1

#17

Awesome thanks!! And btw I still havent read your other reply I know it was a really good reply (abou tthe schedulers and vm.args) so I’m saving it a bit to do some of these easier replies first.

This is such good info, thank you I am eternally grateful :pray:t3:


#18

Awesome! I will read those – but just to be clear would it be OK to put it at 64000m? or even “no limit” (can I delete the entire resource line)?

I was testing different instance sizes and I know it wont even schedule a pod into a node w/o enough resources. It would be great to move the limit entirely so its just 1 app per node regardless of size


#19

Well the goal right now is debugging, so let’s focus on that right now. Debugging is about making sure we create as few factors as possible. How many nodes are you running? To get the highest QOS within K8s you want to do equal requests and limits. I would probably do 3200 for both so that you can also allocate a CPU or two for KubeDNS. Then manually set +S 64.


#20

I really don’t know enough to read that output I’m afraid. In the container you could try netstat -atn | grep TIME_WAIT | wc -l. This assumes that the output of netstat -atn is similar to my local linux box and grep and wc are there too. If they’re not you could copy the output and grep it locally. The idea is just to see how many TCP connections are around and see if that corresponds in some way to the number where you start seeing problems.