Another round of “hey, your server is down!” drama from the "we need moar kubernetes!" crowd.
“I can’t reach your server, it must be down.”
I connect. Everything’s fine.
A few emails later, I ask to access the container. The dev says he can’t - doesn’t know how. He’s a nice guy, though, so he gives me the credentials.
I log in and find the issue: someone pushed a workload to production (cue Kubernetes! Moooaaarrr powaaaarrr! We have the cloud! Who needs sysadmins anymore?!) with DNS set to 192.168.1.1.
Of course, it fell to me to investigate, because the dev couldn’t even get a shell inside his container. And it's ok, as he's a dev - and just wants to be a dev.
Once I pointed it out, they rebuilt the container with the correct config and - TADA! - everything worked again.
Then he went to check other workloads (for other clients, not managed by me) that had been having issues for weeks... Same problem.
It was DNS.
But it wasn't DNS.