Issue binding with internal network on UDP
Hi, we have a server pod and a datadog agent pod. The server pod is trying to connect to the datadog agent pod through its statsd. We know that the networking probably works because our tracer is alive and healthy. Internal networking is enabled.
Dockerfile.datadog
Notice two things:
1. DD_DOGSTATSD_NON_LOCAL_TRAFFIC= true
2. 8125/udp is exposed.
Dockerfile.server
Project ID: 900023fc-eb3b-4c5e-b8bf-83d6fcb5a72f
66 Replies
Project ID:
N/A
Note: We are correctly using the server Dockerfile.
The following code should technically work.
However, we get the following error:
Could there be some problem with the internal network resolution somewhere?
900023fc-eb3b-4c5e-b8bf-83d6fcb5a72f
for the first (approximately) two seconds when your app starts, private DNS resolution isn't available, within those ~2 seconds, if you make a DNS request for a internal domain the request won't resolve
Whowh for every call it takes 2 seconds?
wow that was a terribly worded sentence, let me fix that
But also, I think that would be irrelevant to the error that we're seeing
@theodor_m fyi
there we go, now thats what I meant to say, and now its relevant
I'm doing it several minutes after deployment though
you don't connect to the dd agent upon starting the app?
I do, but we have it so that it doesn't kill the server. So I can try checking the emission after which still fails
can you do a dns.lookup for datadog.railway.internal
after 2 seconds lol
Uhh do you just mean like a curl?
dns.lookup from node
ok one sec
2023-08-10 02:35:27 [server] info: EXECUTING DNS LOOKUP ON DD_AGENT_HOST
2023-08-10 02:35:27 [server] info: address: "fd12:4fde:c93f::3b:bd51:5175" family: IPv6
Code:
lgtm
fyi code in hot-shots for initializing the udp connection
hot-shots/lib/transport.js
Called by
Called by
Called by
Where this is initialized as the following
interesting
sendUsingDnsCache
seems to be basically doing the same thing, so I'm pretty confusedwhat was the result of this?
2023-08-10 02:35:27 [server] info: EXECUTING DNS LOOKUP ON DD_AGENT_HOST
2023-08-10 02:35:27 [server] info: address: "fd12:4fde:c93f::3b:bd51:5175" family: IPv6
hmm
thats what we wanna see
I wonder if the metrics instance is just being initialized too early
because it's used as a global variable
so then the configuration somehow doesn't rerun on failure
is it even set to make an AAAA lookup?
the internal network is ipv6 only
not sure - how might I check that
i am also not sure, look into its code?
"Jul 29, 2021 — Datadog supports the commonly used A, AAAA, CNAME, MX, and TXT record types and allows you to check records against either an external DNS ..." from Google search?
Wait wrong source one sec
but does the
new StatsD
constructor support looking up AAAA?hmm how might I find that in the code?
also why would it matter?
if its making a dns request to lookup an A type, that will fail, there is no A type for the internal addresses
ipv6 only
Hmm I'm not sure if there's any code in createUdpTransport that's trying to do lookup A type
doesnt it have to do a lookup to connect to the internal domain?
Hmmm isn't
sendUsingDnsCache
where that's happening though?ill be honest with you, i have no clue what any of this code does, im just working through some simple debug stuff lol
statsd.js:358:32 wat going on there
It's just this
not a very usefull stack trace
nope
well we know dns resolution does work, you just need to dig into the StatsD code and find out whats going on, perhaps its requesting the wrong dns server (8.8.8.8 instead of fd12::10)
I think this code means that it is trying to send to at least the host
2023-08-09 18:17:15 [server] error: Error: Error sending hot-shots message: Error: getaddrinfo ENOTFOUND datadog.railway.internal
Error: Error sending hot-shots message: Error: getaddrinfo ENOTFOUND datadog.railway.internal
QQ: Potential discovery of the bug: what happens if I just do socket.send(buf, 0, buf.length, args.port, "datadog.railway.internal", callback);
ENOTFOUND is a dns lookup error
It seems like socket.send needs to have the resolved address right?
i assume so, but there is code somewhere that resolves it before socket.send, or socket.send will resolve domains itself
socket.send will resolve domains itself
I guess I want to double-check if this is trueyou do, since i dont actually know what any of this code does
void delay(5000).then(() => {
dns.lookup(process.env.DD_AGENT_HOST!, (err, address) => {
const statsdClient = new StatsD({
host: address,
port: 8125,
protocol: 'udp',
});
statsdClient.gauge('test_metric_cami', 124);
});
});
Trying this
all i know is that ENOTFOUND is a dns resolution error
haha thats a good idea, as long as host can accept a pre resolved host
yes
I think
this would be Go's equivalent of ENOTFOUND
https://utilities.up.railway.app/dns-lookup?value=hello-world.railway.internal&type=ip&dns=8.8.8.8
(type=ip looks up both A and AAAA concurrently)
Now getting
with that code
I guess it doesn't work?
doesnt look like it supports providing it a host ip
can you pass it a custom resolver though?
internal domains are ipv6 only
ok sick
and i really do mean only
🙏 hope this works
railnet0 only has ipv6 ips
omg it works
sweet
we both learned something
Unfortunately the metric isn't actually being registered but at least that's a different issue
will take a look
okay let me know if you think i could help further 🙂
Just took a while. It works fine 🙏
awsome, glad to hear it