R
Railway15mo ago
johns

Issue binding with internal network on UDP

Hi, we have a server pod and a datadog agent pod. The server pod is trying to connect to the datadog agent pod through its statsd. We know that the networking probably works because our tracer is alive and healthy. Internal networking is enabled. Dockerfile.datadog
# Start from the official Datadog agent image
FROM datadog/agent:latest

# Copy your Datadog configuration to the correct location
COPY datadog/datadog.yaml /etc/datadog-agent/datadog.yaml


# Set the hostname and port
ENV DD_HOSTNAME="datadog.railway.internal" \
DD_LOGS_ENABLED=true \
DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL=true \
DD_DOGSTATSD_NON_LOCAL_TRAFFIC=true \
DD_BIND_HOST=::

# Print all variables
RUN echo "DD_HOSTNAME=$DD_HOSTNAME"
RUN echo "DD_LOGS_ENABLED=$DD_LOGS_ENABLED"
RUN echo "DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL=$DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL"
RUN echo "DD_BIND_HOST=$DD_BIND_HOST"
RUN echo "DD_DOGSTATSD_NON_LOCAL_TRAFFIC=$DD_DOGSTATSD_NON_LOCAL_TRAFFIC"

# expose the port for dogstatsd explicitly
EXPOSE 8125/udp

# Start the Datadog agent
CMD ["/init"]
# Start from the official Datadog agent image
FROM datadog/agent:latest

# Copy your Datadog configuration to the correct location
COPY datadog/datadog.yaml /etc/datadog-agent/datadog.yaml


# Set the hostname and port
ENV DD_HOSTNAME="datadog.railway.internal" \
DD_LOGS_ENABLED=true \
DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL=true \
DD_DOGSTATSD_NON_LOCAL_TRAFFIC=true \
DD_BIND_HOST=::

# Print all variables
RUN echo "DD_HOSTNAME=$DD_HOSTNAME"
RUN echo "DD_LOGS_ENABLED=$DD_LOGS_ENABLED"
RUN echo "DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL=$DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL"
RUN echo "DD_BIND_HOST=$DD_BIND_HOST"
RUN echo "DD_DOGSTATSD_NON_LOCAL_TRAFFIC=$DD_DOGSTATSD_NON_LOCAL_TRAFFIC"

# expose the port for dogstatsd explicitly
EXPOSE 8125/udp

# Start the Datadog agent
CMD ["/init"]
Notice two things: 1. DD_DOGSTATSD_NON_LOCAL_TRAFFIC= true 2. 8125/udp is exposed. Dockerfile.server
# Start from the official Node.js 18 image
FROM node:18

# Define environment variables
ENV DD_ENV="prod" \
DD_LOGS_INJECTION=true \
DD_SERVICE="cami-server" \
DD_AGENT_HOST="datadog.railway.internal"

RUN echo $DD_ENV
RUN echo $DD_LOGS_INJECTION
RUN echo $DD_SERVICE
RUN echo $DD_AGENT_HOST

# Create app directory
WORKDIR /usr/src/app

# Copy package.json and package-lock.json
COPY package*.json ./

# Install app dependencies
RUN yarn install --frozen-lockfile

# Copy app source code
COPY . .

# Build the app
RUN yarn run build

# Start the app
CMD [ "yarn", "run", "start" ]
# Start from the official Node.js 18 image
FROM node:18

# Define environment variables
ENV DD_ENV="prod" \
DD_LOGS_INJECTION=true \
DD_SERVICE="cami-server" \
DD_AGENT_HOST="datadog.railway.internal"

RUN echo $DD_ENV
RUN echo $DD_LOGS_INJECTION
RUN echo $DD_SERVICE
RUN echo $DD_AGENT_HOST

# Create app directory
WORKDIR /usr/src/app

# Copy package.json and package-lock.json
COPY package*.json ./

# Install app dependencies
RUN yarn install --frozen-lockfile

# Copy app source code
COPY . .

# Build the app
RUN yarn run build

# Start the app
CMD [ "yarn", "run", "start" ]
Project ID: 900023fc-eb3b-4c5e-b8bf-83d6fcb5a72f
66 Replies
Percy
Percy15mo ago
Project ID: N/A
johns
johns15mo ago
Note: We are correctly using the server Dockerfile. The following code should technically work.
import StatsD from 'hot-shots';

const statsdClient = new StatsD({
host: process.env.DD_AGENT_HOST,
port: 8125,
protocol: 'udp',
});

export default statsdClient;
import StatsD from 'hot-shots';

const statsdClient = new StatsD({
host: process.env.DD_AGENT_HOST,
port: 8125,
protocol: 'udp',
});

export default statsdClient;
However, we get the following error:
2023-08-09 18:17:15 [server] error: Error: Error sending hot-shots message: Error: getaddrinfo ENOTFOUND datadog.railway.internal
Error: Error sending hot-shots message: Error: getaddrinfo ENOTFOUND datadog.railway.internal
33m
2023-08-09 18:17:15 [server] error: Error: Error sending hot-shots message: Error: getaddrinfo ENOTFOUND datadog.railway.internal
Error: Error sending hot-shots message: Error: getaddrinfo ENOTFOUND datadog.railway.internal
33m
Could there be some problem with the internal network resolution somewhere? 900023fc-eb3b-4c5e-b8bf-83d6fcb5a72f
Brody
Brody15mo ago
for the first (approximately) two seconds when your app starts, private DNS resolution isn't available, within those ~2 seconds, if you make a DNS request for a internal domain the request won't resolve
johns
johns15mo ago
Whowh for every call it takes 2 seconds?
Brody
Brody15mo ago
wow that was a terribly worded sentence, let me fix that
johns
johns15mo ago
But also, I think that would be irrelevant to the error that we're seeing @theodor_m fyi
Brody
Brody15mo ago
there we go, now thats what I meant to say, and now its relevant
johns
johns15mo ago
I'm doing it several minutes after deployment though
Brody
Brody15mo ago
you don't connect to the dd agent upon starting the app?
johns
johns15mo ago
I do, but we have it so that it doesn't kill the server. So I can try checking the emission after which still fails
Brody
Brody15mo ago
can you do a dns.lookup for datadog.railway.internal after 2 seconds lol
johns
johns15mo ago
Uhh do you just mean like a curl?
Brody
Brody15mo ago
dns.lookup from node
johns
johns15mo ago
ok one sec 2023-08-10 02:35:27 [server] info: EXECUTING DNS LOOKUP ON DD_AGENT_HOST 2023-08-10 02:35:27 [server] info: address: "fd12:4fde:c93f::3b:bd51:5175" family: IPv6 Code:
void delay(5000).then(() => {
logger.info('EXECUTING DNS LOOKUP ON DD_AGENT_HOST');
dns.lookup(process.env.DD_AGENT_HOST!, (err, address, family) => {
logger.info('address: %j family: IPv%s', address, family);
});
});
void delay(5000).then(() => {
logger.info('EXECUTING DNS LOOKUP ON DD_AGENT_HOST');
dns.lookup(process.env.DD_AGENT_HOST!, (err, address, family) => {
logger.info('address: %j family: IPv%s', address, family);
});
});
Brody
Brody15mo ago
lgtm
johns
johns15mo ago
fyi code in hot-shots for initializing the udp connection hot-shots/lib/transport.js
const createUdpTransport = args => {
const socket = dgram.createSocket(args.udpSocketOptions);
// do not block node from shutting down
socket.unref();

const dnsResolutionData = {
timestamp: new Date(0),
resolvedAddress: undefined
};

const sendUsingDnsCache = (callback, buf) => {
const now = Date.now();
if (dnsResolutionData.resolvedAddress === undefined || (now - dnsResolutionData.timestamp > args.cacheDnsTtl)) {
dns.lookup(args.host, (error, address) => {
if (error) {
callback(error);
return;
}
dnsResolutionData.resolvedAddress = address;
dnsResolutionData.timestamp = now;
socket.send(buf, 0, buf.length, args.port, dnsResolutionData.resolvedAddress, callback);
});
} else {
socket.send(buf, 0, buf.length, args.port, dnsResolutionData.resolvedAddress, callback);
}
};

return {
emit: socket.emit.bind(socket),
on: socket.on.bind(socket),
removeListener: socket.removeListener.bind(socket),
send: function (buf, callback) {
if (args.cacheDns) {
sendUsingDnsCache(callback, buf);
} else {
socket.send(buf, 0, buf.length, args.port, args.host, callback);
}
},
close: socket.close.bind(socket),
unref: socket.unref.bind(socket)
};
};
const createUdpTransport = args => {
const socket = dgram.createSocket(args.udpSocketOptions);
// do not block node from shutting down
socket.unref();

const dnsResolutionData = {
timestamp: new Date(0),
resolvedAddress: undefined
};

const sendUsingDnsCache = (callback, buf) => {
const now = Date.now();
if (dnsResolutionData.resolvedAddress === undefined || (now - dnsResolutionData.timestamp > args.cacheDnsTtl)) {
dns.lookup(args.host, (error, address) => {
if (error) {
callback(error);
return;
}
dnsResolutionData.resolvedAddress = address;
dnsResolutionData.timestamp = now;
socket.send(buf, 0, buf.length, args.port, dnsResolutionData.resolvedAddress, callback);
});
} else {
socket.send(buf, 0, buf.length, args.port, dnsResolutionData.resolvedAddress, callback);
}
};

return {
emit: socket.emit.bind(socket),
on: socket.on.bind(socket),
removeListener: socket.removeListener.bind(socket),
send: function (buf, callback) {
if (args.cacheDns) {
sendUsingDnsCache(callback, buf);
} else {
socket.send(buf, 0, buf.length, args.port, args.host, callback);
}
},
close: socket.close.bind(socket),
unref: socket.unref.bind(socket)
};
};
Called by
module.exports = (instance, args) => {
let transport = null;
const protocol = args.protocol || PROTOCOL.UDP;

try {
if (protocol === PROTOCOL.TCP) {
transport = createTcpTransport(args);
} else if (protocol === PROTOCOL.UDS) {
transport = createUdsTransport(args);
} else if (protocol === PROTOCOL.UDP) {
transport = createUdpTransport(args);
} else if (protocol === PROTOCOL.STREAM) {
transport = createStreamTransport(args);
} else {
throw new Error(`Unsupported protocol '${protocol}'`);
}
transport.type = protocol;
transport.createdAt = Date.now();
} catch (e) {
if (instance.errorHandler) {
instance.errorHandler(e);
} else {
console.error(e);
}
}

return transport;
};
module.exports = (instance, args) => {
let transport = null;
const protocol = args.protocol || PROTOCOL.UDP;

try {
if (protocol === PROTOCOL.TCP) {
transport = createTcpTransport(args);
} else if (protocol === PROTOCOL.UDS) {
transport = createUdsTransport(args);
} else if (protocol === PROTOCOL.UDP) {
transport = createUdpTransport(args);
} else if (protocol === PROTOCOL.STREAM) {
transport = createStreamTransport(args);
} else {
throw new Error(`Unsupported protocol '${protocol}'`);
}
transport.type = protocol;
transport.createdAt = Date.now();
} catch (e) {
if (instance.errorHandler) {
instance.errorHandler(e);
} else {
console.error(e);
}
}

return transport;
};
Called by
function trySetNewSocket(client) {
client.socket = createTransport(client, {
host: client.host,
cacheDns: client.cacheDns,
cacheDnsTtl: client.cacheDnsTtl,
path: client.path,
port: client.port,
protocol: client.protocol,
stream: client.stream,
udpSocketOptions: client.udpSocketOptions,
});
}
function trySetNewSocket(client) {
client.socket = createTransport(client, {
host: client.host,
cacheDns: client.cacheDns,
cacheDnsTtl: client.cacheDnsTtl,
path: client.path,
port: client.port,
protocol: client.protocol,
stream: client.stream,
udpSocketOptions: client.udpSocketOptions,
});
}
Called by
if (!this.socket) {
trySetNewSocket(this);
}
if (!this.socket) {
trySetNewSocket(this);
}
Where this is initialized as the following
this.protocol = (options.protocol && options.protocol.toLowerCase());
if (! this.protocol) {
this.protocol = PROTOCOL.UDP;
}
this.cacheDns = options.cacheDns === true;
this.cacheDnsTtl = options.cacheDnsTtl || CACHE_DNS_TTL_DEFAULT;
this.host = options.host || process.env.DD_AGENT_HOST;
this.port = options.port || parseInt(process.env.DD_DOGSTATSD_PORT, 10) || 8125;
this.protocol = (options.protocol && options.protocol.toLowerCase());
if (! this.protocol) {
this.protocol = PROTOCOL.UDP;
}
this.cacheDns = options.cacheDns === true;
this.cacheDnsTtl = options.cacheDnsTtl || CACHE_DNS_TTL_DEFAULT;
this.host = options.host || process.env.DD_AGENT_HOST;
this.port = options.port || parseInt(process.env.DD_DOGSTATSD_PORT, 10) || 8125;
Brody
Brody15mo ago
interesting
johns
johns15mo ago
sendUsingDnsCache seems to be basically doing the same thing, so I'm pretty confused
Brody
Brody15mo ago
what was the result of this?
johns
johns15mo ago
2023-08-10 02:35:27 [server] info: EXECUTING DNS LOOKUP ON DD_AGENT_HOST 2023-08-10 02:35:27 [server] info: address: "fd12:4fde:c93f::3b:bd51:5175" family: IPv6 hmm
Brody
Brody15mo ago
thats what we wanna see
johns
johns15mo ago
I wonder if the metrics instance is just being initialized too early because it's used as a global variable so then the configuration somehow doesn't rerun on failure
Brody
Brody15mo ago
is it even set to make an AAAA lookup? the internal network is ipv6 only
johns
johns15mo ago
not sure - how might I check that
Brody
Brody15mo ago
i am also not sure, look into its code?
johns
johns15mo ago
"Jul 29, 2021 — Datadog supports the commonly used A, AAAA, CNAME, MX, and TXT record types and allows you to check records against either an external DNS ..." from Google search? Wait wrong source one sec
Brody
Brody15mo ago
but does the new StatsD constructor support looking up AAAA?
johns
johns15mo ago
hmm how might I find that in the code? also why would it matter?
Brody
Brody15mo ago
if its making a dns request to lookup an A type, that will fail, there is no A type for the internal addresses ipv6 only
johns
johns15mo ago
Hmm I'm not sure if there's any code in createUdpTransport that's trying to do lookup A type
Brody
Brody15mo ago
doesnt it have to do a lookup to connect to the internal domain?
johns
johns15mo ago
Hmmm isn't sendUsingDnsCache where that's happening though?
Brody
Brody15mo ago
ill be honest with you, i have no clue what any of this code does, im just working through some simple debug stuff lol
johns
johns15mo ago
2023-08-10 02:49:20 [server] error: Error: Error sending hot-shots message: Error: getaddrinfo ENOTFOUND datadog.railway.internal
at handleCallback (/usr/src/app/node_modules/hot-shots/lib/statsd.js:358:32)
at process.processTicksAndRejections (node:internal/process/task_queues:81:21)
2023-08-10 02:49:20 [server] error: Error: Error sending hot-shots message: Error: getaddrinfo ENOTFOUND datadog.railway.internal
at handleCallback (/usr/src/app/node_modules/hot-shots/lib/statsd.js:358:32)
at process.processTicksAndRejections (node:internal/process/task_queues:81:21)
I realize this is where exactly the issue is happening. one sec nvm, not very helpful because it's just the cb fn
Brody
Brody15mo ago
statsd.js:358:32 wat going on there
johns
johns15mo ago
const handleCallback = (err, bytes) => {
this.messagesInFlight--;
const errFormatted = err ? new Error(`Error sending hot-shots message: ${err}`) : null;
if (errFormatted) {
errFormatted.code = err.code;
// handle TCP/UDS error that requires socket replacement when we are not
// emitting the `error` event on `this.socket`
if ((this.protocol === PROTOCOL.TCP || this.protocol === PROTOCOL.UDS) && (callback || this.errorHandler)) {
protocolErrorHandler(this, this.protocol, err);
}
}
if (callback) {
callback(errFormatted, bytes);
} else if (errFormatted) {
if (this.errorHandler) {
this.errorHandler(errFormatted);
} else {
console.error(String(errFormatted));
// emit error ourselves on the socket for backwards compatibility
this.socket.emit('error', errFormatted);
}
}
};
const handleCallback = (err, bytes) => {
this.messagesInFlight--;
const errFormatted = err ? new Error(`Error sending hot-shots message: ${err}`) : null;
if (errFormatted) {
errFormatted.code = err.code;
// handle TCP/UDS error that requires socket replacement when we are not
// emitting the `error` event on `this.socket`
if ((this.protocol === PROTOCOL.TCP || this.protocol === PROTOCOL.UDS) && (callback || this.errorHandler)) {
protocolErrorHandler(this, this.protocol, err);
}
}
if (callback) {
callback(errFormatted, bytes);
} else if (errFormatted) {
if (this.errorHandler) {
this.errorHandler(errFormatted);
} else {
console.error(String(errFormatted));
// emit error ourselves on the socket for backwards compatibility
this.socket.emit('error', errFormatted);
}
}
};
It's just this
Brody
Brody15mo ago
not a very usefull stack trace
johns
johns15mo ago
nope
Brody
Brody15mo ago
well we know dns resolution does work, you just need to dig into the StatsD code and find out whats going on, perhaps its requesting the wrong dns server (8.8.8.8 instead of fd12::10)
johns
johns15mo ago
I think this code means that it is trying to send to at least the host 2023-08-09 18:17:15 [server] error: Error: Error sending hot-shots message: Error: getaddrinfo ENOTFOUND datadog.railway.internal Error: Error sending hot-shots message: Error: getaddrinfo ENOTFOUND datadog.railway.internal QQ: Potential discovery of the bug: what happens if I just do socket.send(buf, 0, buf.length, args.port, "datadog.railway.internal", callback);
Brody
Brody15mo ago
ENOTFOUND is a dns lookup error
johns
johns15mo ago
It seems like socket.send needs to have the resolved address right?
Brody
Brody15mo ago
i assume so, but there is code somewhere that resolves it before socket.send, or socket.send will resolve domains itself
johns
johns15mo ago
socket.send will resolve domains itself I guess I want to double-check if this is true
Brody
Brody15mo ago
you do, since i dont actually know what any of this code does
johns
johns15mo ago
void delay(5000).then(() => { dns.lookup(process.env.DD_AGENT_HOST!, (err, address) => { const statsdClient = new StatsD({ host: address, port: 8125, protocol: 'udp', }); statsdClient.gauge('test_metric_cami', 124); }); }); Trying this
Brody
Brody15mo ago
all i know is that ENOTFOUND is a dns resolution error haha thats a good idea, as long as host can accept a pre resolved host
johns
johns15mo ago
yes I think
Brody
Brody15mo ago
this would be Go's equivalent of ENOTFOUND https://utilities.up.railway.app/dns-lookup?value=hello-world.railway.internal&type=ip&dns=8.8.8.8 (type=ip looks up both A and AAAA concurrently)
johns
johns15mo ago
Now getting
2023-08-10 03:06:20 [server] error: Error: Error sending hot-shots message: Error: send EINVAL fd12:4fde:c93f::be:71fc:8d8c:8125
at handleCallback (/usr/src/app/node_modules/hot-shots/lib/statsd.js:358:32)
at process.processTicksAndRejections (node:internal/process/task_queues:81:21)
2023-08-10 03:06:20 [server] error: Error: Error sending hot-shots message: Error: send EINVAL fd12:4fde:c93f::be:71fc:8d8c:8125
at handleCallback (/usr/src/app/node_modules/hot-shots/lib/statsd.js:358:32)
at process.processTicksAndRejections (node:internal/process/task_queues:81:21)
with that code I guess it doesn't work?
Brody
Brody15mo ago
doesnt look like it supports providing it a host ip can you pass it a custom resolver though?
johns
johns15mo ago
unfortunately I don't think so I can pass a lookup function for udpsocket hm Is there a diff between udp4 and udp6? In resolution maybe this is it because railway is ipv6 only?
Brody
Brody15mo ago
internal domains are ipv6 only
johns
johns15mo ago
ok sick
Brody
Brody15mo ago
and i really do mean only
johns
johns15mo ago
🙏 hope this works
Brody
Brody15mo ago
Brody
Brody15mo ago
railnet0 only has ipv6 ips
johns
johns15mo ago
omg it works
Brody
Brody15mo ago
sweet
johns
johns15mo ago
const statsdClient = new StatsD({
host: process.env.DD_AGENT_HOST,
port: 8125,
protocol: 'udp',
cacheDns: true,
udpSocketOptions: {
type: 'udp6',
reuseAddr: true,
ipv6Only: true,
},
});
const statsdClient = new StatsD({
host: process.env.DD_AGENT_HOST,
port: 8125,
protocol: 'udp',
cacheDns: true,
udpSocketOptions: {
type: 'udp6',
reuseAddr: true,
ipv6Only: true,
},
});
Brody
Brody15mo ago
we both learned something
johns
johns15mo ago
Unfortunately the metric isn't actually being registered but at least that's a different issue will take a look
Brody
Brody15mo ago
okay let me know if you think i could help further 🙂
johns
johns15mo ago
Just took a while. It works fine 🙏
Brody
Brody15mo ago
awsome, glad to hear it
Want results from more Discord servers?
Add your server