SCTP problem, how to debug?
Bernd Walter
ticso at cicely7.cicely.de
Fri Jul 17 17:51:18 UTC 2020
On Fri, Jul 17, 2020 at 07:27:00PM +0200, Michael Tuexen wrote:
>
>
> > On 17. Jul 2020, at 18:07, Bernd Walter <ticso at cicely7.cicely.de> wrote:
> >
> > I'm running an LED matrix with SCTP.
> > The matrix consists from 24 raspberry pi running NFS-root FreeBSD
> > 12.0-RELEASE (they have an SD card for u-boot and loader).
> > A client system is running FreeBSD 12.1-RELEASE.
> I fixed iterator related bugs and this was MFCed to stable/12 recently.
> The problem was that the iterator stopped sending.
The client side should be easy to update.
The modules can be a bit more tricky, but I will try.
> >
> > The matrix modules have a one to many service socket.
> > The daemon regularily sends status informations (temperature, etc.) to
> > each association and has a second thread to receive.
> >
> > The client system has two deamons running.
> >
> > One daemon is to control power output related to temperature states.
> > It has one thread reestablishing associations via sctp_connectx() to
> > each of the 24 modules using a single one to many socket.
> > Another thread collects all regular received data and updates stored
> > values.
> > Yet another thread sends power control data via SCTP_SENDALL, so that all modules know
> > the maxed allowed brightness rating.
> >
> > The other daemon uses the same threads to reconnect and receive.
> > It connects to the very same sockets on the modules.
> > Another thread updates picture data and wanted power rating.
> > That is sending picture data to the given matrix module and then
> > SCTP_SENDALL an update trigger to all modules.
> > That is reduced brightness at night times, ...
> >
> > All SCTP_SENDALL are just trigger with 0 length and different ppid values.
> Are you really sending messages of length 0? That shouldn't work...
No - I was wrong.
Just checked and I do send a dummy byte:
void
apa_push_leds()
{
// send dummy content, because we wouldn't send anything without
char message = 0;
send_message(&message, 1, 0, 0x00000002, SCTP_SENDALL);
}
send_message(const void* data, size_t len, uint32_t stream, uint32_t ppid, uint32_t flags, const String& dest = "")
{
Mutex::Guard mtx(sctp_mtx);
ssize_t remain = len;
if (!dest.empty()) {
struct addrinfo ai;
struct addrinfo *lips;
bzero(&ai, sizeof(ai));
ai.ai_flags = AI_ADDRCONFIG | AI_NUMERICSERV;
ai.ai_family = AF_INET6;
ai.ai_protocol = IPPROTO_SCTP;
ai.ai_socktype = SOCK_SEQPACKET;
int res = 0;
String sport = port;
String addr = dest;
res = getaddrinfo(addr.c_str(), sport.c_str(), &ai, &lips);
if (res != 0) {
throw Error("failed to resolve local ips");
}
struct addrinfo *lip;
for (lip = lips; lip && remain > 0; lip = lip->ai_next) {
while (remain > 0) {
ssize_t res;
res = sctp_sendmsgx(sctp_socket, ((uint8_t*)data) + (len - remain), remain, lip->ai_addr, 1, ppid, flags | SCTP_EOR, stream, 0, 0);
if (res > 0) {
remain -= res;
} else {
if (errno != EAGAIN && errno != ENOBUFS) {
return;
} else {
int res;
do {
struct pollfd pfd;
pfd.fd = sctp_socket;
pfd.events = POLLOUT;
pfd.revents = 0;
res = poll(&pfd, 1, 5000);
} while (res == 0);
}
}
}
}
freeaddrinfo(lips);
} else {
while (remain > 0) {
ssize_t res;
res = sctp_sendmsg(sctp_socket, ((uint8_t*)data) + (len - remain), remain, NULL, 0, ppid, flags | SCTP_EOR, stream, 0, 0);
if (res > 0) {
remain -= res;
} else {
if (errno != EAGAIN && errno != ENOBUFS) {
return;
} else {
int res;
do {
struct pollfd pfd;
pfd.fd = sctp_socket;
pfd.events = POLLOUT;
pfd.revents = 0;
res = poll(&pfd, 1, 5000);
} while (res == 0);
}
}
}
}
}
> >
> > From time to time (1-5 days) I notice that a module won't get updates
> > anymore.
> > I see that the association got a SCTP_SENDER_DRY_EVENT event.
> > Therefor my expectation is that there is nothing to send.
> > I still see the association in the socket list and also receive the regular
> > temperature data.
> > However, obviously sending won't happen.
> > The other modules still get data.
> >
> > When I restart the client daemon, things start to work again.
> >
> > Currently I'm clueless on how to debug this problem any fsurther.
> Can you try stable/12?
> > Doing a full network log would be too big and I'm not very experienced
> > to understand the SCTP packets.
> > I have no idea to see what data is in the send buffer.
> > netstat with TCP would show send and receive queue, not so with SCTP.
> > Data is send with a single thread, which sctp_sendmsgx() the data for
> > all modules sequencially into the same socket.
> I'm not sure I understood what you are actually doing on which socket
> and how many associations are involved.
Each of the 24 modules has a single socket with two associations from the
client host.
The client host has two daemon, which has a socket each and both sockets
have an association to each of the 24 modules.
This is the client host:
Proto Type Local Address Foreign Address (state)
sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:fe66:62de.1000 ESTABLISHED
sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:fecf:7cb7.1000 ESTABLISHED
sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:fe4c:b9c9.1000 ESTABLISHED
sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:fee6:41f6.1000 ESTABLISHED
sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:fec4:6a45.1000 ESTABLISHED
sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:fe93:5ab4.1000 ESTABLISHED
sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:fec6:aaea.1000 ESTABLISHED
sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:feef:ba3.1000 ESTABLISHED
sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:fe87:b229.1000 ESTABLISHED
sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:fe82:9ece.1000 ESTABLISHED
sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:fe96:bf2a.1000 ESTABLISHED
sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:fe10:195b.1000 ESTABLISHED
sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:fe46:cb7.1000 ESTABLISHED
sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:feb5:65dc.1000 ESTABLISHED
sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:fec5:30dd.1000 ESTABLISHED
sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:fe52:54bc.1000 ESTABLISHED
sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:fe8a:2fcb.1000 ESTABLISHED
sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:fec6:5d6e.1000 ESTABLISHED
sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:fe03:c920.1000 ESTABLISHED
sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:fecb:66a3.1000 ESTABLISHED
sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:fe9c:9e54.1000 ESTABLISHED
sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:fe7c:5702.1000 ESTABLISHED
sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:fef2:186c.1000 ESTABLISHED
sctp46 1toN 2003:df:b017:115::100.10405 2003:df:b017:115:ba27:ebff:fe4d:3de5.1000 ESTABLISHED
sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:fecb:66a3.1000 ESTABLISHED
sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:fe66:62de.1000 ESTABLISHED
sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:fecf:7cb7.1000 ESTABLISHED
sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:fe4c:b9c9.1000 ESTABLISHED
sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:fee6:41f6.1000 ESTABLISHED
sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:fec4:6a45.1000 ESTABLISHED
sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:fe93:5ab4.1000 ESTABLISHED
sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:fec6:aaea.1000 ESTABLISHED
sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:feef:ba3.1000 ESTABLISHED
sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:fe87:b229.1000 ESTABLISHED
sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:fe82:9ece.1000 ESTABLISHED
sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:fe96:bf2a.1000 ESTABLISHED
sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:fe10:195b.1000 ESTABLISHED
sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:fe46:cb7.1000 ESTABLISHED
sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:feb5:65dc.1000 ESTABLISHED
sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:fec5:30dd.1000 ESTABLISHED
sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:fe52:54bc.1000 ESTABLISHED
sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:fe8a:2fcb.1000 ESTABLISHED
sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:fec6:5d6e.1000 ESTABLISHED
sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:fe03:c920.1000 ESTABLISHED
sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:fe9c:9e54.1000 ESTABLISHED
sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:fe7c:5702.1000 ESTABLISHED
sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:fef2:186c.1000 ESTABLISHED
sctp46 1toN 2003:df:b017:115::100.38953 2003:df:b017:115:ba27:ebff:fe4d:3de5.1000 ESTABLISHED
This is one of the 24 modules:
Proto Type Local Address Foreign Address (state)
sctp46 1toN 127.0.0.1.1000 LISTEN
fe80::1%lo0.1000
::1.1000
2003:df:b017:115:ba27:ebff:fe87:b229.1000
fe80::ba27:ebff:fe87:b229%ue0.1000
10.215.74.118.1000
sctp46 1toN 2003:df:b017:115:ba27:ebff:fe87:b229.1000 2003:df:b017:115::100.10405 ESTABLISHED
sctp46 1toN 2003:df:b017:115:ba27:ebff:fe87:b229.1000 2003:df:b017:115::100.38953 ESTABLISHED
> > I havn't checked yet if I get an error with the write to the specific
> > module IP.
--
B.Walter <bernd at bwct.de> http://www.bwct.de
Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm.
More information about the freebsd-net
mailing list