Re: Socket pair used for timer addition
Frank Quinn <f.quinn@...>
Just realized we never informed the list but this was merged into next last week as well:
Wrt the EAGAIN change, that sounds like a separate issue which just happens to be in the same area to me so I think a separate patch / pull request / github ticket would be appropriate here.
Also for future reference (we’ll do it this time via issue #25), could you also include avis fixes for code which is currently (and regrettably) duplicated such as the middleware bridge timer code?
From: openmama-dev-bounces@... [mailto:openmama-dev-bounces@...] On Behalf Of Duane Pauls
Sent: 04 September 2015 15:17
Cc: David McKay <David.McKay@...>
Subject: Re: [Openmama-dev] Socket pair used for timer addition
One other note related to this: While debugging, I read 4 bytes from the socket (so that the number of bytes available dropped from 423 to 419), then continued the program. This cleared the busy loop/deadlock condition and the unit test passed. So it supports the theory that the socket was full and couldn’t be written.
This occurred on CentOS 6.5.
From: Duane Pauls
I’ve been running the timer unit tests repeatedly for over a week with the patch submitted in bug 220:
MamaTimerTestCPP.ForTimer recently entered a busy loop + deadlock. After debugging the problem, I don’t think the problem I’m seeing is specific to the patched OpenMAMA. I’d like to open a quick discussion to see if my proposed fix sounds reasonable.
I saw that the socket write in _addTimer (a new function in the patch, but the same logic as the wwrite in the original OpenMAMA) was failing due to ‘EAGAIN’, so _addTimer entered a busy loop continually trying to write to the socket that was full. The reader is unable to read from the socket since the lock is being held for the duration of the write, and the reader tries to take the lock before reading. By calling ioctl(heapImpl->mSockPair, FIONREAD, buf), I determined there were 423 bytes available for reading. It seems this is the maximum number of records that can be written to a socket pair in Linux. See:
I’m thinking that to fix this, we can simply ignore EAGAIN (and for maximum portability EWOULDBLOCK as well). If the socket is full, there are plenty of records there to wake up the reader, so it isn’t necessary to add another ‘wake-up’ record. Since the lock was taken before the new timer was added and continues to be held through the write, we can just release the lock knowing the reader has something to wake it up.
If there are no objections, I’ll submit another patch along with bug 220. Or should I open a separate bug?
I’m also interested to know if anyone can provide a timeline for applying the patch in bug 220 to a release.
The information contained in this message may be privileged and confidential and protected from disclosure. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by replying to this message and deleting it from your computer. Thank you. SR Labs LLC