That's what the change should do - its a reference count that defers cleanup until the last reference has been released (since both the jvm GC and the bridge thread could both hit this code). It also moves the lock release until a little later in the cycle.
Mike, feel free to reach out to me directly or on Gitter to see if we can get lined up here.
I think the other deadlock reported is a different issue to this one (though a similar cause with asynchronous destroy callbacks from the middleware causing a stir somewhere else).
Cheers,
Frank
toggle quoted message
Show quoted text
Hi Frank
Does this actually address the issue with prematurely releasing the lock? Wouldn’t it be safer, as Mike suggested, to allow the thread that holds the lock to mark the object for destruction, but to defer the memory clean up until
the unlock occurs?
Do you want to have a side conversation with Mike about his concerns – I think it might help?
Nigel
Nigel Phelan | Corporate & Investment Bank | Market Data
Services | J.P. Morgan
Hi Mike – did you have any joy with testing this one?
From: Frank Quinn
Sent: 11 May 2020 20:19
To: Slade, Michael J <michael.j.slade@...>
Cc: openmama-dev@...
Subject: RE: [Openmama-dev] Early releases of lock in mamaSubscription destroy/deallocate logic
Hi Mike,
I have put together something that looks like it address this (I modified qpid and MamaListen locally with various sleeps to verify behaviour). Turned out to be a little trickier than I had anticipated with the different paths through the state machine.
It works by assigning an atomic reference counter to increment for the bridge, and the Java wrapper, such that only when receiving a statement of disinterest from both will it release the memory. Since calling deallocate is an admission that you will no
longer be using that memory any more, operating on a first past the post system should be safe.
There are many paths through the MAMA subscription state machine though so I'd appreciate if you could give it a test in your target environment before I merge it in. You can find my development branch for this change on:
https://github.com/fquinner/OpenMAMA/tree/feature/subscription-reference-count
Let me know if this works for you.
Cheers,
Frank
Thanks Frank.
If you could take a look at implementing this, that would be much appreciated.
Mike
From: Frank Quinn [mailto:fquinn@...]
Sent: 28 April 2020 19:52
To: Slade, Michael J (CIB Tech, GBR) <michael.j.slade@...>
Cc: openmama-dev@...
Subject: RE: [Openmama-dev] Early releases of lock in mamaSubscription destroy/deallocate logic
OK thanks Mike I thought user interaction was the primary problem but sounds like it's the finalizers.
This is where the subscription stuff gets hairy because there are a few different but similar areas at play here.
You should ideally unlock before the mamaSubscriptionImpl_deallocate to avoid undefined behaviour in the destroy yes. However...
In Java, when the GC kicks in, it fires off the JNI Java_com_wombat_mama_MamaSubscription_deallocate method which in turn calls mamaSubscription_deallocate which does acquire the subscription's mCreateDestroyLock
which could be (already) held onto while mamaSubscriptionImpl_deallocate is then called (!). So we actually already have undefined behaviour on that path when you look at it so that's effectively the same bug.
The path with Java destroy goes JNI Java_com_wombat_mama_MamaSubscription_destroy which calls mamaSubscription_destroy which will usually deactivate the subscription, then go on to invoke the destroyed callback. I was suggesting at this point, we *do* hang
onto the lock until that callback is completed to protect the subscription object.
The path mamaSubscriptionImpl_onSubscriptionDestroyed is a different beast when the middleware is letting MAMA know that a subscription has been destroyed which may be via mamaSubscription_destroy -> mamaSubscription -> deactivate. In this case, we currently
unlock before the callback. I was suggesting this should be moved until after the callback, but before the deallocate. If we hung onto the lock until after the deallocate, then we'd just be emulating the buggy behaviour already present in mamaSubscription_deallocate.
But yes when I look this through there is a more subtle issue afoot - because we have onSubscriptionDestroyed which will be called here (already inside the lock)
destroy
deactivate
lock mCreateDestroyLock
mamaSubscription_deactivate_internal
bridge_destroy
mamaSubscriptionImpl_onSubscriptionDestroyed
lock mCreateDestroyLock
unlock mCreateDestroyLock
impldeallocate()
unlock
lock
Now, since wlocks are recursive and these are from the same thread, this won't deadlock and should already be protected from the finalizer, but it also may not happen in this order depending on how the ondestroy callback is implemented in the bridge, so
you could get something more like this:
destroy
deactivate
lock mCreateDestroyLock
mamaSubscription_deactivate_internal
bridge_destroy
... bridge takes this under consideration ...
unlock mCreateDestroyLock
lock
... time passes...
mamaSubscriptionImpl_onSubscriptionDestroyed
lock mCreateDestroyLock
unlock mCreateDestroyLock
impldeallocate() <-- this is where it looks like the GC has an opportunity to cause trouble?
If we are going to have multiple threads coming in like this, then yes I think an atomic reference counter in the subscription object to track when each resource depending on the subscription object has gained and lost interest would be preferable to holding
onto the lock. Would fix the existing bug already in the finalizer too.
Is this something you're looking clarification on before implementing or you want me to have a look at implementing this?
Cheers,
Frank
Hi Frank,
Why do we need to unlock the mutex before the deallocate call? I understand that this call destroys the lock, but with the proposed wrapper around the lock the destroy call can defer the actual release of memory to the unlock
call.
The early release could be exploited because the MamaSubscription Java wrapper calls deallocate in its finalize method. Due to this, the GC thread could acquire the lock and release memory in mamaSubscriptionImpl_deallocate while
the lock-releasing thread is trying to use this memory to invoke the user callback.
Thanks,
Mike
From: Frank Quinn [mailto:fquinn@...]
Sent: 27 April 2020 22:04
To: Slade, Michael J (CIB Tech, GBR) <michael.j.slade@...>
Cc: openmama-dev@...
Subject: RE: [Openmama-dev] Early releases of lock in mamaSubscription destroy/deallocate logic
Hi Mike,
Could you shed some more light on what exactly the problem is here? I have had a refresher in that area, and from what I can see, I can’t think of any reason to unlock that particular mutex before the user callback so I’m not opposed to moving until after
that callback if that will resolve whatever issue you’re seeing? It will need to be before the deallocate though.
The only reason I could think of for why this was done in the first place is to avoid deadlock if the user calls something like mamaSubscription_deallocate or something else that uses that mutex from the callback, but then I guess they’d learn pretty quickly
if they had fallen foul of that. Then I thought this might be somehow exploited in one of the Java or .NET wrappers but couldn’t find any evidence of that either. Plus that callback is more informative than anything else for the application, so if they are
attempting to deallocate, and that’s causing funny business (or something like that?), we just need to make it clear in the callback documentation that they shouldn’t be doing that.
Cheers,
Frank
At first glance, that sounds like just “kicking the can down the road” — i.e., you’re still left with the problem of what to do with these wrappers such that you can tear them down in a thread-safe manner.
Having said that, if you have an implementation that works, I’m sure that others would be interested, so maybe put it up on GitHub or something?
My personal preference would be to try to find something on the intertubes that has been tested up the wazoo — concurrency is hard, and the hardest part IMO is tear-down of event sources/sinks.
Thank you for your reply Bill.
To get around the problem with attempting to destroy the mutex while it is locked, could we instead define our own wrapper around a recursive mutex which handles the following:
1. Keep a count of how many locks have been performed by current thread.
2. Defer destroy to unlock call if mutex currently locked.
This will keep all destroy/deallocate logic thread safe and stop the need for early unlocks.
Attempting to destroy a locked mutex results in undefined behavior.
So you either destroy the locked mutex, which is UB, or you unlock it first. In the second case, another thread could grab it, which would likely cause a crash at some point.
You can make the second problem go away by doing something like queueing the actual destroy in such a way that you know for certain that no other threads are waiting on the mutex, but that’s
a fair bit of work, and tricky in its own right. We took that approach in our application, and it generally works well, and avoids a whole host of lifetime-related problems in OpenMAMA.
In the subscription destroy/deallocate logic, there are a couple of functions which seem to release the lock used to protect access to the subscription structure early – i.e. before the user callback ‘mamaSubscriptionImpl_invokeDestroyedCallback’ or ‘mamaSubscriptionImpl_deallocate’ function is invoked. Is anyone able to explain why the lock is released in this way, and not held until the end of the function?
Thanks in advance,
Mike
mamaSubscription_destroy:2848
mama_status mamaSubscription_destroy(mamaSubscription subscription)
mama_status ret = MAMA_STATUS_NULL_ARG;
mamaSubscriptionImpl *impl = (mamaSubscriptionImpl *)subscription;
mamaSubscription_deactivate(subscription);
wlock_lock(impl->mCreateDestroyLock);
/* The next action will depend on the current state of the subscription. */
switch(wInterlocked_read(&impl->mState))
/* Otherwise the subscription has been correctly removed from the throttle. */
/* Fall through to perform the remaining clean-up. */
/* For the following states the subscription is not active, simply perform clean-up. */
case MAMA_SUBSCRIPTION_SETUP:
case MAMA_SUBSCRIPTION_DEACTIVATED:
mamaSubscription_cleanup(subscription);
mamaSubscriptionImpl_setState(impl, MAMA_SUBSCRIPTION_DESTROYED);
wlock_unlock(impl->mCreateDestroyLock);
mamaSubscriptionImpl_invokeDestroyedCallback(subscription);
mamaSubscriptionImpl_onSubscriptionDestroyed:3292+3303
void MAMACALLTYPE mamaSubscriptionImpl_onSubscriptionDestroyed(mamaSubscription subscription, void *closure)
{
/* Obtain the impl from the subscription object. */
mamaSubscriptionImpl *impl = (mamaSubscriptionImpl *)subscription;
if(NULL != impl)
{
if(NULL != impl->mQueue)
mamaQueue_decrementObjectCount(&impl->mLockHandle, impl->mQueue);
/* Lock the mutex. */
wlock_lock(impl->mCreateDestroyLock);
/* The next action will depend on the current state of the subscription. */
switch(wInterlocked_read(&impl->mState))
{
/* The subscription is being deactivated. */
case MAMA_SUBSCRIPTION_DEACTIVATING:
/* Change the state. */
mamaSubscriptionImpl_setState(impl, MAMA_SUBSCRIPTION_DEACTIVATED);
break;
/* The subscription is being deallocated, i.e. mamaSubscription_deallocate has been called
* before the destroy callback has come in from the bridge.
*/
case MAMA_SUBSCRIPTION_DEALLOCATING :
mamaSubscription_cleanup(subscription);
wlock_unlock(impl->mCreateDestroyLock);
mamaSubscriptionImpl_invokeDestroyedCallback(impl);
/* Delete the subscription. */
mamaSubscriptionImpl_deallocate(impl);
return;
break;
/* The subscription is being destroyed. */
case MAMA_SUBSCRIPTION_DESTROYING :
mamaSubscription_cleanup(subscription);
mamaSubscriptionImpl_setState(impl, MAMA_SUBSCRIPTION_DESTROYED);
wlock_unlock(impl->mCreateDestroyLock);
mamaSubscriptionImpl_invokeDestroyedCallback(impl);
return;
break;
This message is confidential and subject to terms at: https://www.jpmorgan.com/emaildisclaimer including
on confidential, privileged or legal entity information, viruses and monitoring of electronic messages. If you are not the intended recipient, please delete this message and notify the sender immediately. Any unauthorized use is strictly prohibited.
This message is confidential and subject to terms at: https://www.jpmorgan.com/emaildisclaimer including
on confidential, privileged or legal entity information, viruses and monitoring of electronic messages. If you are not the intended recipient, please delete this message and notify the sender immediately. Any unauthorized use is strictly prohibited.
This message is confidential and subject to terms at:
https://www.jpmorgan.com/emaildisclaimer including on confidential, privileged or legal entity information, viruses and monitoring of electronic messages. If you are not the intended recipient, please delete this message and notify the sender immediately.
Any unauthorized use is strictly prohibited.
This message is confidential and subject to terms at:
https://www.jpmorgan.com/emaildisclaimer including on confidential, privileged or legal entity information, viruses and monitoring of electronic messages. If you are not the intended recipient, please delete this message and notify the sender immediately.
Any unauthorized use is strictly prohibited.
This message is confidential and subject to terms at: https://www.jpmorgan.com/emaildisclaimer including on confidential, privileged or legal entity information, viruses and monitoring of electronic messages. If you are not the intended recipient, please
delete this message and notify the sender immediately. Any unauthorized use is strictly prohibited.
|
|
Hi Frank
Does this actually address the issue with prematurely releasing the lock? Wouldn’t it be safer, as Mike suggested, to allow the thread that holds the lock to mark the object for destruction,
but to defer the memory clean up until the unlock occurs?
Do you want to have a side conversation with Mike about his concerns – I think it might help?
Nigel
Nigel Phelan
| Corporate & Investment Bank | Market Data Services | J.P. Morgan
toggle quoted message
Show quoted text
From: Openmama-dev@... [mailto:Openmama-dev@...]
On Behalf Of Frank Quinn
Sent: 01 June 2020 22:16
To: Slade, Michael J (CIB Tech, GBR) <michael.j.slade@...>
Cc: openmama-dev@...
Subject: Re: [Openmama-dev] Early releases of lock in mamaSubscription destroy/deallocate logic
Hi Mike – did you have any joy with testing this one?
From: Frank Quinn
Sent: 11 May 2020 20:19
To: Slade, Michael J <michael.j.slade@...>
Cc: openmama-dev@...
Subject: RE: [Openmama-dev] Early releases of lock in mamaSubscription destroy/deallocate logic
Hi Mike,
I have put together something that looks like it address this (I modified qpid and MamaListen locally with various sleeps to verify behaviour). Turned out to be a little trickier than I had anticipated with the different paths through the
state machine.
It works by assigning an atomic reference counter to increment for the bridge, and the Java wrapper, such that only when receiving a statement of disinterest from both will it release the memory. Since calling deallocate is an admission
that you will no longer be using that memory any more, operating on a first past the post system should be safe.
There are many paths through the MAMA subscription state machine though so I'd appreciate if you could give it a test in your target environment before I merge it in. You can find my development branch for this change on:
https://github.com/fquinner/OpenMAMA/tree/feature/subscription-reference-count
Let me know if this works for you.
Cheers,
Frank
Thanks Frank.
If you could take a look at implementing this, that would be much appreciated.
Mike
From: Frank Quinn [mailto:fquinn@...]
Sent: 28 April 2020 19:52
To: Slade, Michael J (CIB Tech, GBR) <michael.j.slade@...>
Cc: openmama-dev@...
Subject: RE: [Openmama-dev] Early releases of lock in mamaSubscription destroy/deallocate logic
OK thanks Mike I thought user interaction was the primary problem but sounds like it's the finalizers.
This is where the subscription stuff gets hairy because there are a few different but similar areas at play here.
You should ideally unlock before the mamaSubscriptionImpl_deallocate to avoid undefined behaviour in the destroy yes. However...
In Java, when the GC kicks in, it fires off the JNI Java_com_wombat_mama_MamaSubscription_deallocate method which in turn calls mamaSubscription_deallocate which does acquire the subscription's mCreateDestroyLock
which could be (already) held onto while mamaSubscriptionImpl_deallocate is then called (!). So we actually already have undefined behaviour on that path when you look at it so that's effectively the same bug.
The path with Java destroy goes JNI Java_com_wombat_mama_MamaSubscription_destroy which calls mamaSubscription_destroy which will usually deactivate the subscription, then go on to invoke the destroyed callback. I was suggesting at this
point, we *do* hang onto the lock until that callback is completed to protect the subscription object.
The path mamaSubscriptionImpl_onSubscriptionDestroyed is a different beast when the middleware is letting MAMA know that a subscription has been destroyed which may be via mamaSubscription_destroy -> mamaSubscription -> deactivate. In this
case, we currently unlock before the callback. I was suggesting this should be moved until after the callback, but before the deallocate. If we hung onto the lock until after the deallocate, then we'd just be emulating the buggy behaviour already present in
mamaSubscription_deallocate.
But yes when I look this through there is a more subtle issue afoot - because we have onSubscriptionDestroyed which will be called here (already inside the lock)
destroy
deactivate
lock mCreateDestroyLock
mamaSubscription_deactivate_internal
bridge_destroy
mamaSubscriptionImpl_onSubscriptionDestroyed
lock mCreateDestroyLock
unlock mCreateDestroyLock
impldeallocate()
unlock
lock
Now, since wlocks are recursive and these are from the same thread, this won't deadlock and should already be protected from the finalizer, but it also may not happen in this order depending on how the ondestroy callback is implemented
in the bridge, so you could get something more like this:
destroy
deactivate
lock mCreateDestroyLock
mamaSubscription_deactivate_internal
bridge_destroy
... bridge takes this under consideration ...
unlock mCreateDestroyLock
lock
... time passes...
mamaSubscriptionImpl_onSubscriptionDestroyed
lock mCreateDestroyLock
unlock mCreateDestroyLock
impldeallocate() <-- this is where it looks like the GC has an opportunity to cause trouble?
If we are going to have multiple threads coming in like this, then yes I think an atomic reference counter in the subscription object to track when each resource depending on the subscription object has gained and lost interest would be
preferable to holding onto the lock. Would fix the existing bug already in the finalizer too.
Is this something you're looking clarification on before implementing or you want me to have a look at implementing this?
Cheers,
Frank
Hi Frank,
Why do we need to unlock the mutex before the deallocate call? I understand that this call destroys the lock, but with the proposed wrapper around the lock the destroy call can defer the actual release
of memory to the unlock call.
The early release could be exploited because the MamaSubscription Java wrapper calls deallocate in its finalize method. Due to this, the GC thread could acquire the lock and release memory in mamaSubscriptionImpl_deallocate
while the lock-releasing thread is trying to use this memory to invoke the user callback.
Thanks,
Mike
From: Frank Quinn [mailto:fquinn@...]
Sent: 27 April 2020 22:04
To: Slade, Michael J (CIB Tech, GBR) <michael.j.slade@...>
Cc: openmama-dev@...
Subject: RE: [Openmama-dev] Early releases of lock in mamaSubscription destroy/deallocate logic
Hi Mike,
Could you shed some more light on what exactly the problem is here? I have had a refresher in that area, and from what I can see, I can’t think of any reason to unlock that particular mutex before the user callback so I’m not opposed to
moving until after that callback if that will resolve whatever issue you’re seeing? It will need to be before the deallocate though.
The only reason I could think of for why this was done in the first place is to avoid deadlock if the user calls something like mamaSubscription_deallocate or something else that uses that mutex from the callback, but then I guess they’d
learn pretty quickly if they had fallen foul of that. Then I thought this might be somehow exploited in one of the Java or .NET wrappers but couldn’t find any evidence of that either. Plus that callback is more informative than anything else for the application,
so if they are attempting to deallocate, and that’s causing funny business (or something like that?), we just need to make it clear in the callback documentation that they shouldn’t be doing that.
Cheers,
Frank
At first glance, that sounds like just “kicking the can down the road” — i.e., you’re still left with the problem of what to do with these wrappers such that you can tear them down in a thread-safe manner.
Having said that, if you have an implementation that works, I’m sure that others would be interested, so maybe put it up on GitHub or something?
My personal preference would be to try to find something on the intertubes that has been tested up the wazoo — concurrency is hard, and the hardest part IMO is tear-down of event sources/sinks.
Thank you for your reply Bill.
To get around the problem with attempting to destroy the mutex while it is locked, could we instead define our own wrapper around a recursive mutex which handles the following:
1. Keep a
count of how many locks have been performed by current thread.
2. Defer destroy
to unlock call if mutex currently locked.
This will keep all destroy/deallocate logic thread safe and stop the need for early unlocks.
Attempting to destroy a locked mutex results in undefined behavior.
So you either destroy the locked mutex, which is UB, or you unlock it first. In the second case, another thread could grab it, which would likely cause a crash at some
point.
You can make the second problem go away by doing something like queueing the actual destroy in such a way that you know for certain that no other threads are waiting on
the mutex, but that’s a fair bit of work, and tricky in its own right. We took that approach in our application, and it generally works well, and avoids a whole host of lifetime-related problems in OpenMAMA.
In the subscription destroy/deallocate logic, there are a couple of functions which seem to release the lock used to protect access to the subscription structure early – i.e. before the user callback ‘mamaSubscriptionImpl_invokeDestroyedCallback’ or ‘mamaSubscriptionImpl_deallocate’ function is invoked. Is anyone able to explain why the lock is released in this way, and not held until the end of the function?
Thanks in advance,
Mike
mamaSubscription_destroy:2848
mama_status mamaSubscription_destroy(mamaSubscription subscription)
mama_status ret = MAMA_STATUS_NULL_ARG;
mamaSubscriptionImpl *impl = (mamaSubscriptionImpl *)subscription;
mamaSubscription_deactivate(subscription);
wlock_lock(impl->mCreateDestroyLock);
/* The next action will depend on the current state of the subscription. */
switch(wInterlocked_read(&impl->mState))
/* Otherwise the subscription has been correctly removed from the throttle. */
/* Fall through to perform the remaining clean-up. */
/* For the following states the subscription is not active, simply perform clean-up. */
case MAMA_SUBSCRIPTION_SETUP:
case MAMA_SUBSCRIPTION_DEACTIVATED:
mamaSubscription_cleanup(subscription);
mamaSubscriptionImpl_setState(impl, MAMA_SUBSCRIPTION_DESTROYED);
wlock_unlock(impl->mCreateDestroyLock);
mamaSubscriptionImpl_invokeDestroyedCallback(subscription);
mamaSubscriptionImpl_onSubscriptionDestroyed:3292+3303
void MAMACALLTYPE mamaSubscriptionImpl_onSubscriptionDestroyed(mamaSubscription subscription, void *closure)
{
/* Obtain the impl from the subscription object. */
mamaSubscriptionImpl *impl = (mamaSubscriptionImpl *)subscription;
if(NULL != impl)
{
if(NULL != impl->mQueue)
mamaQueue_decrementObjectCount(&impl->mLockHandle, impl->mQueue);
/* Lock the mutex. */
wlock_lock(impl->mCreateDestroyLock);
/* The next action will depend on the current state of the subscription. */
switch(wInterlocked_read(&impl->mState))
{
/* The subscription is being deactivated. */
case MAMA_SUBSCRIPTION_DEACTIVATING:
/* Change the state. */
mamaSubscriptionImpl_setState(impl, MAMA_SUBSCRIPTION_DEACTIVATED);
break;
/* The subscription is being deallocated, i.e. mamaSubscription_deallocate has been called
* before the destroy callback has come in from the bridge.
*/
case MAMA_SUBSCRIPTION_DEALLOCATING :
mamaSubscription_cleanup(subscription);
wlock_unlock(impl->mCreateDestroyLock);
mamaSubscriptionImpl_invokeDestroyedCallback(impl);
/* Delete the subscription. */
mamaSubscriptionImpl_deallocate(impl);
return;
break;
/* The subscription is being destroyed. */
case MAMA_SUBSCRIPTION_DESTROYING :
mamaSubscription_cleanup(subscription);
mamaSubscriptionImpl_setState(impl, MAMA_SUBSCRIPTION_DESTROYED);
wlock_unlock(impl->mCreateDestroyLock);
mamaSubscriptionImpl_invokeDestroyedCallback(impl);
return;
break;
This message is confidential and subject to terms at: https://www.jpmorgan.com/emaildisclaimer including
on confidential, privileged or legal entity information, viruses and monitoring of electronic messages. If you are not the intended recipient, please delete this message and notify the sender immediately. Any unauthorized use is strictly prohibited.
This message is confidential and subject to terms at: https://www.jpmorgan.com/emaildisclaimer including
on confidential, privileged or legal entity information, viruses and monitoring of electronic messages. If you are not the intended recipient, please delete this message and notify the sender immediately. Any unauthorized use is strictly prohibited.
This message is confidential and subject to terms at:
https://www.jpmorgan.com/emaildisclaimer including on confidential, privileged or legal entity information, viruses and monitoring of electronic messages. If you are not the intended recipient, please delete this message and notify the sender immediately.
Any unauthorized use is strictly prohibited.
This message is confidential and subject to terms at:
https://www.jpmorgan.com/emaildisclaimer including on confidential, privileged or legal entity information, viruses and monitoring of electronic messages. If you are not the intended recipient, please delete this message and notify the sender immediately.
Any unauthorized use is strictly prohibited.
This message is confidential and subject to terms at: https://www.jpmorgan.com/emaildisclaimer including on confidential, privileged or legal entity information, viruses and monitoring of electronic messages. If you are not the intended recipient, please delete this message and notify the sender immediately. Any unauthorized use is strictly prohibited.
|
|
Hi Mike – did you have any joy with testing this one?
toggle quoted message
Show quoted text
From: Frank Quinn
Sent: 11 May 2020 20:19
To: Slade, Michael J <michael.j.slade@...>
Cc: openmama-dev@...
Subject: RE: [Openmama-dev] Early releases of lock in mamaSubscription destroy/deallocate logic
Hi Mike,
I have put together something that looks like it address this (I modified qpid and MamaListen locally with various sleeps to verify behaviour). Turned out to be a little trickier than I had anticipated with the different paths through the
state machine.
It works by assigning an atomic reference counter to increment for the bridge, and the Java wrapper, such that only when receiving a statement of disinterest from both will it release the memory. Since calling deallocate is an admission
that you will no longer be using that memory any more, operating on a first past the post system should be safe.
There are many paths through the MAMA subscription state machine though so I'd appreciate if you could give it a test in your target environment before I merge it in. You can find my development branch for this change on:
https://github.com/fquinner/OpenMAMA/tree/feature/subscription-reference-count
Let me know if this works for you.
Cheers,
Frank
Thanks Frank.
If you could take a look at implementing this, that would be much appreciated.
Mike
From: Frank Quinn [mailto:fquinn@...]
Sent: 28 April 2020 19:52
To: Slade, Michael J (CIB Tech, GBR) <michael.j.slade@...>
Cc: openmama-dev@...
Subject: RE: [Openmama-dev] Early releases of lock in mamaSubscription destroy/deallocate logic
OK thanks Mike I thought user interaction was the primary problem but sounds like it's the finalizers.
This is where the subscription stuff gets hairy because there are a few different but similar areas at play here.
You should ideally unlock before the mamaSubscriptionImpl_deallocate to avoid undefined behaviour in the destroy yes. However...
In Java, when the GC kicks in, it fires off the JNI Java_com_wombat_mama_MamaSubscription_deallocate method which in turn calls mamaSubscription_deallocate which does acquire the subscription's mCreateDestroyLock
which could be (already) held onto while mamaSubscriptionImpl_deallocate is then called (!). So we actually already have undefined behaviour on that path when you look at it so that's effectively the same bug.
The path with Java destroy goes JNI Java_com_wombat_mama_MamaSubscription_destroy which calls mamaSubscription_destroy which will usually deactivate the subscription, then go on to invoke the destroyed callback. I was suggesting at this
point, we *do* hang onto the lock until that callback is completed to protect the subscription object.
The path mamaSubscriptionImpl_onSubscriptionDestroyed is a different beast when the middleware is letting MAMA know that a subscription has been destroyed which may be via mamaSubscription_destroy -> mamaSubscription -> deactivate. In this
case, we currently unlock before the callback. I was suggesting this should be moved until after the callback, but before the deallocate. If we hung onto the lock until after the deallocate, then we'd just be emulating the buggy behaviour already present in
mamaSubscription_deallocate.
But yes when I look this through there is a more subtle issue afoot - because we have onSubscriptionDestroyed which will be called here (already inside the lock)
destroy
deactivate
lock mCreateDestroyLock
mamaSubscription_deactivate_internal
bridge_destroy
mamaSubscriptionImpl_onSubscriptionDestroyed
lock mCreateDestroyLock
unlock mCreateDestroyLock
impldeallocate()
unlock
lock
Now, since wlocks are recursive and these are from the same thread, this won't deadlock and should already be protected from the finalizer, but it also may not happen in this order depending on how the ondestroy callback is implemented
in the bridge, so you could get something more like this:
destroy
deactivate
lock mCreateDestroyLock
mamaSubscription_deactivate_internal
bridge_destroy
... bridge takes this under consideration ...
unlock mCreateDestroyLock
lock
... time passes...
mamaSubscriptionImpl_onSubscriptionDestroyed
lock mCreateDestroyLock
unlock mCreateDestroyLock
impldeallocate() <-- this is where it looks like the GC has an opportunity to cause trouble?
If we are going to have multiple threads coming in like this, then yes I think an atomic reference counter in the subscription object to track when each resource depending on the subscription object has gained and lost interest would be
preferable to holding onto the lock. Would fix the existing bug already in the finalizer too.
Is this something you're looking clarification on before implementing or you want me to have a look at implementing this?
Cheers,
Frank
Hi Frank,
Why do we need to unlock the mutex before the deallocate call? I understand that this call destroys the lock, but with the proposed wrapper around the lock the destroy call can defer the actual release
of memory to the unlock call.
The early release could be exploited because the MamaSubscription Java wrapper calls deallocate in its finalize method. Due to this, the GC thread could acquire the lock and release memory in mamaSubscriptionImpl_deallocate
while the lock-releasing thread is trying to use this memory to invoke the user callback.
Thanks,
Mike
From: Frank Quinn [mailto:fquinn@...]
Sent: 27 April 2020 22:04
To: Slade, Michael J (CIB Tech, GBR) <michael.j.slade@...>
Cc: openmama-dev@...
Subject: RE: [Openmama-dev] Early releases of lock in mamaSubscription destroy/deallocate logic
Hi Mike,
Could you shed some more light on what exactly the problem is here? I have had a refresher in that area, and from what I can see, I can’t think of any reason to unlock that particular mutex before the user callback so I’m not opposed to
moving until after that callback if that will resolve whatever issue you’re seeing? It will need to be before the deallocate though.
The only reason I could think of for why this was done in the first place is to avoid deadlock if the user calls something like mamaSubscription_deallocate or something else that uses that mutex from the callback, but then I guess they’d
learn pretty quickly if they had fallen foul of that. Then I thought this might be somehow exploited in one of the Java or .NET wrappers but couldn’t find any evidence of that either. Plus that callback is more informative than anything else for the application,
so if they are attempting to deallocate, and that’s causing funny business (or something like that?), we just need to make it clear in the callback documentation that they shouldn’t be doing that.
Cheers,
Frank
At first glance, that sounds like just “kicking the can down the road” — i.e., you’re still left with the problem of what to do with these wrappers such that you can tear them down in a thread-safe manner.
Having said that, if you have an implementation that works, I’m sure that others would be interested, so maybe put it up on GitHub or something?
My personal preference would be to try to find something on the intertubes that has been tested up the wazoo — concurrency is hard, and the hardest part IMO is tear-down of event sources/sinks.
Thank you for your reply Bill.
To get around the problem with attempting to destroy the mutex while it is locked, could we instead define our own wrapper around a recursive mutex which handles the following:
1. Keep a
count of how many locks have been performed by current thread.
2. Defer destroy
to unlock call if mutex currently locked.
This will keep all destroy/deallocate logic thread safe and stop the need for early unlocks.
Attempting to destroy a locked mutex results in undefined behavior.
So you either destroy the locked mutex, which is UB, or you unlock it first. In the second case, another thread could grab it, which would likely cause a crash at some
point.
You can make the second problem go away by doing something like queueing the actual destroy in such a way that you know for certain that no other threads are waiting on
the mutex, but that’s a fair bit of work, and tricky in its own right. We took that approach in our application, and it generally works well, and avoids a whole host of lifetime-related problems in OpenMAMA.
In the subscription destroy/deallocate logic, there are a couple of functions which seem to release the lock used to protect access to the subscription structure early – i.e. before the user callback ‘mamaSubscriptionImpl_invokeDestroyedCallback’ or ‘mamaSubscriptionImpl_deallocate’ function is invoked. Is anyone able to explain why the lock is released in this way, and not held until the end of the function?
Thanks in advance,
Mike
mamaSubscription_destroy:2848
mama_status mamaSubscription_destroy(mamaSubscription subscription)
mama_status ret = MAMA_STATUS_NULL_ARG;
mamaSubscriptionImpl *impl = (mamaSubscriptionImpl *)subscription;
mamaSubscription_deactivate(subscription);
wlock_lock(impl->mCreateDestroyLock);
/* The next action will depend on the current state of the subscription. */
switch(wInterlocked_read(&impl->mState))
/* Otherwise the subscription has been correctly removed from the throttle. */
/* Fall through to perform the remaining clean-up. */
/* For the following states the subscription is not active, simply perform clean-up. */
case MAMA_SUBSCRIPTION_SETUP:
case MAMA_SUBSCRIPTION_DEACTIVATED:
mamaSubscription_cleanup(subscription);
mamaSubscriptionImpl_setState(impl, MAMA_SUBSCRIPTION_DESTROYED);
wlock_unlock(impl->mCreateDestroyLock);
mamaSubscriptionImpl_invokeDestroyedCallback(subscription);
mamaSubscriptionImpl_onSubscriptionDestroyed:3292+3303
void MAMACALLTYPE mamaSubscriptionImpl_onSubscriptionDestroyed(mamaSubscription subscription, void *closure)
{
/* Obtain the impl from the subscription object. */
mamaSubscriptionImpl *impl = (mamaSubscriptionImpl *)subscription;
if(NULL != impl)
{
if(NULL != impl->mQueue)
mamaQueue_decrementObjectCount(&impl->mLockHandle, impl->mQueue);
/* Lock the mutex. */
wlock_lock(impl->mCreateDestroyLock);
/* The next action will depend on the current state of the subscription. */
switch(wInterlocked_read(&impl->mState))
{
/* The subscription is being deactivated. */
case MAMA_SUBSCRIPTION_DEACTIVATING:
/* Change the state. */
mamaSubscriptionImpl_setState(impl, MAMA_SUBSCRIPTION_DEACTIVATED);
break;
/* The subscription is being deallocated, i.e. mamaSubscription_deallocate has been called
* before the destroy callback has come in from the bridge.
*/
case MAMA_SUBSCRIPTION_DEALLOCATING :
mamaSubscription_cleanup(subscription);
wlock_unlock(impl->mCreateDestroyLock);
mamaSubscriptionImpl_invokeDestroyedCallback(impl);
/* Delete the subscription. */
mamaSubscriptionImpl_deallocate(impl);
return;
break;
/* The subscription is being destroyed. */
case MAMA_SUBSCRIPTION_DESTROYING :
mamaSubscription_cleanup(subscription);
mamaSubscriptionImpl_setState(impl, MAMA_SUBSCRIPTION_DESTROYED);
wlock_unlock(impl->mCreateDestroyLock);
mamaSubscriptionImpl_invokeDestroyedCallback(impl);
return;
break;
This message is confidential and subject to terms at: https://www.jpmorgan.com/emaildisclaimer including
on confidential, privileged or legal entity information, viruses and monitoring of electronic messages. If you are not the intended recipient, please delete this message and notify the sender immediately. Any unauthorized use is strictly prohibited.
This message is confidential and subject to terms at: https://www.jpmorgan.com/emaildisclaimer including
on confidential, privileged or legal entity information, viruses and monitoring of electronic messages. If you are not the intended recipient, please delete this message and notify the sender immediately. Any unauthorized use is strictly prohibited.
This message is confidential and subject to terms at:
https://www.jpmorgan.com/emaildisclaimer including on confidential, privileged or legal entity information, viruses and monitoring of electronic messages. If you are not the intended recipient, please delete this message and notify the sender immediately.
Any unauthorized use is strictly prohibited.
This message is confidential and subject to terms at:
https://www.jpmorgan.com/emaildisclaimer including on confidential, privileged or legal entity information, viruses and monitoring of electronic messages. If you are not the intended recipient, please delete this message and notify the sender immediately.
Any unauthorized use is strictly prohibited.
|
|
Hi Mike,
I have put together something that looks like it address this (I modified qpid and MamaListen locally with various sleeps to verify behaviour). Turned out to be a little trickier than I had anticipated with the different paths through the
state machine.
It works by assigning an atomic reference counter to increment for the bridge, and the Java wrapper, such that only when receiving a statement of disinterest from both will it release the memory. Since calling deallocate is an admission
that you will no longer be using that memory any more, operating on a first past the post system should be safe.
There are many paths through the MAMA subscription state machine though so I'd appreciate if you could give it a test in your target environment before I merge it in. You can find my development branch for this change on:
https://github.com/fquinner/OpenMAMA/tree/feature/subscription-reference-count
Let me know if this works for you.
Cheers,
Frank
toggle quoted message
Show quoted text
From: Slade, Michael J <michael.j.slade@...>
Sent: 29 April 2020 10:01
To: Frank Quinn <fquinn@...>
Cc: openmama-dev@...
Subject: RE: [Openmama-dev] Early releases of lock in mamaSubscription destroy/deallocate logic
Thanks Frank.
If you could take a look at implementing this, that would be much appreciated.
Mike
From: Frank Quinn [mailto:fquinn@...]
Sent: 28 April 2020 19:52
To: Slade, Michael J (CIB Tech, GBR) <michael.j.slade@...>
Cc: openmama-dev@...
Subject: RE: [Openmama-dev] Early releases of lock in mamaSubscription destroy/deallocate logic
OK thanks Mike I thought user interaction was the primary problem but sounds like it's the finalizers.
This is where the subscription stuff gets hairy because there are a few different but similar areas at play here.
You should ideally unlock before the mamaSubscriptionImpl_deallocate to avoid undefined behaviour in the destroy yes. However...
In Java, when the GC kicks in, it fires off the JNI Java_com_wombat_mama_MamaSubscription_deallocate method which in turn calls mamaSubscription_deallocate which does acquire the subscription's mCreateDestroyLock
which could be (already) held onto while mamaSubscriptionImpl_deallocate is then called (!). So we actually already have undefined behaviour on that path when you look at it so that's effectively the same bug.
The path with Java destroy goes JNI Java_com_wombat_mama_MamaSubscription_destroy which calls mamaSubscription_destroy which will usually deactivate the subscription, then go on to invoke the destroyed callback. I was suggesting at this
point, we *do* hang onto the lock until that callback is completed to protect the subscription object.
The path mamaSubscriptionImpl_onSubscriptionDestroyed is a different beast when the middleware is letting MAMA know that a subscription has been destroyed which may be via mamaSubscription_destroy -> mamaSubscription -> deactivate. In this
case, we currently unlock before the callback. I was suggesting this should be moved until after the callback, but before the deallocate. If we hung onto the lock until after the deallocate, then we'd just be emulating the buggy behaviour already present in
mamaSubscription_deallocate.
But yes when I look this through there is a more subtle issue afoot - because we have onSubscriptionDestroyed which will be called here (already inside the lock)
destroy
deactivate
lock mCreateDestroyLock
mamaSubscription_deactivate_internal
bridge_destroy
mamaSubscriptionImpl_onSubscriptionDestroyed
lock mCreateDestroyLock
unlock mCreateDestroyLock
impldeallocate()
unlock
lock
Now, since wlocks are recursive and these are from the same thread, this won't deadlock and should already be protected from the finalizer, but it also may not happen in this order depending on how the ondestroy callback is implemented
in the bridge, so you could get something more like this:
destroy
deactivate
lock mCreateDestroyLock
mamaSubscription_deactivate_internal
bridge_destroy
... bridge takes this under consideration ...
unlock mCreateDestroyLock
lock
... time passes...
mamaSubscriptionImpl_onSubscriptionDestroyed
lock mCreateDestroyLock
unlock mCreateDestroyLock
impldeallocate() <-- this is where it looks like the GC has an opportunity to cause trouble?
If we are going to have multiple threads coming in like this, then yes I think an atomic reference counter in the subscription object to track when each resource depending on the subscription object has gained and lost interest would be
preferable to holding onto the lock. Would fix the existing bug already in the finalizer too.
Is this something you're looking clarification on before implementing or you want me to have a look at implementing this?
Cheers,
Frank
Hi Frank,
Why do we need to unlock the mutex before the deallocate call? I understand that this call destroys the lock, but with the proposed wrapper around the lock the destroy call can defer the actual release
of memory to the unlock call.
The early release could be exploited because the MamaSubscription Java wrapper calls deallocate in its finalize method. Due to this, the GC thread could acquire the lock and release memory in mamaSubscriptionImpl_deallocate
while the lock-releasing thread is trying to use this memory to invoke the user callback.
Thanks,
Mike
From: Frank Quinn [mailto:fquinn@...]
Sent: 27 April 2020 22:04
To: Slade, Michael J (CIB Tech, GBR) <michael.j.slade@...>
Cc: openmama-dev@...
Subject: RE: [Openmama-dev] Early releases of lock in mamaSubscription destroy/deallocate logic
Hi Mike,
Could you shed some more light on what exactly the problem is here? I have had a refresher in that area, and from what I can see, I can’t think of any reason to unlock that particular mutex before the user callback so I’m not opposed to
moving until after that callback if that will resolve whatever issue you’re seeing? It will need to be before the deallocate though.
The only reason I could think of for why this was done in the first place is to avoid deadlock if the user calls something like mamaSubscription_deallocate or something else that uses that mutex from the callback, but then I guess they’d
learn pretty quickly if they had fallen foul of that. Then I thought this might be somehow exploited in one of the Java or .NET wrappers but couldn’t find any evidence of that either. Plus that callback is more informative than anything else for the application,
so if they are attempting to deallocate, and that’s causing funny business (or something like that?), we just need to make it clear in the callback documentation that they shouldn’t be doing that.
Cheers,
Frank
At first glance, that sounds like just “kicking the can down the road” — i.e., you’re still left with the problem of what to do with these wrappers such that you can tear them down in a thread-safe manner.
Having said that, if you have an implementation that works, I’m sure that others would be interested, so maybe put it up on GitHub or something?
My personal preference would be to try to find something on the intertubes that has been tested up the wazoo — concurrency is hard, and the hardest part IMO is tear-down of event sources/sinks.
Thank you for your reply Bill.
To get around the problem with attempting to destroy the mutex while it is locked, could we instead define our own wrapper around a recursive mutex which handles the following:
1. Keep a
count of how many locks have been performed by current thread.
2. Defer destroy
to unlock call if mutex currently locked.
This will keep all destroy/deallocate logic thread safe and stop the need for early unlocks.
Attempting to destroy a locked mutex results in undefined behavior.
So you either destroy the locked mutex, which is UB, or you unlock it first. In the second case, another thread could grab it, which would likely cause a crash at some
point.
You can make the second problem go away by doing something like queueing the actual destroy in such a way that you know for certain that no other threads are waiting on
the mutex, but that’s a fair bit of work, and tricky in its own right. We took that approach in our application, and it generally works well, and avoids a whole host of lifetime-related problems in OpenMAMA.
In the subscription destroy/deallocate logic, there are a couple of functions which seem to release the lock used to protect access to the subscription structure early – i.e. before the user callback ‘mamaSubscriptionImpl_invokeDestroyedCallback’ or ‘mamaSubscriptionImpl_deallocate’ function is invoked. Is anyone able to explain why the lock is released in this way, and not held until the end of the function?
Thanks in advance,
Mike
mamaSubscription_destroy:2848
mama_status mamaSubscription_destroy(mamaSubscription subscription)
mama_status ret = MAMA_STATUS_NULL_ARG;
mamaSubscriptionImpl *impl = (mamaSubscriptionImpl *)subscription;
mamaSubscription_deactivate(subscription);
wlock_lock(impl->mCreateDestroyLock);
/* The next action will depend on the current state of the subscription. */
switch(wInterlocked_read(&impl->mState))
/* Otherwise the subscription has been correctly removed from the throttle. */
/* Fall through to perform the remaining clean-up. */
/* For the following states the subscription is not active, simply perform clean-up. */
case MAMA_SUBSCRIPTION_SETUP:
case MAMA_SUBSCRIPTION_DEACTIVATED:
mamaSubscription_cleanup(subscription);
mamaSubscriptionImpl_setState(impl, MAMA_SUBSCRIPTION_DESTROYED);
wlock_unlock(impl->mCreateDestroyLock);
mamaSubscriptionImpl_invokeDestroyedCallback(subscription);
mamaSubscriptionImpl_onSubscriptionDestroyed:3292+3303
void MAMACALLTYPE mamaSubscriptionImpl_onSubscriptionDestroyed(mamaSubscription subscription, void *closure)
{
/* Obtain the impl from the subscription object. */
mamaSubscriptionImpl *impl = (mamaSubscriptionImpl *)subscription;
if(NULL != impl)
{
if(NULL != impl->mQueue)
mamaQueue_decrementObjectCount(&impl->mLockHandle, impl->mQueue);
/* Lock the mutex. */
wlock_lock(impl->mCreateDestroyLock);
/* The next action will depend on the current state of the subscription. */
switch(wInterlocked_read(&impl->mState))
{
/* The subscription is being deactivated. */
case MAMA_SUBSCRIPTION_DEACTIVATING:
/* Change the state. */
mamaSubscriptionImpl_setState(impl, MAMA_SUBSCRIPTION_DEACTIVATED);
break;
/* The subscription is being deallocated, i.e. mamaSubscription_deallocate has been called
* before the destroy callback has come in from the bridge.
*/
case MAMA_SUBSCRIPTION_DEALLOCATING :
mamaSubscription_cleanup(subscription);
wlock_unlock(impl->mCreateDestroyLock);
mamaSubscriptionImpl_invokeDestroyedCallback(impl);
/* Delete the subscription. */
mamaSubscriptionImpl_deallocate(impl);
return;
break;
/* The subscription is being destroyed. */
case MAMA_SUBSCRIPTION_DESTROYING :
mamaSubscription_cleanup(subscription);
mamaSubscriptionImpl_setState(impl, MAMA_SUBSCRIPTION_DESTROYED);
wlock_unlock(impl->mCreateDestroyLock);
mamaSubscriptionImpl_invokeDestroyedCallback(impl);
return;
break;
This message is confidential and subject to terms at: https://www.jpmorgan.com/emaildisclaimer including
on confidential, privileged or legal entity information, viruses and monitoring of electronic messages. If you are not the intended recipient, please delete this message and notify the sender immediately. Any unauthorized use is strictly prohibited.
This message is confidential and subject to terms at: https://www.jpmorgan.com/emaildisclaimer including
on confidential, privileged or legal entity information, viruses and monitoring of electronic messages. If you are not the intended recipient, please delete this message and notify the sender immediately. Any unauthorized use is strictly prohibited.
This message is confidential and subject to terms at:
https://www.jpmorgan.com/emaildisclaimer including on confidential, privileged or legal entity information, viruses and monitoring of electronic messages. If you are not the intended recipient, please delete this message and notify the sender immediately.
Any unauthorized use is strictly prohibited.
This message is confidential and subject to terms at:
https://www.jpmorgan.com/emaildisclaimer including on confidential, privileged or legal entity information, viruses and monitoring of electronic messages. If you are not the intended recipient, please delete this message and notify the sender immediately.
Any unauthorized use is strictly prohibited.
|
|
Thanks Frank.
If you could take a look at implementing this, that would be much appreciated.
Mike
toggle quoted message
Show quoted text
From: Frank Quinn [mailto: fquinn@...]
Sent: 28 April 2020 19:52
To: Slade, Michael J (CIB Tech, GBR) < michael.j.slade@...>
Cc: openmama-dev@...
Subject: RE: [Openmama-dev] Early releases of lock in mamaSubscription destroy/deallocate logic
OK thanks Mike I thought user interaction was the primary problem but sounds like it's the finalizers.
This is where the subscription stuff gets hairy because there are a few different but similar areas at play here.
You should ideally unlock before the mamaSubscriptionImpl_deallocate to avoid undefined behaviour in the destroy yes. However...
In Java, when the GC kicks in, it fires off the JNI Java_com_wombat_mama_MamaSubscription_deallocate method which in turn calls mamaSubscription_deallocate which does acquire the subscription's mCreateDestroyLock
which could be (already) held onto while mamaSubscriptionImpl_deallocate is then called (!). So we actually already have undefined behaviour on that path when you look at it so that's effectively the same bug.
The path with Java destroy goes JNI Java_com_wombat_mama_MamaSubscription_destroy which calls mamaSubscription_destroy which will usually deactivate the subscription, then go on to invoke the destroyed callback. I was
suggesting at this point, we *do* hang onto the lock until that callback is completed to protect the subscription object.
The path mamaSubscriptionImpl_onSubscriptionDestroyed is a different beast when the middleware is letting MAMA know that a subscription has been destroyed which may be via mamaSubscription_destroy -> mamaSubscription
-> deactivate. In this case, we currently unlock before the callback. I was suggesting this should be moved until after the callback, but before the deallocate. If we hung onto the lock until after the deallocate, then we'd just be emulating the buggy behaviour
already present in mamaSubscription_deallocate.
But yes when I look this through there is a more subtle issue afoot - because we have onSubscriptionDestroyed which will be called here (already inside the lock)
destroy
deactivate
lock mCreateDestroyLock
mamaSubscription_deactivate_internal
bridge_destroy
mamaSubscriptionImpl_onSubscriptionDestroyed
lock mCreateDestroyLock
unlock mCreateDestroyLock
impldeallocate()
unlock
lock
Now, since wlocks are recursive and these are from the same thread, this won't deadlock and should already be protected from the finalizer, but it also may not happen in this order depending on how the ondestroy callback
is implemented in the bridge, so you could get something more like this:
destroy
deactivate
lock mCreateDestroyLock
mamaSubscription_deactivate_internal
bridge_destroy
... bridge takes this under consideration ...
unlock mCreateDestroyLock
lock
... time passes...
mamaSubscriptionImpl_onSubscriptionDestroyed
lock mCreateDestroyLock
unlock mCreateDestroyLock
impldeallocate() <-- this is where it looks like the GC has an opportunity to cause trouble?
If we are going to have multiple threads coming in like this, then yes I think an atomic reference counter in the subscription object to track when each resource depending on the subscription object has gained and lost
interest would be preferable to holding onto the lock. Would fix the existing bug already in the finalizer too.
Is this something you're looking clarification on before implementing or you want me to have a look at implementing this?
Cheers,
Frank
Hi Frank,
Why do we need to unlock the mutex before the deallocate call? I understand that this call destroys the lock, but with the proposed wrapper around the lock the destroy call can defer the actual release of memory
to the unlock call.
The early release could be exploited because the MamaSubscription Java wrapper calls deallocate in its finalize method. Due to this, the GC thread could acquire the lock and release memory in mamaSubscriptionImpl_deallocate
while the lock-releasing thread is trying to use this memory to invoke the user callback.
Thanks,
Mike
From: Frank Quinn [mailto:fquinn@...]
Sent: 27 April 2020 22:04
To: Slade, Michael J (CIB Tech, GBR) <michael.j.slade@...>
Cc: openmama-dev@...
Subject: RE: [Openmama-dev] Early releases of lock in mamaSubscription destroy/deallocate logic
Hi Mike,
Could you shed some more light on what exactly the problem is here? I have had a refresher in that area, and from what I can see, I can’t think of any reason to unlock that particular mutex before the user callback so
I’m not opposed to moving until after that callback if that will resolve whatever issue you’re seeing? It will need to be before the deallocate though.
The only reason I could think of for why this was done in the first place is to avoid deadlock if the user calls something like mamaSubscription_deallocate or something else that uses that mutex from the callback, but
then I guess they’d learn pretty quickly if they had fallen foul of that. Then I thought this might be somehow exploited in one of the Java or .NET wrappers but couldn’t find any evidence of that either. Plus that callback is more informative than anything
else for the application, so if they are attempting to deallocate, and that’s causing funny business (or something like that?), we just need to make it clear in the callback documentation that they shouldn’t be doing that.
Cheers,
Frank
At first glance, that sounds like just “kicking the can down the road” — i.e., you’re still left with the problem of what to do with these wrappers such that you can tear them down in a thread-safe manner.
Having said that, if you have an implementation that works, I’m sure that others would be interested, so maybe put it up on GitHub or something?
My personal preference would be to try to find something on the intertubes that has been tested up the wazoo — concurrency is hard, and the hardest part IMO is tear-down of event sources/sinks.
Thank you for your reply Bill.
To get around the problem with attempting to destroy the mutex while it is locked, could we instead define our own wrapper around a recursive mutex which handles the following:
1. Keep
a count of how many locks have been performed by current thread.
2. Defer
destroy to unlock call if mutex currently locked.
This will keep all destroy/deallocate logic thread safe and stop the need for early unlocks.
Attempting to destroy a locked mutex results in undefined behavior.
So you either destroy the locked mutex, which is UB, or you unlock it first. In the second case, another thread could grab it, which would likely cause a crash
at some point.
You can make the second problem go away by doing something like queueing the actual destroy in such a way that you know for certain that no other threads are
waiting on the mutex, but that’s a fair bit of work, and tricky in its own right. We took that approach in our application, and it generally works well, and avoids a whole host of lifetime-related problems in OpenMAMA.
In the subscription destroy/deallocate logic, there are a couple of functions which seem to release the lock used to protect access to the subscription structure early – i.e. before the user callback ‘mamaSubscriptionImpl_invokeDestroyedCallback’ or ‘mamaSubscriptionImpl_deallocate’ function is invoked. Is anyone able to explain why the lock is released in this way, and not held until the end of the function?
Thanks in advance,
Mike
mamaSubscription_destroy:2848
mama_status mamaSubscription_destroy(mamaSubscription subscription)
mama_status ret = MAMA_STATUS_NULL_ARG;
mamaSubscriptionImpl *impl = (mamaSubscriptionImpl *)subscription;
mamaSubscription_deactivate(subscription);
wlock_lock(impl->mCreateDestroyLock);
/* The next action will depend on the current state of the subscription. */
switch(wInterlocked_read(&impl->mState))
/* Otherwise the subscription has been correctly removed from the throttle. */
/* Fall through to perform the remaining clean-up. */
/* For the following states the subscription is not active, simply perform clean-up. */
case MAMA_SUBSCRIPTION_SETUP:
case MAMA_SUBSCRIPTION_DEACTIVATED:
mamaSubscription_cleanup(subscription);
mamaSubscriptionImpl_setState(impl, MAMA_SUBSCRIPTION_DESTROYED);
wlock_unlock(impl->mCreateDestroyLock);
mamaSubscriptionImpl_invokeDestroyedCallback(subscription);
mamaSubscriptionImpl_onSubscriptionDestroyed:3292+3303
void MAMACALLTYPE mamaSubscriptionImpl_onSubscriptionDestroyed(mamaSubscription subscription, void *closure)
{
/* Obtain the impl from the subscription object. */
mamaSubscriptionImpl *impl = (mamaSubscriptionImpl *)subscription;
if(NULL != impl)
{
if(NULL != impl->mQueue)
mamaQueue_decrementObjectCount(&impl->mLockHandle, impl->mQueue);
/* Lock the mutex. */
wlock_lock(impl->mCreateDestroyLock);
/* The next action will depend on the current state of the subscription. */
switch(wInterlocked_read(&impl->mState))
{
/* The subscription is being deactivated. */
case MAMA_SUBSCRIPTION_DEACTIVATING:
/* Change the state. */
mamaSubscriptionImpl_setState(impl, MAMA_SUBSCRIPTION_DEACTIVATED);
break;
/* The subscription is being deallocated, i.e. mamaSubscription_deallocate has been called
* before the destroy callback has come in from the bridge.
*/
case MAMA_SUBSCRIPTION_DEALLOCATING :
mamaSubscription_cleanup(subscription);
wlock_unlock(impl->mCreateDestroyLock);
mamaSubscriptionImpl_invokeDestroyedCallback(impl);
/* Delete the subscription. */
mamaSubscriptionImpl_deallocate(impl);
return;
break;
/* The subscription is being destroyed. */
case MAMA_SUBSCRIPTION_DESTROYING :
mamaSubscription_cleanup(subscription);
mamaSubscriptionImpl_setState(impl, MAMA_SUBSCRIPTION_DESTROYED);
wlock_unlock(impl->mCreateDestroyLock);
mamaSubscriptionImpl_invokeDestroyedCallback(impl);
return;
break;
This message is confidential and subject to terms at: https://www.jpmorgan.com/emaildisclaimer including
on confidential, privileged or legal entity information, viruses and monitoring of electronic messages. If you are not the intended recipient, please delete this message and notify the sender immediately. Any unauthorized use is strictly prohibited.
This message is confidential and subject to terms at: https://www.jpmorgan.com/emaildisclaimer including
on confidential, privileged or legal entity information, viruses and monitoring of electronic messages. If you are not the intended recipient, please delete this message and notify the sender immediately. Any unauthorized use is strictly prohibited.
This message is confidential and subject to terms at:
https://www.jpmorgan.com/emaildisclaimer including on confidential, privileged or legal entity information, viruses and monitoring of electronic messages. If you are not the intended recipient, please delete this message and notify the sender immediately.
Any unauthorized use is strictly prohibited.
This message is confidential and subject to terms at: https://www.jpmorgan.com/emaildisclaimer including on confidential, privileged or legal entity information, viruses and monitoring of electronic messages. If you are not the intended recipient, please delete this message and notify the sender immediately. Any unauthorized use is strictly prohibited.
|
|
OK thanks Mike I thought user interaction was the primary problem but sounds like it's the finalizers.
This is where the subscription stuff gets hairy because there are a few different but similar areas at play here.
You should ideally unlock before the mamaSubscriptionImpl_deallocate to avoid undefined behaviour in the destroy yes. However...
In Java, when the GC kicks in, it fires off the JNI Java_com_wombat_mama_MamaSubscription_deallocate method which in turn calls mamaSubscription_deallocate which does acquire the subscription's mCreateDestroyLock
which could be (already) held onto while mamaSubscriptionImpl_deallocate is then called (!). So we actually already have undefined behaviour on that path when you look at it so that's effectively
the same bug.
The path with Java destroy goes JNI Java_com_wombat_mama_MamaSubscription_destroy which calls mamaSubscription_destroy which will usually deactivate the subscription, then go on to invoke the destroyed
callback. I was suggesting at this point, we *do* hang onto the lock until that callback is completed to protect the subscription object.
The path mamaSubscriptionImpl_onSubscriptionDestroyed is a different beast when the middleware is letting MAMA know that a subscription has been destroyed which may be via mamaSubscription_destroy
-> mamaSubscription -> deactivate. In this case, we currently unlock before the callback. I was suggesting this should be moved until after the callback, but before the deallocate. If we hung onto the lock until after the deallocate, then we'd just be emulating
the buggy behaviour already present in mamaSubscription_deallocate.
But yes when I look this through there is a more subtle issue afoot - because we have onSubscriptionDestroyed which will be called here (already inside the lock)
destroy
deactivate
lock mCreateDestroyLock
mamaSubscription_deactivate_internal
bridge_destroy
mamaSubscriptionImpl_onSubscriptionDestroyed
lock mCreateDestroyLock
unlock mCreateDestroyLock
impldeallocate()
unlock
lock
Now, since wlocks are recursive and these are from the same thread, this won't deadlock and should already be protected from the finalizer, but it also may not happen in this order depending on how
the ondestroy callback is implemented in the bridge, so you could get something more like this:
destroy
deactivate
lock mCreateDestroyLock
mamaSubscription_deactivate_internal
bridge_destroy
... bridge takes this under consideration ...
unlock mCreateDestroyLock
lock
... time passes...
mamaSubscriptionImpl_onSubscriptionDestroyed
lock mCreateDestroyLock
unlock mCreateDestroyLock
impldeallocate() <-- this is where it looks like the GC has an opportunity to cause trouble?
If we are going to have multiple threads coming in like this, then yes I think an atomic reference counter in the subscription object to track when each resource depending on the subscription object
has gained and lost interest would be preferable to holding onto the lock. Would fix the existing bug already in the finalizer too.
Is this something you're looking clarification on before implementing or you want me to have a look at implementing this?
Cheers,
Frank
toggle quoted message
Show quoted text
From: Slade, Michael J <michael.j.slade@...>
Sent: 28 April 2020 12:37
To: Frank Quinn <fquinn@...>
Cc: openmama-dev@...
Subject: RE: [Openmama-dev] Early releases of lock in mamaSubscription destroy/deallocate logic
Hi Frank,
Why do we need to unlock the mutex before the deallocate call? I understand that this call destroys the lock, but with the proposed wrapper around the lock the destroy call can defer the actual release
of memory to the unlock call.
The early release could be exploited because the MamaSubscription Java wrapper calls deallocate in its finalize method. Due to this, the GC thread could acquire the lock and release memory in mamaSubscriptionImpl_deallocate
while the lock-releasing thread is trying to use this memory to invoke the user callback.
Thanks,
Mike
From: Frank Quinn [mailto:fquinn@...]
Sent: 27 April 2020 22:04
To: Slade, Michael J (CIB Tech, GBR) <michael.j.slade@...>
Cc: openmama-dev@...
Subject: RE: [Openmama-dev] Early releases of lock in mamaSubscription destroy/deallocate logic
Hi Mike,
Could you shed some more light on what exactly the problem is here? I have had a refresher in that area, and from what I can see, I can’t think of any reason to unlock that particular mutex before the user callback so I’m not opposed to
moving until after that callback if that will resolve whatever issue you’re seeing? It will need to be before the deallocate though.
The only reason I could think of for why this was done in the first place is to avoid deadlock if the user calls something like mamaSubscription_deallocate or something else that uses that mutex from the callback, but then I guess they’d
learn pretty quickly if they had fallen foul of that. Then I thought this might be somehow exploited in one of the Java or .NET wrappers but couldn’t find any evidence of that either. Plus that callback is more informative than anything else for the application,
so if they are attempting to deallocate, and that’s causing funny business (or something like that?), we just need to make it clear in the callback documentation that they shouldn’t be doing that.
Cheers,
Frank
At first glance, that sounds like just “kicking the can down the road” — i.e., you’re still left with the problem of what to do with these wrappers such that you can tear them down in a thread-safe manner.
Having said that, if you have an implementation that works, I’m sure that others would be interested, so maybe put it up on GitHub or something?
My personal preference would be to try to find something on the intertubes that has been tested up the wazoo — concurrency is hard, and the hardest part IMO is tear-down of event sources/sinks.
Thank you for your reply Bill.
To get around the problem with attempting to destroy the mutex while it is locked, could we instead define our own wrapper around a recursive mutex which handles the following:
1. Keep a
count of how many locks have been performed by current thread.
2. Defer destroy
to unlock call if mutex currently locked.
This will keep all destroy/deallocate logic thread safe and stop the need for early unlocks.
Attempting to destroy a locked mutex results in undefined behavior.
So you either destroy the locked mutex, which is UB, or you unlock it first. In the second case, another thread could grab it, which would likely cause a crash at some
point.
You can make the second problem go away by doing something like queueing the actual destroy in such a way that you know for certain that no other threads are waiting on
the mutex, but that’s a fair bit of work, and tricky in its own right. We took that approach in our application, and it generally works well, and avoids a whole host of lifetime-related problems in OpenMAMA.
In the subscription destroy/deallocate logic, there are a couple of functions which seem to release the lock used to protect access to the subscription structure early – i.e. before the user callback ‘mamaSubscriptionImpl_invokeDestroyedCallback’ or ‘mamaSubscriptionImpl_deallocate’ function is invoked. Is anyone able to explain why the lock is released in this way, and not held until the end of the function?
Thanks in advance,
Mike
mamaSubscription_destroy:2848
mama_status mamaSubscription_destroy(mamaSubscription subscription)
mama_status ret = MAMA_STATUS_NULL_ARG;
mamaSubscriptionImpl *impl = (mamaSubscriptionImpl *)subscription;
mamaSubscription_deactivate(subscription);
wlock_lock(impl->mCreateDestroyLock);
/* The next action will depend on the current state of the subscription. */
switch(wInterlocked_read(&impl->mState))
/* Otherwise the subscription has been correctly removed from the throttle. */
/* Fall through to perform the remaining clean-up. */
/* For the following states the subscription is not active, simply perform clean-up. */
case MAMA_SUBSCRIPTION_SETUP:
case MAMA_SUBSCRIPTION_DEACTIVATED:
mamaSubscription_cleanup(subscription);
mamaSubscriptionImpl_setState(impl, MAMA_SUBSCRIPTION_DESTROYED);
wlock_unlock(impl->mCreateDestroyLock);
mamaSubscriptionImpl_invokeDestroyedCallback(subscription);
mamaSubscriptionImpl_onSubscriptionDestroyed:3292+3303
void MAMACALLTYPE mamaSubscriptionImpl_onSubscriptionDestroyed(mamaSubscription subscription, void *closure)
{
/* Obtain the impl from the subscription object. */
mamaSubscriptionImpl *impl = (mamaSubscriptionImpl *)subscription;
if(NULL != impl)
{
if(NULL != impl->mQueue)
mamaQueue_decrementObjectCount(&impl->mLockHandle, impl->mQueue);
/* Lock the mutex. */
wlock_lock(impl->mCreateDestroyLock);
/* The next action will depend on the current state of the subscription. */
switch(wInterlocked_read(&impl->mState))
{
/* The subscription is being deactivated. */
case MAMA_SUBSCRIPTION_DEACTIVATING:
/* Change the state. */
mamaSubscriptionImpl_setState(impl, MAMA_SUBSCRIPTION_DEACTIVATED);
break;
/* The subscription is being deallocated, i.e. mamaSubscription_deallocate has been called
* before the destroy callback has come in from the bridge.
*/
case MAMA_SUBSCRIPTION_DEALLOCATING :
mamaSubscription_cleanup(subscription);
wlock_unlock(impl->mCreateDestroyLock);
mamaSubscriptionImpl_invokeDestroyedCallback(impl);
/* Delete the subscription. */
mamaSubscriptionImpl_deallocate(impl);
return;
break;
/* The subscription is being destroyed. */
case MAMA_SUBSCRIPTION_DESTROYING :
mamaSubscription_cleanup(subscription);
mamaSubscriptionImpl_setState(impl, MAMA_SUBSCRIPTION_DESTROYED);
wlock_unlock(impl->mCreateDestroyLock);
mamaSubscriptionImpl_invokeDestroyedCallback(impl);
return;
break;
This message is confidential and subject to terms at: https://www.jpmorgan.com/emaildisclaimer including
on confidential, privileged or legal entity information, viruses and monitoring of electronic messages. If you are not the intended recipient, please delete this message and notify the sender immediately. Any unauthorized use is strictly prohibited.
This message is confidential and subject to terms at: https://www.jpmorgan.com/emaildisclaimer including
on confidential, privileged or legal entity information, viruses and monitoring of electronic messages. If you are not the intended recipient, please delete this message and notify the sender immediately. Any unauthorized use is strictly prohibited.
This message is confidential and subject to terms at:
https://www.jpmorgan.com/emaildisclaimer including on confidential, privileged or legal entity information, viruses and monitoring of electronic messages. If you are not the intended recipient, please delete this message and notify the sender immediately.
Any unauthorized use is strictly prohibited.
|
|
Hi Frank,
Why do we need to unlock the mutex before the deallocate call? I understand that this call destroys the lock, but with the proposed wrapper around the lock the destroy call can defer the actual release of memory
to the unlock call.
The early release could be exploited because the MamaSubscription Java wrapper calls deallocate in its finalize method. Due to this, the GC thread could acquire the lock and release memory in mamaSubscriptionImpl_deallocate
while the lock-releasing thread is trying to use this memory to invoke the user callback.
Thanks,
Mike
toggle quoted message
Show quoted text
From: Frank Quinn [mailto: fquinn@...]
Sent: 27 April 2020 22:04
To: Slade, Michael J (CIB Tech, GBR) < michael.j.slade@...>
Cc: openmama-dev@...
Subject: RE: [Openmama-dev] Early releases of lock in mamaSubscription destroy/deallocate logic
Hi Mike,
Could you shed some more light on what exactly the problem is here? I have had a refresher in that area, and from what I can see, I can’t think of any reason to unlock that particular mutex before the user callback so
I’m not opposed to moving until after that callback if that will resolve whatever issue you’re seeing? It will need to be before the deallocate though.
The only reason I could think of for why this was done in the first place is to avoid deadlock if the user calls something like mamaSubscription_deallocate or something else that uses that mutex from the callback, but
then I guess they’d learn pretty quickly if they had fallen foul of that. Then I thought this might be somehow exploited in one of the Java or .NET wrappers but couldn’t find any evidence of that either. Plus that callback is more informative than anything
else for the application, so if they are attempting to deallocate, and that’s causing funny business (or something like that?), we just need to make it clear in the callback documentation that they shouldn’t be doing that.
Cheers,
Frank
At first glance, that sounds like just “kicking the can down the road” — i.e., you’re still left with the problem of what to do with these wrappers such that you can tear them down in a thread-safe manner.
Having said that, if you have an implementation that works, I’m sure that others would be interested, so maybe put it up on GitHub or something?
My personal preference would be to try to find something on the intertubes that has been tested up the wazoo — concurrency is hard, and the hardest part IMO is tear-down of event sources/sinks.
Thank you for your reply Bill.
To get around the problem with attempting to destroy the mutex while it is locked, could we instead define our own wrapper around a recursive mutex which handles the following:
1. Keep
a count of how many locks have been performed by current thread.
2. Defer
destroy to unlock call if mutex currently locked.
This will keep all destroy/deallocate logic thread safe and stop the need for early unlocks.
Attempting to destroy a locked mutex results in undefined behavior.
So you either destroy the locked mutex, which is UB, or you unlock it first. In the second case, another thread could grab it, which would likely cause a crash
at some point.
You can make the second problem go away by doing something like queueing the actual destroy in such a way that you know for certain that no other threads are
waiting on the mutex, but that’s a fair bit of work, and tricky in its own right. We took that approach in our application, and it generally works well, and avoids a whole host of lifetime-related problems in OpenMAMA.
In the subscription destroy/deallocate logic, there are a couple of functions which seem to release the lock used to protect access to the subscription structure early – i.e. before the user callback ‘mamaSubscriptionImpl_invokeDestroyedCallback’ or ‘mamaSubscriptionImpl_deallocate’ function is invoked. Is anyone able to explain why the lock is released in this way, and not held until the end of the function?
Thanks in advance,
Mike
mamaSubscription_destroy:2848
mama_status mamaSubscription_destroy(mamaSubscription subscription)
mama_status ret = MAMA_STATUS_NULL_ARG;
mamaSubscriptionImpl *impl = (mamaSubscriptionImpl *)subscription;
mamaSubscription_deactivate(subscription);
wlock_lock(impl->mCreateDestroyLock);
/* The next action will depend on the current state of the subscription. */
switch(wInterlocked_read(&impl->mState))
/* Otherwise the subscription has been correctly removed from the throttle. */
/* Fall through to perform the remaining clean-up. */
/* For the following states the subscription is not active, simply perform clean-up. */
case MAMA_SUBSCRIPTION_SETUP:
case MAMA_SUBSCRIPTION_DEACTIVATED:
mamaSubscription_cleanup(subscription);
mamaSubscriptionImpl_setState(impl, MAMA_SUBSCRIPTION_DESTROYED);
wlock_unlock(impl->mCreateDestroyLock);
mamaSubscriptionImpl_invokeDestroyedCallback(subscription);
mamaSubscriptionImpl_onSubscriptionDestroyed:3292+3303
void MAMACALLTYPE mamaSubscriptionImpl_onSubscriptionDestroyed(mamaSubscription subscription, void *closure)
{
/* Obtain the impl from the subscription object. */
mamaSubscriptionImpl *impl = (mamaSubscriptionImpl *)subscription;
if(NULL != impl)
{
if(NULL != impl->mQueue)
mamaQueue_decrementObjectCount(&impl->mLockHandle, impl->mQueue);
/* Lock the mutex. */
wlock_lock(impl->mCreateDestroyLock);
/* The next action will depend on the current state of the subscription. */
switch(wInterlocked_read(&impl->mState))
{
/* The subscription is being deactivated. */
case MAMA_SUBSCRIPTION_DEACTIVATING:
/* Change the state. */
mamaSubscriptionImpl_setState(impl, MAMA_SUBSCRIPTION_DEACTIVATED);
break;
/* The subscription is being deallocated, i.e. mamaSubscription_deallocate has been called
* before the destroy callback has come in from the bridge.
*/
case MAMA_SUBSCRIPTION_DEALLOCATING :
mamaSubscription_cleanup(subscription);
wlock_unlock(impl->mCreateDestroyLock);
mamaSubscriptionImpl_invokeDestroyedCallback(impl);
/* Delete the subscription. */
mamaSubscriptionImpl_deallocate(impl);
return;
break;
/* The subscription is being destroyed. */
case MAMA_SUBSCRIPTION_DESTROYING :
mamaSubscription_cleanup(subscription);
mamaSubscriptionImpl_setState(impl, MAMA_SUBSCRIPTION_DESTROYED);
wlock_unlock(impl->mCreateDestroyLock);
mamaSubscriptionImpl_invokeDestroyedCallback(impl);
return;
break;
This message is confidential and subject to terms at: https://www.jpmorgan.com/emaildisclaimer including
on confidential, privileged or legal entity information, viruses and monitoring of electronic messages. If you are not the intended recipient, please delete this message and notify the sender immediately. Any unauthorized use is strictly prohibited.
This message is confidential and subject to terms at: https://www.jpmorgan.com/emaildisclaimer including
on confidential, privileged or legal entity information, viruses and monitoring of electronic messages. If you are not the intended recipient, please delete this message and notify the sender immediately. Any unauthorized use is strictly prohibited.
This message is confidential and subject to terms at: https://www.jpmorgan.com/emaildisclaimer including on confidential, privileged or legal entity information, viruses and monitoring of electronic messages. If you are not the intended recipient, please delete this message and notify the sender immediately. Any unauthorized use is strictly prohibited.
|
|
Hi Mike,
Could you shed some more light on what exactly the problem is here? I have had a refresher in that area, and from what I can see, I can’t think of any reason to unlock that particular mutex before
the user callback so I’m not opposed to moving until after that callback if that will resolve whatever issue you’re seeing? It will need to be before the deallocate though.
The only reason I could think of for why this was done in the first place is to avoid deadlock if the user calls something like mamaSubscription_deallocate or something else that uses that mutex
from the callback, but then I guess they’d learn pretty quickly if they had fallen foul of that. Then I thought this might be somehow exploited in one of the Java or .NET wrappers but couldn’t find any evidence of that either. Plus that callback is more informative
than anything else for the application, so if they are attempting to deallocate, and that’s causing funny business (or something like that?), we just need to make it clear in the callback documentation that they shouldn’t be doing that.
Cheers,
Frank
toggle quoted message
Show quoted text
From: Openmama-dev@... <Openmama-dev@...>
On Behalf Of Bill Torpey via lists.openmama.org
Sent: 05 April 2020 15:02
To: Slade, Michael J <michael.j.slade@...>
Cc: openmama-dev@...
Subject: Re: [Openmama-dev] Early releases of lock in mamaSubscription destroy/deallocate logic
At first glance, that sounds like just “kicking the can down the road” — i.e., you’re still left with the problem of what to do with these wrappers such that you can tear them down in a thread-safe manner.
Having said that, if you have an implementation that works, I’m sure that others would be interested, so maybe put it up on GitHub or something?
My personal preference would be to try to find something on the intertubes that has been tested up the wazoo — concurrency is hard, and the hardest part IMO is tear-down of event sources/sinks.
Thank you for your reply Bill.
To get around the problem with attempting to destroy the mutex while it is locked, could we instead define our own wrapper around a recursive mutex which handles the following:
1. Keep a
count of how many locks have been performed by current thread.
2. Defer destroy
to unlock call if mutex currently locked.
This will keep all destroy/deallocate logic thread safe and stop the need for early unlocks.
Attempting to destroy a locked mutex results in undefined behavior.
So you either destroy the locked mutex, which is UB, or you unlock it first. In the second case, another thread could grab it, which would likely cause a crash at some
point.
You can make the second problem go away by doing something like queueing the actual destroy in such a way that you know for certain that no other threads are waiting on
the mutex, but that’s a fair bit of work, and tricky in its own right. We took that approach in our application, and it generally works well, and avoids a whole host of lifetime-related problems in OpenMAMA.
In the subscription destroy/deallocate logic, there are a couple of functions which seem to release the lock used to protect access to the subscription structure early – i.e. before the user callback ‘mamaSubscriptionImpl_invokeDestroyedCallback’ or ‘mamaSubscriptionImpl_deallocate’ function is invoked. Is anyone able to explain why the lock is released in this way, and not held until the end of the function?
Thanks in advance,
Mike
mamaSubscription_destroy:2848
mama_status mamaSubscription_destroy(mamaSubscription subscription)
mama_status ret = MAMA_STATUS_NULL_ARG;
mamaSubscriptionImpl *impl = (mamaSubscriptionImpl *)subscription;
mamaSubscription_deactivate(subscription);
wlock_lock(impl->mCreateDestroyLock);
/* The next action will depend on the current state of the subscription. */
switch(wInterlocked_read(&impl->mState))
/* Otherwise the subscription has been correctly removed from the throttle. */
/* Fall through to perform the remaining clean-up. */
/* For the following states the subscription is not active, simply perform clean-up. */
case MAMA_SUBSCRIPTION_SETUP:
case MAMA_SUBSCRIPTION_DEACTIVATED:
mamaSubscription_cleanup(subscription);
mamaSubscriptionImpl_setState(impl, MAMA_SUBSCRIPTION_DESTROYED);
wlock_unlock(impl->mCreateDestroyLock);
mamaSubscriptionImpl_invokeDestroyedCallback(subscription);
mamaSubscriptionImpl_onSubscriptionDestroyed:3292+3303
void MAMACALLTYPE mamaSubscriptionImpl_onSubscriptionDestroyed(mamaSubscription subscription, void *closure)
{
/* Obtain the impl from the subscription object. */
mamaSubscriptionImpl *impl = (mamaSubscriptionImpl *)subscription;
if(NULL != impl)
{
if(NULL != impl->mQueue)
mamaQueue_decrementObjectCount(&impl->mLockHandle, impl->mQueue);
/* Lock the mutex. */
wlock_lock(impl->mCreateDestroyLock);
/* The next action will depend on the current state of the subscription. */
switch(wInterlocked_read(&impl->mState))
{
/* The subscription is being deactivated. */
case MAMA_SUBSCRIPTION_DEACTIVATING:
/* Change the state. */
mamaSubscriptionImpl_setState(impl, MAMA_SUBSCRIPTION_DEACTIVATED);
break;
/* The subscription is being deallocated, i.e. mamaSubscription_deallocate has been called
* before the destroy callback has come in from the bridge.
*/
case MAMA_SUBSCRIPTION_DEALLOCATING :
mamaSubscription_cleanup(subscription);
wlock_unlock(impl->mCreateDestroyLock);
mamaSubscriptionImpl_invokeDestroyedCallback(impl);
/* Delete the subscription. */
mamaSubscriptionImpl_deallocate(impl);
return;
break;
/* The subscription is being destroyed. */
case MAMA_SUBSCRIPTION_DESTROYING :
mamaSubscription_cleanup(subscription);
mamaSubscriptionImpl_setState(impl, MAMA_SUBSCRIPTION_DESTROYED);
wlock_unlock(impl->mCreateDestroyLock);
mamaSubscriptionImpl_invokeDestroyedCallback(impl);
return;
break;
This message is confidential and subject to terms at: https://www.jpmorgan.com/emaildisclaimer including
on confidential, privileged or legal entity information, viruses and monitoring of electronic messages. If you are not the intended recipient, please delete this message and notify the sender immediately. Any unauthorized use is strictly prohibited.
This message is confidential and subject to terms at: https://www.jpmorgan.com/emaildisclaimer including
on confidential, privileged or legal entity information, viruses and monitoring of electronic messages. If you are not the intended recipient, please delete this message and notify the sender immediately. Any unauthorized use is strictly prohibited.
|
|
At first glance, that sounds like just “kicking the can down the road” — i.e., you’re still left with the problem of what to do with these wrappers such that you can tear them down in a thread-safe manner.
Having said that, if you have an implementation that works, I’m sure that others would be interested, so maybe put it up on GitHub or something?
My personal preference would be to try to find something on the intertubes that has been tested up the wazoo — concurrency is hard, and the hardest part IMO is tear-down of event sources/sinks.
toggle quoted message
Show quoted text
Thank you for your reply Bill. To get around the problem with attempting to destroy the mutex while it is locked, could we instead define our own wrapper around a recursive mutex which handles the following: 1. Keep a count of how many locks have been performed by current thread. 2. Defer destroy to unlock call if mutex currently locked. This will keep all destroy/deallocate logic thread safe and stop the need for early unlocks. Mike Attempting to destroy a locked mutex results in undefined behavior.
So you either destroy the locked mutex, which is UB, or you unlock it first. In the second case, another thread could grab it, which would likely cause a crash at some point. You can make the second problem go away by doing something like queueing the actual destroy in such a way that you know for certain that no other threads are waiting on the mutex, but that’s a fair bit of work, and tricky in its own right. We took that approach in our application, and it generally works well, and avoids a whole host of lifetime-related problems in OpenMAMA.
In the subscription destroy/deallocate logic, there are a couple of functions which seem to release the lock used to protect access to the subscription structure early – i.e. before the user callback ‘mamaSubscriptionImpl_invokeDestroyedCallback’ or ‘mamaSubscriptionImpl_deallocate’ function is invoked. Is anyone able to explain why the lock is released in this way, and not held until the end of the function? Thanks in advance, Mike
mamaSubscription_destroy:2848
mama_status mamaSubscription_destroy(mamaSubscription subscription) mama_status ret = MAMA_STATUS_NULL_ARG; mamaSubscriptionImpl *impl = (mamaSubscriptionImpl *)subscription; mamaSubscription_deactivate(subscription); wlock_lock(impl->mCreateDestroyLock); /* The next action will depend on the current state of the subscription. */ switch(wInterlocked_read(&impl->mState)) /* Otherwise the subscription has been correctly removed from the throttle. */ /* Fall through to perform the remaining clean-up. */ /* For the following states the subscription is not active, simply perform clean-up. */ case MAMA_SUBSCRIPTION_SETUP: case MAMA_SUBSCRIPTION_DEACTIVATED: mamaSubscription_cleanup(subscription); mamaSubscriptionImpl_setState(impl, MAMA_SUBSCRIPTION_DESTROYED); wlock_unlock(impl->mCreateDestroyLock); mamaSubscriptionImpl_invokeDestroyedCallback(subscription); mamaSubscriptionImpl_onSubscriptionDestroyed:3292+3303 void MAMACALLTYPE mamaSubscriptionImpl_onSubscriptionDestroyed(mamaSubscription subscription, void *closure) { /* Obtain the impl from the subscription object. */ mamaSubscriptionImpl *impl = (mamaSubscriptionImpl *)subscription; if(NULL != impl) { if(NULL != impl->mQueue) mamaQueue_decrementObjectCount(&impl->mLockHandle, impl->mQueue); /* Lock the mutex. */ wlock_lock(impl->mCreateDestroyLock); /* The next action will depend on the current state of the subscription. */ switch(wInterlocked_read(&impl->mState)) { /* The subscription is being deactivated. */ case MAMA_SUBSCRIPTION_DEACTIVATING: /* Change the state. */ mamaSubscriptionImpl_setState(impl, MAMA_SUBSCRIPTION_DEACTIVATED); break; /* The subscription is being deallocated, i.e. mamaSubscription_deallocate has been called * before the destroy callback has come in from the bridge. */ case MAMA_SUBSCRIPTION_DEALLOCATING : mamaSubscription_cleanup(subscription); wlock_unlock(impl->mCreateDestroyLock); mamaSubscriptionImpl_invokeDestroyedCallback(impl); /* Delete the subscription. */ mamaSubscriptionImpl_deallocate(impl); return; break; /* The subscription is being destroyed. */ case MAMA_SUBSCRIPTION_DESTROYING : mamaSubscription_cleanup(subscription); mamaSubscriptionImpl_setState(impl, MAMA_SUBSCRIPTION_DESTROYED); wlock_unlock(impl->mCreateDestroyLock); mamaSubscriptionImpl_invokeDestroyedCallback(impl); return; break; This message is confidential and subject to terms at: https://www.jpmorgan.com/emaildisclaimer including on confidential, privileged or legal entity information, viruses and monitoring of electronic messages. If you are not the intended recipient, please delete this message and notify the sender immediately. Any unauthorized use is strictly prohibited.
This message is confidential and subject to terms at: https://www.jpmorgan.com/emaildisclaimer including on confidential, privileged or legal entity information, viruses and monitoring of electronic messages. If you are not the intended recipient, please delete this message and notify the sender immediately. Any unauthorized use is strictly prohibited.
|
|
Thank you for your reply Bill.
To get around the problem with attempting to destroy the mutex while it is locked, could we instead define our own wrapper around a recursive mutex which handles
the following:
1.
Keep a count of how many locks have been performed by current thread.
2.
Defer destroy to unlock call if mutex currently locked.
This will keep all destroy/deallocate logic thread safe and stop the need for early unlocks.
Mike
toggle quoted message
Show quoted text
From: Bill Torpey [mailto:wallstprog@...]
Sent: 01 April 2020 18:10
To: Slade, Michael J (CIB Tech, GBR) <michael.j.slade@...>
Cc: openmama-dev@...
Subject: Re: [Openmama-dev] Early releases of lock in mamaSubscription destroy/deallocate logic
As far as I can tell, there’s no perfect solution here. From https://linux.die.net/man/3/pthread_mutex_destroy
Attempting to destroy a locked mutex results in undefined behavior.
So you either destroy the locked mutex, which is UB, or you unlock it first. In the second case, another thread could grab it, which would likely cause a crash at some point.
You can make the second problem go away by doing something like queueing the actual destroy in such a way that you know for certain that no other threads are waiting on the mutex, but that’s a fair bit of work, and tricky in its own right.
We took that approach in our application, and it generally works well, and avoids a whole host of lifetime-related problems in OpenMAMA.
In the subscription destroy/deallocate logic, there are a couple of functions which seem to release the lock used to protect access to the subscription structure early – i.e. before the user callback ‘mamaSubscriptionImpl_invokeDestroyedCallback’ or ‘mamaSubscriptionImpl_deallocate’ function is invoked. Is anyone able to explain why the lock is released in this way, and not held until the end of the function?
Thanks in advance,
Mike
mamaSubscription_destroy:2848
mama_status mamaSubscription_destroy(mamaSubscription subscription)
mama_status ret = MAMA_STATUS_NULL_ARG;
mamaSubscriptionImpl *impl = (mamaSubscriptionImpl *)subscription;
mamaSubscription_deactivate(subscription);
wlock_lock(impl->mCreateDestroyLock);
/* The next action will depend on the current state of the subscription. */
switch(wInterlocked_read(&impl->mState))
/* Otherwise the subscription has been correctly removed from the throttle. */
/* Fall through to perform the remaining clean-up. */
/* For the following states the subscription is not active, simply perform clean-up. */
case MAMA_SUBSCRIPTION_SETUP:
case MAMA_SUBSCRIPTION_DEACTIVATED:
mamaSubscription_cleanup(subscription);
mamaSubscriptionImpl_setState(impl, MAMA_SUBSCRIPTION_DESTROYED);
wlock_unlock(impl->mCreateDestroyLock);
mamaSubscriptionImpl_invokeDestroyedCallback(subscription);
mamaSubscriptionImpl_onSubscriptionDestroyed:3292+3303
void MAMACALLTYPE mamaSubscriptionImpl_onSubscriptionDestroyed(mamaSubscription subscription, void *closure)
{
/* Obtain the impl from the subscription object. */
mamaSubscriptionImpl *impl = (mamaSubscriptionImpl *)subscription;
if(NULL != impl)
{
if(NULL != impl->mQueue)
mamaQueue_decrementObjectCount(&impl->mLockHandle, impl->mQueue);
/* Lock the mutex. */
wlock_lock(impl->mCreateDestroyLock);
/* The next action will depend on the current state of the subscription. */
switch(wInterlocked_read(&impl->mState))
{
/* The subscription is being deactivated. */
case MAMA_SUBSCRIPTION_DEACTIVATING:
/* Change the state. */
mamaSubscriptionImpl_setState(impl, MAMA_SUBSCRIPTION_DEACTIVATED);
break;
/* The subscription is being deallocated, i.e. mamaSubscription_deallocate has been called
* before the destroy callback has come in from the bridge.
*/
case MAMA_SUBSCRIPTION_DEALLOCATING :
mamaSubscription_cleanup(subscription);
wlock_unlock(impl->mCreateDestroyLock);
mamaSubscriptionImpl_invokeDestroyedCallback(impl);
/* Delete the subscription. */
mamaSubscriptionImpl_deallocate(impl);
return;
break;
/* The subscription is being destroyed. */
case MAMA_SUBSCRIPTION_DESTROYING :
mamaSubscription_cleanup(subscription);
mamaSubscriptionImpl_setState(impl, MAMA_SUBSCRIPTION_DESTROYED);
wlock_unlock(impl->mCreateDestroyLock);
mamaSubscriptionImpl_invokeDestroyedCallback(impl);
return;
break;
This message is confidential and subject to terms at: https://www.jpmorgan.com/emaildisclaimer including
on confidential, privileged or legal entity information, viruses and monitoring of electronic messages. If you are not the intended recipient, please delete this message and notify the sender immediately. Any unauthorized use is strictly prohibited.
This message is confidential and subject to terms at: https://www.jpmorgan.com/emaildisclaimer including on confidential, privileged or legal entity information, viruses and monitoring of electronic messages. If you are not the intended recipient, please delete this message and notify the sender immediately. Any unauthorized use is strictly prohibited.
|
|
As far as I can tell, there’s no perfect solution here. From https://linux.die.net/man/3/pthread_mutex_destroy
Attempting to destroy a locked mutex results in undefined
behavior.
So you either destroy the locked mutex, which is UB, or you unlock it first. In the second case, another thread could grab it, which would likely cause a crash at some point.
You can make the second problem go away by doing something like queueing the actual destroy in such a way that you know for certain that no other threads are waiting on the mutex, but that’s a fair bit of work, and tricky in its own right. We took that approach in our application, and it generally works well, and avoids a whole host of lifetime-related problems in OpenMAMA.
Hi OpenMAMA Devs, In the subscription destroy/deallocate logic, there are a couple of functions which seem to release the lock used to protect access to the subscription structure early – i.e. before the user callback ‘mamaSubscriptionImpl_invokeDestroyedCallback’ or ‘mamaSubscriptionImpl_deallocate’ function is invoked. Is anyone able to explain why the lock is released in this way, and not held until the end of the function? Thanks in advance, Mike
mamaSubscription_destroy:2848
mama_status mamaSubscription_destroy(mamaSubscription subscription) { /* Returns. */ mama_status ret = MAMA_STATUS_NULL_ARG; if(NULL != subscription) { /* Get the impl. */ mamaSubscriptionImpl *impl = (mamaSubscriptionImpl *)subscription; mamaSubscription_deactivate(subscription); wlock_lock(impl->mCreateDestroyLock); /* The next action will depend on the current state of the subscription. */ switch(wInterlocked_read(&impl->mState)) { /* Otherwise the subscription has been correctly removed from the throttle. */ /* Fall through to perform the remaining clean-up. */ /* For the following states the subscription is not active, simply perform clean-up. */ case MAMA_SUBSCRIPTION_SETUP: case MAMA_SUBSCRIPTION_DEACTIVATED: mamaSubscription_cleanup(subscription); mamaSubscriptionImpl_setState(impl, MAMA_SUBSCRIPTION_DESTROYED); wlock_unlock(impl->mCreateDestroyLock); mamaSubscriptionImpl_invokeDestroyedCallback(subscription); return MAMA_STATUS_OK; break; mamaSubscriptionImpl_onSubscriptionDestroyed:3292+3303 void MAMACALLTYPE mamaSubscriptionImpl_onSubscriptionDestroyed(mamaSubscription subscription, void *closure) { /* Obtain the impl from the subscription object. */ mamaSubscriptionImpl *impl = (mamaSubscriptionImpl *)subscription; if(NULL != impl) { if(NULL != impl->mQueue) mamaQueue_decrementObjectCount(&impl->mLockHandle, impl->mQueue); /* Lock the mutex. */ wlock_lock(impl->mCreateDestroyLock); /* The next action will depend on the current state of the subscription. */ switch(wInterlocked_read(&impl->mState)) { /* The subscription is being deactivated. */ case MAMA_SUBSCRIPTION_DEACTIVATING: /* Change the state. */ mamaSubscriptionImpl_setState(impl, MAMA_SUBSCRIPTION_DEACTIVATED); break; /* The subscription is being deallocated, i.e. mamaSubscription_deallocate has been called * before the destroy callback has come in from the bridge. */ case MAMA_SUBSCRIPTION_DEALLOCATING : mamaSubscription_cleanup(subscription); wlock_unlock(impl->mCreateDestroyLock); mamaSubscriptionImpl_invokeDestroyedCallback(impl); /* Delete the subscription. */ mamaSubscriptionImpl_deallocate(impl); return; break; /* The subscription is being destroyed. */ case MAMA_SUBSCRIPTION_DESTROYING : mamaSubscription_cleanup(subscription); mamaSubscriptionImpl_setState(impl, MAMA_SUBSCRIPTION_DESTROYED); wlock_unlock(impl->mCreateDestroyLock); mamaSubscriptionImpl_invokeDestroyedCallback(impl); return; break; This message is confidential and subject to terms at: https://www.jpmorgan.com/emaildisclaimer including on confidential, privileged or legal entity information, viruses and monitoring of electronic messages. If you are not the intended recipient, please delete this message and notify the sender immediately. Any unauthorized use is strictly prohibited.
|
|
Hi OpenMAMA Devs,
In the subscription destroy/deallocate logic, there are a couple of functions which seem to release the lock used to protect access to the subscription structure early – i.e. before the user callback ‘mamaSubscriptionImpl_invokeDestroyedCallback’ or ‘mamaSubscriptionImpl_deallocate’ function is invoked. Is anyone able to explain why the lock is released in this way, and not held until the end of the function?
Thanks in advance,
Mike
mamaSubscription_destroy:2848
mama_status mamaSubscription_destroy(mamaSubscription subscription)
{
/* Returns. */
mama_status ret = MAMA_STATUS_NULL_ARG;
if(NULL != subscription)
{
/* Get the impl. */
mamaSubscriptionImpl *impl = (mamaSubscriptionImpl *)subscription;
mamaSubscription_deactivate(subscription);
wlock_lock(impl->mCreateDestroyLock);
/* The next action will depend on the current state of the subscription. */
switch(wInterlocked_read(&impl->mState))
{
/* Otherwise the subscription has been correctly removed from the throttle. */
/* Fall through to perform the remaining clean-up. */
/* For the following states the subscription is not active, simply perform clean-up. */
case MAMA_SUBSCRIPTION_SETUP:
case MAMA_SUBSCRIPTION_DEACTIVATED:
mamaSubscription_cleanup(subscription);
mamaSubscriptionImpl_setState(impl, MAMA_SUBSCRIPTION_DESTROYED);
wlock_unlock(impl->mCreateDestroyLock);
mamaSubscriptionImpl_invokeDestroyedCallback(subscription);
return MAMA_STATUS_OK;
break;
mamaSubscriptionImpl_onSubscriptionDestroyed:3292+3303
void MAMACALLTYPE mamaSubscriptionImpl_onSubscriptionDestroyed(mamaSubscription subscription, void *closure)
{
/* Obtain the impl from the subscription object. */
mamaSubscriptionImpl *impl = (mamaSubscriptionImpl *)subscription;
if(NULL != impl)
{
if(NULL != impl->mQueue)
mamaQueue_decrementObjectCount(&impl->mLockHandle, impl->mQueue);
/* Lock the mutex. */
wlock_lock(impl->mCreateDestroyLock);
/* The next action will depend on the current state of the subscription. */
switch(wInterlocked_read(&impl->mState))
{
/* The subscription is being deactivated. */
case MAMA_SUBSCRIPTION_DEACTIVATING:
/* Change the state. */
mamaSubscriptionImpl_setState(impl, MAMA_SUBSCRIPTION_DEACTIVATED);
break;
/* The subscription is being deallocated, i.e. mamaSubscription_deallocate has been called
* before the destroy callback has come in from the bridge.
*/
case MAMA_SUBSCRIPTION_DEALLOCATING :
mamaSubscription_cleanup(subscription);
wlock_unlock(impl->mCreateDestroyLock);
mamaSubscriptionImpl_invokeDestroyedCallback(impl);
/* Delete the subscription. */
mamaSubscriptionImpl_deallocate(impl);
return;
break;
/* The subscription is being destroyed. */
case MAMA_SUBSCRIPTION_DESTROYING :
mamaSubscription_cleanup(subscription);
mamaSubscriptionImpl_setState(impl, MAMA_SUBSCRIPTION_DESTROYED);
wlock_unlock(impl->mCreateDestroyLock);
mamaSubscriptionImpl_invokeDestroyedCallback(impl);
return;
break;
This message is confidential and subject to terms at: https://www.jpmorgan.com/emaildisclaimer including on confidential, privileged or legal entity information, viruses and monitoring of electronic messages. If you are not the intended recipient, please delete this message and notify the sender immediately. Any unauthorized use is strictly prohibited.
|
|