ruby: Fix protocol deadlock with RMW operations

Information
Submitter:	Lena Olson
Repository:	gem5
Branch:	default
Bugs:
Depends On:
Reviewers
Groups:	Default
People:

Description

Changeset 10926:97dc170a3064
---------------------------
ruby: Fix protocol deadlock with RMW operations

Previously, if there was a memory operation between the read part and the write
part of a RMW transaction, a deadlock could occur. There was logic to allow the
write part of the RMW transaction to bypass the blocked controller, but it also
allowed other operations to bypass as well. Thus, only the write parts of a RMW
operation can bypass. This patch adds a new function to the message class which
controls whether a block can bypass the blocked controller.

It is possible that an IFETCH is issued between the read and write parts of the
RMW transaction.

Testing Done

Tested with the ruby random tester as well as running some multiprogrammed workloads, including ones where this deadlock manifested. The multicore tests I have are for x86, so if those with more knowledge of other ISAs could comment that would be helpful.

Many thanks to Jason Power for help in tracking this down and fixing it.

Issue Summary

Description	From	Last Updated	Status
This is a very subtle change to already complex code, so I'm not sure I understand how it works. The ...	Joel Hestness	Aug. 14, 2015, 1:03 p.m.	Open

src/mem/slicc/ast/PeekStatementAST.py (Diff revision 1)

This is a very subtle change to already complex code, so I'm not sure I understand how it works. The patch description indicates that bad things could happen if controller activity occurred between the read and write of a RMW, but it doesn't really say what the bad things are. This code appears to only block intermediate activity when the write part of the RMW gets to the head of a controller queue.

Can you clarify what was causing the deadlock, and why this works to fix it?

Show all issues

Nilay Vaish Oct. 7, 2015, 8:28 p.m. (Oct. 7, 2015, 8:28 p.m.)

Joel, consider the following operations happen on some address A:

RMW Read Address A
Some other mem op on Address A
RMW Write Address A

Currently, on seeing a RMW Read, the sequencer asks the controller to block
any operation on address A other than those coming from the mandatory queue.
This is to prevent losing coherence permissions on address A. My guess is that
RMW operations in x86 have to always succeed, unlink LL/SC which I believe can
fail. I think the expectation here is that no operation other than RMW Write
on the same address would be issued from the manadatory queue. This implies
the sequencer expects that the processing core would not issue any operation
other than RMW Write. But Lena must have observed an instruction fetch request
to the same address between RMW Read and RMW write.

Now, I think the proposed fix is a character away from correctness. The condition should be:

 if (m_is_blocking &&
    (m_block_map.count(in_msg_ptr->m_$address_field) == 1) &&
    ((m_block_map[in_msg_ptr->m_$address_field] != &$qcode) ||
    (!in_msg_ptr->bypassBlocked()))) {

I am actually not in favor of fixing the problem the way this patch does it.
As I said above, the controller expects that the mandatory queue will not
issue any operation on the address other than RMW Write. I think we should
change the sequencer and put in a check.

Joel Hestness Oct. 7, 2015, 8:52 p.m. (Oct. 7, 2015, 8:52 p.m.)

Hi Nilay. We've discussed this patch pretty thoroughly on the gem5-gpu list (here: https://groups.google.com/forum/#!topic/gem5-gpu-dev/RQv4SxIKv7g ). Based on some testing, the patch in this review request probably doesn't work correctly, even if you change the condition as you suggest. I've posted a preliminary fix on the thread linked here, but I'm not sure that is a good way to do it either. I'd be happy to hear your take on our gem5-gpu discussion.

Nilay Vaish Oct. 7, 2015, 9:20 p.m. (Oct. 7, 2015, 9:20 p.m.)

I saw your patch. It is essentially what I had in mind when I said that the check should
go in the sequencer. But I would prefer the check is made at line 577 just before insertRequest()
is called, rather than at 177 as is being done currently in your patch. And mark isBlocked as const.

Joel Hestness Oct. 9, 2015, 8:16 a.m. (Oct. 9, 2015, 8:16 a.m.)
```
Posted here: http://reviews.gem5.org/r/3149/
```

You have a pending review.

Review Board 2.0.15

Screenshots

Files

Issue Summary