mem: replacing bytesPerCacheLine with DRAM burstLength in SimpleDRAM
Review Request #1927 - Created June 16, 2013 and submitted
| Information | |
|---|---|
| Amin Farmahini | |
| gem5 | |
| Reviewers | |
| Default | |
This patch gets rid of bytesPerCacheLine parameter and makes the DRAM configuration separate from cache line size. Instead of bytesPerCacheLine, I define a parameter for DRAM called burst_length. The burst_length parameter shows the length of a DRAM device burst in bits. Also, I replace lines_per_rowbuffer with device_rowbuffer_size to improve code portablity. Updates: - a burst length in beats for each memory type. - an interface width for each memory type. - the memory controller model is extended to reason about "system" packets vs "dram" packets and assemble the responses properly. It means that system packets larger than a full burst are split into multiple dram packets.
Some short and fairly long tests on DDR3 and LPDRR3 with cache lines of 32, 64, and 128 bytes.
Issue Summary
41
26
8
7
-
src/mem/SimpleDRAM.py (Diff revision 1) -
Is 'burst length' a standard term for those who dabble with drams? May be the comment can be expanded to explain what is meant by a burst.
-
src/mem/SimpleDRAM.py (Diff revision 1) -
Ok, I am confused here. What is the difference between a dram page and a row buffer? It seems to me that you are assuming something more than what has been stated here as 128 * 64 != 1024.
-
src/mem/SimpleDRAM.py (Diff revision 1) -
Assuming BL stands for burst length, can you explain why do we need this to be fixed to 5ns and not make it a function of the burst length? Secondly, why is this not a function of the clock frequency?
-
src/mem/SimpleDRAM.py (Diff revision 1) -
Now I am confident that some thing is wrong either with the comment here or the one in DDR3's definition.
-
src/mem/simple_dram.cc (Diff revision 1) -
Typo in length. Actually, I was not aware that you can write comments after the if() statement without using braces.
-
src/mem/simple_dram.cc (Diff revision 1) -
Looking at how it is incremented, this was really shady.
Posted (June 17, 2013, 7:58 a.m.)
-
src/mem/SimpleDRAM.py (Diff revision 1) -
I would rather see an interface width in bits (e.g. 64) and a burst length in "beats", so 8 in this case.
-
src/mem/SimpleDRAM.py (Diff revision 1) -
Would it not make more sense to express the row buffer as a MemorySize param and thus in kb/bytes?
-
src/mem/SimpleDRAM.py (Diff revision 1) -
I think if we do indeed do this change we should have a clock and do as Nilay suggests. This also opens up the complication that the WideIO config is SDR and the others are DDR so that has to be captured somehow.
-
src/mem/simple_dram.cc (Diff revision 1) -
I think this patch should include the support for splitting cache lines into multiple DRAM bursts. At least the two patches should be committed as one.
Posted (June 19, 2013, 7:49 a.m.)
-
src/mem/SimpleDRAM.py (Diff revision 1) -
Hi Nilay, You cannot compute this from params as the cache line size is determined in the C++ code.
Review request changed
Updated (June 23, 2013, 10:53 p.m.)
Change Summary:
Updates: - a burst length in beats for each memory type. - an interface width for each memory type. - the memory controller model is extended to reason about "system" packets vs "dram" packets and assemble the responses properly. It means that system packets larger than a full burst are split into multiple dram packets.
Diff: |
Revision 2 (+355 -182) |
|---|
Review request changed
Updated (June 23, 2013, 11:01 p.m.)
Description: |
|
|---|
Posted (June 24, 2013, 11:17 a.m.)
-
src/mem/SimpleDRAM.py (Diff revision 2) -
I think we should compute tBURST here, instead of simply assigning a value.
Posted (June 25, 2013, 3:12 a.m.)
Overall this looks like a great step in the right direction. One thing that struck me is the complexity of the iteration to split the packets. Why make it so complicated? Would we not simply change decode packet to take a container as an input parameter and push back as many packets as needed? This should also help remove any assumptions on alignment. What do you think?
-
src/mem/SimpleDRAM.py (Diff revision 2) -
Do we need it per device? Why not combined?
-
src/mem/SimpleDRAM.py (Diff revision 2) -
Same as above, why per device and not per channel? Also, do you not think it is wise to use the Param.MemorySize here and specify it in bytes of kb?
-
src/mem/simple_dram.hh (Diff revision 2) -
I'd suggest to make all these const
Posted (June 27, 2013, 10:41 a.m.)
-
src/mem/SimpleDRAM.py (Diff revision 2) -
I was not referring to the memory size of the system, rather I was thinking to use the parameter type Param.MemorySize. What do you think?
Review request changed
Updated (June 29, 2013, 3:44 p.m.)
Posted (July 4, 2013, 12:55 a.m.)
-
src/mem/SimpleDRAM.py (Diff revision 3) -
I don't see how the bus width and burst size (in bits/bytes) is of any relevance here any longer. Now it is all about the burst length (in beats), is it not?
-
src/mem/SimpleDRAM.py (Diff revision 3) -
I would change the comment to say, 8 beats across a ...
-
src/mem/SimpleDRAM.py (Diff revision 3) -
Once again, the size in bytes is not as relevant here, it is the BL that matters.
-
src/mem/simple_dram.hh (Diff revision 3) -
...such that we know when the whole packet is served.
-
src/mem/simple_dram.hh (Diff revision 3) -
Perhaps name the variable pktCount here as well?
-
src/mem/simple_dram.hh (Diff revision 3) -
Same as above
-
src/mem/simple_dram.hh (Diff revision 3) -
Could you add a comment that this is called multiple times on the same packet if a system packet is larger than the burst of the memory and that the addr is used for the offset within that packet.
-
src/mem/simple_dram.hh (Diff revision 3) -
unsigned int
-
src/mem/simple_dram.hh (Diff revision 3) -
I still don't understand why this is not const
-
src/mem/simple_dram.hh (Diff revision 3) -
And this one :-)
-
src/mem/simple_dram.hh (Diff revision 3) -
readBursts?
-
src/mem/simple_dram.cc (Diff revision 3) -
I would move this to the initialisation list. Space around '/'
-
src/mem/simple_dram.cc (Diff revision 3) -
\n at the end of the DPRINTF string.
-
src/mem/simple_dram.cc (Diff revision 3) -
Move to initialisation list for these two.
-
src/mem/simple_dram.cc (Diff revision 3) -
\n at the end
-
src/mem/simple_dram.cc (Diff revision 3) -
Is the packet ptr still needed here?
-
src/mem/simple_dram.cc (Diff revision 3) -
Would this not be easy enough to fix here?
-
src/mem/simple_dram.cc (Diff revision 3) -
pktCount > 1, then new BurstHelper here?
-
src/mem/simple_dram.cc (Diff revision 3) -
Why involve the absolute address here, and not just divide getSize and burstSize?
-
src/mem/simple_dram.cc (Diff revision 3) -
wrtie
-
src/mem/simple_dram.cc (Diff revision 3) -
Once again I'm not sure of the while loop here. Why not a for look with pktCount?
-
src/mem/simple_dram.cc (Diff revision 3) -
Alignment here? Worth dealing with that as well I would imagine.
-
src/mem/simple_dram.cc (Diff revision 3) -
dram_pkt->burstHelper still points to the burstHelper.
Posted (July 4, 2013, 9:27 a.m.)
Once the last little bits are fixed I'll add this to the regressions and make sure everything passes before pushing it out.
Review request changed
Updated (July 23, 2013, 10:39 p.m.)
Some very minor comments, then it's good to go. Have you run a complete regression with this patch? I'll kick one off just in case.
-
src/mem/SimpleDRAM.py (Diff revision 4) -
Can skip the "in bytes".
-
src/mem/SimpleDRAM.py (Diff revision 4) -
DDR2 -> LPDDR2_S4
-
src/mem/simple_dram.cc (Diff revision 4) -
Perhaps a stupid question, but is this entirely safe? I guess it is...just thinking in terms of types, assumptions on powers of two etc.
-
src/mem/simple_dram.cc (Diff revision 4) -
Do you perhaps want to add the device organisation here as well, i.e. 1x32, 8x8 etc.
-
src/mem/simple_dram.cc (Diff revision 4) -
Not sure we want to count the DRAM bursts here as the stat was primarily for the arrivals. Is totGap not for the "system" packets any longer?
Review request changed
Updated (July 26, 2013, 12:18 a.m.)
Testing Done: |
|
||||||
|---|---|---|---|---|---|---|---|
Diff: |
Revision 5 (+377 -190) |
Posted (July 29, 2013, 6:56 a.m.)
I'm still seeing quite some stats differences for e.g.: build/ARM/tests/opt/quick/fs/10.linux-boot/arm/linux/realview-simple-timing I am not sure why and I'd like to understand before pushing this out. The regressions all use the DDR3 config with a 64 byte burst size, so I do not see why the behaviuor should change. Amin, any ideas?
Review request changed
Updated (Aug. 6, 2013, 11:39 p.m.)
Change Summary:
Fixes a bug in addToReadQueue(). The bug showed up when the read packet is larger than DRAM burst size AND the first read burst hits in the write queue. Subsequent read bursts did not get the correct starting addr. Sorry, My bad
Diff: |
Revision 6 (+377 -190) |
|---|
Posted (Aug. 19, 2013, 12:52 a.m.)
Could you mark this submitted Amin?
Posted (Aug. 19, 2013, 12:53 a.m.)
Could you mark this submitted Amin?
