Review Board 2.0.15


ruby: slicc: Dynamically find+declare all MachineTypes

Review Request #2773 - Created May 9, 2015 and updated

Information
Joel Hestness
gem5
default
Reviewers
Default
Changeset 10832:ba670c48f3c8
---------------------------
ruby: slicc: Dynamically find+declare all MachineTypes

To avoid statically declaring MachineTypes that can be used by Ruby protocols,
modify the SLICC parser to dynamically collect the different machine types
from all the available protocols to declare them in the MachineTypes generated
enum. This change declares the full set of types, but only generates the
controller files for the specified protocol.

Built MI_example and MOESI_hammer in gem5. Also applied a personal patch
that adds MachineType-dependent code to RubyMemoryController, and
successfully built a protocol that does not define machines required by
that that code. Benchmarks run in each build, and MachineType-specific
functionality appears correct.

Posted (June 18, 2015, 2:38 p.m.)

Hi Joel,
I have some minor concerns and some major concerns about this patch..:). Let me start with the minor concerns first:

1) If this patch is applied on top of Nilay's http://reviews.gem5.org/r/2550/ patch, gem5 will not compile with this patch. Nilay's patch removes the line "self.machine_types = idents" from MachineAST.py; so we will get "AttributeError: 'MachineAST' object has no attribute 'machine_types':" error. The fix is to add machine_types back into MachineAST.
2) Nilay's http://reviews.gem5.org/r/2551/ patch seems to do something which is kind of opposite to what this patch is doing. So, we will have to discard either this patch or Nilay's patch.
3) What happens if the machine is declared like "machine({L1, L2},".. I could be wrong but I think this patch cannot take care of this situation.

Now, coming back to major concerns (in increasing order of importance):
1) Probably, going through all .sm files in the protocol_source path is not an optimal solution. The optimal solution may be going through the .sm files defined in protocol.slicc. Now, if I am working on a new protocol and has multiple incomplete .sm files under development, this patch goes through those incomplete files and may create difficulties for the developer. I think the user will have to move those files out of protocol_source path or will have to change the extension (isnt it so?) to satisfy this patch.
2) Most importantly, the developer completely loses the flexibility to define custom machine types with this patch. In multiple occasions, the same machine is instantiated multiple times in a ruby system and sometimes they are differentiated (say for profiling purposes) by writing custom functions in .sm files which returns different machine types for different instantiations. To elaborate a bit more, assume a situation in which I can use the same cache.sm file for both L1 and L2 cache. Also assume that the declaration in cache.sm is "machine(Cache,". Now, if I have to distinguish these controllers, I can have a function inside the .sm file which returns "MachineType:L2" or "MachineType:L1" depending on some variable or event or message. Such a scheme will not work with this patch because the MachineType enum will only have "Cache" and not "L1" or "L2". With Nilay's http://reviews.gem5.org/r/2551/ patch the developer can actually instantiate any number of pseudo-machine types.

Having said all this, I understand the utility of this patch but given the trade-offs, I think it is better to check in Nilay's patch and discard this patch. Please let me know your thoughts.

  1. None of these are actual problems.

    Your minor "concerns":
    1) Merging will be necessary with other patches, sure. However, http://reviews.gem5.org/r/2550/ in particular has been discarded, so this is a non-issue.
    2) This patch was submitted in lieu of http://reviews.gem5.org/r/2551/, and this is clear from the email trail (here: http://permalink.gmane.org/gmane.comp.emulators.m5.devel/26215)
    3) This patch works. I've tested it with all protocols in mainline (including MOESI_hammer).

    Your major "concerns":
    1) This patch only parses the AST of .sm files in the path and then extracts machine names. If you have a partially complete .sm file, it only needs to be parseable. Why would someone keep uncompilable .sm files in protocol directories?
    2) This situation doesn't exist in the current codebase and wasn't presented as a use case. Further, doing what you describe would likely result in buggy behavior in the .sm files, because MachineType counts could not reflect the number of L1 and L2 types, but only the number of Cache types. It seems like you could only use the L1 and L2 types outside of protocol files (though MachineTypes are currently defined to be protocol-specific). What is the purpose of the L1 and L2 types?

    I understand that you're trying to get AMD internal code working with the changes available in the public repos. However, please let's be more concrete than just "concerns", because they're either not problems or if they are, your description is too vague to be addressed. There may be ways for AMD to address problems internally or ways for this patch to fix them, but we need to know specifically what they are.

  2. Thanks for the link to the email. I was not aware of that email, so I did not know this patch was created to replace Nilay's patch (r/2551). Regarding my comment on r/2550, I was just highlighting the possibility of potential merge conflict.

    Coming back to major concerns, probably I did not do a good job in giving you some concrete example. Let me try this again..:).
    1) As I explicitly mentioned in the original comment, I was advocating collecting machinetypes only from the specified protocol and not collecting all available machine types going through all protocol files. The example I mentioned there was just one scenario which I hoped will help justify my comment. I think keeping uncompilable .sm files in the protocol path while new protocol is under development (or even some uncompilable temporarily deprecated protocol files in the same directory) is a possible scenario. No better example come to my mind right now. Fundamentally, what I want to point out here is the compilation should not be affected by issues in the protocol files which the user is not concerned at all. The compilation should only be affected by the protocol files of the specific protocol being used.
    2)Probably, I was vague in the original comment. When we combine multiple cache controllers to a single .sm file (for example, L1 and L2 controllers combined to one .sm file called machine(Cache:"")); although we have only one .sm file, we actually have more than one Machine in the system and each machine should have an enum declaration in the MachineType:Enum list. Since this patch associates number of machines and MachineType with the machine declaraitons in .sm file, the developer cannot model combined controllers with this patch. BTW, I am not talking about MOESI_Hammer protocol kind of situation which actually has two separate controllers although both of them are defined in one .sm file. I am talking about integrated controller which is actually the state machine of two (or more) machines integrated into one .sm file. With Nilay's patch (r/2551), the developer can add any number of machine types to the enum list and this enables the developer to have integrated controllers. The underlying differnce is Nilay's patch doesn't restrict MachineTypes to the declarations in the .sm files.

    We rely on these features for our internal protocols although this is not a part of current codebase. So, if you could modify this patch to collect machinetypes from only the specified protocol (my first point) and if you could give the flexibility to user to add MachineTypes that are not declared as machines in the .sm files (my second point), that will add more utility to this patch and make this patch a true replacement of Nilay's r/2551 patch.

  3. 1) I understand your concern about uncompilable code, but it seems pretty exaggerated. In order for a controller file to cause a compilation error with this patch, it needs (1) the PROTOCOL_NAME to be declared in the scons all_protocols array, (2) to have a PROTOCOL_NAME.slicc file that points to an uncompilable controller file, and (3) the controller file is not even parseable. Your deprecated protocol file example suggests that a user not only has an unparseable file, but everything else is set up to allow you to try to compile the protocol. Given that it is very simple to either remove the PROTOCOL_NAME from all_protocols or remove a patch containing the protocol altogether, it seems like this is a very minor issue.

    2) To me, the behavior you're describing indicates that the machine should be declared with the following header: machine({L1, L2}, ...), because you are communicating outside of those controllers using those types. With the current state of SLICC and the scenario you describe, the L1 and L2 machine types cannot be used by any of the SLICC controller context-sensitive features like machineCount() or machineIDToMachineType(), because there would be no machines declared with those types. Thus, you must ONLY be using the L1 and L2 types to communicate outside of Ruby (e.g. for stats collection/profiling based on machine ID). By declaring the machine as and L1/L2 controller, this patch will work just fine with your protocol. You should probably be declaring the machine as an L1/L2 controller anyway, since otherwise, you're not actually using them as MachineTypes.

    To be clear, I'm not strongly advocating that we use my proposed patch here. This was posted as a smarter way to resolve the MachineTypes headaches of other patches, while allowing AMD to use MachineTypes outside of SLICC controllers (e.g. for stats). My opinion is that gem5 should not allow MachineTypes to be exposed outside of SLICC controllers, because there are currently no known situations where it is absolutely required. For stats purposes, SLICC should be augmented to allow stats annotations within SLICC controllers, which should remove the need to communicate MachineTypes outside of controllers.

    If AMD wants to continue using MachineTypes outside of controllers (a feature which is basically deprecated) AND gem5 mainline should allow that feature (questionable), I think the solution should be to reintroduce the GenericMachineType as an interface between SLICC and non-SLICC code, rather than stretching the function of MachineType to fit that bill. (I also don't understand why it's burdensome to just maintain one's own internal patch that adds the GenericMachineType back).

  4. Hi Joel,
    Thanks for the suggestions. I think we can remove the deprecated files from the scons PROTOCOL_NAME list and avoid the prtoblem of deprecated files.

    Regarding the otehr issue of adding machine types to the enum list by the developer, we need that feature for many of our protocols. Would it be possibble to add a "supplementary enum decalaration list" to which the developer can add machinetypes whose definitions may not be there in any .sm files? I think having such a feature will solve the issue which we are having with a strictly "Dynamically find+declare all MachineTypes" approach of this patch.

  5. Ok, cool. Thanks for the clarification. I think I understand what you're requesting: There should be a way to declare MachineTypes even if they aren't used as standard MachineTypes.

    If that's correct, I feel that this patch supports MachineType declaration in a pretty elegant way. If you'd like to leave the machine declaration as you have it ("machine(Cache, ...)"), but also have L1 and L2 MachineTypes that can be passed around, you can simply declare those types as empty machines in the same .sm file. It would look something like this:

    // Declare the MachineTypes used by separate instances of the Cache machine below
    machine(L1, "Declaration of L1 MachineType") : { /* Intentionally left empty */ }
    machine(L2, "Declaration of L2 MachineType") : { /* Intentionally left empty */ }

    // Now declare the Cache machine itself
    machine(Cache, ...) :
    // Params
    {
    ... [implementation] ...
    }

    I tested that this works using a fake protocol. The L1 and L2 MachineTypes will now end up in the generated MachineTypes declaration so you can use them as you've described in this discussion. This also keeps the MachineType declarations modular; Each protocol just declares the MachineTypes that it uses, rather than trying to maintain a giant global set of MachineTypes that is a superset of machines from all protocols. One thing that may be worth noting is that unless you instantiate some of these empty machine types, machineCount(L1) and machineCount(L2) will return 0 (though it seems that your code probably already behaves like that?).

    Also worth noting is that your use case suggests that it would be useful to introduce SLICC machine inheritance, so the L1 and L2 machines could just inherit from Cache. That way, you could just instantiate them directly instead of instantiating Cache with some parameter that makes them L1 vs. L2. This patch appears to be a step in the right direction for that type of inheritance, since the machines that should inherit from Cache will now have declarations.

  6. Hi Joel,
    Thanks for the suggestions. I tried instantiating empty machines (hereafter referred as dummy machines) in the same .sm file like you suggested. But I ran into some issues. The first issue was "cannot find L1_Event.hh" and "cannot find L1_State.hh". To resolve this issue, I added some dummy states and events to the dummy machines, but then the compiler was complaining about the lack of getState() function in the dummy machine. I think the issue is these dummy machines are treated as a standard (properly implemented) machine by SLICC and the C++ compiler expects proper states, events and SLICC reserved fucntions in these dummy machines. Based on my experiemnts, I think declaring MachineTypes (that are not used as standard MachineTypes) with empty machines may not be possible. Please let me know if I missed something.

    If you also agree that "declaring empty machine" is not the way to go, could you please provide functionality in this patch to have a "supplementary enum decalaration list" for machineTypes that can be used by developers to add machineTypes without any machine definition. The other option is to use Nilay's patch r/2551 which has MachineType enumeration defined statically in RubySlicc_Exports.sm. We need to have this functionality for our protocols. Please let me know your thoughts.

  7. Hi All,

    As Sooraj indicated above, we tried to incorporate this patch but we encountered many issues. As a result, we will check in patch 2551 on Friday so that we can make forward progress. Hopefully we can find a solution to this issue that satisfies all parties in our next round of changes.

    Thank you for understanding.