Review Board 2.0.15


stats: Add a separate display class for vector formulas

Review Request #1678 - Created Jan. 28, 2013 and updated

Information
Andreas Hansson
gem5
default
Reviewers
Default
Changeset 9513:6e1f6c9cd4ac
---------------------------
stats: Add a separate display class for vector formulas

This patch adds a separate display class for vector formulas, which is
tailored to more closely resemble that of the original C++ based
stats. Unlike the vector class, the total for a formula needs to be
evaluated by calculating the total of the numerator and the total of
the denominator, and then dividing them. Therefore, the vector formula
display class calls the total function of the formula, rather than
evaluating the total locally.

   
Posted (Feb. 5, 2013, 8:21 a.m.)
This really confuses me.  A the idea of the Formula class was that it was a Vector, and we'd simply pretend that Formulas of length 1 were scalars.  It seems wrong to me to have two classes here and we need to fix the main class itself.
  1. I second Nate's opinion here. A formula can involve both scalars and vectors. What would you do then?
  2. The basic functionality of the interface to the classes is there same and therefore does not represent an issue. The issue lies with the total calculation for the different types of statistic. For example, to calculate the total of a vector, we sum each of the elements. The same principle applies for the total for a 2D vector, where we sum each of the "rows" and then also calculate an overall total. However, in the case of a formula it becomes more complex. Specifically, if we were to just sum the value of each element of a vector formula the result would potentially be incorrect. For example, if we are calculating a rate using a formula, then we do not want to sum all of the elements as this will not give the overall rate. Instead we want to calculate the total of the numerators and the total of the denominators before dividing them to get the overall total for the formula. This is the reason that I split the display class into two classes.
    
    The total for a vector is calculated "externally", i.e. we take the list representing the vector, and sum all of the values to get the total. This is done as part of the display class. On the other hand, the total for a formula is calculated "internally", i.e. it is calculated by the formula class itself (using abstract syntax trees) by calculating the total for the numerator and denominator separately. Aside from the calculation of the total these classes are identical, but are incompatible.
    
    An alternative solution to having separate display classes would be to add an additional layer of indirection which then provides the same interface for getting the total of the vectors and formulas, but hides the mechanics internally. Do you have any better suggestions?
    
    Nilay, regarding your question: the formulas can indeed involve both scalars and vectors. However the calculation of the formula is handled using python's eval function. Any formula calculation involving both vectors and scalars will invariably result in a vector result. However, based on the length of the vector, it is either processed directly as a scalar (hence we do not have the issues with the total calculation), or as a vector formula. Does this clarify things?
  3. Can you tell me how are you expressing the so called vector formula in your example?
    The formula, in my opinion, would look like rate = sum(v) / sum(t) where
    v and t can be vectors or scalars (does not matter at all). If you are writing rate = v / t
    where v and t can be vectors, I would say that this expression is ambiguous.
    How do you know I do not intend rate to be a vector, where rate_i = v_i / t_i?
  4. In fact, in my opinion rate = v/t should mean rate_i = v_i / t_i.
  5. You are correct that rate = v/t should mean rate_i = v_i / t_i, and that is exactly what the stats system does. For example:
    
    system.l2.ReadReq_hits::cpu.dtb.walker              9                       # number of ReadReq hits
    system.l2.ReadReq_hits::cpu.itb.walker              2                       # number of ReadReq hits
    system.l2.ReadReq_hits::cpu.inst                   49                       # number of ReadReq hits
    system.l2.ReadReq_hits::cpu.data                 1116                       # number of ReadReq hits
    system.l2.ReadReq_hits::total                    1176                       # number of ReadReq hits
    system.l2.ReadReq_misses::cpu.dtb.walker            5                       # number of ReadReq misses
    system.l2.ReadReq_misses::cpu.itb.walker            1                       # number of ReadReq misses
    system.l2.ReadReq_misses::cpu.inst                411                       # number of ReadReq misses
    system.l2.ReadReq_misses::cpu.data                111                       # number of ReadReq misses
    system.l2.ReadReq_misses::total                   528                       # number of ReadReq misses
    system.l2.ReadReq_miss_rate::cpu.dtb.walker     0.357143                       # miss rate for ReadReq accesses
    system.l2.ReadReq_miss_rate::cpu.itb.walker     0.333333                       # miss rate for ReadReq accesses
    system.l2.ReadReq_miss_rate::cpu.inst        0.893478                       # miss rate for ReadReq accesses
    system.l2.ReadReq_miss_rate::cpu.data        0.090465                       # miss rate for ReadReq accesses
    system.l2.ReadReq_miss_rate::total           0.309859                       # miss rate for ReadReq accesses
    
    In this case, hits and misses are vectors, where each element in the vector counts the hits or misses for a particular master. The total for hits or misses is simply the sum of the elements.
    
    The miss_rate is a formula which is defined as: misses / (hits + misses). This is calculated for each element of the input vectors, e.g. miss_rate::cpu.inst = misses::cpu.inst / (hits::cpu.inst + misses::cpu.inst). 
    
    For the total calculation, we do: total_miss_rate = total(misses) / total(hits + misses)
    
    Therefore, the rate is evaluated on a per-element basis and the total is not simply the sum of the individual miss rates. This is why we need to treat formulas slightly different from vectors when calculating the total.
    
  6. Sascha, the original stats package did do the totals exactly the way you suggested, by doing element wise totals at the bottom and then calculating the formula on those totals. I'm reasonably certain that my original python stats package patches did this as well.  When I put my original stats changes on reviewboard, the output of stats was identical between old and new for all regressions.  In addition, I used the python stuff extensively when I was working on my thesis.
    
    I'd *STRONGLY* prefer to see that the total() calculation be done on the formula class itself, and that we not push this to the display class.  The reason is simple. I may not use a display class at all.  I may in fact want to just call total() on the stat to feed it into some other calculation (or into a graphing package).    That was the primary reason for moving the stats to python in the first place and it'd be a shame to lose that functionality.