sim: Add support for dynamic frequency scaling

Information
Submitter:	Christopher Torng
Repository:	gem5
Branch:	default
Bugs:
Depends On:
Reviewers
Groups:	Default
People:

Description

sim: Add support for dynamic frequency scaling

This patch provides support for DFS by having ClockedObjects register
themselves with their clock domain at construction time in a member list.
Using this list, a clock domain can update each member's tick to the
curTick() before modifying the clock period.

Testing Done

Previously, ClockedObjects would incorrectly update their internal tick after a
SrcClockDomain changed the clock period. This would lead to incorrect latency
calculations throughout the system, slowing down simulated runtime by 5-20%.

Purely to investigate this slowdown, I've added the following code, using the
inorder cpu to drive the test.

In src/sim/stat_control.cc (changed precision from 6 to 12 for more detail)::

  simSeconds
      .name("sim_seconds")
      .desc("Number of seconds simulated")
      .precision(12)
      ;

In src/sim/clocked_object.hh (added function for a clocked object to get its clock domain's pointer)::

  inline ClockDomain* getClockDomain() const
  {
      ClockDomain *c = &clockDomain;
      return c;
  }

In src/cpu/inorder/InOrderCPU.py (change stageWidth to 1 to make dumps easier to read)::

  stageWidth = Param.Unsigned(1, "Stage width")

In src/cpu/inorder/cpu.hh (create a pointer to a SrcClockDomain to control the CPU's clock period from within the CPU)::

  SrcClockDomain *clockDomain;

In src/cpu/inorder/cpu.cc (let CPU constructor set the clock domain pointer)::

  InOrderCPU::InOrderCPU(Params *params)
      : (...)
  {
      (...)
      clockDomain = (SrcClockDomain*) getClockDomain();
  }

In src/cpu/inorder/cpu.cc (after 5000 cycles, change the frequency to 1999ps)::

void
InOrderCPU::tick()
{

  (...)

  if (numCycles.value() > 5000)
  {
    clockDomain->clockPeriod(1999);
  }
}

After compiling these testing changes, I run MIPS hello world at 500 MHz
(2000 ps)::

  $ ./build/MIPS/gem5.opt configs/example/se.py -c tests/test-progs/hello/bin/mips/linux/hello --cpu-type=inorder --caches --cpu-clock=500MHz --l1d_size=16kB --l1i_size=16kB --l1d_assoc=1 --l1i_assoc=1 --cacheline_size=32 --num-cpus=1

Results:

- Without patch, simSeconds is 0.000040804594.
- With    patch, simSeconds is 0.000038657668.

The patch speeds up simulated runtime by 5.6% here. You can check the Exec
debug-flag dump to see that memory instructions complete with different
latencies with and without the patch.

Instructions in middle of simulation, without patch, the time difference is 5997 ps (3
cycles)::

  10073967: system.cpu T0 : @_int_malloc+992    : srl        r9, r21         : IntAlu :  D=0x0000000000000000
  10079964: system.cpu T0 : @_int_malloc+996    : lw         r25, -32740(r28) : MemRead :  D=0x0000000010000000 A=0x10004fac

Same instructions with patch, the time difference is 3998 ps (2 cycles)::

  10071968: system.cpu T0 : @_int_malloc+992    : srl        r9, r21         : IntAlu :  D=0x0000000000000000
  10075966: system.cpu T0 : @_int_malloc+996    : lw         r25, -32740(r28) : MemRead :  D=0x0000000010000000 A=0x10004fac

The correct L1 hit latency is 2 cycles.

For more information, you can look at src/sim/clocked_object.hh update() to
see how the tick is incorrectly updated after a clocked object wakes up.

Here's the relevant section of code in src/sim/clocked_object.hh update()::

  void update() const
  {
      (...)

      // if not, we have to recalculate the cycle and tick, we
      // perform the calculations in terms of relative cycles to
      // allow changes to the clock period in the future
      Cycles elapsedCycles(divCeil(curTick() - tick, clockPeriod()));
      cycle += elapsedCycles;
      tick += elapsedCycles * clockPeriod();
  }

This part of update() is only executed when a clocked object has been
inactive and has not updated its tick for a while. The code updates the tick
using clockPeriod() -- basically it's assuming that the clock period has not
changed since it went to sleep.

In the hello world example, the dcache goes out of sync with the system. You
can see this by adding this printout.

In src/sim/clocked_object.hh::

  inline Tick clockEdge(Cycles cycles = Cycles(0)) const
  {
      (update ...)

      std::cout << name() << " tick is ";
      std::cout << std::dec << tick << std::endl;

      (return ...)
  }

If you compile and re-simulate and go to those same two instructions we
looked at earlier...

Without patch::

  10073967: system.cpu T0 : @_int_malloc+992    : srl        r9, r21         : IntAlu :  D=0x0000000000000000
  system.cpu.dcache tick is 10075945
  system.cpu.dcache tick is 10075945
  system.cpu.dcache tick is 10075945
  system.cpu tick is 10073967
  system.cpu tick is 10075966
  system.cpu tick is 10077965
  10079964: system.cpu T0 : @_int_malloc+996    : lw         r25, -32740(r28) : MemRead :  D=0x0000000010000000 A=0x10004fac

The CPU's lw graduates at tick 10079964. Meanwhile dcache has tick 10075945.
This is a difference of 4019, which is not a multiple of the 1999 ps clock
period. This mismatch causes the L1 hit latency to increase from 2 cycles to
3 cycles.

With patch::

  10071968: system.cpu T0 : @_int_malloc+992    : srl        r9, r21         : IntAlu :  D=0x0000000000000000
  system.cpu.dcache tick is 10071968
  system.cpu.dcache tick is 10071968
  system.cpu.dcache tick is 10071968
  system.cputick is 10071968
  system.cputick is 10073967
  10075966: system.cpu T0 : @_int_malloc+996    : lw         r25, -32740(r28) : MemRead :  D=0x0000000010000000 A=0x10004fac

The CPU's lw graduates at tick 10075966. Meanwhile dcache has tick 10071968.
This is a difference of 3998, which is exactly 2 cycles. Since the cache and
CPU are together, the L1 hit latency stays at 2 cycles even after frequency
scaling.

Issue Summary

Description	From	Last Updated	Status
Spacing around the ( and ), ...Domain(ClockedObject *c)	Andreas Hansson	Dec. 16, 2013, 9:44 a.m.	Resolved
It would be good to add asserts to check that c != NULL, and c is not in member already.	Andreas Hansson	Dec. 16, 2013, 9:44 a.m.	Resolved
I'd suggest to stick with auto m = members.begin()...	Andreas Hansson	Dec. 16, 2013, 9:44 a.m.	Resolved
I'd prefer to add a public method to the clockedObject, call this updateClockPeriod, just like with the domains The updateclockPeriod ...	Andreas Hansson	Dec. 16, 2013, 10:22 a.m.	Resolved
Spacing around (this)	Andreas Hansson	Dec. 16, 2013, 9:44 a.m.	Resolved
Very minor, but as this only happens once (when creating the object), inline seems a bit of a waste.	Andreas Hansson	Dec. 16, 2013, 10:21 a.m.	Resolved
I'm still tempted to add a "updateClockPeriod" to the ClockedObject, make it public, and simply do a call to "update();" ...	Andreas Hansson	Dec. 16, 2013, 10:21 a.m.	Resolved

Change Summary:

Additional testing.

Testing Done:

		Purely to demonstrate this bug, I've added the following code, using the
~		inorder cpu.
	~	inorder cpu.

		In src/sim/stat_control.cc (changed precision from 6 to 12 for more detail)::

~		simSeconds
~		.name("sim_seconds")
~		.desc("Number of seconds simulated")
~		.precision(12)
~		;
	~	simSeconds
	~	.name("sim_seconds")
	~	.desc("Number of seconds simulated")
	~	.precision(12)
	~	;
	+

		In src/sim/clocked_object.hh (added function for a clocked object to get its clock domain's pointer)::

~		inline ClockDomain getClockDomain() const
~		{
~		ClockDomain c = &clockDomain;
~		return c;
~		}
	~	inline ClockDomain* getClockDomain() const
	~	{
	~	ClockDomain *c = &clockDomain;
	~	return c;
	~	}
	+
	+
	+	In src/cpu/inorder/InOrderCPU.py (change stageWidth to 1 to make dumps easier to read)::
	+
	+	stageWidth = Param.Unsigned(1, "Stage width")
	+

		In src/cpu/inorder/cpu.hh (create a pointer to a SrcClockDomain to control the CPU's clock period from within the CPU)::

~		SrcClockDomain *clockDomain;
	~	SrcClockDomain *clockDomain;
	+

		In src/cpu/inorder/cpu.cc (let CPU constructor set the clock domain pointer)::

~		InOrderCPU::InOrderCPU(Params params)
~		: (...)
~		{
~		(...)
~		clockDomain = (SrcClockDomain) getClockDomain();
~		}
	~	InOrderCPU::InOrderCPU(Params *params)
	~	: (...)
	~	{
	~	(...)
	~	clockDomain = (SrcClockDomain*) getClockDomain();
	~	}
	+

		In src/cpu/inorder/cpu.cc (after 5000 cycles, change the frequency to 1999ps)::

		void
~		InOrderCPU::tick()
~		{
	~	InOrderCPU::tick()
	~	{

~		(...)
	~	(...)
	+
	+	if (numCycles.value() > 5000)
	+	{
	+	clockDomain->clockPeriod(1999);
	+	}
	+

~		if (numCycles.value() > 5000)
	~	}
-		{
-		clockDomain->clockPeriod(1999);
-		}
-		}

		After compiling these testing changes, I run MIPS hello world at 500 MHz
~		(2000 ps)::
	~	(2000 ps)::

~		$ ./build/MIPS/gem5.opt configs/example/se.py -c tests/test-progs/hello/bin/mips/linux/hello --cpu-type=inorder --caches --cpu-clock=500MHz --l1d_size=16kB --l1i_size=16kB --l1d_assoc=1 --l1i_assoc=1 --cacheline_size=32 --num-cpus=1
	~	$ ./build/MIPS/gem5.opt configs/example/se.py -c tests/test-progs/hello/bin/mips/linux/hello --cpu-type=inorder --caches --cpu-clock=500MHz --l1d_size=16kB --l1i_size=16kB --l1d_assoc=1 --l1i_assoc=1 --cacheline_size=32 --num-cpus=1
	+

		Results:

~		Without patch, simSeconds is 0.000027967012.
~		With patch, simSeconds is 0.000026771610.
	~	Without patch, simSeconds is 0.000040804594.
	~	With patch, simSeconds is 0.000038657668.
	+
	+	The patch speeds up simulated runtime by 5.6% here. You can check the Exec
	+	debug-flag dump to see that memory instructions complete with different
	+	latencies with and without the patch.

~		The patch speeds up simulated runtime by 4% here. You can check the Exec
~		debug-flag dump to see that memory instructions complete with different
	~	Instructions in middle of simulation, without patch, the time difference is 5997 ps (3
	~	cycles)::
-		latencies with and without the patch.

~		Instructions in middle of simulation, without patch, the time difference is 7996 ps (4
~		cycles)::
	~	10073967: system.cpu T0 : @_int_malloc+992 : srl r9, r21 : IntAlu : D=0x0000000000000000
	~	10079964: system.cpu T0 : @_int_malloc+996 : lw r25, -32740(r28) : MemRead : D=0x0000000010000000 A=0x10004fac
	+

~		11995002: system.cpu T0 : @__libc_malloc+244 : addu r17, r2, r0 : IntAlu : D=0x000000001000c060
	~	Same instructions with patch, the time difference is 3998 ps (2 cycles)::
-		12002998: system.cpu T0 : @__libc_malloc+248 : lw r25, -31084(r28) : MemRead : D=0x0000000000000000 A=0x10005624

~		Same instructions with patch, the time difference is 5997 ps (3 cycles)::
	~	10071968: system.cpu T0 : @_int_malloc+992 : srl r9, r21 : IntAlu : D=0x0000000000000000
	+	10075966: system.cpu T0 : @_int_malloc+996 : lw r25, -32740(r28) : MemRead : D=0x0000000010000000 A=0x10004fac
	+

~		11787106: system.cpu T0 : @__libc_malloc+244 : addu r17, r2, r0 : IntAlu : D=0x000000001000c060
	~	The correct L1 hit latency is 2 cycles.
-		11793103: system.cpu T0 : @__libc_malloc+248 : lw r25, -31084(r28) : MemRead : D=0x0000000000000000 A=0x10005624

~		For more information, you can look at src/sim/clocked_object.hh update() and
~		insert debug messages to see that the tick is incorrectly updated after a
	~	For more information, you can look at src/sim/clocked_object.hh update() to
	~	see how the tick is incorrectly updated after a clocked object wakes up.
-		clocked object wakes up.

		Here's the relevant section of code in src/sim/clocked_object.hh update()::

		void update() const
		{
		(...)

		// if not, we have to recalculate the cycle and tick, we
		// perform the calculations in terms of relative cycles to
		// allow changes to the clock period in the future
		Cycles elapsedCycles(divCeil(curTick() - tick, clockPeriod()));
		cycle += elapsedCycles;
		tick += elapsedCycles * clockPeriod();
		}

	+
	+	This part of update() is only executed when a clocked object has been
	+	inactive and has not updated its tick for a while. The code updates the tick
	+	using clockPeriod() -- basically it's assuming that the clock period has not
	+	changed since it went to sleep.
	+
	+	In the hello world example, the dcache goes out of sync with the system. You
	+	can see this by adding this printout.
	+
	+	In src/sim/clocked_object.hh::
	+
	+	inline Tick clockEdge(Cycles cycles = Cycles(0)) const
	+	{
	+	(update ...)
	+
	+	std::cout << name() << " tick is ";
	+	std::cout << std::dec << tick << std::endl;
	+
	+	(return ...)
	+	}
	+
	+
	+	If you compile and re-simulate and go to those same two instructions we
	+	looked at earlier...
	+
	+	Without patch::
	+
	+	10073967: system.cpu T0 : @_int_malloc+992 : srl r9, r21 : IntAlu : D=0x0000000000000000
	+	system.cpu.dcache tick is 10075945
	+	system.cpu.dcache tick is 10075945
	+	system.cpu.dcache tick is 10075945
	+	system.cpu tick is 10073967
	+	system.cpu tick is 10075966
	+	system.cpu tick is 10077965
	+	10079964: system.cpu T0 : @_int_malloc+996 : lw r25, -32740(r28) : MemRead : D=0x0000000010000000 A=0x10004fac
	+
	+
	+	The CPU's lw graduates at tick 10079964. Meanwhile dcache has tick 10075945.
	+	This is a difference of 4019, which is not a multiple of the 1999 ps clock
	+	period. This mismatch causes the L1 hit latency to increase from 2 cycles to
	+	3 cycles.
	+
	+	With patch::
	+
	+	10071968: system.cpu T0 : @_int_malloc+992 : srl r9, r21 : IntAlu : D=0x0000000000000000
	+	system.cpu.dcache tick is 10071968
	+	system.cpu.dcache tick is 10071968
	+	system.cpu.dcache tick is 10071968
	+	system.cputick is 10071968
	+	system.cputick is 10073967
	+	10075966: system.cpu T0 : @_int_malloc+996 : lw r25, -32740(r28) : MemRead : D=0x0000000010000000 A=0x10004fac
	+
	+
	+	The CPU's lw graduates at tick 10075966. Meanwhile dcache has tick 10071968.
	+	This is a difference of 3998, which is exactly 2 cycles. Since the cache and
	+	CPU are together, the L1 hit latency stays at 2 cycles even after frequency
	+	scaling.

src/sim/clock_domain.hh (Diff revision 1)

Spacing around the ( and ),

...Domain(ClockedObject *c)

Show all issues

src/sim/clock_domain.hh (Diff revision 1)

It would be good to add asserts to check that c != NULL, and c is not in member already.

Show all issues

src/sim/clock_domain.cc (Diff revision 1)

I'd suggest to stick with auto m = members.begin()...

Show all issues

src/sim/clock_domain.cc (Diff revision 1)

I'd prefer to add a public method to the clockedObject, call this updateClockPeriod, just like with the domains

The updateclockPeriod could just call the private update.

Show all issues

src/sim/clocked_object.hh (Diff revision 1)

Spacing around (this)

Show all issues

You beat us to it :-)

Thanks for providing this functionality. I would perhaps shorten the description a bit and highlight that the patch provides support for DFS (it was simply not intended to do this before...so it was not really a bug).

Change Summary:

Shortening description.

Summary:

-	sim: Fix a bug when scaling a clock domain's frequency
+	sim: Add support for dynamic frequency scaling

Description:

~		sim: Fix a bug when scaling a clock domain's frequency
	~	sim: Add support for dynamic frequency scaling

~		A SrcClockDomain (src/sim/clock_domain.hh) can change its clock period with
~		the clockPeriod(Tick) function. However, when a SrcClockDomain changes the
~		clock period, the ClockedObjects in the clock domain may be inactive. When
~		these ClockedObjects wake up, they update their internal tick incorrectly,
	~	This patch provides support for DFS by having ClockedObjects register
	~	themselves with their clock domain at construction time in a member list.
	~	Using this list, a clock domain can update each member's tick to the
	~	curTick() before modifying the clock period.
-		leading to incorrect latency calculations throughout the system. For
-		example, a 2 cycle L1 cache hit latency can turn into 3 cycles, causing
-		significant slowdowns in simulated runtime after the frequency change (4% to
-		20% slowdowns in my experiments).
-
-		This patch fixes the bug by adding a member list to the ClockDomain --
-		ClockedObjects register themselves with their clock domain at construction
-		time and are added to the member list. Using this list, before a clock
-		domain modifies its clock period, it can update each member's tick to the
-		curTick().
-
-		Diffed from Changeset 9993

Testing Done:

~		Purely to demonstrate this bug, I've added the following code, using the
~		inorder cpu.
	~	Previously, ClockedObjects would incorrectly update their internal tick after a
	~	SrcClockDomain changed the clock period. This would lead to incorrect latency
	+	calculations throughout the system, slowing down simulated runtime by 5-20%.
	+
	+	Purely to investigate this slowdown, I've added the following code, using the
	+	inorder cpu to drive the test.

		In src/sim/stat_control.cc (changed precision from 6 to 12 for more detail)::

~		simSeconds
~		.name("sim_seconds")
~		.desc("Number of seconds simulated")
~		.precision(12)
~		;
	~	simSeconds
	~	.name("sim_seconds")
	~	.desc("Number of seconds simulated")
	~	.precision(12)
	~	;
-

		In src/sim/clocked_object.hh (added function for a clocked object to get its clock domain's pointer)::

~		inline ClockDomain* getClockDomain() const
~		{
~		ClockDomain *c = &clockDomain;
~		return c;
~		}
	~	inline ClockDomain getClockDomain() const
	~	{
	~	ClockDomain c = &clockDomain;
	~	return c;
	~	}
-

		In src/cpu/inorder/InOrderCPU.py (change stageWidth to 1 to make dumps easier to read)::

~		stageWidth = Param.Unsigned(1, "Stage width")
	~	stageWidth = Param.Unsigned(1, "Stage width")
-

		In src/cpu/inorder/cpu.hh (create a pointer to a SrcClockDomain to control the CPU's clock period from within the CPU)::

~		SrcClockDomain *clockDomain;
	~	SrcClockDomain *clockDomain;
-

		In src/cpu/inorder/cpu.cc (let CPU constructor set the clock domain pointer)::

~		InOrderCPU::InOrderCPU(Params *params)
~		: (...)
~		{
~		(...)
~		clockDomain = (SrcClockDomain*) getClockDomain();
~		}
	~	InOrderCPU::InOrderCPU(Params params)
	~	: (...)
	~	{
	~	(...)
	~	clockDomain = (SrcClockDomain) getClockDomain();
	~	}
-

		In src/cpu/inorder/cpu.cc (after 5000 cycles, change the frequency to 1999ps)::

		void
~		InOrderCPU::tick()
~		{
	~	InOrderCPU::tick()
	~	{

~		(...)
	~	(...)
-
-		if (numCycles.value() > 5000)
-		{
-		clockDomain->clockPeriod(1999);
-		}
-

~		}
	~	if (numCycles.value() > 5000)
	+	{
	+	clockDomain->clockPeriod(1999);
	+	}
	+	}

		After compiling these testing changes, I run MIPS hello world at 500 MHz
~		(2000 ps)::
	~	(2000 ps)::

~		$ ./build/MIPS/gem5.opt configs/example/se.py -c tests/test-progs/hello/bin/mips/linux/hello --cpu-type=inorder --caches --cpu-clock=500MHz --l1d_size=16kB --l1i_size=16kB --l1d_assoc=1 --l1i_assoc=1 --cacheline_size=32 --num-cpus=1
	~	$ ./build/MIPS/gem5.opt configs/example/se.py -c tests/test-progs/hello/bin/mips/linux/hello --cpu-type=inorder --caches --cpu-clock=500MHz --l1d_size=16kB --l1i_size=16kB --l1d_assoc=1 --l1i_assoc=1 --cacheline_size=32 --num-cpus=1
-

		Results:

		Without patch, simSeconds is 0.000040804594.
		With patch, simSeconds is 0.000038657668.

		The patch speeds up simulated runtime by 5.6% here. You can check the Exec
~		debug-flag dump to see that memory instructions complete with different
~		latencies with and without the patch.
	~	debug-flag dump to see that memory instructions complete with different
	~	latencies with and without the patch.

		Instructions in middle of simulation, without patch, the time difference is 5997 ps (3
~		cycles)::
	~	cycles)::

~		10073967: system.cpu T0 : @_int_malloc+992 : srl r9, r21 : IntAlu : D=0x0000000000000000
~		10079964: system.cpu T0 : @_int_malloc+996 : lw r25, -32740(r28) : MemRead : D=0x0000000010000000 A=0x10004fac
	~	10073967: system.cpu T0 : @_int_malloc+992 : srl r9, r21 : IntAlu : D=0x0000000000000000
	~	10079964: system.cpu T0 : @_int_malloc+996 : lw r25, -32740(r28) : MemRead : D=0x0000000010000000 A=0x10004fac
-

		Same instructions with patch, the time difference is 3998 ps (2 cycles)::

~		10071968: system.cpu T0 : @_int_malloc+992 : srl r9, r21 : IntAlu : D=0x0000000000000000
~		10075966: system.cpu T0 : @_int_malloc+996 : lw r25, -32740(r28) : MemRead : D=0x0000000010000000 A=0x10004fac
	~	10071968: system.cpu T0 : @_int_malloc+992 : srl r9, r21 : IntAlu : D=0x0000000000000000
	~	10075966: system.cpu T0 : @_int_malloc+996 : lw r25, -32740(r28) : MemRead : D=0x0000000010000000 A=0x10004fac
-

		The correct L1 hit latency is 2 cycles.

		For more information, you can look at src/sim/clocked_object.hh update() to
~		see how the tick is incorrectly updated after a clocked object wakes up.
	~	see how the tick is incorrectly updated after a clocked object wakes up.

		Here's the relevant section of code in src/sim/clocked_object.hh update()::

~		void update() const
~		{
~		(...)
~
~		// if not, we have to recalculate the cycle and tick, we
~		// perform the calculations in terms of relative cycles to
~		// allow changes to the clock period in the future
~		Cycles elapsedCycles(divCeil(curTick() - tick, clockPeriod()));
~		cycle += elapsedCycles;
~		tick += elapsedCycles * clockPeriod();
	~	void update() const
	~	{
	~	(...)
	~
	~	// if not, we have to recalculate the cycle and tick, we
	~	// perform the calculations in terms of relative cycles to
	~	// allow changes to the clock period in the future
	~	Cycles elapsedCycles(divCeil(curTick() - tick, clockPeriod()));
	~	cycle += elapsedCycles;
	~	tick += elapsedCycles * clockPeriod();
-		}


	+	}
	+
		This part of update() is only executed when a clocked object has been
~		inactive and has not updated its tick for a while. The code updates the tick
~		using clockPeriod() -- basically it's assuming that the clock period has not
~		changed since it went to sleep.
	~	inactive and has not updated its tick for a while. The code updates the tick
	~	using clockPeriod() -- basically it's assuming that the clock period has not
	~	changed since it went to sleep.

		In the hello world example, the dcache goes out of sync with the system. You
~		can see this by adding this printout.
	~	can see this by adding this printout.

		In src/sim/clocked_object.hh::

~		inline Tick clockEdge(Cycles cycles = Cycles(0)) const
~		{
~		(update ...)
~
~		std::cout << name() << " tick is ";
~		std::cout << std::dec << tick << std::endl;
	~	inline Tick clockEdge(Cycles cycles = Cycles(0)) const
	~	{
	~	(update ...)
	~
	~	std::cout << name() << " tick is ";
	~	std::cout << std::dec << tick << std::endl;

~		(return ...)
	~	(return ...)
-		}


	+	}
	+
		If you compile and re-simulate and go to those same two instructions we
~		looked at earlier...
	~	looked at earlier...

		Without patch::

~		10073967: system.cpu T0 : @_int_malloc+992 : srl r9, r21 : IntAlu : D=0x0000000000000000
~		system.cpu.dcache tick is 10075945
~		system.cpu.dcache tick is 10075945
~		system.cpu.dcache tick is 10075945
~		system.cpu tick is 10073967
~		system.cpu tick is 10075966
~		system.cpu tick is 10077965
~		10079964: system.cpu T0 : @_int_malloc+996 : lw r25, -32740(r28) : MemRead : D=0x0000000010000000 A=0x10004fac
	~	10073967: system.cpu T0 : @_int_malloc+992 : srl r9, r21 : IntAlu : D=0x0000000000000000
	~	system.cpu.dcache tick is 10075945
	~	system.cpu.dcache tick is 10075945
	~	system.cpu.dcache tick is 10075945
	~	system.cpu tick is 10073967
	~	system.cpu tick is 10075966
	~	system.cpu tick is 10077965
	~	10079964: system.cpu T0 : @_int_malloc+996 : lw r25, -32740(r28) : MemRead : D=0x0000000010000000 A=0x10004fac
-

		The CPU's lw graduates at tick 10079964. Meanwhile dcache has tick 10075945.
~		This is a difference of 4019, which is not a multiple of the 1999 ps clock
~		period. This mismatch causes the L1 hit latency to increase from 2 cycles to
~		3 cycles.
	~	This is a difference of 4019, which is not a multiple of the 1999 ps clock
	~	period. This mismatch causes the L1 hit latency to increase from 2 cycles to
	~	3 cycles.

		With patch::

~		10071968: system.cpu T0 : @_int_malloc+992 : srl r9, r21 : IntAlu : D=0x0000000000000000
~		system.cpu.dcache tick is 10071968
~		system.cpu.dcache tick is 10071968
~		system.cpu.dcache tick is 10071968
~		system.cputick is 10071968
~		system.cputick is 10073967
~		10075966: system.cpu T0 : @_int_malloc+996 : lw r25, -32740(r28) : MemRead : D=0x0000000010000000 A=0x10004fac
	~	10071968: system.cpu T0 : @_int_malloc+992 : srl r9, r21 : IntAlu : D=0x0000000000000000
	~	system.cpu.dcache tick is 10071968
	~	system.cpu.dcache tick is 10071968
	~	system.cpu.dcache tick is 10071968
	~	system.cputick is 10071968
	~	system.cputick is 10073967
	~	10075966: system.cpu T0 : @_int_malloc+996 : lw r25, -32740(r28) : MemRead : D=0x0000000010000000 A=0x10004fac
-

		The CPU's lw graduates at tick 10075966. Meanwhile dcache has tick 10071968.
~		This is a difference of 3998, which is exactly 2 cycles. Since the cache and
~		CPU are together, the L1 hit latency stays at 2 cycles even after frequency
~		scaling.
	~	This is a difference of 3998, which is exactly 2 cycles. Since the cache and
	~	CPU are together, the L1 hit latency stays at 2 cycles even after frequency
	~	scaling.

Change Summary:

Adding asserts and fixing coding style.

Diff:

Revision 2 (+39)

Show changes

	src/sim/clock_domain.hh
	src/sim/clock_domain.cc
	src/sim/clocked_object.hh

src/sim/clock_domain.hh (Diff revision 2)

Very minor, but as this only happens once (when creating the object), inline seems a bit of a waste.

Show all issues

src/sim/clock_domain.cc (Diff revision 2)

I'm still tempted to add a "updateClockPeriod" to the ClockedObject, make it public, and simply do a call to "update();" in the body. Right now it's a bit too implicit for my taste.

Show all issues

Change Summary:

Minor fixes and adding clocked object updateClockPeriod().

Diff:

Revision 3 (+46)

Show changes

	src/sim/clock_domain.hh
	src/sim/clock_domain.cc
	src/sim/clocked_object.hh

Thanks for the effort (and for the quick turn around)!

Ship It!

Status: Closed (submitted)

Status: Re-opened

Change Summary:

Copyright header update.

Diff:

Revision 4 (+52)

Show changes

	src/sim/clock_domain.hh
	src/sim/clock_domain.cc
	src/sim/clocked_object.hh

if someone has a second to commit this that would be great.

Nilay Vaish Dec. 28, 2013, 8:08 a.m. (Dec. 28, 2013, 8:08 a.m.)
```
Would do it over the weekend.
```

You have a pending review.

Review Board 2.0.15

This change has been marked as submitted.

Screenshots

Files

Issue Summary

Change Summary:

Testing Done:

Change Summary:

Summary:

Description:

Testing Done:

Change Summary:

Diff:

Change Summary:

Diff:

Status: Closed (submitted)

Status: Re-opened

Change Summary:

Diff:

Status: Closed (submitted)