[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [openrisc] Address pipeline during exception...



> Or to be more precise, CPU takes DIR instruction
> instead of fetching it from memory (fetched insn is
> ignored). CPU doesn't need to be stalled.

Well, that's one I hadn't thought about. So, to
restate the obvious, you're telling me that whenever
I execute an instruction through the DIR I actually
change the PC and the instruction that was scheduled
is ignored?

That's a pretty important detail to leave out of the
manual. That should definitely be added immediately.
I understand why you would do this from the
hardware standpoint, it just wasn't what I expected.

> You should not try to simulate OR1200 internal
> pipeline. You should simulate OR1K architecture
> instead with insn safely assumed to execute in one
> clock cycle etc. If OR1K architecture manual does
> not precisely define how something should be
> handled, don't rely on OR1200 implementation.

This is great for you, but it doesn't solve my problem.
I want a cycle to cycle accurate simulator of Or1200
so that I can port operating systems. I'm not looking
to develop a generic simulator that future Or1k
designers can build from.

I'm specifically looking for something that mirrors
your implementation in every minor detail, so that
I can write low level assembly code. For example,
I don't understand what happens in the following
instance:

l.ld     r7,(r6)
l.sys   200

Now, assume that (r6) is not in the cache, and a
cache miss occurs. At what point is the system call
actually performed? Before or after r7 has been
resolved?  There could be a 12-14 bus cycle delay
before r7 is actually valid. This could be pushing
100 CPU cycles in a fast core.

I can answer this myself  *IF* I understand how
you've implemented the pipelines.  I really need this
document. In the case above, I would assume that
r7 must be invalid, and that an attempt to use it
would stall.

This is an important question for the following reason:
As a good programmer porting an operating system,
I need to understand if I can use the r7 register in
the system call without incurring a pipeline stall. That
will affect how I write my exception handler. It will
affect how I tell gcc to allocate registers. It will have
serious consequences in many aspects of design.

Every processor manual I've worked with includes
a section on the pipeline, latencies, and execution
units used by every single instruction.

My question now becomes, OK, what is the exact
timing of instructions under other circumstances? How
many clock cycles before the execution stage does the
fetch actually occur? If I want to make sure an
instruction is prefetched in the cache, how many
instructions ahead do I have to plan? If I've got
a critical piece of code, and an instruction comming
up that I know may cause a ITLB or DTLB miss
in the MMU, how many cycles before execution
is the fetch actually performed?

Remeber, my goal is different than yours. I'm porting
an operating system. I want a tool that will help me
to do that. These other concerns about a simulator
for other possible implementations can be addressed
later.

I want to understand the pipeline of Or1200. I need
that for my next step.

> For more information how exceptions in OpenRISC
> 1000 are defined, please have a look at any modern
> RISC such as MIPS, PowerPC etc. All modern
> RISCs have virtually same exception model.

I'm looking at my Hitachi manual right now, and I can
see the exception model and pipeline model spelled
out for me in excruciating detail including all the
penalties involved, which instructions have flow
dependencies and anti-flow dependencies. What
things are done in parallel and what is delayed.
Give me any set of conditions and I can tell you down
to the cycle exactly how much time any code segment
will take.

I'm looking for something like that here as well. That's
what it will take to port an OS, and a simulator that
can help out ifor this task really needs this same
information.

Thanks,

Chris
chris@asics.ws