[Libre-soc-dev] Some high-level information about Apple's new M1 SoCs

Fri Dec 11 23:27:33 GMT 2020

On Fri, Dec 11, 2020, 14:55 <whygee at f-cpu.org> wrote:

> On 2020-12-11 22:21, Luke Kenneth Casson Leighton wrote:
> > On 12/11/20, Jacob Lifshay <programmerjake at gmail.com> wrote:
> <...>
> > i was both fascinated and horrified to see that x86 multi issue decode
> > has to start decoding instructions from *random byte locations* in the
> > desperate hope that, half way through that, some of them will go "ah!
> > yippeee! i know the length of this instruction! all youse f*****rs can
> > stop wasting power now"
> >
> > if they don't do that, instead waiting until the 1st op is fully
> > decoded, the length of time it takes is so high that the chances of
> > doing multi-issue are absolute zero.
> <...>
>
> Just a quick note :
> early in the 90s when the Pentium arrived, Intel devised the trick of
> storing one bit per instruction byte (in the instr cache) where the
> boundary of instructions is encoded.
>
> The first execution "if instr cache miss" decodes 1 instr per cycle,
> writes back the instruction stream to cache, then further executions
> would be faster, in loops for example.
>
> I suppose this scheme has been adapted since.
>

yup, now there's a whole separate cache for holding decoded instructions,
the uop-cache. IIRC x86 cpus can run at well more than 4 IPC if the
instructions are already decoded into the uop-cache, it's only filling the
uop-cache on misses that's limited to 4 IPC.

>
> But yes, CISC *sigh* ahem.
>

if we're going to have CISC, can we please have VAX (without their annoying
call instruction :P) where everything's orthogonal? :) Plus, VAX is old
enough that SIMD has not yet ruined it all.

Jacob