From lkcl at lkcl.net Sun Aug 1 01:45:04 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Sun, 1 Aug 2021 01:45:04 +0100 Subject: [Libre-soc-dev] Inverse DCT In-Reply-To: References: Message-ID: On 7/30/21, Luke Kenneth Casson Leighton wrote: > next step, putting in a yield-based inverse DCT, was successful. > > next step is to link it into instructions and write a simulator unit test done, successfully. the problem comes with the LD instruction. FFT: bitreverse with shift DCT: recursive halfswap then bitreverse iDCT: bitreverse then inverse recursive halfswap this is just too much to fit into SVP64 24 bit prefix, and the LD-byterev is actually interfering with applying REMAP. what i am thinking of doing is removing bytereversing ftom the augmented LD and just having LD-with-shift, to which 3 REMAP modes above can be applied. this makes FFT about 13 instructions rather than 11 but pffh. l. From lkcl at lkcl.net Sun Aug 1 10:59:37 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Sun, 1 Aug 2021 10:59:37 +0100 Subject: [Libre-soc-dev] [RFC] merging parallel reduction into REMAP Message-ID: https://libre-soc.org/openpower/sv/svp64/appendix/?updated#index14h1 i'm looking at the parallel reduction algorithm and note that it is remarkably similar to the REMAP schedule for DCT COS table generation. 8 4 2 1 which is exactly the kind of thing i was looking for, to make general abstractions. the first issue is, however, that it is not ok to have two separate and distinct operations. the parallel reduxtion pseudocode has two operations: 1) the operation requested 2) a MV operation the MV has to go. a trick i have been using in the simulator "yield" iterators is to create redirection lookup indices. i am reasonably confident that these can be blatted down to O(1) at gate level, however they give an idea: instead of MVing the data, use the predicate bits to sequentially "step over" the data: j = 0 for i, pbit in enumerate(predicate_bits): if pbit == 1: lookup[j] = i j += 1 then use lookup[index] in all register accessing. i will update the pseudocode with this idea, to see what it looks like. l. From lkcl at lkcl.net Sun Aug 1 14:24:16 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Sun, 1 Aug 2021 14:24:16 +0100 Subject: [Libre-soc-dev] [RFC] SVP64 on branch instructions Message-ID: https://libre-soc.org/openpower/isa/branch/ it occurs to me only just now that we completely forgot to evaluate SVP64 interaction on branches, particularly when bc involves CRs. context: i started looking at this because svstep for Vertical-First Mode requires explicit incrementing of src/dst step, thrn a loop end test, followed by a bc on CR0. this is near identical to what CTR is for. consequently, there is a case for adding a special SVP64 bc mode to check the svstep conditions instead of CTR. the other thing is, what does Vectorised bc mean? and what does predicated Vectorised bc mean? should modes be added which check *all* CR fields bring tested, or just one, or add a bit to select either? l. From programmerjake at gmail.com Sun Aug 1 18:15:41 2021 From: programmerjake at gmail.com (Jacob Lifshay) Date: Sun, 1 Aug 2021 10:15:41 -0700 Subject: [Libre-soc-dev] [RFC] SVP64 on branch instructions In-Reply-To: References: Message-ID: On Sun, Aug 1, 2021, 06:32 Luke Kenneth Casson Leighton wrote: > https://libre-soc.org/openpower/isa/branch/ > > it occurs to me only just now that we completely forgot to evaluate > SVP64 interaction on branches, particularly when bc involves CRs. > > ... > > should modes be added which check *all* CR fields bring tested, or > just one, or add a bit to select either? > GPU code will need to very often branch if all/any predicate bits are set/clear, having a branch op that covers all 4 combinations would save a bunch of instructions (>5-10% in some common cases). Jacob From lkcl at lkcl.net Sun Aug 1 18:49:31 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Sun, 1 Aug 2021 18:49:31 +0100 Subject: [Libre-soc-dev] [RFC] SVP64 on branch instructions In-Reply-To: References: Message-ID: On Sun, Aug 1, 2021 at 6:15 PM Jacob Lifshay wrote: > > On Sun, Aug 1, 2021, 06:32 Luke Kenneth Casson Leighton > wrote: > > should modes be added which check *all* CR fields bring tested, or > > just one, or add a bit to select either? > > > > GPU code will need to very often branch if all/any predicate bits are > set/clear, having a branch op that covers all 4 combinations would save a > bunch of instructions (>5-10% in some common cases). yowser, definitely worth it. the CRM mode i had in mind to do something like this (merge all CR bit-tests) but it turned out not to have enough space to do so. if it's actually part of the *branch* instruction, that's fantastic (and, logically, the right place for it) l. From lkcl at lkcl.net Mon Aug 2 00:36:57 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Mon, 2 Aug 2021 00:36:57 +0100 Subject: [Libre-soc-dev] libre-soc server cgroups Message-ID: about 10 days ago the server loadavg hit 1.7 due to soclayout being 100 megabyte in size, from multiple git commits of massive verilog autogenerated (compiled) output. a few days before that we had fastcgid crash and take the entire web backend offline (that turned out to be morons trying to access wordpress php scripts: anything involving php is now an instant fail2ban) this is precisely why i set the rule that autogenerated output should not be added to git repositories, because soclayout is now so massive it affected everyone's useability. i had since set up cgroups and allocated only 20% CPU to fastcgid. this turns out to make bugzilla dreadfully slow, so i have increased it to 40% to see how that goes. i may instead set up a separate cgroup just for the git command, such that it does not impact bugzilla. mythic-beasts hosting is extremely good, however the next level up is double the cost, i don't want to increase that unless absolutely necessary. l. From lkcl at lkcl.net Mon Aug 2 01:41:14 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Mon, 2 Aug 2021 01:41:14 +0100 Subject: [Libre-soc-dev] [RFC] SVP64 on branch instructions In-Reply-To: References:

Message-ID: https://libre-soc.org/openpower/sv/branches/ i created this page with various modes, i believe only "ALL/SOME" is needed because by inverting the BO test itself ~ALL and ~SOME are achieved. strictly speaking illegal instructions should be raised for mode combinations that make no sense however given how gate critical thus is, and how doing so would create Hazard dependencies on SVSTATE, compromising multi issue execution in the process, i am very reluctant to do that. detection of Branch SVPY4 RM Mode is quite straightforward, there is major op 18 and two minor op 19s, this is not a lot for the early decode phase. l. From programmerjake at gmail.com Mon Aug 2 08:42:27 2021 From: programmerjake at gmail.com (Jacob Lifshay) Date: Mon, 2 Aug 2021 00:42:27 -0700 Subject: [Libre-soc-dev] [RFC] SVP64 on branch instructions In-Reply-To: References:

Message-ID: On Sun, Aug 1, 2021, 17:41 Luke Kenneth Casson Leighton wrote: > https://libre-soc.org/openpower/sv/branches/ > > i created this page with various modes, i believe only "ALL/SOME" is > needed because by inverting the BO test itself ~ALL and ~SOME are > achieved. > Instead of "SOME", I'd call the mode "ANY" -- it's more specific. Alternatively we could call the modes reduce-and/reduce-or, since that's what they actually are. GPU code could benefit from having the semantics be where the SVP64 predicate (either Int or CR) tells the branch instruction which CR fields it should use, where zero bits in the SVP64 predicate cause the corresponding CR fields to be ignored. Since the ignored bits cause ~ALL and ~ANY to no longer be redundant afaict, we will want to add them back in. This will allow saving instructions in nested SIMT code like the following: i32 a, b; // globals // ... while(a > 2) { if(b < 5) f(); else g(); h(); } which compiles to something like: vec a, b; // ... pred loop_pred = a > 2; while(loop_pred.any()) { pred if_pred = loop_pred & (b < 5); if(if_pred.any()) { f(if_pred); } label1: pred else_pred = loop_pred & ~if_pred; if(else_pred.any()) { g(else_pred); } h(loop_pred); } in the else_pred part (after label1 above), we could write it like so (wrong asm syntax, but you get the point): // loop_pred could be stored in r30 or something -- out-of-the-way of f(), g(), and h() // // skip extra instructions if not(any non-ignored bit in else_pred is set), // the un-prefixed branch instruction is just: `bc ~if_pred, skip` bc reduce_mode=~ANY, svp64_pred=loop_pred, ~if_pred, skip // compute else_pred without loop_pred being forced to be in a CR, // this only works if else_pred is the same CR registers as if_pred // and it relies on all zero bits in loop_pred also being zeros in if_pred crnot else_pred, if_pred, svp64_pred=loop_pred // g(else_pred) inlined here skip: // h(loop_pred) inlined here // code for while loop... The above would take additional instructions if the semantics of br were instead defined as currently in the wiki, instead of my proposal. Jacob > From staf at fibraservi.eu Mon Aug 2 08:57:36 2021 From: staf at fibraservi.eu (Staf Verhaegen (FibraServi)) Date: Mon, 2 Aug 2021 09:57:36 +0200 Subject: [Libre-soc-dev] libre-soc server cgroups In-Reply-To: References: Message-ID: Op 2/08/2021 om 01:36 schreef Luke Kenneth Casson Leighton: > mythic-beasts hosting is extremely good, however the next level up is > double the cost, i don't want to increase that unless absolutely > necessary. Did you have a look Contabo (contabo.de) ? They are pretty cheap and I am satisfied with their hosting. greets, Staf. -- Chips want to be free. From lkcl at lkcl.net Mon Aug 2 09:12:21 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Mon, 2 Aug 2021 09:12:21 +0100 Subject: [Libre-soc-dev] libre-soc server cgroups In-Reply-To: References:

Message-ID: --- crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68 On Mon, Aug 2, 2021 at 8:57 AM Staf Verhaegen (FibraServi) wrote: > Did you have a look Contabo (contabo.de) ? > They are pretty cheap and I am satisfied with their hosting. https://contabo.com/en/vps/vps-s-ssd/?image=ubuntu.267&qty=1&contract=1 4 cores, 8 GB RAM, 200 GB SSD for EUR 6, that's pretty damn good. moving to a different VM however is quite a bit of hassle. i wonder if i can get mythic-beasts to negotiate alternative pricing. l. From lkcl at lkcl.net Mon Aug 2 09:54:21 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Mon, 2 Aug 2021 09:54:21 +0100 Subject: [Libre-soc-dev] [RFC] SVP64 on branch instructions In-Reply-To: References:

Message-ID: --- crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68 On Mon, Aug 2, 2021 at 8:42 AM Jacob Lifshay wrote: > > On Sun, Aug 1, 2021, 17:41 Luke Kenneth Casson Leighton > wrote: > > > https://libre-soc.org/openpower/sv/branches/ > > > > i created this page with various modes, i believe only "ALL/SOME" is > > needed because by inverting the BO test itself ~ALL and ~SOME are > > achieved. > > > > Instead of "SOME", I'd call the mode "ANY" -- it's more specific. > Alternatively we could call the modes reduce-and/reduce-or, since that's > what they actually are. the bit is named "ALL" to indicate "All tests must pass". > GPU code could benefit from having the semantics be where the SVP64 > predicate (either Int or CR) tells the branch instruction which CR fields > it should use, yes, that's a given. > where zero bits in the SVP64 predicate cause the > corresponding CR fields to be ignored. that's part of SVP64 default behaviour: those tests would simply be skipped. i have however just realised that zeroing mode is completely meaningless, including the SNZ bit... ah no it isn't, because it can be set to deliberately fail at the first zero point. that can be used to very deliberately truncate VL to the exact point where the first zero point occurs in the predicate mask.... argh can't do that, we've run out of bits. nuts. oh wait... VLI is to truncate to VL rather than VL-1. so it's not so bad. > Since the ignored bits cause ~ALL > and ~ANY to no longer be redundant afaict, can you re-read, about sz and SNZ, to take those into consideration? > The above would take additional instructions if the semantics of br were > instead defined as currently in the wiki, instead of my proposal. i'm not totally following, i'm still absorbing the concept of what you're describing, however a couple of things: 1) changing the behaviour and semantics of SVP64 predicate masks just for SVP64 isn't ok. fitting with how SVP64 predicate masks work for all other options is how it has to go at this point 2) you may not have understood about sv and SNZ, or, if i am reading correctly what you wrote, you may have misunderstood predicate masks and how they're applied (or, not). can you please re-evaluate / re-word, taking into account sz and SNZ, which can be used to insert (effectively) *either* an immediate of zeros or an immediate of 1s in place of masked-out CR bits being tested? l. From lkcl at lkcl.net Mon Aug 2 10:14:07 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Mon, 2 Aug 2021 10:14:07 +0100 Subject: [Libre-soc-dev] [RFC] SVP64 on branch instructions In-Reply-To: References:

Message-ID: On 8/2/21, Jacob Lifshay wrote: > GPU code could benefit from having the semantics be where the SVP64 > predicate (either Int or CR) tells the branch instruction which CR fields > it should use, where zero bits in the SVP64 predicate cause the > corresponding CR fields to be ignored. Since the ignored bits cause ~ALL > and ~ANY to no longer be redundant afaict, we will want to add them back > in. remember that there is both ~R30 (and ~R10) as well as ~CRbit predicate testing, as well as being able to invert the BO bit test as well. i would be very surprised if, in combination with sz+SNZ, all possible options were not covered. l. From lkcl at lkcl.net Mon Aug 2 21:42:22 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Mon, 2 Aug 2021 21:42:22 +0100 Subject: [Libre-soc-dev] Inverse DCT In-Reply-To: References: Message-ID: LD all sorted, scaled it back to LD-with-shift, and if pushed we could do without LDsh entirely. the DCT/iDCT REMAP schedule now does bitrev-with-halfswap itself, applying that to the offset. i cannot say i am happy about losing LD-bitrev because FFT was very short, with it. iDCT unit test with LD, inner and outer butterfly works great. next is to write a program that creates SVG files to put into docs and slides. l. From lkcl at lkcl.net Tue Aug 3 03:36:38 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Tue, 3 Aug 2021 03:36:38 +0100 Subject: [Libre-soc-dev] [RFC] SVP64 on branch instructions In-Reply-To: References:

Message-ID: i wrote out the pseudocode, and there are some fascinating side-effect / possible uses, including interaction with CTR, using the predicate mask with unconditional tests (BO1 set), loads more. l. From lkcl at lkcl.net Tue Aug 3 13:08:31 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Tue, 3 Aug 2021 13:08:31 +0100 Subject: [Libre-soc-dev] [RFC] SVP64 on branch instructions In-Reply-To: References:

Message-ID: On 8/2/21, Jacob Lifshay wrote: > On Sun, Aug 1, 2021, 17:41 Luke Kenneth Casson Leighton > wrote: > >> https://libre-soc.org/openpower/sv/branches/ >> >> i created this page with various modes, i believe only "ALL/SOME" is >> needed because by inverting the BO test itself ~ALL and ~SOME are >> achieved. >> > > Instead of "SOME", I'd call the mode "ANY" -- it's more specific. > Alternatively we could call the modes reduce-and/reduce-or, since that's > what they actually are. > > GPU code could benefit from having the semantics be where the SVP64 > predicate (either Int or CR) tells the branch instruction which CR fields > it should use, where zero bits in the SVP64 predicate cause the > corresponding CR fields to be ignored. Since the ignored bits cause ~ALL > and ~ANY to no longer be redundant afaict, we will want to add them back > in. This will allow saving instructions in nested SIMT code like the > following: > i32 a, b; // globals > // ... > while(a > 2) { > if(b < 5) > f(); > else > g(); > h(); > } > which compiles to something like: > vec a, b; > // ... > pred loop_pred = a > 2; > while(loop_pred.any()) { > pred if_pred = loop_pred & (b < 5); > if(if_pred.any()) { > f(if_pred); > } > label1: > pred else_pred = loop_pred & ~if_pred; > if(else_pred.any()) { > g(else_pred); > } > h(loop_pred); > } > > in the else_pred part (after label1 above), we could write it like so > (wrong asm syntax, but you get the point): > // loop_pred could be stored in r30 or something -- out-of-the-way of f(), > g(), and h() > // > // skip extra instructions if not(any non-ignored bit in else_pred is set), > // the un-prefixed branch instruction is just: `bc ~if_pred, skip` > bc reduce_mode=~ANY, svp64_pred=loop_pred, ~if_pred, skip > // compute else_pred without loop_pred being forced to be in a CR, > // this only works if else_pred is the same CR registers as if_pred > // and it relies on all zero bits in loop_pred also being zeros in if_pred > crnot else_pred, if_pred, svp64_pred=loop_pred > // g(else_pred) inlined here > skip: > // h(loop_pred) inlined here > // code for while loop... > > The above would take additional instructions if the semantics of br were > instead defined as currently in the wiki, instead of my proposal. > > Jacob > >> > _______________________________________________ > Libre-soc-dev mailing list > Libre-soc-dev at lists.libre-soc.org > http://lists.libre-soc.org/mailman/listinfo/libre-soc-dev > From lkcl at lkcl.net Tue Aug 3 15:51:55 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Tue, 3 Aug 2021 15:51:55 +0100 Subject: [Libre-soc-dev] [RFC] SVP64 on branch instructions In-Reply-To: References:

Message-ID: drat. fricking gmail HTML Basic mode is barely useable. hit send instead of save. grrr. ok. https://libre-soc.org/openpower/sv/branches/?updated i've added the example and created SVP64 hypothetical assembler. i very deliberately placed the calculation of ANDing the predicate with the CR just before each call to f() and g(). the CR Vector *BEFORE* bring transferred to r30 is used, there, because it is a pain to cross-interact integers with Vector CRs. one of the tests (the else.any) is deliberately inverted: mask=~r30 this is to illustrate how and what SNZ immediate field is for. ANDing of all tests is still done, but instead of sz (source zero in masked out bits) a **ONE** is put in the place of the CR Field element, **NOT** a zero. this causes the ANDing to effectively IGNORE masked-out bits but still keep decrementing CTR (if the relevant CTR bit is set). thus, CTR branch conditional mode *still operates correctly* counting down the total number of elements in an array, even when sone elements of that array should be masked out. where you do not want that behaviour, instead wanting CTR to count down ONLY mask-selected elements, you would not use sz. this woukd skip both the element test *and* skip CTR decrementing. what _would_ be nice is if bc were to update the CR field based on the mask. however to be absolutely honest i think this is too much, and it needs to be optional, and unfortunately we are out of bits. l. From lkcl at lkcl.net Tue Aug 3 18:32:29 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Tue, 3 Aug 2021 18:32:29 +0100 Subject: [Libre-soc-dev] [llvm-dev] [RFC] Vector/SIMD ISA Context Abstraction In-Reply-To: References:

Message-ID: (renato thank you for cc'ing, due to digest subscription at the moment) On Tue, Aug 3, 2021 at 3:25 PM Renato Golin wrote: > > On Sat, 31 Jul 2021 at 00:33, Luke Kenneth Casson Leighton via llvm-dev wrote: >> >> if however instead of an NxM problem this was turned into N+M, >> separating out "scalar base" from "augmentation" throughout the IR, >> the problem disappears entirely. > > > Hi Luke, > > It's not entirely clear to me what you are suggesting here. it's a nebulous but fundamentally low-level concept, that may take some time to sink in, and also appreciate the significance. some background: over the past 3+ years i have made a comprehensive comparative study of ISAs (both SIMD and Vector), the latter only being revived recently thanks to RVV bringing Cray-style Vectors back into the forefront of computing research. here is a quick summary of that: * Packed SIMD. the worst kind of ISA. the following article puts it diplomatically: https://www.sigarch.org/simd-instructions-considered-harmful/ i have no such compunction or affiiation, and as an engineer can speak freely and plainly. gloves off, statement of fact: Packed SIMD is the worst thing to become ubiquitous in computer science, bar none. frankly, the sooner that Packed SIMD is shot and buried as an embarrassing and incredibly expensive historical footnote in the history of computing, the better [there's always exceptions: for e.g. embedded DSP Audio, Packed SIMD is perfect]. * Predicated SIMD: by total contrast, this is actually not half bad. SVE2, AVX-512, GPU ISAs, anything that takes masks per element, the masks completely eliminate Packed SIMD Hell (except for ISAs that still have SIMD alignment for memory LD/STs even for Predicated LD/STs, but hey, nothing's perfect) * Horizontal-First Vectors. known best from the Cray-i, also in modern ISAs such as NEC's SX-Aurora and now RVV, horizontal vectors are of the form "for i in range(VL) operation(vec_src_reg[i], vec_dest_regs[i])" * Vertical-First Vectors. these are *NOT* very well-known, and the only two ISAs that i know of (to date) are SVP64 and Mitch Alsup's MyISA 66000. btw, here's the [terse] page on SVP64's format: https://libre-soc.org/openpower/sv/svp64/ the overview is much more useful for understanding: https://libre-soc.org/openpower/sv/overview/ additional relevant ISAs: * Broadcom VideoCore IV which has a Vector-form of "REP", where the operation may be repeated 2, 4, 8, 16, 32 or "the count taken from scalar r0". * the Mill Architecture, which is a "tagged" ISA. here, there is no "ADD32", "ADD16", "ADD64", "ADD8", there is *ONLY* "ADD". the width of the operation is taken *FROM THE LOAD OPERATION* which is the ONLY place where the register operand width is specified. operations are DEDUCED, statically, at compile-time, based on the "tags" associated with source registers as they cascade through. two strategically important instructions are included: WIDEN and NARROW. the significance of mentioning the Mill is that the ISA has a closer match to the simple (basic) LLVM intrinsics than most other ISAs. however even there (unless they've done drastic and extensive changes) they will be limited to trying to fit a flexible (tagged) ISA into an inflexible IR that was never designed with "context" in mind. > For context: > * Historically, we have tried to keep as many instructions as native IR as possible to avoid the explosion of intrinsics, as you describe. a crucially important goal that gets a big thumbs-up from me. > * However, traditionally, intrinsics reduce the number of instructions in a basic block instead of increasing them, so there's always the balance. where the opposite of that is that the CISC-ness of a given new intrinsic itself could impact ISAs that don't support that feature natively, making it necessary for them to emit rather more assembly instructions than it first appears. > * For example, some reduction intrinsics were added to address bloat, but no target is forced to use them. excellent. iteration and reduction (including fixed schedule paralleliseable reduction) is one of the intrinsics being added to SVP64. it's good to hear that that, as a concept, has been added. if i may, i will use that as an example, later. > * If you can represent the operation as a series of native IR instructions, by all means, you should do so. this assumes (perfectly reasonably, mind you) that the (hypothetical) ISA itself is not capable of expressing a given operation *in* IR, and consequently has to be done as a series of passes, substituting for a lack of native (direct) support of a given operation with some (faster?) operations that *do* (ultimately) exist as actual assembler. in some architectures a particular native IR instruction might actually exist, but the native assembler variant is so horribly slow at the hardware level that alternatives are actually *demanded* by users. AVX's native Horizontal Reduction instructions would be a good example. > I get it that a lot of intrinsics are repeated patterns over all variations and that most targets don't have that many, so it's "ok". > > I also get it that most SIMD vector operations aren't intrinsically vector, [indeed. i have spent considerable time recently on the wikipedia Vector_processor page, and associated nearby related pages (SIMD, GPUs, etc), correcting that unfortunate meme that "SIMD equals vectors". this is unfortunately where Corporate Marketing has badly interfered with actual Computer Science. sigh.] > but expansions of scalar operations for the benefit of vectorisation >(plus predication, to avoid undefined behaviour and to allow "funny" patterns, etc). yes. and this perspective is where Mitch Alsup's MyISA 66000, and SVP64's "Vertical-First" Mode come into play: the instructions in both are effectively executed *in scalar form ONLY* (as far as the Program Order is concerned), and, at the end of a loop/branch, you *EXPLICITLY* increment the element index, such that all *SCALAR* operations in the loop now execute on *scalar* element one. end of loop, explicit increment element index to 2, loop back and execute on element *two* of the Vector Register. repeat until loop-termination condition. the challenge ahead for Libre-SOC (and for MyISA 66000) will be to introduce this entirely new concept to compilers. however given that it's effectively scalar rather than Vector, the task should actually be a *lot* easier than it is for Horizontal-First ISAs such as SVE and RVV. > But it's not clear to me what the "augmentation" part would be in other targets. the proposal is - at its heart - to replace all IR of the form: llvm.masked.load.v16f32.predicatespec(arguments) llvm.masked.load.v2f64.predicatespec(arguments) and so on with just: llvm.load(mask=x, arguments). where there *is* no llvm.masked.load, there is *only* an optional argument that *at runtime* (or, compile-time more like, i.e. when running llvm) rather than expands out explicitly / statically to dozens of special IR intrinsics, *there is only one*: llvm.load. additional optional arguments also then specify whether this operation is twin-predicated by having a *second* predicate mask (yes, SVP64 can apply one predicate mask to the source, and another to the destination. conceptually this is equivalent to back-to-back VGATHER-VSCATTER). additional optional arguments also then specify whether there is SWIZZLE applied, or Sub-Vectors, or any other types of "augmentation". now, here's the kicker: what we need to support SVP64 is for *all llvm basic intrinsics to support all possible optional augmentations of all possible types*. yes, really, that's not a typo or a mis-statement. we *genuinely* need a sign-extended twin-predicated intrinsic: llvm.sext(source_mask=source_pred, dest_mask=dest_pred, source_argument) >> even permute / shuffle Vector/SIMD operations are separateable into >> "base" and "abstract Vector Concept": the "base" operation in that >> case being "MV.X" (scalar register copy, indexable - reg[RT] = >> reg[reg[RA]] and immediate variant reg[RT] = reg[RA+imm]) > > > Shuffles are already represented as IR instructions (insert/extract vector), so I'm not sure this clarifies much. ok, so is it possible to do shuffle-sign-extend, shuffle-fptrunc, shuffle-fabs, shuffle-sqrt, shuffle-log, and any other single-src single-dest operation? this is where the "augmentation" - the separation of PREFIX-SUFFIX comes into play. SVP64 has the ability to set up "SWIZZLE" contexts as well as certain kinds of "REMAP" Schedules (triple-loop butterfly schedules) - PREFIXes - that can be *applied* to base operations (SUFFIXes), which, if we were to expand all those possibilities out would literally create several MILLION intrinsics. > Have you looked at the current scalable vector implementation? briefly, yes. i also helped with some review insights when RVV was being added, although that was a brief glimpse into a massive world where i was (and still am) constrained, unfortunately, by time and resources, much as i would love that to be otherwies. > It allows a set of operations on open-ended vectors that are controlled by a predicate, which is possibly the "augmentation" that you're looking for? no. what is happening there is that it is a reflection of the limitations of the current ISAs. i can say with 100% certainty that the SVE implementation will not have been designed to take SVP64 into consideration. the reason is actually very simple and straightforward: at the time LLVM SVE was added, SVP64 did not even exist. so for example, let us take the new feature added in LLVM SVE: reduction. most Vector ISAs add *explicit* reduction operations. NEC SX-Aurora for example has reduce-add, reduce-multiply, reduce-OR, reduce-AND, reduce-XOR, and that's about it. SVP64 has: * reduction as a fixed (paralleliseable) schedule * base operation. you can LITERALLY apply reduction to.... to... "llvm.maximum" scalar operation, or to... divide or subtract (or other non-commutative operation) if you really really want to, and the ISA will go, "ok, done. next?". sv.fmax/MR FRT.v, FRA.v, FRB.v # MR means "map-reduce mode" you can apply parallel reduction to Power ISA v3.0 Condition Register operations, "crand" or "cror" or "crnor". sv.crand/MR BT.v, BA.v, BC.v you can even apply parallel reduction to single-argument instructions if you really, really want to: we're not going to stop that from happening, because somebody might find it useful given the fact that the parallel-reduction is on a fixed Power-2-halving Schedule that could have practical uses, and the hardware is *required* to write out all intermediate values into a *vector* result. you can even apply parallel reduction Schedules to triple-argument instructions (FMA), however there it gets tricky and complicated (and i haven't thought it through, fully, what it actually means, i.e. whether it's useful). certainly if the MUL register argument is considered scalar and the others Vector, that is actually useful (performs repeated cumulative multiply as part of the Schedule). does this help illustrate what i mean by "augmentation"? there is a "base" (scalar) operation, you "augment" it, and it *becomes* SIMD-like, *becomes* Vector-like, *becomes* predicated, *becomes* Swizzled, *becomes* reduced. the development of LLVM SVE would not have taken this possibility into account, because, put simply, it is plain common sense in something as complex as LLVM not to waste time writing code for something that does not have a real-world use-case. >> the issue is that this is a massive intrusive change, effectively a >> low-level redesign of LLVM IR internals for every single back-end. > > > Not necessarily. this would be fantastic (and a huge relief) if it can be so arranged. one of my biggest concerns is that what i am advocating is at such a fundamental level that it could, if done incorrectly, be extremely disruptive. however even now as i think about it on-the-fly, if the proposal is as simple as adding c++-like "optional named arguments" to (all) base scalar LLVM intrinsics, then, i think that would work extremely well. it would have zero impact on other ISAs, which is a huge plus. > For example, scalable vectors are being introduced in a way that non-scalable back-ends (mostly) won't notice. > And it's not just adding a few intrinsics, the very concept of vectors was changed. > There could be a (set of) construct(s) for your particular back-end that is invisible to others. the problem is that all other Vector ISAs have constrained themselves to 32 bit (or, for GPU ISAs, often 48 or 64). they *explicitly* add *explicit* O(N) opcodes. RVV adds 192 *explicit* opcodes, embedded into a MAJOR 32-bit opcode specifically dedicated for use by RVV, and that was part of its original design. ARM, likewise, will have done something similar, with SVE and SVE2. the problem with that approach is that it is extremely limiting in the possible permutations / combinations of *potential* instructions that *could* exist, if there was not such a limit of trying to cram into a 32-bit space. [it does have to be said, however, that there are some serious practical benefits to limiting the possibilities of an ISA: validation and verification of silicon before spending USD 16 million on 7nm masks is a Damn Good Reason :) and it is one that we are going to have to give some serious thought to: how to verify the hardware for an ISA with literally several MILLION instructions] we have left the entirety of the Scalar Power v3.0B ISA alone (which is 32-bit), called that "base", and added a full 32-bit Prefix (called SVP64) which contains the Vectorisation Context. SVP64 is - fundamentally - an O(NxM) ISA, where N ~= 250 and M is ~= 1,000 to 8,000. actually, it's O(NxMxOxPxQ) where: * N~=250 is the base scalar Power v3.0B ISA * M~=1,000-8,000 is the Vectorisation Context * O~=2^20 (guessing here) is REMAP Schedules and * P~=2^(3*12) is SWIZZLE Contexts for GPUs (XXYZ, WZZX) * Q=64 (Vector Length, VL) thus for example with Twin-Predication applied to e.g. llvm.sext or llvm.cos we have implicit back-to-back VGATHER-VSCATTER behaviour *WITHOUT* needing a pair of LD/ST operations or register MV operations before and after the Vectorised operation. we anticipate some extremely powerful compact representations, and to be honest it may literally take several years for the full implications of SVP64's power and flexibility to sink in. in-register paralleliseable DCT can be done in 9 instructions, and paralleliseable in-register-file (small) insertion-sort likely in around 11 instructions thanks to the Data-Dependent Fail-on-First-Condition Mode. we can even implement a (small) Vectorised quick-sort, in-register, fully paralleliseable, in probably about... 20 instructions. it's on my TODO list to investigate. > Of course, the more invisible things, the harder it is to validate and change intersections of code, so the change must really be worth the extra hassle. appreciated. > With both Arm and RISCV implementing scalable extensions, that change was deemed worthy and work is progressing. > So, if you could leverage the existing code to your advantage, you'd avoid having to convince a huge community to implement a large breaking change. the possibility that occurred to me, above, as writing this, of adding optional arguments (containing the Vector Augmentation Context) to base scalar llvm intrinsics, would i believe achieve that. if any other ISA vendors wanted to use that, they could, as a first pass, map e.g. llvm.load(optional_predicate=xxx) *onto* llvm.masked.load(....) and thus avoid huge disruption, and carry that out in an incremental fashion. or not. at their choice. there are several variants on this theme of optional arguments of some description to the base: llvm.add(normal_arguments) where optional_vector_context is an object of some type that itself contains optional "augmentation" features. i would advocate something like this: llvm.add(normal_arguments, source_override_width=<8/16/32/64>, dest_override_width=<8/16/32/64>, saturation_mode=, source_pred=xxx, dest_pred=yyyy, fail_first_mode, swizzle_src=, REMAP_schedules=, scalar_or_vector_regs=) can you imagine expanding all of those out into a declared (flat) list of intrinsics? what that would do to LLVM SVE if we tried? the "augmentation" list is absolutely massive and starts to give some idea of why LLVM SVE, as designed, simply won't cope, and why we have to think about this differently. the thing is: from the study i've made of other ISAs, i can say that with near 100% certainty that there *will* be a direct map to all of the existing LLVM SVE intrinsics recently added *and to those of all SIMD ISAs as well* and, what i expect to happen is that instead of a massive list of thousands of SIMD intrinsics for e.g. x86, it will reduce down to a fraction of what is in LLVM x86 backend right now. in fact, i expect the exact same reduction to occur for *all* Packed and Predicated SIMD ISAs supported by LLVM. that will have both reduction in maintenance burden, and it should, in theeoorry, reduce compile times as well. in theory. in practical terms it depends what the impact is of the "optional" arguments. hmmm, that will need some thought. even the Mill i believe could benefit, from being able to map much more closely to the actual underlying ISA, which *only* has "ADD" (not ADD8/16/32/64), because they could potentially add an "auto" or "implicit" option to the source width / dest width arguments, which would be much more in line with how the actual ISA itself works (implicit tagged - polymorphic - registers) https://millcomputing.com/docs/compiler/ > And you'd also give us one more reason for the scalable extension to exist. :) :) as i mentioned at the start, with that list of ISAs, there _do_ exist ofher Vector ISAs and actual hardware implementations, out there: NEC SX-Aurora has been shipping for decades, now - first implementations were April 1983! https://en.wikipedia.org/wiki/NEC_SX here's some background: https://sx-aurora.github.io/ and yes, they do have a Vector Extension variant of llvm: https://sx-aurora.github.io/posts/llvm-ve-rv/ i do hope at some point that they come out of the woodwork and participate in LLVM SVE. and that the product continues to ship, it's pretty incredible and i am delighted that NEC has had a strong enough customer base to keep on selling it and maintaining SX-Aurora. Mitch Alsup's MyISA 66000 will need gcc and llvm at some point, and it is another ISA with a form of Scalable Vectors - one that has been specially designed to "thunk" down to simple scalar hardware. thank you, Renato, for responding, it's given me the opportunity to explain a bit more in-depth. feel free to cc libre-soc-dev in future, we don't mind [relevant!] cross-posts. warmest, l. From programmerjake at gmail.com Tue Aug 3 19:21:33 2021 From: programmerjake at gmail.com (Jacob Lifshay) Date: Tue, 3 Aug 2021 11:21:33 -0700 Subject: [Libre-soc-dev] [RFC] SVP64 on branch instructions In-Reply-To: References:

Message-ID: On Tue, Aug 3, 2021, 07:52 Luke Kenneth Casson Leighton wrote: > drat. fricking gmail HTML Basic mode is barely useable. hit send > instead of save. grrr. > > ok. > > https://libre-soc.org/openpower/sv/branches/?updated > > i've added the example and created SVP64 hypothetical assembler. ok, 3 issues: 1. CR fields set before a call and used after a call will not work, unless you pick callee-saved fields. icr what ABI we picked, but I expect around half of them to be callee-saved and half to be caller-saved. In all ABIs I've seen, argument registers aren't preserved (you used them to pass the predicates to the functions, then tried to read the same register immediately after the function call, where it is potentially overwritten by the function). 2. some branch instructions are missing commas to separate arguments. 3. the branch at the end that branches back to the top of the while loop needs to either be an unconditional branch to the while loop's test (though basically all compilers don't do that), or replicate the code for testing the condition at the bottom of the loop (what basically all compilers do, iirc). Jacob From lkcl at lkcl.net Tue Aug 3 19:55:21 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Tue, 3 Aug 2021 19:55:21 +0100 Subject: [Libre-soc-dev] [RFC] SVP64 on branch instructions In-Reply-To: References:

Message-ID: On 8/3/21, Jacob Lifshay wrote: > ok, 3 issues: > 1. CR fields set before a call and used after a call will not work, ... you get the general idea, though: that with sz and SNZ there's a way for predicate masks to interact with the CR Vector, to create Vec-AND and VEC-OR behaviour that, at the same time, still allows CTR the option of counting masked-in elements or all elements. also, early-exit has the "truncate VL" option, so that, hmm, i just realised, that could help with strncpy and strlen. feel free to help edit and correct the syntax errors. l. From luke.leighton at gmail.com Wed Aug 4 11:23:54 2021 From: luke.leighton at gmail.com (lkcl) Date: Wed, 04 Aug 2021 10:23:54 +0000 Subject: [Libre-soc-dev] [RFC] SVP64 on branch instructions In-Reply-To: References:

Message-ID: On August 3, 2021 6:55:21 PM UTC, Luke Kenneth Casson Leighton wrote: >... you get the general idea, though: that with sz and SNZ there's a >way for predicate masks to interact with the CR Vector, to create >Vec-AND and VEC-OR behaviour that, at the same time, still allows CTR >the option of counting masked-in elements or all elements. sigh. we've run out of bits, and i have a feeling that it is more useful to have the option of updating the CR field being tested, taking predicate masks into account, than it is say to keep the Absolute Address functionality of branch. AA is something that is only used in Hypervisor mode, for interrupt tables or OS source, and is otherwise very much wasted in userspace. normally it is a Hard Rule that under no circumstances should SVP64 alter the base operation. this so that when talking about it, and advocating it, we may state, plainly "base. loop. simple". the moment the word "except" has to go into that sentence, it will make people nervous when it cones to adoption. in this particular case however the entire Branch has to be *replaced*. thoughts? l. From luke.leighton at gmail.com Wed Aug 4 12:44:08 2021 From: luke.leighton at gmail.com (lkcl) Date: Wed, 04 Aug 2021 11:44:08 +0000 Subject: [Libre-soc-dev] [RFC] SVP64 on branch instructions In-Reply-To: References:

Message-ID: <0CBB2A4D-E459-4461-BB0B-4AF9000E1CC7@gmail.com> On August 4, 2021 10:23:54 AM UTC, lkcl wrote: >On August 3, 2021 6:55:21 PM UTC, Luke Kenneth Casson Leighton > wrote: > >>... you get the general idea, though: that with sz and SNZ there's a >>way for predicate masks to interact with the CR Vector, to create >>Vec-AND and VEC-OR behaviour that, at the same time, still allows CTR >>the option of counting masked-in elements or all elements. > >sigh. > >we've run out of bits, and i have a feeling that it is more useful to have the option of updating the CR field being tested, taking predicate masks into account, than it is say to keep the Absolute Address functionality of branch. i just noticed, AA is (bit 30) only in bc: PO BO BI BD AA LK 0 6 11 16 30 31 whereas for bclr there are bits spare: PO BO BI /// BH XO LK 0 6 11 16 19 21 31 thus only bc need have altered behaviour from v3.0B as far as bit definitions are concerned: bclr may set a new bitfield 16-18. excellent. the reason the branch pseudocode has to change is because the loop on the CR Field Vector must not run to letting LR or other alterations occur. i.e. we cannot just use the existing bc pseudocode and run it in a VL loop. l. From luke.leighton at gmail.com Wed Aug 4 14:14:40 2021 From: luke.leighton at gmail.com (lkcl) Date: Wed, 04 Aug 2021 13:14:40 +0000 Subject: [Libre-soc-dev] [RFC] SVP64 on branch instructions In-Reply-To: References:

Message-ID: <121DFA96-9CAA-45C7-B24D-258D2F62D0DB@gmail.com> On August 3, 2021 6:21:33 PM UTC, Jacob Lifshay wrote: >On Tue, Aug 3, 2021, 07:52 Luke Kenneth Casson Leighton >wrote: > >> drat. fricking gmail HTML Basic mode is barely useable. hit send >> instead of save. grrr. >> >> ok. >> >> https://libre-soc.org/openpower/sv/branches/?updated >> >> i've added the example and created SVP64 hypothetical assembler. > > >ok, 3 issues: >1. CR fields set before a call and used after a call will not work, ah, i just noticed: you may have missed the significance of this: sv.crand CR80.v.SO, CR60.v.GT, CR80.v.LT # if = loop & pred_b f(CR80.v.SO) that's taking the *LT* field from the CRv for b, and ANDing it with the *GT* field for a, and storing it in *a completely separate* CR field (SO). thus whatever f() does there will be no impact. technically, EABI definitions are out of scope at the moment, i would like to get the ISA design right and focus on that, first. Vectors of CRs is not a concept that exists in EABI v2.0 so there is no existing EABI. at some point we have to define one... i would prefer that not to be right now (it is a massive task of its own) regarding overwrite and use of AA for alternative purposes, i realised after some thought that actually, combining the predicate mask with the Vector-Branch-CR test is not appropriate to do inside Branch itself. the example above illustrates why: CR80.v.SO has bits *cleared* where the predicate mask is cleared, and the behaviour of predicate masks in operations is to act on elements where the bits are *set*. altering Branch to cope with these inverted semantics, given its early-out capability, is completely inappropriate. l. From luke.leighton at gmail.com Wed Aug 4 18:45:19 2021 From: luke.leighton at gmail.com (lkcl) Date: Wed, 04 Aug 2021 17:45:19 +0000 Subject: [Libre-soc-dev] [RFC] SVP64 on branch instructions In-Reply-To: <121DFA96-9CAA-45C7-B24D-258D2F62D0DB@gmail.com> References:

<121DFA96-9CAA-45C7-B24D-258D2F62D0DB@gmail.com> Message-ID: <41FC12EC-19C5-4635-94C2-224F264A62AF@gmail.com> On August 4, 2021 1:14:40 PM UTC, lkcl wrote: >regarding overwrite and use of AA for alternative purposes, i realised >after some thought that actually, combining the predicate mask with the >Vector-Branch-CR test is not appropriate to do inside Branch itself. ... but _is_ appropriate for svstep mode, to allow some situations where you want to know what a REMAP schedule might look like (and to obtain all the endpoints of all loops in one hit), yet in others you don't care, you just want to branch/loop. i've therefore put re-purposing of AA as Rc back.in, sigh. implementations of this are going to be... tricky. although, just thinking about it: hypothetically, and just like LD/ST, it may still be possible to use the existing scalar v3.0B instruction, but "fake" what data it receives. (for Vector LDST in ISACaller i actually changed the immediate D to contain D*srcstep and other modes. something similar might be possible with branches. have to see) l. From programmerjake at gmail.com Wed Aug 4 19:41:11 2021 From: programmerjake at gmail.com (Jacob Lifshay) Date: Wed, 4 Aug 2021 11:41:11 -0700 Subject: [Libre-soc-dev] XDC2021 Message-ID: Phoronix had an article about the XDC talks, I didn't see any Libre-SOC talks, were we going to submit any? https://www.phoronix.com/scan.php?page=news_item&px=XDC-2021-Scheduler Jacob From luke.leighton at gmail.com Wed Aug 4 19:52:22 2021 From: luke.leighton at gmail.com (lkcl) Date: Wed, 04 Aug 2021 18:52:22 +0000 Subject: [Libre-soc-dev] XDC2021 In-Reply-To: References: Message-ID: <14EC9F6E-988B-451A-B1E1-1D9CA683BAC2@gmail.com> On August 4, 2021 6:41:11 PM UTC, Jacob Lifshay wrote: >Phoronix had an article about the XDC talks, I didn't see any Libre-SOC >talks, were we going to submit any? i did however the website said "submissions open" for seversl weeks, and only in small letters much further down contained the deadline for talk submissions. they've updated their procedures and also included libre-soc-dev in template notifications. l. From luke.leighton at gmail.com Wed Aug 4 23:05:51 2021 From: luke.leighton at gmail.com (lkcl) Date: Wed, 04 Aug 2021 22:05:51 +0000 Subject: [Libre-soc-dev] [llvm-dev] [RFC] Vector/SIMD ISA Context Abstraction In-Reply-To: References:

Message-ID: On August 3, 2021 5:32:29 PM UTC, Luke Kenneth Casson Leighton wrote: >(renato thank you for cc'ing, due to digest subscription at the moment) > >On Tue, Aug 3, 2021 at 3:25 PM Renato Golin wrote: >> * For example, some reduction intrinsics were added to address >bloat, but no target is forced to use them. > >excellent. iteration and reduction (including fixed schedule >paralleliseable reduction) is one of the intrinsics being added to >SVP64. apologies to all for the follow-up, i realised i joined iteration and reduction together as if they were the same concept: they are not. Iterative Sum when carried out on add of a Vector containing all 1s results in a Pascal Triangle Vector output example of existing hardware that has actual Iteration instructions: Section 8.15 of SX-Aurora ISA guide, p8-297, the pseudocode for Iterative Add: for (i = 0 to VL-1) { Vx(i) ← Vy(i) + Vx(i-1), where Vx(-1)=Sy } where if Vx and Vy are the same register you get the Pascal Triangle effect. https://www.hpc.nec/documents/guide/pdfs/Aurora_ISA_guide.pdf SVP64 does not have this *specifically* added: it is achieved incidentally by issuing an add where the src and dest registers differ by one (SVP64 sits on top of a rather large scalar regfile, 128 64 bit entries) sv.add r1, r1, r0 we did however need to add a "reverse gear" (for (i = 0 to VL-1)) which was needed for ffmpeg's MP3 CODEC ironically to *avoid* the Pascal Triangle effect (and not need to copy a large batch of registers instead) can anyone say if LLVM SVE happened to add Iteration? l. From luke.leighton at gmail.com Fri Aug 6 10:01:17 2021 From: luke.leighton at gmail.com (lkcl) Date: Fri, 06 Aug 2021 09:01:17 +0000 Subject: [Libre-soc-dev] [RFC] SVP64 on branch instructions In-Reply-To: <41FC12EC-19C5-4635-94C2-224F264A62AF@gmail.com> References:

<121DFA96-9CAA-45C7-B24D-258D2F62D0DB@gmail.com> <41FC12EC-19C5-4635-94C2-224F264A62AF@gmail.com> Message-ID: <13162CF9-9E95-4ACC-80FD-B101602FC7F3@gmail.com> On August 4, 2021 5:45:19 PM UTC, lkcl wrote: >i've therefore put re-purposing of AA as Rc back.in, sigh. i just realised / remembered that there are some spare bits in the RM EXTRA2/3 area that can be used rather than make life hell (and create critical dependencies) in RM MODE. for other instructions including LD/ST the EXTRA2/3 area is often entirely taken up, so trying to reuse it for Mode bits is inappropriate. however Branches are so specific (only 2) that we *know*, from examining the register profile of Branches, that they will not use the high area of EXTRA2/3, or in fact the ELWIDTH area either (which may be a better choice, EXTRA2/3 is quite complex decoding, and adding extra MUXes into it is not something done lightly) i will rework things today. l. From luke.leighton at gmail.com Fri Aug 6 12:12:07 2021 From: luke.leighton at gmail.com (lkcl) Date: Fri, 06 Aug 2021 11:12:07 +0000 Subject: [Libre-soc-dev] [RFC] SVP64 on branch instructions In-Reply-To: <13162CF9-9E95-4ACC-80FD-B101602FC7F3@gmail.com> References:

<121DFA96-9CAA-45C7-B24D-258D2F62D0DB@gmail.com> <41FC12EC-19C5-4635-94C2-224F264A62AF@gmail.com> <13162CF9-9E95-4ACC-80FD-B101602FC7F3@gmail.com> Message-ID: <0E44672E-F2F2-4C03-8C43-60086A2C783D@gmail.com> | 4 | 5 | 6 | 7 | 19 | 20 | 21 | 22 23 | description | | - | - | - | - | -- | -- | --- |---------|-------------------- | |ALL|LRu| / | / | 0 | 0 | / | SNZ sz | normal mode | |ALL|LRu| / | / | 0 | 1 | VLI | SNZ sz | VLSET mode | |ALL|LRu|BRc| / | 1 | 0 | / | SNZ sz | svstep mode | |ALL|LRu|BRc| / | 1 | 1 | VLI | SNZ sz | svstep+VLSET mode | that's more like it. are there any other modes worth considering? From luke.leighton at gmail.com Fri Aug 6 21:15:28 2021 From: luke.leighton at gmail.com (lkcl) Date: Fri, 06 Aug 2021 20:15:28 +0000 Subject: [Libre-soc-dev] [RFC] SVP64 on branch instructions In-Reply-To: <0E44672E-F2F2-4C03-8C43-60086A2C783D@gmail.com> References:

<121DFA96-9CAA-45C7-B24D-258D2F62D0DB@gmail.com> <41FC12EC-19C5-4635-94C2-224F264A62AF@gmail.com> <13162CF9-9E95-4ACC-80FD-B101602FC7F3@gmail.com> <0E44672E-F2F2-4C03-8C43-60086A2C783D@gmail.com> <27695B12-7CAA-4AB0-8FEB-EF0E0667AD22@gmail.com> <20210807212511.t52ojkjyrxkvwp52@topoi.pooq.com> Message-ID: i've started on the 24 bit RM decoder for BC, combining bits into 2 bit enums with only 3 entries in most cases, quite annoying that, but it is what it is. svstep for example: * disabled * non Rc mode * Rc mode and VLSET: * disabled * set to VL * set to VL-1 also with using both elwidth fields there now has to be a MUX on element widths, where the selector of that MUX is dependent on whether the operation is a Branch or not. hmmm. fortunately it is local i.e. not dependent on SVSTATE. i nearly made the mistake of making Branch Conditional dependent on SVSTATE.VerticalFirst Mode, which would have serious adverse consequences for multi-issue decoding. this is exactly the same reason why i said "Hard No" to the idea of making the decoder critically dependent on when SVSTATE.VL==0 if we were designing something that was specifically intended for non-supercomputer non-multi-issue uses, adding critical dependencies between SVSTATE and the decoder would be perfectly fine. l. From richard.wilbur at gmail.com Sun Aug 8 18:20:08 2021 From: richard.wilbur at gmail.com (Richard Wilbur) Date: Sun, 8 Aug 2021 10:20:08 -0700 Subject: [Libre-soc-dev] [RFC] SVP64 on branch instructions In-Reply-To: References: Message-ID: > On Aug 8, 2021, at 06:32, lkcl wrote: > > i've started on the 24 bit RM decoder for BC, combining bits into 2 bit enums with only 3 entries in most cases, quite annoying that, but it is what it is. Indeed, realizing that it is not as densely packed as if all the possibilities were used can be vexing, but it leaves room to accommodate one more option if we realize later that something could be an immense improvement with an additional mode. […] > i nearly made the mistake of making Branch Conditional dependent on SVSTATE.VerticalFirst Mode, which would have serious adverse consequences for multi-issue decoding. > > this is exactly the same reason why i said "Hard No" to the idea of making the decoder critically dependent on when SVSTATE.VL==0 > > if we were designing something that was specifically intended for non-supercomputer non-multi-issue uses, adding critical dependencies between SVSTATE and the decoder would be perfectly fine. So I’m envisioning the supercomputer multi-issue decoder loading something like a cache line at a time from memory/cache, starting the decode by determining instruction boundaries (left-to-right cascade, but pretty quick/simple to determine 32-bit or 64-bit), then parallel decode can start on each instruction up to dispatch when hazards from interactions with resources modified by previous instructions need to be taken into account. It is a very cool picture—even cooler because, to the extent they are used, the horizontal and vertical loop/vector modes will relieve a large amount of instruction cache and decoder activity! I suppose dispatch will need to depend on/have a hazard on-SVSTATE (at least VL?) in order to possibly parallelize some vector operations in an implementation-dependent fashion? It seems likely that if VL <= the number of ALU’s that the initial multiplications of a vector dot product could be dispatched in parallel. From lkcl at lkcl.net Sun Aug 8 20:11:29 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Sun, 8 Aug 2021 20:11:29 +0100 Subject: [Libre-soc-dev] [RFC] SVP64 on branch instructions In-Reply-To: References:

Message-ID: --- crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68 On Sun, Aug 8, 2021 at 6:20 PM Richard Wilbur wrote: > > > > On Aug 8, 2021, at 06:32, lkcl wrote: > > > > i've started on the 24 bit RM decoder for BC, combining bits into 2 bit enums with only 3 entries in most cases, quite annoying that, but it is what it is. > > Indeed, realizing that it is not as densely packed as if all the possibilities were used can be vexing, but it leaves room to accommodate one more option if we realize later that something could be an immense improvement with an additional mode. as a last resort, yes. the complexity involved of first spotting those brownfield encodings then chaining on them, it gets... yeah. > > if we were designing something that was specifically intended for non-supercomputer non-multi-issue uses, adding critical dependencies between SVSTATE and the decoder would be perfectly fine. > > So I’m envisioning the supercomputer multi-issue decoder loading something > like a cache line at a time from memory/cache, yes, and pushing that into a queue. (often this is not a shift register, just an SRAM but where the address is what moves. exactly like how static-sized queues get implemented in software, with a pointer to head and pointer to tail, you move them on) > starting the decode by determining instruction boundaries (left-to-right cascade, > but pretty quick/simple to determine 32-bit or 64-bit), yes. jacob had a great idea there to use a standard carry-save-propagation algorithm. > then parallel decode can start on each instruction up to dispatch correct. > when hazards from interactions with resources modified by previous instructions need to be taken into account. this (dispatch) is where, if you have dependencies on SVSTATE (such as the VerticalFirst bit, or the idea of having VL==0 mean something completely different as far as what those 64-bits *actually* mean, it all goes to hell. one of the prior instructions in the current "batch" might *change* VL, or *change* to VerticalFirst Mode. now you want every one of those parallel decoders to be critically dependent on something that was in a previous slot?? oink. that's no longer a paralleliseable decoder, is it? > It is a very cool picture—even cooler because, to the extent they are used, > the horizontal and vertical loop/vector modes will relieve a large amount of > instruction cache and decoder activity! Vertical-First in "batch" mode - i.e. when the hardware has set the VF "Hint" to a value other than 1, yes. or, if, like in MyISA 66000 by Mitch Alsup, the hardware can determine through lookahead that it can parallelise a whole batch (automatically determine the number of elements in a loop that can be done entirely in parallel) > I suppose dispatch will need to depend on/have a hazard on-SVSTATE > (at least VL?) yes. in Horizontal-First, mode, definitely. the actual relationship between parallelly-decoded instructions and the issued elements-which-may-be-batched is *not* a linear one. decoder1 decoder2 decoder3 decoder4 decoder5 decoder6 sv.add sv.mul setvli 5 sv.sub ... VL=4 VL=4 VL=5 VL=5 the instructions that get issued will be: decoder1: QTY 4x ADDs decoder2: QTY 4x MULs decoder3: QTY 1x change of SVSTATE decoder4: QTY 5x SUBs **ONLY** in the circumstance where all 4 ADDs may be passed straight through to **ONE** ALU in *ONE* clock cycle will it be possible to also consider some of the MUL operations. in the case where that is not possible, let us assume e.g. that there are 8 potential issue slots, we may issue QTY4 ADDs to the first 4 slots and QTY4 MULs to the next 4. ... errr.... but we have 8-way multi-issue and 8-way parallel decode? errr what happened to the other 8 decoded instructions? answer: the issue slots are all full, just from the first two instructions. the rest have to wait. this is not a bad thing per se, because execution has just been spammed and is 100% occupied. > in order to possibly parallelize some vector operations in an implementation-dependent fashion? It seems likely that if VL <= the number of ALU’s that the initial multiplications of a vector dot product could be dispatched in parallel. even if VL >= the number of ALUs, the multiplications can still be issued in parallel. it's just that the decoders sit there "zzzzz" and yet we're perfectly happy with that situation because back-end execution is 100% occupied. l. From vklr at vkten.in Mon Aug 9 01:29:04 2021 From: vklr at vkten.in (Veera) Date: Mon, 9 Aug 2021 05:59:04 +0530 Subject: [Libre-soc-dev] Status of Power ASIC Chip sent to TSMC Fab Message-ID: <20210809002902.GA1671@lily.local> Hi, What is the status of OPENPOWER ASIC Chip sent to TSMC 180nm Fab? Has it arrived to Libre-SOC team. Regards, Veera From whygee at f-cpu.org Mon Aug 9 01:31:12 2021 From: whygee at f-cpu.org (whygee at f-cpu.org) Date: Mon, 09 Aug 2021 02:31:12 +0200 Subject: [Libre-soc-dev] Status of Power ASIC Chip sent to TSMC Fab In-Reply-To: <20210809002902.GA1671@lily.local> References: <20210809002902.GA1671@lily.local> Message-ID: <2563e68ff2735262e36666a108670cf5@f-cpu.org> On 2021-08-09 02:29, Veera wrote: > Hi, > > What is the status of OPENPOWER ASIC Chip sent to TSMC 180nm Fab? > > Has it arrived to Libre-SOC team. I doubt : it's going to take many months... and we'd be the first to know ! Still, I'm looking forward to more news like everybody :-) > Regards, > Veera yg From luke.leighton at gmail.com Mon Aug 9 16:46:43 2021 From: luke.leighton at gmail.com (lkcl) Date: Mon, 09 Aug 2021 15:46:43 +0000 Subject: [Libre-soc-dev] [llvm-dev] [RFC] Vector/SIMD ISA Context Abstraction In-Reply-To: References:

Message-ID: <402E58C0-81FF-4ED3-9E8A-14741C1665F4@gmail.com> again, apologies, a follow-up: i'd like to keep the conversation going (with everyone). a reminder / summary of the proposal: all basic *scalar* LLVM intrinsics extend with *optional* arguments that provide Vector / SIMD Augmentation Context. the benefit being that the number of intrinsics needed now and in the future in LLVM is dramatically reduced first, a clarification: Renato, you asked if the shuffle capability of LLVM SVE was sufficient: i replied slightly flippantly asking if shuffle-{any-arith-op} existed as a concept (apologies for that). SVP64 does not have shuffle-{any-arith-op} however being targetted at 3D and Video it does have Swizzle and a new concept: REMAP. Swizzle can be applied through prefixing to all source registers. it is well-known in the GPU world, especially how important it is, and does not need describing. REMAP is a completely new concept. an algorithmic "remapping" is applied to the normally sequentially-incrementing Vector Element indices. useful limited easy-to-implement "remappings" are being developed, such as Matrix Schedules (0 3 6 1 4 7 2 5 8) and RADIX-2 FFT/DCT Butterfly Schedules. normally Shuffle is limited to either memory operations or to register MV operations, and both are inherently supported by SVP64 through Vectorisation of base scalar operations: Indexed LD/ST for example. my point is that whilst SVP64 supports the "normal" expected type of Shuffle Operations expected of Vector ISAs (Vector-Indexed-LD, Indexed-Reg-MV) it also has GPU style Swizzle (a limited type of shuffle for short vectors up to length 4) and REMAP. thus, there is a case even for adding shuffle-augmentation to base LLVM intrinsics as optional arguments. the one that *is* much more general purpose but was not mentioned except in passing was VGATHER-VSCATTER. in all other Vector ISAs these are usually either memory-only or Reg-MV operations (or both). it's usually done with Predicate Masks. In SVP64, surprise: both VGATHER and VSCATTER are abstracted-out concepts that can apply to almost every operation. this is not possible to do all thr time, but when *both* are applied (VGATHER to the source regs or memory, VSCATTER to the dest), we call that "Twin Predication". thus, again, we would propose adding *both* a source predicate mask *and* destination predicate mask to base llvm intrinsics, as optional arguments. the other concept is slightly odd: element-width overrides even on operations where the source registers are specified at a fixed width already. this one i am slightly uncertain about. we have a Mode in SVP64 called "Saturate" which has sub-options Signed and Unsigned. the rules for this took us some time to derive: eventually we realised that the rule has to be that the arithmetic operation appears to take place at *infinite* precision, followed up by truncation to the min/max of the output bitwidth. all other definitions turned out to be problematic in some way (particularly for multiply or power). what i am not certain about is whether it is perfectly sufficient to use standard base LLVM intrinsics, and count on source register type and return type as the SVP64 src width and dest width, and simply add optional arguments for signed/unsigned saturation. however what is clear to me is that there is very little conceptual limit as to what can be added as optional arguments to base intrinsics. it would be up to ISA Maintainers to define what they can provide in hardware. i would very much love to hear from other ISA Maintainers as to whether the ISA they are responsible for could benefit from this approach, both in the 3D GPU World as well as standard non-GPU: ARM SVE2, x86, AMDGPU, MIPS, ppc64, SX-Aurora, everyone. SIMD ISAs would have an optional argument specifying the (fixed) length. Cray-style Scalar Vector ISAs would have an optional argument specifying that the length was variable. the invitation is therefore to see if this idea, of adding optional Vectorisation Context to base llvm intrinsics, has merit across the entire LLVM community, and, if it does, what would it look like? key question: what impact would a large number of optional arguments to LLVM base intrinsics have, on performance and memory consumption? would it be beneficial or adverse? i honestly have no idea. another question: if a given ISA does not provide a particular hardware feature (saturation let us say) then should this be declared in some fashion such that LLVM avoids emitting llvm.add(args, sat=signed) OR should the functionality be provided anyway by way of soft-passes behind the scenes? i.e. the lack of hardware saturation would result in IR being emitted that ultimately performed the saturation using multiple assembly operations. given that this latter approach would effectively imply that *all* LLVM IR backends "supported" SIMD and Vectorisation (emulated through IR passes for non-Vector non-SIMD hardware) it would need some serious thought. l. From lkcl at lkcl.net Mon Aug 9 19:09:39 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Mon, 9 Aug 2021 19:09:39 +0100 Subject: [Libre-soc-dev] libre-soc server cgroups In-Reply-To: References:

Message-ID: On Mon, Aug 2, 2021 at 9:12 AM Luke Kenneth Casson Leighton wrote: > https://contabo.com/en/vps/vps-s-ssd/?image=ubuntu.267&qty=1&contract=1 > > 4 cores, 8 GB RAM, 200 GB SSD for EUR 6, that's pretty damn good. > moving to a different VM however is quite a bit of hassle. i wonder if > i can get mythic-beasts to negotiate alternative pricing. Staf, i had a word with mythic-beasts, they explained the difference: contabo are likely to be using "ballooning" (meaning, they allocate more customers than there are actual resources). mythic-beasts *very deliberately* do not do that. also, their peering arrangement with other ISPs provides a guaranteed gigabit-ethernet-level service, and if you look at the bandwidth allocations the "level up" for contabo is a whopping EUR 300+ euros a month. bottom line is, if the libre-soc server gets hammered we can scale it up at not unreasonable prices, whereas contabo will hit limits much earlier and we'd be hit with disproportionately high bills to fix that. the team at mythic-beasts loved what we're doing, so they offered to upgrade us to 2-core 4GB RAM for 12 GBP/m inc VAT (actually 10 GBP but i don't think they were aware of the IPv4 address) i've now listed them as a hosting sponsor on the front page. l. From lkcl at lkcl.net Tue Aug 10 12:48:43 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Tue, 10 Aug 2021 12:48:43 +0100 Subject: [Libre-soc-dev] [RFC] SVP64 on branch instructions In-Reply-To: References:

Message-ID: hm, i'm starting to implement the SVP64 Branch in ISACaller and ran immediately into an issue. unlike CTR decrementing, combining SVSTEP capability into Branch is actually very complicated: a lot of gates, a lot of state. question: when preparing the *next* SVSTATE (the next src/dest step), do you use the CR bit to test from the *CURRENT* src/dest step, or the NEXT src/dest step? what happens when REMAP is involved? how many gates in the chain are there before you can even determine if the branch should go ahead? CTR you can just subtract one. much as i would like an svstep mode in sv.bc, it's too CISC. annoying. l. From luke.leighton at gmail.com Tue Aug 10 13:47:24 2021 From: luke.leighton at gmail.com (lkcl) Date: Tue, 10 Aug 2021 12:47:24 +0000 Subject: [Libre-soc-dev] NLnet cryotoprimitives grant approved In-Reply-To: References: Message-ID: with many thanks to NLnet, the EUR 50,000 grant to research and develop Draft cryptographic primitives and instructions to the newly-open Power ISA has been approved. unlike RISC-V where full transparency and trust is problematic and there are many participants whose interests may not necessarily align, the OpenPOWER initiative, which has been in careful planning for nearly 10 years, is a much less crowded space and, crucially, does not require non-transparent membership of OPF in order to submit ISA RFCs (Requests for Change) [non-OPF members cannot participate in actual ISA WG meetings and certainly cannot vote on RFCs, but they can at least submit them. whereas whilst the RISC-V Foundation's Commercial Confidence Requirements are perfectly reasonable, the blanket secrecy even for submitting RFCs is not] we at Libre-SOC aim to use this process, based on taking apart key strategic cryptographic algorithms back to their mathematical roots, then applying Vector ISA design analysis and seeing what can be created. examples include going back to the fundamental basis of Rijndael, and instead of creating hardcoded custom silicon for MixColumns as is the "normal" practice, adding a generic Galois Field ALU and a generic Matrix Multiply system. another is to design instructions suitable for "big integer math" this in turn means that the resultant ISA would be ideally suited to the experimental development of future cryptographic algorithms for use in securing wallets and other purposes related to blockchain management. [as bitcoin stands we cannot possibly hope to compete with custom silicon dedicated to SHA hash production, however we would very much like to see a future version of bitcoin that uses far less power yet retains its high strategic value, and, at the same time, like e.g. monero RandomX, is better suited to a general-purpose Vector Supercomputer ISA, which is what we are developing] OpenPOWER's commitment to a transparent RFC process allows us to do that without compromising trust: no discussions that we participate in will ever be behind closed doors. if anyone would be interested to participate or collaborate on this, we have funding available, and welcome involvement in designing and testing an ISA suitable for securing bitcoin for end-users in a fully transparent fashion. l. From lkcl at lkcl.net Tue Aug 10 13:54:55 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Tue, 10 Aug 2021 13:54:55 +0100 Subject: [Libre-soc-dev] Fwd: NLnet cryotoprimitives grant approved In-Reply-To: References:

Message-ID: with many thanks to NLnet, the EUR 50,000 grant to research and develop Draft cryptographic primitives and instructions to the newly-open Power ISA has been approved. unlike RISC-V where full transparency and trust is problematic and there are many participants whose interests may not necessarily align, the OpenPOWER initiative, which has been in careful planning for nearly 10 years, is a much less crowded space and, crucially, does not require non-transparent membership of OPF in order to submit ISA RFCs (Requests for Change) [non-OPF members cannot participate in actual ISA WG meetings and certainly cannot vote on RFCs, but they can at least submit them. whereas whilst the RISC-V Foundation's Commercial Confidence Requirements are perfectly reasonable, the blanket secrecy even for submitting RFCs is not] we at Libre-SOC aim to use this process, based on taking apart key strategic cryptographic algorithms back to their mathematical roots, then applying Vector ISA design analysis and seeing what can be created. examples include going back to the fundamental basis of Rijndael, and instead of creating hardcoded custom silicon for MixColumns as is the "normal" practice, adding a generic Galois Field ALU and a generic Matrix Multiply system. another is to design instructions suitable for "big integer math" this in turn means that the resultant ISA would be ideally suited to the experimental development of future cryptographic algorithms for use in securing wallets and other purposes related to blockchain management. [as bitcoin stands we cannot possibly hope to compete with custom silicon dedicated to SHA hash production, however we would very much like to see a future version of bitcoin that uses far less power yet retains its high strategic value, and, at the same time, like e.g. monero RandomX, is better suited to a general-purpose Vector Supercomputer ISA, which is what we are developing] OpenPOWER's commitment to a transparent RFC process allows us to do that without compromising trust: no discussions that we participate in will ever be behind closed doors. if anyone would be interested to participate or collaborate on this, we have funding available, and welcome involvement in designing and testing an ISA suitable for securing bitcoin for end-users in a fully transparent fashion. l. --- crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68 From programmerjake at gmail.com Tue Aug 10 18:10:25 2021 From: programmerjake at gmail.com (Jacob Lifshay) Date: Tue, 10 Aug 2021 10:10:25 -0700 Subject: [Libre-soc-dev] Fwd: NLnet cryotoprimitives grant approved In-Reply-To: References:

Message-ID: On Tue, Aug 10, 2021, 05:55 Luke Kenneth Casson Leighton wrote: > with many thanks to NLnet, the EUR 50,000 grant to research and > develop Draft cryptographic primitives and instructions to the > newly-open Power ISA has been approved. Yay! I told Phoronix since I think they would deem this sufficiently newsworthy. Jacob From lkcl at lkcl.net Tue Aug 10 18:25:10 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Tue, 10 Aug 2021 18:25:10 +0100 Subject: [Libre-soc-dev] Fwd: NLnet cryotoprimitives grant approved In-Reply-To: References:

Message-ID: On Tue, Aug 10, 2021 at 6:10 PM Jacob Lifshay wrote: > Yay! > I told Phoronix since I think they would deem this sufficiently newsworthy. ah yeah that's a good idea. primarily i wanted to see if there's anyone in the bitcoin community interested in this. there was a company i'd been speaking to who wanted to do something based on RISC-V. rather sheepishly i had to explain to them the conflict between "making things transparent and public" and "the way ISA mods are done in RISC-V". they believed that they could do the modifications as a custom extension, which they probably can... except they end up being the permanent maintainers of a hard fork of gcc, llvm, binutils, u-boot, linux kernel, libc6 and so on. oops. l. From programmerjake at gmail.com Wed Aug 11 13:14:47 2021 From: programmerjake at gmail.com (Jacob Lifshay) Date: Wed, 11 Aug 2021 05:14:47 -0700 Subject: [Libre-soc-dev] Fwd: NLnet cryotoprimitives grant approved In-Reply-To: References:

Message-ID: On Tue, Aug 10, 2021, 10:32 Luke Kenneth Casson Leighton wrote: > On Tue, Aug 10, 2021 at 6:10 PM Jacob Lifshay > wrote: > > > Yay! > > I told Phoronix since I think they would deem this sufficiently > newsworthy. > > ah yeah that's a good idea. > https://www.phoronix.com/scan.php?page=news_item&px=Libre-SoC-Crypto-Project Jacob Lifshay > From luke.leighton at gmail.com Thu Aug 12 13:21:44 2021 From: luke.leighton at gmail.com (lkcl) Date: Thu, 12 Aug 2021 12:21:44 +0000 Subject: [Libre-soc-dev] [RFC] SVP64 Vertical-First Mode, batch processing Message-ID: since adding Vertical-First Mode, which is very cool, a lot simpler to add into compilers, and closer to Mitch Alsup's MyISA 66000 Virtual Vectors, the implications have taken some time to sink in. VF Mode does *not* increment srcstep/dststep automatically on running an instruction: srcstep/dststep *remain where they are*. an explicit instruction, svstep, is called to increment src/dststep, then a branch-conditional test of whether VL has been reached, loop back on a BATCH of instructions to do the next element(s). the next logical evolution on that is: do you allow just the one element per instruction to be executed? or do you allow up to a certain explicit set limit? in Mitch Alsup's MyISA 66000 it is entirely up to the hardware to determine and decide that "batch size". the idea being: for very simple hardware, the batch size (number of elements executed per instruction) is definitely one. this means that the VVM Loop is basically very similar to Power ISA Branch CTR automatic decrementing. this is also the "fallback" position for complex hardware if it cannot determine it can do multiple elements safely. more complex hardware in MyISA 66000 can use OoO in-flight buffers. the caveat: the VVM loop has to be short enough that the engine can analyse the entire loop (a couple of cache lines), and determine that even memory accesses inside the loop are "safe", and thus determine the element batch size, which, obviously, has to be fixed for the ENTIRE loop. (it's no good executing 3 elements of the vector for the first instruction then doing 5 for the next, you are guaranteed data corruption that way) the limitations: you can't do branches inside the loop, you can't call functions, and the only way to get Vectors per se is to use memory LD/STs. for most situations this is perfectly fine, for us it's not. also, critically relying on an OoO engine to determine the batch size, i am not happy with that. so the initial idea is, to have a "Batch Hint" size, very similar to VL. the compiler informs the hardware "you can safely do up to this many elements per instruction, please tell me exactly how many you CAN do". ironically you should recognise that as the EXACT same rules for Cray Vectors setvl! here's where it gets complicated, given how far along we are. i initially thought, "we need a new hint SPR, like VL and MAXVL, called VFHintLen". this hint would be completely separate from VL and MVL, still within the limits of VL and MVL. VFHintLen <= VL <= MVL and you execute batches of length VFHintLen until hitting VL however what i have just come to realise is: actually, VFHintLen is redundant.... *if VL is made to do its job*. in Horizontal-First Mode we have: * MVL set to max reservation (statically determined by compiler) * VL set dynamically at runtime to explicit value * loops go from 0 to VL-1 in VF Mode currently it is: * MVL set to max reservation (statically determined by compiler) * VL set dynamically at runtime to explicit value * VFHint *requested* but is set to hw limit * VFHint elements are run in batches limited by VL example, MVL=12, VL=10, VFH=3 * first time round a loop elements 0 1 2 are executed in parallel * svstep called, src/dststep incremented by VFHint (3) * second loop elements 3 4 5 executed in parallel * svstep called, src/dst incremented to 6 * third loop elements 6 7 8 executed in parallel * svstep called, src/dst incremented to 9 * fourth loop ONLY element 9 executed because VL=10 * svstep sets CR0 to 1 to indicate "src/dst exceeds VL" * Branch-Conditional fails, loop is exited notice how MVL was wasted, there? what i *believe* we may be able to do is: do without VFHint and use *VL and MVL instead*. example, in Vertical-First mode: * MVL would be set to 10 (as an immediate) * VL would be *requested* to be set to a given dynamic value, but would be set to a value that HARDWARE determines it can cope with * proceed same as above but src/dst step test against **VL** not VFHint and * svstep tests a limit against **MVL** not VL. basically all testing of the limit of src/dststep right now is: if srcstep < VL srcstep increments i propose this change to: if HorizontalFirst if srcstep < VL srstsep increments else if VerticalFirst if srcstep < *MAXVL* srcstep increments questions, comments? l. From richard.wilbur at gmail.com Thu Aug 12 22:37:16 2021 From: richard.wilbur at gmail.com (Richard Wilbur) Date: Thu, 12 Aug 2021 15:37:16 -0600 Subject: [Libre-soc-dev] [RFC] SVP64 Vertical-First Mode, batch processing Message-ID: <1FAAD960-18E2-406B-88D4-96613BF7949E@gmail.com> > On Aug 12, 2021, at 06:23, lkcl wrote: > > since adding Vertical-First Mode, which is very cool, a lot simpler to add into compilers, and closer to Mitch Alsup's MyISA 66000 Virtual Vectors, the implications have taken some time to sink in. Very cool indeed. Sounds like Mitch Alsup’s MyISA 66000 design would be very interesting reading. Is there public documentation? It is interesting to me how reminiscent this is of my proposal back in 1988-1990 of a massively serial machine that would decode a section of code and configure connections between functional units and data dependencies. Then it would go run the code limited only by the timing of data availability. > VF Mode does *not* increment srcstep/dststep automatically on running an instruction: srcstep/dststep *remain where they are*. an explicit instruction, svstep, is called to increment src/dststep, then a branch-conditional test of whether VL has been reached, loop back on a BATCH of instructions to do the next element(s). What if svstep was a state associated with the branch instruction in the Finite State Machine implementing Vertical-First Mode instead of requiring a separate op code, cache space, and a decode slot? Is svstep used outside of the Vertical-First Mode context? […] > i propose this change to: > > if HorizontalFirst > if srcstep < VL > srstsep increments > else if VerticalFirst > if srcstep < *MAXVL* > srcstep increments > > questions, comments? Sounds like a good thing. From luke.leighton at gmail.com Thu Aug 12 23:14:48 2021 From: luke.leighton at gmail.com (lkcl) Date: Thu, 12 Aug 2021 22:14:48 +0000 Subject: [Libre-soc-dev] [RFC] SVP64 Vertical-First Mode, batch processing In-Reply-To: <1FAAD960-18E2-406B-88D4-96613BF7949E@gmail.com> References: <1FAAD960-18E2-406B-88D4-96613BF7949E@gmail.com> Message-ID: On August 12, 2021 9:37:16 PM UTC, Richard Wilbur wrote: >Very cool indeed. Sounds like Mitch Alsup’s MyISA 66000 design would >be very interesting reading. Is there public documentation? ah no. you can however email Mitch (check comp.arch newsgroup) and request it. > >It is interesting to me how reminiscent this is of my proposal back in >1988-1990 of a massively serial machine that would decode a section of >code and configure connections between functional units and data >dependencies. Then it would go run the code limited only by the timing >of data availability. intriguing >What if svstep was a state associated with the branch instruction in >the Finite State Machine implementing Vertical-First Mode instead of >requiring a separate op code, cache space, and a decode slot? Is >svstep used outside of the Vertical-First Mode context? yes it is. sv.step (a Horizontal version of single-step) can be used to obtain a Vector of Condition Registers, where each CR Field contains whether a given src step is part of a "loop end condition". let us say that VL=4, you call sv.step. (Rc=1) the result will be that CR0=0 CR1=0 CR2=0 CR3=1 because VL=4, and the end condition of the loop 0..VL-1 terminates at CR3, CR3 gets a "1". it gets more complex when REMAP is involved: there you can extract the end-points of the inner, middle *and* outer REMAP loop end-conditions. e.g. if you use MATRIX remap, a 2x2 matrix: CR0=b00 CR1=0b01 CR2=0b10 CR3=0b11 >[…] >> i propose this change to: >> >> if HorizontalFirst >> if srcstep < VL >> srstsep increments >> else if VerticalFirst >> if srcstep < *MAXVL* >> srcstep increments >> >> questions, comments? > >Sounds like a good thing. my only concern is, should MVL be restricted to an immediate (for VFirst mode) or should it be allowed to be set via a register (RA). whilst the logic behind making MVL compile-time static for Horizontal Mode is obvious, i haven't got my head round Vertical Mode yet. l. From luke.leighton at gmail.com Fri Aug 13 00:22:56 2021 From: luke.leighton at gmail.com (lkcl) Date: Thu, 12 Aug 2021 23:22:56 +0000 Subject: [Libre-soc-dev] [RFC] SVP64 Vertical-First Mode, batch processing In-Reply-To: References: <1FAAD960-18E2-406B-88D4-96613BF7949E@gmail.com> Message-ID: <056BE990-F94D-4E50-B9AF-D5769A9B25E8@gmail.com> On August 12, 2021 10:14:48 PM UTC, lkcl wrote: >On August 12, 2021 9:37:16 PM UTC, Richard Wilbur > wrote: >>What if svstep was a state associated with the branch instruction in >>the Finite State Machine implementing Vertical-First Mode instead of >>requiring a separate op code, cache space, and a decode slot? forgot to say, the svstep instruction has a lot more options than sv.bc, and there are not enough bits available spare in 24 bit RM. also the number of registers that go in and out of bc is already really high: in: * SVSTATE * CIA * LR * CTR * CR out: * SVSTATE * NIA * CTR * LR that's a hell of a lot of registers. an svstep variant of bc would also need to write to CR. that's *ten* registers, 5 read, 5 write, i don't think any other instruction in the whole of Power ISA has anywhere near that many. l. From luke.leighton at gmail.com Fri Aug 13 16:14:43 2021 From: luke.leighton at gmail.com (lkcl) Date: Fri, 13 Aug 2021 15:14:43 +0000 Subject: [Libre-soc-dev] [RFC] SVP64 Vertical-First Mode, batch processing In-Reply-To: References: <1FAAD960-18E2-406B-88D4-96613BF7949E@gmail.com> Message-ID: <52A9B46C-43E4-4625-AC6C-C58B3E62FA21@gmail.com> On August 12, 2021 10:14:48 PM UTC, lkcl wrote: > > >On August 12, 2021 9:37:16 PM UTC, Richard Wilbur > wrote: >>> i propose this change to: >>> >>> if HorizontalFirst >>> if srcstep < VL >>> srstsep increments >>> else if VerticalFirst >>> if srcstep < *MAXVL* >>> srcstep increments >>> >>> questions, comments? >> >>Sounds like a good thing. > >my only concern is, should MVL be restricted to an immediate (for >VFirst mode) or should it be allowed to be set via a register (RA). > >whilst the logic behind making MVL compile-time static for Horizontal >Mode is obvious, i haven't got my head round Vertical Mode yet. Horizontal-First, you perform these types of loops: setmaxvli 8 loop: setvl r5, r3 # VL=r5=MAX(MVL, r3) sv.ld r20.v, r4(0) # load VL elements (max 8) sv.addi r20.v, r20.v, 55 # add 55 to all vector sv.st r20.v, r4(0) # store VL elements add r4, r4, r5 # move r4 pointer forward sub. r3, r3, r5 # decrement total count by VL bnz loop this will always do 8 elements at a time until r3 drops below 8. VerticalFirst you insert a *second inner loop* with an svstep instruction just before the bnz but also, at the moment, rather than just setmaxvli 8 is is: setmaxvvlandvfhint 8, 2 # MVL=8, VFHint=2 if the hardware *chooses* to set VFHint=2, there we will always have 2 elements at a time in the inner loop, until srcstep reaches VL setmaxvvlandvfhint 8, 2 # MVL=8, VFHint=2 loop: setvl r5, r3 # VL=r5=MAX(MVL, r3) loopinner: sv.ld r20.v, r4(0) # load VLhint elements (max 2) sv.addi r20.v, r20.v, 55 # add 55 to 2 elements sv.st r20.v, r4(0) # store VLhint elements svstep. # srcstep += VLhint bnz loopinner # repeat until srcstep=VL # now done VL elements, move to next batch add r4, r4, r5 # move r4 pointer forward sub. r3, r3, r5 # decrement total count by VL bnz loop the question is, then: can we get rid of the inner loop? and if we do can anything useful be done? i have a feeling, looking at this assembler, that VLhint genuinely serves a different purpose *in addition* to VL and MAXVL. (btw aside: svstep+bnz was why i wanted a step-and-test branch conditional instruction but it's too CISC) l. From madan.kartheessan at gmail.com Fri Aug 13 18:17:12 2021 From: madan.kartheessan at gmail.com (Madan Kartheessan) Date: Fri, 13 Aug 2021 22:47:12 +0530 Subject: [Libre-soc-dev] General Introduction Message-ID: Hello all: Good evening, I am from Chennai (formerly, Madras) , India. I work for Object Automation Software Solutions Private Limited. My title is Techno Project Manager. I also teach Python, its libraries like Pandas, Numpy, Scikit-learn, etc. and Machine Learning algorithms. I am happy to join the libre-soc-dev list. Happy weekend. Regards Madan K. From lkcl at lkcl.net Fri Aug 13 18:21:34 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Fri, 13 Aug 2021 18:21:34 +0100 Subject: [Libre-soc-dev] General Introduction In-Reply-To: References: Message-ID: (ccing you, Madan, as you are not yet subscribed) On Fri, Aug 13, 2021 at 6:17 PM Madan Kartheessan < madan.kartheessan at gmail.com> wrote: > Hello all: > Good evening, I am from Chennai (formerly, Madras) , India. I work for > Object Automation Software Solutions Private Limited. My title is Techno > Project Manager. I also teach Python, its libraries like Pandas, Numpy, > Scikit-learn, etc. and Machine Learning algorithms. > fantastic, great to hear from you Madan. that's very interesting to hear that you have extensive knowledge of numpy. I am happy to join the libre-soc-dev list. > ok so here, at this page, fill in your email address, name (if you want) and you can leave the password field blank, one will be created and emailed to you: http://lists.libre-soc.org/mailman/listinfo/libre-soc-dev i approved this message you sent to the list, it is better if you subscribe yourself, then you can receive all the messages sent to the list. best, l. From umbertocerrato at outlook.it Fri Aug 13 18:46:13 2021 From: umbertocerrato at outlook.it (Umberto Cerrato) Date: Fri, 13 Aug 2021 17:46:13 +0000 Subject: [Libre-soc-dev] General Introduction In-Reply-To: References: Message-ID: Hello, Welcome From luke.leighton at gmail.com Fri Aug 13 21:48:23 2021 From: luke.leighton at gmail.com (lkcl) Date: Fri, 13 Aug 2021 20:48:23 +0000 Subject: [Libre-soc-dev] [RFC] SVP64 Vertical-First Mode, batch processing In-Reply-To: <52A9B46C-43E4-4625-AC6C-C58B3E62FA21@gmail.com> References: <1FAAD960-18E2-406B-88D4-96613BF7949E@gmail.com> <52A9B46C-43E4-4625-AC6C-C58B3E62FA21@gmail.com> Message-ID: <8F813409-2548-446F-B4E6-E44429A1142B@gmail.com> On August 13, 2021 3:14:43 PM UTC, lkcl wrote: >Horizontal-First, you perform these types of loops: > > setmaxvli 8 >loop: > setvl r5, r3 # VL=r5=MAX(MVL, r3) > sv.ld r20.v, r4(0) # load VL elements (max 8) > sv.addi r20.v, r20.v, 55 # add 55 to all vector > sv.st r20.v, r4(0) # store VL elements > add r4, r4, r5 # move r4 pointer forward > sub. r3, r3, r5 # decrement total count by VL > bnz loop oo, oo, i just had an idea. setvlc r5 # VL=r5=MAX(MVL, CTR) ... ... add r4, r4, r5 sv.bnz/VLCTR # subtracts VL from CTR SVSTATE is *already* going into sv.bc so it is not a hardship to subtract VL from CTR. this reduces critical inner loops by one instruction and frees up a GPR. using CTR for loops is normal in Power ISA anyway. doesn't help with VFHint though. l From lkcl at lkcl.net Sat Aug 14 22:57:55 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Sat, 14 Aug 2021 22:57:55 +0100 Subject: [Libre-soc-dev] General Introduction In-Reply-To: References: Message-ID: On Fri, Aug 13, 2021 at 6:21 PM Luke Kenneth Casson Leighton wrote: > (ccing you, Madan, as you are not yet subscribed) > Madan, i checked the list mailing list membership, and you are subscribed... .... but you sent the intro from a *completely different* email address (one that you have *not* subscribed with - the gmail account) this was why i received a moderation request for a post from a non-member. there are a couple of solutions here: 1) subscribe the *gmail* account as well and set "nomail" (otherwise you receive 2 copies of the list posts) 2) set up "send as a 2nd email address" on gmail, *and remember to use it*. honestly, i just do (1) because sometimes i forget. what you wrote as an introduction is perfect to put on the team page http://libre-soc.org/about_us best, l. From programmerjake at gmail.com Sun Aug 15 06:46:01 2021 From: programmerjake at gmail.com (Jacob Lifshay) Date: Sat, 14 Aug 2021 22:46:01 -0700 Subject: [Libre-soc-dev] General Introduction In-Reply-To: References: Message-ID: On Fri, Aug 13, 2021, 10:18 Madan Kartheessan wrote: > Hello all: Welcome! Always glad to have more people interested in Libre-SOC! Jacob Lifshay From lkcl at lkcl.net Sun Aug 15 17:24:31 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Sun, 15 Aug 2021 17:24:31 +0100 Subject: [Libre-soc-dev] [RFC] SVP64 on branch instructions In-Reply-To: References:

Message-ID: ok so there are now two first unit tests, "sv.bc" and "sv.bc/all". https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_bc.py;h=5378040085995813070ccaa9cbe28a1add9a5e81;hb=c3b9973df8edcb1f6c1583c2da693336af7d1921#l80 sv.bc/all makes a bit of a mess of the pseudocode, it's a Finite State Machine where ISACaller is calling the sv.bc operation once per element. in the case of sv.bc/all it is necessary to branch *ONLY* when *ALL* tests are successful... but the tests actually need to be done. normal bc pseudocode: if cond_ok then branch sv.bc pseudocode: if cond_ok and last element in VL loop: branch it's more complex than that, though. if NOT last element and cond NOT ok terminate entire VL loop with early-out non-ALL mode (ANY mode) is more straightforward, but again, on branch, you must not continue to do further tests! so branch definitely terminates the VL loop... ... but in Vertical-First Mode it's a completely different story, much more like a standard scalar branch. there's an awful lot going on, quite fascinating. l. From lkcl at lkcl.net Sun Aug 15 19:10:59 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Sun, 15 Aug 2021 19:10:59 +0100 Subject: [Libre-soc-dev] General Introduction In-Reply-To: References: Message-ID: On Sat, Aug 14, 2021 at 10:57 PM Luke Kenneth Casson Leighton wrote: > Madan, https://libre-soc.org/irclog/%23libre-soc.2021-08-15.log.html#t2021-08-15T17:18:09 great! that looks like a successful join of the #libre-soc IRC channel. normally, one would leave the IRC client running in order for people to notice, and respond, and say hello (hold a conversation). this is why i said that you should leave the IRC client running 24x7, or use "bnc4you" or other IRC proxy. if you log on, ask a question, then leave immediately, how can you receive the answer? it is like going into a room where people are having conversations, making an announcement, then walking out without waiting for anyone to turn around :) IRC conversations can often take 36 hours round-trip because people are in different timezones. you need to adjust expectations accordingly. l. From lkcl at lkcl.net Mon Aug 16 14:09:42 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Mon, 16 Aug 2021 14:09:42 +0100 Subject: [Libre-soc-dev] OA minutes 2021 aug 10 In-Reply-To: References: Message-ID: Madan, hi, i saw the update to the minutes, which is great, moving the contents that had been added to a page dedicated to another set of minutes (with a different date) https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=1fa1211c74a3b3dc4878bfdf7c479a79ac7b6a47 normal practice would be to then do a follow-up email informing everyone that that action has been taken. now that everyone is subscribed to the mailing list, you can use the mailing list for that purpose: "This is Madan: I have updated the minutes of the meeting that took place on Aug 10th, please can everyone read and review them" so, to illustrate by example, here is a summary of what i have done: 1) made some whitespace and formatting corrections to both minutes pages 2) created a new (template) page for the meeting tomorrow, https://libre-soc.org/oa/minutes/2021aug17/ given that you (and you alone) have sent an email to the list, and also joined (briefly) the IRC channel, we will go ahead with tomorrow's meeting, and we will allocate some time for you to explain to everyone else what you did and how you did it. i do expect everyone in the team to complete these tasks promptly, they are extremely basic, fundamental, and absolutely critical. projects are not about the code, they're about the communication. l. From niranjan at object-automation.com Mon Aug 16 16:57:47 2021 From: niranjan at object-automation.com (niranjan at object-automation.com) Date: Mon, 16 Aug 2021 11:57:47 -0400 Subject: [Libre-soc-dev] General Introduction Message-ID: <60c1dcfef7436bb25aaa577d5741d081.squirrel@email.powweb.com> Hello all, I am Niranjan, from Kerala, India. I am a third year B.Tech student at Indian Institute of Technology Madras (IITM), doing a project at Object Automation Software Solutions Pvt Ltd. I'm happy to join the libre-soc-dev mailing list. Thanks and regards, Niranjan J Nair From niranjan at object-automation.com Mon Aug 16 16:57:48 2021 From: niranjan at object-automation.com (niranjan at object-automation.com) Date: Mon, 16 Aug 2021 11:57:48 -0400 Subject: [Libre-soc-dev] General Introduction Message-ID: Hello all, I am Niranjan, from Kerala, India. I am a third year B.Tech student at Indian Institute of Technology Madras (IITM), doing a project at Object Automation Software Solutions Pvt Ltd. I'm happy to join the libre-soc-dev mailing list. Thanks and regards, Niranjan J Nair From luke.leighton at gmail.com Mon Aug 16 17:19:52 2021 From: luke.leighton at gmail.com (lkcl) Date: Mon, 16 Aug 2021 16:19:52 +0000 Subject: [Libre-soc-dev] General Introduction In-Reply-To: References: Message-ID: <387E632F-AD18-4A95-93CA-DDD510E19216@gmail.com> On August 16, 2021 3:57:48 PM UTC, niranjan at object-automation.com wrote: >Hello all, > >I am Niranjan, from Kerala, India. I am a third year B.Tech student at >Indian Institute of Technology Madras (IITM), doing a project at Object >Automation Software Solutions Pvt Ltd. >I'm happy to join the libre-soc-dev mailing list. fantastic, great to hear from you, welcome. have you reviewed the Charter and are you happy to abide by it? http://libre-soc.org/charter any questions about it feel free to ask. also do edit the wiki page and add yourself to http://libre-soc.org/about_us. best, l. From madan.kartheessan at gmail.com Mon Aug 16 17:17:04 2021 From: madan.kartheessan at gmail.com (Madan Kartheessan) Date: Mon, 16 Aug 2021 21:47:04 +0530 Subject: [Libre-soc-dev] Issues with madan.kartheessan@gmail.com Message-ID: Hi: I have subscribed using both of my email IDs to the libre-soc-dev list. 1) madan at object-automation.com 2) madan.kartheessan at gmail.com I am sending this mail using madan at object-automation.com. But, I am not able to see *"madan.kartheessan at gmail.com "* listed under the subscribers list of libre-soc-dev. I am able to see "madan at object-automation.com" When I try to login using madan.kartheessan at gmail.com and give the password , I get the message "*Libre-soc-dev roster authentication failed."* Regards Madan K. From luke.leighton at gmail.com Mon Aug 16 17:21:27 2021 From: luke.leighton at gmail.com (lkcl) Date: Mon, 16 Aug 2021 16:21:27 +0000 Subject: [Libre-soc-dev] General Introduction In-Reply-To: <60c1dcfef7436bb25aaa577d5741d081.squirrel@email.powweb.com> References: <60c1dcfef7436bb25aaa577d5741d081.squirrel@email.powweb.com> Message-ID: <986A9949-B7FE-4BD8-AD70-5696F21564FE@gmail.com> On August 16, 2021 3:57:47 PM UTC, niranjan at object-automation.com wrote: >Hello all, > >I am Niranjan, from Kerala, India. I am a third year B.Tech student at >Indian Institute of Technology Madras (IITM), doing a project at Object >Automation Software Solutions Pvt Ltd. >I'm happy to join the libre-soc-dev mailing list fantastic, great to hear from you as well, Niranjan. next steps are to review the Charter and reply if you are happy to abide by it or if you have any questions. best, l. From gautham at object-automation.com Mon Aug 16 17:31:34 2021 From: gautham at object-automation.com (gautham at object-automation.com) Date: Mon, 16 Aug 2021 12:31:34 -0400 Subject: [Libre-soc-dev] General Introduction Message-ID: Hello Everyone! I am Gautham, from Kerala, India. I am a pre-final year student at the Department of Electrical Engineering, Indian Institute of Technology Madras. I have some exprerience with Gate Level Design and Verilog. I am also familiar with C, C++ and Python. I also fool around with popular Machine Learning algorithms and packages. I also went through the charter and found it very interesting! Of course, I agree to abide by it. Very happy to be part of the libre-soc community! Regards Gautham From libre-soc at platen-software.de Mon Aug 16 17:32:56 2021 From: libre-soc at platen-software.de (Tobias Platen) Date: Mon, 16 Aug 2021 18:32:56 +0200 Subject: [Libre-soc-dev] daily kan-ban update 16aug2021 Message-ID: <51b21ccea7ac4e1af1bf93261f6ab3a3dd8f24f6.camel@platen-software.de> today: continuing where I left two weeks ago From lkcl at lkcl.net Mon Aug 16 17:34:58 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Mon, 16 Aug 2021 17:34:58 +0100 Subject: [Libre-soc-dev] Issues with madan.kartheessan@gmail.com In-Reply-To: References: Message-ID: On Mon, Aug 16, 2021 at 5:20 PM Madan Kartheessan < madan.kartheessan at gmail.com> wrote: > Hi: > I have subscribed using both of my email IDs to the libre-soc-dev list. > > 1) madan at object-automation.com > 2) madan.kartheessan at gmail.com i checked the subscriber list, it's not a member. there are however two messages in the exim4 logs to that email address, one in and one out, one at 16:58 (40 mins ago) and one at 17:20 (10 mins ago). try the subscription again. l. From lkcl at lkcl.net Mon Aug 16 17:48:31 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Mon, 16 Aug 2021 17:48:31 +0100 Subject: [Libre-soc-dev] General Introduction In-Reply-To: References: Message-ID: On Mon, Aug 16, 2021 at 5:32 PM wrote: > Hello Everyone! > > I am Gautham, from Kerala, India. I am a pre-final year student at the > Department of Electrical Engineering, Indian Institute of Technology > Madras. I have some exprerience with Gate Level Design and Verilog. I am > also familiar with C, C++ and Python. nice! I also fool around with popular > Machine Learning algorithms and packages. > very cool. do put all of that on this page https://libre-soc.org/about_us/ i created a section "Object Automation". > I also went through the charter and found it very interesting! Of course, > I agree to abide by it. Very happy to be part of the libre-soc community! > :) great to hear. if you can generate an ssh key i can add you to the gitolite3 git access. instructions for generating the ssh key are in https://libre-soc.org/HDL_workflow/ search for "ssh-keygen". l. From programmerjake at gmail.com Mon Aug 16 18:05:01 2021 From: programmerjake at gmail.com (Jacob Lifshay) Date: Mon, 16 Aug 2021 10:05:01 -0700 Subject: [Libre-soc-dev] General Introduction In-Reply-To: References: Message-ID: On Mon, Aug 16, 2021, 08:58 wrote: > Hello all, > > I am Niranjan, from Kerala, India. I am a third year B.Tech student at > Indian Institute of Technology Madras (IITM), doing a project at Object > Automation Software Solutions Pvt Ltd. > Welcome! Jacob Lifshay From programmerjake at gmail.com Mon Aug 16 18:06:50 2021 From: programmerjake at gmail.com (Jacob Lifshay) Date: Mon, 16 Aug 2021 10:06:50 -0700 Subject: [Libre-soc-dev] General Introduction In-Reply-To: References: Message-ID: On Mon, Aug 16, 2021, 09:32 wrote: > Hello Everyone! > > I am Gautham, from Kerala, India. I am a pre-final year student at the > Department of Electrical Engineering, Indian Institute of Technology > Madras. Welcome! I have some exprerience with Gate Level Design and Verilog. I am > also familiar with C, C++ and Python. I also fool around with popular > Machine Learning algorithms and packages. > Neat! Jacob Lifshay From madan.kartheessan at gmail.com Mon Aug 16 18:51:38 2021 From: madan.kartheessan at gmail.com (Madan Kartheessan) Date: Mon, 16 Aug 2021 23:21:38 +0530 Subject: [Libre-soc-dev] =?utf-8?q?Minutes_of_the_Meeting=E2=80=94August_6?= =?utf-8?q?_and_10?= Message-ID: Luke, David and the Object Automation team Using the link below, you will be able to read the August 6 and August 10 MoMs of Libre-SoC and the Object Automation teams.. https://libre-soc.org/oa/minutes/ Regards Madan K. From gautham at object-automation.com Mon Aug 16 18:52:17 2021 From: gautham at object-automation.com (gautham at object-automation.com) Date: Mon, 16 Aug 2021 13:52:17 -0400 Subject: [Libre-soc-dev] General Introduction In-Reply-To: References: Message-ID: <56454d111c41b22063c827c0a0705d42.squirrel@email.powweb.com> Hi, I have updated the "About Us" page. I am also sending my public ssh key. Gautham From libre-soc at platen-software.de Mon Aug 16 18:58:18 2021 From: libre-soc at platen-software.de (Tobias Platen) Date: Mon, 16 Aug 2021 19:58:18 +0200 Subject: [Libre-soc-dev] daily kan-ban update 16aug2021 In-Reply-To: <51b21ccea7ac4e1af1bf93261f6ab3a3dd8f24f6.camel@platen-software.de> References: <51b21ccea7ac4e1af1bf93261f6ab3a3dd8f24f6.camel@platen-software.de> Message-ID: <20210816195818.55667402c25ad5ec0f2387da@platen-software.de> On Mon, 16 Aug 2021 18:32:56 +0200 Tobias Platen wrote: > today: continuing where I left two weeks ago this includes fixing the renamed symbols. I get an AttributeError in the store function: def store(dut, src1, src2, src3, imm, imm_ok=True, update=False, byterev=True): print("ST", src1, src2, src3, imm, imm_ok, update) yield dut.oper_i.insn_type.eq(MicrOp.OP_STORE) yield dut.oper_i.data_len.eq(2) # half-word yield dut.oper_i.byte_reverse.eq(byterev) yield dut.src1_i.eq(src1) yield dut.src2_i.eq(src2) yield dut.src3_i.eq(src3) #FIXME -- symbols have been renamed -- #orig yield dut.oper_i.imm_data.imm.eq(imm) #orig yield dut.oper_i.imm_data.ok.eq(imm_ok) #orig yield dut.oper_i.update.eq(update) yield dut.oper_i.imm_data.data.eq(imm) yield dut.oper_i.imm_data.ok.eq(imm_ok) #error here: yield dut.oper_i.update.eq(update) #AttributeError: Record 'oper_i_None' does not have a field 'update'. #Did you mean one of: insn_type, fn_unit, imm_data, zero_a, rc, oe, #msr, is_32bit, is_signed, data_len, byte_reverse, sign_extend, #ldst_mode, insn, sv_pred_sz, sv_pred_dz, sv_saturate, sv_ldstmode, SV_Ptype yield dut.issue_i.eq(1) yield yield dut.issue_i.eq(0) > > > > > _______________________________________________ > Libre-soc-dev mailing list > Libre-soc-dev at lists.libre-soc.org > http://lists.libre-soc.org/mailman/listinfo/libre-soc-dev -- Tobias Platen From madan.kartheessan at gmail.com Mon Aug 16 19:03:52 2021 From: madan.kartheessan at gmail.com (Madan Kartheessan) Date: Mon, 16 Aug 2021 23:33:52 +0530 Subject: [Libre-soc-dev] MoM of Libre-SoC and the Object Automation teams Message-ID: Luke Thanks for putting the links in appropriate MoM and also formatting the content. It looks amazing now. Regards Madan K. From adigopzz3 at gmail.com Mon Aug 16 19:18:56 2021 From: adigopzz3 at gmail.com (Adithya Gopan) Date: Mon, 16 Aug 2021 23:48:56 +0530 Subject: [Libre-soc-dev] General Introduction Message-ID: Hello everyone, I am Adithya Gopan from Kerala, India. I am a third year BTech student majoring in Electrical Engineering, studying in Indian Institute of Technology Madras. I am currently doing a project at Object Automation. I am very excited to join this libre-soc-dev mailing list and I am very excited to join the libre-soc-dev mailing list. I hope to have a really cool and interesting ride. From luke.leighton at gmail.com Mon Aug 16 19:33:38 2021 From: luke.leighton at gmail.com (lkcl) Date: Mon, 16 Aug 2021 18:33:38 +0000 Subject: [Libre-soc-dev] MoM of Libre-SoC and the Object Automation teams In-Reply-To: References: Message-ID: <7DE09405-ABF2-4F94-A888-FB8B9C1BAF25@gmail.com> On August 16, 2021 6:03:52 PM UTC, Madan Kartheessan wrote: >Luke > >Thanks for putting the links in appropriate MoM and also formatting the >content. It looks amazing now. i did the simplest thing for now, which is to use ``` to make it fixed-width font. if you read up on markdown format (google it) you can make future ones look even better. l. From luke.leighton at gmail.com Mon Aug 16 19:35:59 2021 From: luke.leighton at gmail.com (lkcl) Date: Mon, 16 Aug 2021 18:35:59 +0000 Subject: [Libre-soc-dev] General Introduction In-Reply-To: <56454d111c41b22063c827c0a0705d42.squirrel@email.powweb.com> References: <56454d111c41b22063c827c0a0705d42.squirrel@email.powweb.com> Message-ID: <43FA1787-DCD9-4F2F-A08C-D9654E6AECDD@gmail.com> On August 16, 2021 5:52:17 PM UTC, gautham at object-automation.com wrote: >Hi, > >I have updated the "About Us" page. saw that, looks great https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=67a8e09b081506318780d4b6b0b849e4380d5a61 >I am also sending my public ssh key. great, send that to me (luke.leighton at gmail.com) it does not need to go to the list. l. From adithya at object-automation.com Mon Aug 16 19:23:00 2021 From: adithya at object-automation.com (adithya at object-automation.com) Date: Mon, 16 Aug 2021 14:23:00 -0400 Subject: [Libre-soc-dev] General Introduction Message-ID: <4bb393863ed37e27698498aa92857793.squirrel@email.powweb.com> Hello everyone, I am Adithya Gopan from Kerala, India. I am a third year BTech student majoring in Electrical Engineering, studying in Indian Institute of Technology Madras. I am currently doing a project at Object Automation. I am very excited to join this libre-soc-dev mailing list and I am very excited to join the libre-soc-dev mailing list. I hope to have a really cool and interesting ride. From luke.leighton at gmail.com Mon Aug 16 19:41:49 2021 From: luke.leighton at gmail.com (lkcl) Date: Mon, 16 Aug 2021 18:41:49 +0000 Subject: [Libre-soc-dev] daily kan-ban update 16aug2021 In-Reply-To: <20210816195818.55667402c25ad5ec0f2387da@platen-software.de> References: <51b21ccea7ac4e1af1bf93261f6ab3a3dd8f24f6.camel@platen-software.de> <20210816195818.55667402c25ad5ec0f2387da@platen-software.de> Message-ID: <3CDEBA34-5753-4608-942F-154B14F8BC78@gmail.com> On August 16, 2021 5:58:18 PM UTC, Tobias Platen wrote: >On Mon, 16 Aug 2021 18:32:56 +0200 >Tobias Platen wrote: > >> today: continuing where I left two weeks ago >this includes fixing the renamed symbols. I get an AttributeError in >the store function: > store function of which file? >def store(dut, src1, src2, src3, imm, imm_ok=True, update=False, > byterev=True): > #orig yield dut.oper_i.update.eq(update) > yield dut.oper_i.imm_data.data.eq(imm) > yield dut.oper_i.imm_data.ok.eq(imm_ok) > #error here: yield dut.oper_i.update.eq(update) > #AttributeError: Record 'oper_i_None' does not have a field 'update'. this *might* now be ldst_mode, i think. but it is not a True/False it is an enum. you'll need to check the Record. recursive grep ldst_mode. l. From programmerjake at gmail.com Mon Aug 16 19:41:51 2021 From: programmerjake at gmail.com (Jacob Lifshay) Date: Mon, 16 Aug 2021 11:41:51 -0700 Subject: [Libre-soc-dev] General Introduction In-Reply-To: References: Message-ID: On Mon, Aug 16, 2021, 11:19 Adithya Gopan wrote: > Hello everyone, > > I am Adithya Gopan from Kerala, India. I am a third year BTech student > majoring in Electrical Engineering, studying in Indian Institute of > Technology Madras. > Welcome! I am currently doing a project at Object Automation. I am very excited to > join this libre-soc-dev mailing list and I am very excited to join the > libre-soc-dev mailing list. I hope to have a really cool and interesting ride. > :) Jacob Lifshay From luke.leighton at gmail.com Mon Aug 16 19:46:37 2021 From: luke.leighton at gmail.com (lkcl) Date: Mon, 16 Aug 2021 18:46:37 +0000 Subject: [Libre-soc-dev] General Introduction In-Reply-To: References: Message-ID: <29B147EC-B882-4F96-8939-2E7CF62239EB@gmail.com> On August 16, 2021 6:18:56 PM UTC, Adithya Gopan wrote: >Hello everyone, > >I am Adithya Gopan from Kerala, India. I am a third year BTech student >majoring in Electrical Engineering, studying in Indian Institute of >Technology Madras. >I am currently doing a project at Object Automation. I am very excited >to >join this libre-soc-dev mailing list and I am very excited to join the >libre-soc-dev mailing list. >I hope to have a really cool and interesting ride. great to hear from you, Adithya. do add yourself to the about us page, i just added a TODO for you https://libre-soc.org/about_us/ just copy what you wrote above, anything else you'd like to add, as well, feel free. also, just as with everyone, review the charter, sny questions ask straight away. best, l. From arjun at object-automation.com Mon Aug 16 20:16:43 2021 From: arjun at object-automation.com (arjun at object-automation.com) Date: Mon, 16 Aug 2021 15:16:43 -0400 Subject: [Libre-soc-dev] Hello from Arjun Nag Message-ID: Hello Everyone, Glad to get connected with the members of Libre soc team. Thanks & regards Arjun From lkcl at lkcl.net Mon Aug 16 20:47:22 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Mon, 16 Aug 2021 20:47:22 +0100 Subject: [Libre-soc-dev] Hello from Arjun Nag In-Reply-To: References: Message-ID: On Mon, Aug 16, 2021 at 8:17 PM wrote: > Hello Everyone, > > Glad to get connected with the members of Libre soc team. > :) looks like you got onto #libre-soc IRC channel, but left just after saying hello: https://libre-soc.org/irclog/%23libre-soc.2021-08-16.log.html#t2021-08-16T20:07:25 you can see from the logs that i said hello back, but you'd already left by that point. this is why i recommended using bnc4you (a persistent IRC proxy) or just leave the irc client active 24x7. l. From programmerjake at gmail.com Mon Aug 16 20:51:57 2021 From: programmerjake at gmail.com (Jacob Lifshay) Date: Mon, 16 Aug 2021 12:51:57 -0700 Subject: [Libre-soc-dev] Hello from Arjun Nag In-Reply-To: References: Message-ID: On Mon, Aug 16, 2021, 12:17 wrote: > Hello Everyone, > Glad to get connected with the members of Libre soc team. > Welcome! Jacob From arjunpartha99 at gmail.com Mon Aug 16 19:36:42 2021 From: arjunpartha99 at gmail.com (Arjun Nag) Date: Tue, 17 Aug 2021 00:06:42 +0530 Subject: [Libre-soc-dev] Arjun Nag Message-ID: Hello everyone, I am here with a quick intro... • Design Verification Engineer in the field of VLSI • Hands on Experience in System Verilog, Verilog & VHDL. • Good Knowledge of UVM (Universal Verification Methodology). • Good understanding of SoC level Test bench Architecture • Good Hands-on at RTL - C Co simulation on HLS tool • Well versed in creating the OVC’s and UVC’s • Good understanding of Processor boot flows • Good understanding of Verification flow at SoC level • Good Knowledge of FPGA, ASIC & SoC Design & Verification Life Cycles. • Excellent understanding of protocols like PCIe (Gen 2 & 3), UART, SPI and Compression/decompression Engine, QSPI, AMBA AXI, SMB & HTP • Good Knowledge in both functional and gate level simulations. • Possess hands on experience in Emulating complex SoC designs, Interface Build-up and Debugging. • Good understanding of Design Partitioning and Trimming. • Hands-on Working experience of Mentor Veloce Quattro 2 & MAXIMUS Emulators. • Mentor Veloce Quattro 2 based Emulation setup and chip compile/run hands on experience • Experience in Emulating Complex memory controllers and functional IP blocks in In Circuit Emulation (ICE) and TBX Modes. • Good understanding of differences between RTL and X-RTL. • Well versed in setting up software top and hardware top in a Top Level Emulation test bench. • Have good knowledge on protocols like PCIe Gen 2, UART, SPI and Compression/decompression Engine. • Excellent debugging skills • Excellent Documentation skills • Knowledge of Configuration Management tools like Tele logic DOORS and CM Synergy tools. • Fair Knowledge of IBM rational Tools like Clear Case and Clear Quest • Familiar with concepts of C Language and Object Oriented Methodology. • Willing to learn new skills and ability to learn fast. • Technical support to team members From lkcl at lkcl.net Mon Aug 16 21:03:46 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Mon, 16 Aug 2021 21:03:46 +0100 Subject: [Libre-soc-dev] Arjun Nag In-Reply-To: References: Message-ID: On Mon, Aug 16, 2021 at 9:01 PM Arjun Nag wrote: > Hello everyone, > I am here with a quick intro... > cool! you sent this from the gmail account, it went to moderation, so i approved it and have added you to be able to send in future without moderation. if you prefer, subscribe the gmail account as well. • Design Verification Engineer in the field of VLSI > • Hands on Experience in System Verilog, Verilog & VHDL. > these are fantastic skills to have, delighted to have you on board. best, l. From libre-soc at platen-software.de Tue Aug 17 18:54:23 2021 From: libre-soc at platen-software.de (Tobias Platen) Date: Tue, 17 Aug 2021 19:54:23 +0200 Subject: [Libre-soc-dev] daily kan-ban update 17aug2021 Message-ID: today: more work on dcbz testcase From niranjan at object-automation.com Tue Aug 17 19:10:55 2021 From: niranjan at object-automation.com (niranjan at object-automation.com) Date: Tue, 17 Aug 2021 14:10:55 -0400 Subject: [Libre-soc-dev] General Introduction In-Reply-To: <387E632F-AD18-4A95-93CA-DDD510E19216@gmail.com> References: <387E632F-AD18-4A95-93CA-DDD510E19216@gmail.com> Message-ID: > > > On August 16, 2021 3:57:48 PM UTC, niranjan at object-automation.com wrote: >>Hello all, >> >>I am Niranjan, from Kerala, India. I am a third year B.Tech student at >>Indian Institute of Technology Madras (IITM), doing a project at Object >>Automation Software Solutions Pvt Ltd. >>I'm happy to join the libre-soc-dev mailing list. > > fantastic, great to hear from you, welcome. > > have you reviewed the Charter and are you happy to abide by it? > http://libre-soc.org/charter any questions about it feel free to ask. > > also do edit the wiki page and add yourself to > http://libre-soc.org/about_us. Thank you, I have gone through the charter and would be happy to abide by it. I have also added myself to the wiki page and sent you an email with my ssh key. Thanks and regards. From luke.leighton at gmail.com Tue Aug 17 19:22:53 2021 From: luke.leighton at gmail.com (lkcl) Date: Tue, 17 Aug 2021 18:22:53 +0000 Subject: [Libre-soc-dev] General Introduction In-Reply-To: References: <387E632F-AD18-4A95-93CA-DDD510E19216@gmail.com> Message-ID: <467FCC64-2611-4791-9159-FEB283015AFB@gmail.com> On August 17, 2021 6:10:55 PM UTC, niranjan at object-automation.com wrote: >Thank you, I have gone through the charter and would be happy to abide >by >it. I have also added myself to the wiki page and sent you an email >with >my ssh key. brilliant, will add it shortly once i receive it btw, you remember what i said about "trim context"? you notice how i cut everything above? this you have to do manually. gmail is *not* your friend, here. they have this stupid thing with the 3 dots, which hides reply context. you *must* expand out the *full* message then edit out anything irrelevant, ok? it's explained in HDL_workflow. again, this is just standard netiquette of 30 years standing, for technical mailing lists. l. From libre-soc at platen-software.de Tue Aug 17 19:40:49 2021 From: libre-soc at platen-software.de (Tobias Platen) Date: Tue, 17 Aug 2021 20:40:49 +0200 Subject: [Libre-soc-dev] daily kan-ban update 17aug2021 In-Reply-To: References: Message-ID: <20210817204049.aae88eeee5acc419dd6a4e84@platen-software.de> On Tue, 17 Aug 2021 19:54:23 +0200 Tobias Platen wrote: > today: more work on dcbz testcase found two bugs in src/soc/experiment/compldst_multi.py, the first one is fixed, the second one is more complex > > _______________________________________________ > Libre-soc-dev mailing list > Libre-soc-dev at lists.libre-soc.org > http://lists.libre-soc.org/mailman/listinfo/libre-soc-dev -- Tobias Platen From niranjan at object-automation.com Tue Aug 17 20:15:45 2021 From: niranjan at object-automation.com (niranjan at object-automation.com) Date: Tue, 17 Aug 2021 15:15:45 -0400 Subject: [Libre-soc-dev] General Introduction In-Reply-To: <467FCC64-2611-4791-9159-FEB283015AFB@gmail.com> References: <387E632F-AD18-4A95-93CA-DDD510E19216@gmail.com> <467FCC64-2611-4791-9159-FEB283015AFB@gmail.com> Message-ID: <5e0bb24a441b79bed6f164d89661a393.squirrel@email.powweb.com> > btw, you remember what i said about "trim context"? you notice how i cut > everything above? this you have to do manually. > you *must* expand out the *full* message then edit out anything > irrelevant, ok? Sorry, I will keep that in mind and do so from now. Thank you. From niranjan at object-automation.com Tue Aug 17 20:15:44 2021 From: niranjan at object-automation.com (niranjan at object-automation.com) Date: Tue, 17 Aug 2021 15:15:44 -0400 Subject: [Libre-soc-dev] General Introduction In-Reply-To: <467FCC64-2611-4791-9159-FEB283015AFB@gmail.com> References: <387E632F-AD18-4A95-93CA-DDD510E19216@gmail.com> <467FCC64-2611-4791-9159-FEB283015AFB@gmail.com> Message-ID: > btw, you remember what i said about "trim context"? you notice how i cut > everything above? this you have to do manually. > you *must* expand out the *full* message then edit out anything > irrelevant, ok? Sorry, I will keep that in mind and do so from now. Thank you. From lkcl at lkcl.net Tue Aug 17 21:11:54 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Tue, 17 Aug 2021 21:11:54 +0100 Subject: [Libre-soc-dev] daily kan-ban update 17aug2021 In-Reply-To: <20210817204049.aae88eeee5acc419dd6a4e84@platen-software.de> References: <20210817204049.aae88eeee5acc419dd6a4e84@platen-software.de> Message-ID: On Tue, Aug 17, 2021 at 7:40 PM Tobias Platen wrote: > On Tue, 17 Aug 2021 19:54:23 +0200 > Tobias Platen wrote: > > > today: more work on dcbz testcase > found two bugs in src/soc/experiment/compldst_multi.py, > the first one is fixed, nice. the second one is more complex > Cesar is looking at LD/ST exceptions at the moment, be aware of that, you are both working on the same code. i will run the full test_issuer.py to make sure all's good. l. From luke.leighton at gmail.com Wed Aug 18 01:20:21 2021 From: luke.leighton at gmail.com (lkcl) Date: Wed, 18 Aug 2021 00:20:21 +0000 Subject: [Libre-soc-dev] General Introduction In-Reply-To: References: