From lkcl at lkcl.net Sun Aug 1 01:45:04 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Sun, 1 Aug 2021 01:45:04 +0100 Subject: [Libre-soc-dev] Inverse DCT In-Reply-To: References: Message-ID: On 7/30/21, Luke Kenneth Casson Leighton wrote: > next step, putting in a yield-based inverse DCT, was successful. > > next step is to link it into instructions and write a simulator unit test done, successfully. the problem comes with the LD instruction. FFT: bitreverse with shift DCT: recursive halfswap then bitreverse iDCT: bitreverse then inverse recursive halfswap this is just too much to fit into SVP64 24 bit prefix, and the LD-byterev is actually interfering with applying REMAP. what i am thinking of doing is removing bytereversing ftom the augmented LD and just having LD-with-shift, to which 3 REMAP modes above can be applied. this makes FFT about 13 instructions rather than 11 but pffh. l. From lkcl at lkcl.net Sun Aug 1 10:59:37 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Sun, 1 Aug 2021 10:59:37 +0100 Subject: [Libre-soc-dev] [RFC] merging parallel reduction into REMAP Message-ID: https://libre-soc.org/openpower/sv/svp64/appendix/?updated#index14h1 i'm looking at the parallel reduction algorithm and note that it is remarkably similar to the REMAP schedule for DCT COS table generation. 8 4 2 1 which is exactly the kind of thing i was looking for, to make general abstractions. the first issue is, however, that it is not ok to have two separate and distinct operations. the parallel reduxtion pseudocode has two operations: 1) the operation requested 2) a MV operation the MV has to go. a trick i have been using in the simulator "yield" iterators is to create redirection lookup indices. i am reasonably confident that these can be blatted down to O(1) at gate level, however they give an idea: instead of MVing the data, use the predicate bits to sequentially "step over" the data: j = 0 for i, pbit in enumerate(predicate_bits): if pbit == 1: lookup[j] = i j += 1 then use lookup[index] in all register accessing. i will update the pseudocode with this idea, to see what it looks like. l. From lkcl at lkcl.net Sun Aug 1 14:24:16 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Sun, 1 Aug 2021 14:24:16 +0100 Subject: [Libre-soc-dev] [RFC] SVP64 on branch instructions Message-ID: https://libre-soc.org/openpower/isa/branch/ it occurs to me only just now that we completely forgot to evaluate SVP64 interaction on branches, particularly when bc involves CRs. context: i started looking at this because svstep for Vertical-First Mode requires explicit incrementing of src/dst step, thrn a loop end test, followed by a bc on CR0. this is near identical to what CTR is for. consequently, there is a case for adding a special SVP64 bc mode to check the svstep conditions instead of CTR. the other thing is, what does Vectorised bc mean? and what does predicated Vectorised bc mean? should modes be added which check *all* CR fields bring tested, or just one, or add a bit to select either? l. From programmerjake at gmail.com Sun Aug 1 18:15:41 2021 From: programmerjake at gmail.com (Jacob Lifshay) Date: Sun, 1 Aug 2021 10:15:41 -0700 Subject: [Libre-soc-dev] [RFC] SVP64 on branch instructions In-Reply-To: References: Message-ID: On Sun, Aug 1, 2021, 06:32 Luke Kenneth Casson Leighton wrote: > https://libre-soc.org/openpower/isa/branch/ > > it occurs to me only just now that we completely forgot to evaluate > SVP64 interaction on branches, particularly when bc involves CRs. > > ... > > should modes be added which check *all* CR fields bring tested, or > just one, or add a bit to select either? > GPU code will need to very often branch if all/any predicate bits are set/clear, having a branch op that covers all 4 combinations would save a bunch of instructions (>5-10% in some common cases). Jacob From lkcl at lkcl.net Sun Aug 1 18:49:31 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Sun, 1 Aug 2021 18:49:31 +0100 Subject: [Libre-soc-dev] [RFC] SVP64 on branch instructions In-Reply-To: References: Message-ID: On Sun, Aug 1, 2021 at 6:15 PM Jacob Lifshay wrote: > > On Sun, Aug 1, 2021, 06:32 Luke Kenneth Casson Leighton > wrote: > > should modes be added which check *all* CR fields bring tested, or > > just one, or add a bit to select either? > > > > GPU code will need to very often branch if all/any predicate bits are > set/clear, having a branch op that covers all 4 combinations would save a > bunch of instructions (>5-10% in some common cases). yowser, definitely worth it. the CRM mode i had in mind to do something like this (merge all CR bit-tests) but it turned out not to have enough space to do so. if it's actually part of the *branch* instruction, that's fantastic (and, logically, the right place for it) l. From lkcl at lkcl.net Mon Aug 2 00:36:57 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Mon, 2 Aug 2021 00:36:57 +0100 Subject: [Libre-soc-dev] libre-soc server cgroups Message-ID: about 10 days ago the server loadavg hit 1.7 due to soclayout being 100 megabyte in size, from multiple git commits of massive verilog autogenerated (compiled) output. a few days before that we had fastcgid crash and take the entire web backend offline (that turned out to be morons trying to access wordpress php scripts: anything involving php is now an instant fail2ban) this is precisely why i set the rule that autogenerated output should not be added to git repositories, because soclayout is now so massive it affected everyone's useability. i had since set up cgroups and allocated only 20% CPU to fastcgid. this turns out to make bugzilla dreadfully slow, so i have increased it to 40% to see how that goes. i may instead set up a separate cgroup just for the git command, such that it does not impact bugzilla. mythic-beasts hosting is extremely good, however the next level up is double the cost, i don't want to increase that unless absolutely necessary. l. From lkcl at lkcl.net Mon Aug 2 01:41:14 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Mon, 2 Aug 2021 01:41:14 +0100 Subject: [Libre-soc-dev] [RFC] SVP64 on branch instructions In-Reply-To: References: Message-ID: https://libre-soc.org/openpower/sv/branches/ i created this page with various modes, i believe only "ALL/SOME" is needed because by inverting the BO test itself ~ALL and ~SOME are achieved. strictly speaking illegal instructions should be raised for mode combinations that make no sense however given how gate critical thus is, and how doing so would create Hazard dependencies on SVSTATE, compromising multi issue execution in the process, i am very reluctant to do that. detection of Branch SVPY4 RM Mode is quite straightforward, there is major op 18 and two minor op 19s, this is not a lot for the early decode phase. l. From programmerjake at gmail.com Mon Aug 2 08:42:27 2021 From: programmerjake at gmail.com (Jacob Lifshay) Date: Mon, 2 Aug 2021 00:42:27 -0700 Subject: [Libre-soc-dev] [RFC] SVP64 on branch instructions In-Reply-To: References: Message-ID: On Sun, Aug 1, 2021, 17:41 Luke Kenneth Casson Leighton wrote: > https://libre-soc.org/openpower/sv/branches/ > > i created this page with various modes, i believe only "ALL/SOME" is > needed because by inverting the BO test itself ~ALL and ~SOME are > achieved. > Instead of "SOME", I'd call the mode "ANY" -- it's more specific. Alternatively we could call the modes reduce-and/reduce-or, since that's what they actually are. GPU code could benefit from having the semantics be where the SVP64 predicate (either Int or CR) tells the branch instruction which CR fields it should use, where zero bits in the SVP64 predicate cause the corresponding CR fields to be ignored. Since the ignored bits cause ~ALL and ~ANY to no longer be redundant afaict, we will want to add them back in. This will allow saving instructions in nested SIMT code like the following: i32 a, b; // globals // ... while(a > 2) { if(b < 5) f(); else g(); h(); } which compiles to something like: vec a, b; // ... pred loop_pred = a > 2; while(loop_pred.any()) { pred if_pred = loop_pred & (b < 5); if(if_pred.any()) { f(if_pred); } label1: pred else_pred = loop_pred & ~if_pred; if(else_pred.any()) { g(else_pred); } h(loop_pred); } in the else_pred part (after label1 above), we could write it like so (wrong asm syntax, but you get the point): // loop_pred could be stored in r30 or something -- out-of-the-way of f(), g(), and h() // // skip extra instructions if not(any non-ignored bit in else_pred is set), // the un-prefixed branch instruction is just: `bc ~if_pred, skip` bc reduce_mode=~ANY, svp64_pred=loop_pred, ~if_pred, skip // compute else_pred without loop_pred being forced to be in a CR, // this only works if else_pred is the same CR registers as if_pred // and it relies on all zero bits in loop_pred also being zeros in if_pred crnot else_pred, if_pred, svp64_pred=loop_pred // g(else_pred) inlined here skip: // h(loop_pred) inlined here // code for while loop... The above would take additional instructions if the semantics of br were instead defined as currently in the wiki, instead of my proposal. Jacob > From staf at fibraservi.eu Mon Aug 2 08:57:36 2021 From: staf at fibraservi.eu (Staf Verhaegen (FibraServi)) Date: Mon, 2 Aug 2021 09:57:36 +0200 Subject: [Libre-soc-dev] libre-soc server cgroups In-Reply-To: References: Message-ID: Op 2/08/2021 om 01:36 schreef Luke Kenneth Casson Leighton: > mythic-beasts hosting is extremely good, however the next level up is > double the cost, i don't want to increase that unless absolutely > necessary. Did you have a look Contabo (contabo.de) ? They are pretty cheap and I am satisfied with their hosting. greets, Staf. -- Chips want to be free. From lkcl at lkcl.net Mon Aug 2 09:12:21 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Mon, 2 Aug 2021 09:12:21 +0100 Subject: [Libre-soc-dev] libre-soc server cgroups In-Reply-To: References: Message-ID: --- crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68 On Mon, Aug 2, 2021 at 8:57 AM Staf Verhaegen (FibraServi) wrote: > Did you have a look Contabo (contabo.de) ? > They are pretty cheap and I am satisfied with their hosting. https://contabo.com/en/vps/vps-s-ssd/?image=ubuntu.267&qty=1&contract=1 4 cores, 8 GB RAM, 200 GB SSD for EUR 6, that's pretty damn good. moving to a different VM however is quite a bit of hassle. i wonder if i can get mythic-beasts to negotiate alternative pricing. l. From lkcl at lkcl.net Mon Aug 2 09:54:21 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Mon, 2 Aug 2021 09:54:21 +0100 Subject: [Libre-soc-dev] [RFC] SVP64 on branch instructions In-Reply-To: References: Message-ID: --- crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68 On Mon, Aug 2, 2021 at 8:42 AM Jacob Lifshay wrote: > > On Sun, Aug 1, 2021, 17:41 Luke Kenneth Casson Leighton > wrote: > > > https://libre-soc.org/openpower/sv/branches/ > > > > i created this page with various modes, i believe only "ALL/SOME" is > > needed because by inverting the BO test itself ~ALL and ~SOME are > > achieved. > > > > Instead of "SOME", I'd call the mode "ANY" -- it's more specific. > Alternatively we could call the modes reduce-and/reduce-or, since that's > what they actually are. the bit is named "ALL" to indicate "All tests must pass". > GPU code could benefit from having the semantics be where the SVP64 > predicate (either Int or CR) tells the branch instruction which CR fields > it should use, yes, that's a given. > where zero bits in the SVP64 predicate cause the > corresponding CR fields to be ignored. that's part of SVP64 default behaviour: those tests would simply be skipped. i have however just realised that zeroing mode is completely meaningless, including the SNZ bit... ah no it isn't, because it can be set to deliberately fail at the first zero point. that can be used to very deliberately truncate VL to the exact point where the first zero point occurs in the predicate mask.... argh can't do that, we've run out of bits. nuts. oh wait... VLI is to truncate to VL rather than VL-1. so it's not so bad. > Since the ignored bits cause ~ALL > and ~ANY to no longer be redundant afaict, can you re-read, about sz and SNZ, to take those into consideration? > The above would take additional instructions if the semantics of br were > instead defined as currently in the wiki, instead of my proposal. i'm not totally following, i'm still absorbing the concept of what you're describing, however a couple of things: 1) changing the behaviour and semantics of SVP64 predicate masks just for SVP64 isn't ok. fitting with how SVP64 predicate masks work for all other options is how it has to go at this point 2) you may not have understood about sv and SNZ, or, if i am reading correctly what you wrote, you may have misunderstood predicate masks and how they're applied (or, not). can you please re-evaluate / re-word, taking into account sz and SNZ, which can be used to insert (effectively) *either* an immediate of zeros or an immediate of 1s in place of masked-out CR bits being tested? l. From lkcl at lkcl.net Mon Aug 2 10:14:07 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Mon, 2 Aug 2021 10:14:07 +0100 Subject: [Libre-soc-dev] [RFC] SVP64 on branch instructions In-Reply-To: References: Message-ID: On 8/2/21, Jacob Lifshay wrote: > GPU code could benefit from having the semantics be where the SVP64 > predicate (either Int or CR) tells the branch instruction which CR fields > it should use, where zero bits in the SVP64 predicate cause the > corresponding CR fields to be ignored. Since the ignored bits cause ~ALL > and ~ANY to no longer be redundant afaict, we will want to add them back > in. remember that there is both ~R30 (and ~R10) as well as ~CRbit predicate testing, as well as being able to invert the BO bit test as well. i would be very surprised if, in combination with sz+SNZ, all possible options were not covered. l. From lkcl at lkcl.net Mon Aug 2 21:42:22 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Mon, 2 Aug 2021 21:42:22 +0100 Subject: [Libre-soc-dev] Inverse DCT In-Reply-To: References: Message-ID: LD all sorted, scaled it back to LD-with-shift, and if pushed we could do without LDsh entirely. the DCT/iDCT REMAP schedule now does bitrev-with-halfswap itself, applying that to the offset. i cannot say i am happy about losing LD-bitrev because FFT was very short, with it. iDCT unit test with LD, inner and outer butterfly works great. next is to write a program that creates SVG files to put into docs and slides. l. From lkcl at lkcl.net Tue Aug 3 03:36:38 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Tue, 3 Aug 2021 03:36:38 +0100 Subject: [Libre-soc-dev] [RFC] SVP64 on branch instructions In-Reply-To: References: Message-ID: i wrote out the pseudocode, and there are some fascinating side-effect / possible uses, including interaction with CTR, using the predicate mask with unconditional tests (BO1 set), loads more. l. From lkcl at lkcl.net Tue Aug 3 13:08:31 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Tue, 3 Aug 2021 13:08:31 +0100 Subject: [Libre-soc-dev] [RFC] SVP64 on branch instructions In-Reply-To: References: Message-ID: On 8/2/21, Jacob Lifshay wrote: > On Sun, Aug 1, 2021, 17:41 Luke Kenneth Casson Leighton > wrote: > >> https://libre-soc.org/openpower/sv/branches/ >> >> i created this page with various modes, i believe only "ALL/SOME" is >> needed because by inverting the BO test itself ~ALL and ~SOME are >> achieved. >> > > Instead of "SOME", I'd call the mode "ANY" -- it's more specific. > Alternatively we could call the modes reduce-and/reduce-or, since that's > what they actually are. > > GPU code could benefit from having the semantics be where the SVP64 > predicate (either Int or CR) tells the branch instruction which CR fields > it should use, where zero bits in the SVP64 predicate cause the > corresponding CR fields to be ignored. Since the ignored bits cause ~ALL > and ~ANY to no longer be redundant afaict, we will want to add them back > in. This will allow saving instructions in nested SIMT code like the > following: > i32 a, b; // globals > // ... > while(a > 2) { > if(b < 5) > f(); > else > g(); > h(); > } > which compiles to something like: > vec a, b; > // ... > pred loop_pred = a > 2; > while(loop_pred.any()) { > pred if_pred = loop_pred & (b < 5); > if(if_pred.any()) { > f(if_pred); > } > label1: > pred else_pred = loop_pred & ~if_pred; > if(else_pred.any()) { > g(else_pred); > } > h(loop_pred); > } > > in the else_pred part (after label1 above), we could write it like so > (wrong asm syntax, but you get the point): > // loop_pred could be stored in r30 or something -- out-of-the-way of f(), > g(), and h() > // > // skip extra instructions if not(any non-ignored bit in else_pred is set), > // the un-prefixed branch instruction is just: `bc ~if_pred, skip` > bc reduce_mode=~ANY, svp64_pred=loop_pred, ~if_pred, skip > // compute else_pred without loop_pred being forced to be in a CR, > // this only works if else_pred is the same CR registers as if_pred > // and it relies on all zero bits in loop_pred also being zeros in if_pred > crnot else_pred, if_pred, svp64_pred=loop_pred > // g(else_pred) inlined here > skip: > // h(loop_pred) inlined here > // code for while loop... > > The above would take additional instructions if the semantics of br were > instead defined as currently in the wiki, instead of my proposal. > > Jacob > >> > _______________________________________________ > Libre-soc-dev mailing list > Libre-soc-dev at lists.libre-soc.org > http://lists.libre-soc.org/mailman/listinfo/libre-soc-dev > From lkcl at lkcl.net Tue Aug 3 15:51:55 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Tue, 3 Aug 2021 15:51:55 +0100 Subject: [Libre-soc-dev] [RFC] SVP64 on branch instructions In-Reply-To: References: Message-ID: drat. fricking gmail HTML Basic mode is barely useable. hit send instead of save. grrr. ok. https://libre-soc.org/openpower/sv/branches/?updated i've added the example and created SVP64 hypothetical assembler. i very deliberately placed the calculation of ANDing the predicate with the CR just before each call to f() and g(). the CR Vector *BEFORE* bring transferred to r30 is used, there, because it is a pain to cross-interact integers with Vector CRs. one of the tests (the else.any) is deliberately inverted: mask=~r30 this is to illustrate how and what SNZ immediate field is for. ANDing of all tests is still done, but instead of sz (source zero in masked out bits) a **ONE** is put in the place of the CR Field element, **NOT** a zero. this causes the ANDing to effectively IGNORE masked-out bits but still keep decrementing CTR (if the relevant CTR bit is set). thus, CTR branch conditional mode *still operates correctly* counting down the total number of elements in an array, even when sone elements of that array should be masked out. where you do not want that behaviour, instead wanting CTR to count down ONLY mask-selected elements, you would not use sz. this woukd skip both the element test *and* skip CTR decrementing. what _would_ be nice is if bc were to update the CR field based on the mask. however to be absolutely honest i think this is too much, and it needs to be optional, and unfortunately we are out of bits. l. From lkcl at lkcl.net Tue Aug 3 18:32:29 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Tue, 3 Aug 2021 18:32:29 +0100 Subject: [Libre-soc-dev] [llvm-dev] [RFC] Vector/SIMD ISA Context Abstraction In-Reply-To: References: Message-ID: (renato thank you for cc'ing, due to digest subscription at the moment) On Tue, Aug 3, 2021 at 3:25 PM Renato Golin wrote: > > On Sat, 31 Jul 2021 at 00:33, Luke Kenneth Casson Leighton via llvm-dev wrote: >> >> if however instead of an NxM problem this was turned into N+M, >> separating out "scalar base" from "augmentation" throughout the IR, >> the problem disappears entirely. > > > Hi Luke, > > It's not entirely clear to me what you are suggesting here. it's a nebulous but fundamentally low-level concept, that may take some time to sink in, and also appreciate the significance. some background: over the past 3+ years i have made a comprehensive comparative study of ISAs (both SIMD and Vector), the latter only being revived recently thanks to RVV bringing Cray-style Vectors back into the forefront of computing research. here is a quick summary of that: * Packed SIMD. the worst kind of ISA. the following article puts it diplomatically: https://www.sigarch.org/simd-instructions-considered-harmful/ i have no such compunction or affiiation, and as an engineer can speak freely and plainly. gloves off, statement of fact: Packed SIMD is the worst thing to become ubiquitous in computer science, bar none. frankly, the sooner that Packed SIMD is shot and buried as an embarrassing and incredibly expensive historical footnote in the history of computing, the better [there's always exceptions: for e.g. embedded DSP Audio, Packed SIMD is perfect]. * Predicated SIMD: by total contrast, this is actually not half bad. SVE2, AVX-512, GPU ISAs, anything that takes masks per element, the masks completely eliminate Packed SIMD Hell (except for ISAs that still have SIMD alignment for memory LD/STs even for Predicated LD/STs, but hey, nothing's perfect) * Horizontal-First Vectors. known best from the Cray-i, also in modern ISAs such as NEC's SX-Aurora and now RVV, horizontal vectors are of the form "for i in range(VL) operation(vec_src_reg[i], vec_dest_regs[i])" * Vertical-First Vectors. these are *NOT* very well-known, and the only two ISAs that i know of (to date) are SVP64 and Mitch Alsup's MyISA 66000. btw, here's the [terse] page on SVP64's format: https://libre-soc.org/openpower/sv/svp64/ the overview is much more useful for understanding: https://libre-soc.org/openpower/sv/overview/ additional relevant ISAs: * Broadcom VideoCore IV which has a Vector-form of "REP", where the operation may be repeated 2, 4, 8, 16, 32 or "the count taken from scalar r0". * the Mill Architecture, which is a "tagged" ISA. here, there is no "ADD32", "ADD16", "ADD64", "ADD8", there is *ONLY* "ADD". the width of the operation is taken *FROM THE LOAD OPERATION* which is the ONLY place where the register operand width is specified. operations are DEDUCED, statically, at compile-time, based on the "tags" associated with source registers as they cascade through. two strategically important instructions are included: WIDEN and NARROW. the significance of mentioning the Mill is that the ISA has a closer match to the simple (basic) LLVM intrinsics than most other ISAs. however even there (unless they've done drastic and extensive changes) they will be limited to trying to fit a flexible (tagged) ISA into an inflexible IR that was never designed with "context" in mind. > For context: > * Historically, we have tried to keep as many instructions as native IR as possible to avoid the explosion of intrinsics, as you describe. a crucially important goal that gets a big thumbs-up from me. > * However, traditionally, intrinsics reduce the number of instructions in a basic block instead of increasing them, so there's always the balance. where the opposite of that is that the CISC-ness of a given new intrinsic itself could impact ISAs that don't support that feature natively, making it necessary for them to emit rather more assembly instructions than it first appears. > * For example, some reduction intrinsics were added to address bloat, but no target is forced to use them. excellent. iteration and reduction (including fixed schedule paralleliseable reduction) is one of the intrinsics being added to SVP64. it's good to hear that that, as a concept, has been added. if i may, i will use that as an example, later. > * If you can represent the operation as a series of native IR instructions, by all means, you should do so. this assumes (perfectly reasonably, mind you) that the (hypothetical) ISA itself is not capable of expressing a given operation *in* IR, and consequently has to be done as a series of passes, substituting for a lack of native (direct) support of a given operation with some (faster?) operations that *do* (ultimately) exist as actual assembler. in some architectures a particular native IR instruction might actually exist, but the native assembler variant is so horribly slow at the hardware level that alternatives are actually *demanded* by users. AVX's native Horizontal Reduction instructions would be a good example. > I get it that a lot of intrinsics are repeated patterns over all variations and that most targets don't have that many, so it's "ok". > > I also get it that most SIMD vector operations aren't intrinsically vector, [indeed. i have spent considerable time recently on the wikipedia Vector_processor page, and associated nearby related pages (SIMD, GPUs, etc), correcting that unfortunate meme that "SIMD equals vectors". this is unfortunately where Corporate Marketing has badly interfered with actual Computer Science. sigh.] > but expansions of scalar operations for the benefit of vectorisation >(plus predication, to avoid undefined behaviour and to allow "funny" patterns, etc). yes. and this perspective is where Mitch Alsup's MyISA 66000, and SVP64's "Vertical-First" Mode come into play: the instructions in both are effectively executed *in scalar form ONLY* (as far as the Program Order is concerned), and, at the end of a loop/branch, you *EXPLICITLY* increment the element index, such that all *SCALAR* operations in the loop now execute on *scalar* element one. end of loop, explicit increment element index to 2, loop back and execute on element *two* of the Vector Register. repeat until loop-termination condition. the challenge ahead for Libre-SOC (and for MyISA 66000) will be to introduce this entirely new concept to compilers. however given that it's effectively scalar rather than Vector, the task should actually be a *lot* easier than it is for Horizontal-First ISAs such as SVE and RVV. > But it's not clear to me what the "augmentation" part would be in other targets. the proposal is - at its heart - to replace all IR of the form: llvm.masked.load.v16f32.predicatespec(arguments) llvm.masked.load.v2f64.predicatespec(arguments) and so on with just: llvm.load(mask=x, arguments). where there *is* no llvm.masked.load, there is *only* an optional argument that *at runtime* (or, compile-time more like, i.e. when running llvm) rather than expands out explicitly / statically to dozens of special IR intrinsics, *there is only one*: llvm.load. additional optional arguments also then specify whether this operation is twin-predicated by having a *second* predicate mask (yes, SVP64 can apply one predicate mask to the source, and another to the destination. conceptually this is equivalent to back-to-back VGATHER-VSCATTER). additional optional arguments also then specify whether there is SWIZZLE applied, or Sub-Vectors, or any other types of "augmentation". now, here's the kicker: what we need to support SVP64 is for *all llvm basic intrinsics to support all possible optional augmentations of all possible types*. yes, really, that's not a typo or a mis-statement. we *genuinely* need a sign-extended twin-predicated intrinsic: llvm.sext(source_mask=source_pred, dest_mask=dest_pred, source_argument) >> even permute / shuffle Vector/SIMD operations are separateable into >> "base" and "abstract Vector Concept": the "base" operation in that >> case being "MV.X" (scalar register copy, indexable - reg[RT] = >> reg[reg[RA]] and immediate variant reg[RT] = reg[RA+imm]) > > > Shuffles are already represented as IR instructions (insert/extract vector), so I'm not sure this clarifies much. ok, so is it possible to do shuffle-sign-extend, shuffle-fptrunc, shuffle-fabs, shuffle-sqrt, shuffle-log, and any other single-src single-dest operation? this is where the "augmentation" - the separation of PREFIX-SUFFIX comes into play. SVP64 has the ability to set up "SWIZZLE" contexts as well as certain kinds of "REMAP" Schedules (triple-loop butterfly schedules) - PREFIXes - that can be *applied* to base operations (SUFFIXes), which, if we were to expand all those possibilities out would literally create several MILLION intrinsics. > Have you looked at the current scalable vector implementation? briefly, yes. i also helped with some review insights when RVV was being added, although that was a brief glimpse into a massive world where i was (and still am) constrained, unfortunately, by time and resources, much as i would love that to be otherwies. > It allows a set of operations on open-ended vectors that are controlled by a predicate, which is possibly the "augmentation" that you're looking for? no. what is happening there is that it is a reflection of the limitations of the current ISAs. i can say with 100% certainty that the SVE implementation will not have been designed to take SVP64 into consideration. the reason is actually very simple and straightforward: at the time LLVM SVE was added, SVP64 did not even exist. so for example, let us take the new feature added in LLVM SVE: reduction. most Vector ISAs add *explicit* reduction operations. NEC SX-Aurora for example has reduce-add, reduce-multiply, reduce-OR, reduce-AND, reduce-XOR, and that's about it. SVP64 has: * reduction as a fixed (paralleliseable) schedule * base operation. you can LITERALLY apply reduction to.... to... "llvm.maximum" scalar operation, or to... divide or subtract (or other non-commutative operation) if you really really want to, and the ISA will go, "ok, done. next?". sv.fmax/MR FRT.v, FRA.v, FRB.v # MR means "map-reduce mode" you can apply parallel reduction to Power ISA v3.0 Condition Register operations, "crand" or "cror" or "crnor". sv.crand/MR BT.v, BA.v, BC.v you can even apply parallel reduction to single-argument instructions if you really, really want to: we're not going to stop that from happening, because somebody might find it useful given the fact that the parallel-reduction is on a fixed Power-2-halving Schedule that could have practical uses, and the hardware is *required* to write out all intermediate values into a *vector* result. you can even apply parallel reduction Schedules to triple-argument instructions (FMA), however there it gets tricky and complicated (and i haven't thought it through, fully, what it actually means, i.e. whether it's useful). certainly if the MUL register argument is considered scalar and the others Vector, that is actually useful (performs repeated cumulative multiply as part of the Schedule). does this help illustrate what i mean by "augmentation"? there is a "base" (scalar) operation, you "augment" it, and it *becomes* SIMD-like, *becomes* Vector-like, *becomes* predicated, *becomes* Swizzled, *becomes* reduced. the development of LLVM SVE would not have taken this possibility into account, because, put simply, it is plain common sense in something as complex as LLVM not to waste time writing code for something that does not have a real-world use-case. >> the issue is that this is a massive intrusive change, effectively a >> low-level redesign of LLVM IR internals for every single back-end. > > > Not necessarily. this would be fantastic (and a huge relief) if it can be so arranged. one of my biggest concerns is that what i am advocating is at such a fundamental level that it could, if done incorrectly, be extremely disruptive. however even now as i think about it on-the-fly, if the proposal is as simple as adding c++-like "optional named arguments" to (all) base scalar LLVM intrinsics, then, i think that would work extremely well. it would have zero impact on other ISAs, which is a huge plus. > For example, scalable vectors are being introduced in a way that non-scalable back-ends (mostly) won't notice. > And it's not just adding a few intrinsics, the very concept of vectors was changed. > There could be a (set of) construct(s) for your particular back-end that is invisible to others. the problem is that all other Vector ISAs have constrained themselves to 32 bit (or, for GPU ISAs, often 48 or 64). they *explicitly* add *explicit* O(N) opcodes. RVV adds 192 *explicit* opcodes, embedded into a MAJOR 32-bit opcode specifically dedicated for use by RVV, and that was part of its original design. ARM, likewise, will have done something similar, with SVE and SVE2. the problem with that approach is that it is extremely limiting in the possible permutations / combinations of *potential* instructions that *could* exist, if there was not such a limit of trying to cram into a 32-bit space. [it does have to be said, however, that there are some serious practical benefits to limiting the possibilities of an ISA: validation and verification of silicon before spending USD 16 million on 7nm masks is a Damn Good Reason :) and it is one that we are going to have to give some serious thought to: how to verify the hardware for an ISA with literally several MILLION instructions] we have left the entirety of the Scalar Power v3.0B ISA alone (which is 32-bit), called that "base", and added a full 32-bit Prefix (called SVP64) which contains the Vectorisation Context. SVP64 is - fundamentally - an O(NxM) ISA, where N ~= 250 and M is ~= 1,000 to 8,000. actually, it's O(NxMxOxPxQ) where: * N~=250 is the base scalar Power v3.0B ISA * M~=1,000-8,000 is the Vectorisation Context * O~=2^20 (guessing here) is REMAP Schedules and * P~=2^(3*12) is SWIZZLE Contexts for GPUs (XXYZ, WZZX) * Q=64 (Vector Length, VL) thus for example with Twin-Predication applied to e.g. llvm.sext or llvm.cos we have implicit back-to-back VGATHER-VSCATTER behaviour *WITHOUT* needing a pair of LD/ST operations or register MV operations before and after the Vectorised operation. we anticipate some extremely powerful compact representations, and to be honest it may literally take several years for the full implications of SVP64's power and flexibility to sink in. in-register paralleliseable DCT can be done in 9 instructions, and paralleliseable in-register-file (small) insertion-sort likely in around 11 instructions thanks to the Data-Dependent Fail-on-First-Condition Mode. we can even implement a (small) Vectorised quick-sort, in-register, fully paralleliseable, in probably about... 20 instructions. it's on my TODO list to investigate. > Of course, the more invisible things, the harder it is to validate and change intersections of code, so the change must really be worth the extra hassle. appreciated. > With both Arm and RISCV implementing scalable extensions, that change was deemed worthy and work is progressing. > So, if you could leverage the existing code to your advantage, you'd avoid having to convince a huge community to implement a large breaking change. the possibility that occurred to me, above, as writing this, of adding optional arguments (containing the Vector Augmentation Context) to base scalar llvm intrinsics, would i believe achieve that. if any other ISA vendors wanted to use that, they could, as a first pass, map e.g. llvm.load(optional_predicate=xxx) *onto* llvm.masked.load(....) and thus avoid huge disruption, and carry that out in an incremental fashion. or not. at their choice. there are several variants on this theme of optional arguments of some description to the base: llvm.add(normal_arguments) where optional_vector_context is an object of some type that itself contains optional "augmentation" features. i would advocate something like this: llvm.add(normal_arguments, source_override_width=<8/16/32/64>, dest_override_width=<8/16/32/64>, saturation_mode=, source_pred=xxx, dest_pred=yyyy, fail_first_mode, swizzle_src=, REMAP_schedules=, scalar_or_vector_regs=) can you imagine expanding all of those out into a declared (flat) list of intrinsics? what that would do to LLVM SVE if we tried? the "augmentation" list is absolutely massive and starts to give some idea of why LLVM SVE, as designed, simply won't cope, and why we have to think about this differently. the thing is: from the study i've made of other ISAs, i can say that with near 100% certainty that there *will* be a direct map to all of the existing LLVM SVE intrinsics recently added *and to those of all SIMD ISAs as well* and, what i expect to happen is that instead of a massive list of thousands of SIMD intrinsics for e.g. x86, it will reduce down to a fraction of what is in LLVM x86 backend right now. in fact, i expect the exact same reduction to occur for *all* Packed and Predicated SIMD ISAs supported by LLVM. that will have both reduction in maintenance burden, and it should, in theeoorry, reduce compile times as well. in theory. in practical terms it depends what the impact is of the "optional" arguments. hmmm, that will need some thought. even the Mill i believe could benefit, from being able to map much more closely to the actual underlying ISA, which *only* has "ADD" (not ADD8/16/32/64), because they could potentially add an "auto" or "implicit" option to the source width / dest width arguments, which would be much more in line with how the actual ISA itself works (implicit tagged - polymorphic - registers) https://millcomputing.com/docs/compiler/ > And you'd also give us one more reason for the scalable extension to exist. :) :) as i mentioned at the start, with that list of ISAs, there _do_ exist ofher Vector ISAs and actual hardware implementations, out there: NEC SX-Aurora has been shipping for decades, now - first implementations were April 1983! https://en.wikipedia.org/wiki/NEC_SX here's some background: https://sx-aurora.github.io/ and yes, they do have a Vector Extension variant of llvm: https://sx-aurora.github.io/posts/llvm-ve-rv/ i do hope at some point that they come out of the woodwork and participate in LLVM SVE. and that the product continues to ship, it's pretty incredible and i am delighted that NEC has had a strong enough customer base to keep on selling it and maintaining SX-Aurora. Mitch Alsup's MyISA 66000 will need gcc and llvm at some point, and it is another ISA with a form of Scalable Vectors - one that has been specially designed to "thunk" down to simple scalar hardware. thank you, Renato, for responding, it's given me the opportunity to explain a bit more in-depth. feel free to cc libre-soc-dev in future, we don't mind [relevant!] cross-posts. warmest, l. From programmerjake at gmail.com Tue Aug 3 19:21:33 2021 From: programmerjake at gmail.com (Jacob Lifshay) Date: Tue, 3 Aug 2021 11:21:33 -0700 Subject: [Libre-soc-dev] [RFC] SVP64 on branch instructions In-Reply-To: References: Message-ID: On Tue, Aug 3, 2021, 07:52 Luke Kenneth Casson Leighton wrote: > drat. fricking gmail HTML Basic mode is barely useable. hit send > instead of save. grrr. > > ok. > > https://libre-soc.org/openpower/sv/branches/?updated > > i've added the example and created SVP64 hypothetical assembler. ok, 3 issues: 1. CR fields set before a call and used after a call will not work, unless you pick callee-saved fields. icr what ABI we picked, but I expect around half of them to be callee-saved and half to be caller-saved. In all ABIs I've seen, argument registers aren't preserved (you used them to pass the predicates to the functions, then tried to read the same register immediately after the function call, where it is potentially overwritten by the function). 2. some branch instructions are missing commas to separate arguments. 3. the branch at the end that branches back to the top of the while loop needs to either be an unconditional branch to the while loop's test (though basically all compilers don't do that), or replicate the code for testing the condition at the bottom of the loop (what basically all compilers do, iirc). Jacob From lkcl at lkcl.net Tue Aug 3 19:55:21 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Tue, 3 Aug 2021 19:55:21 +0100 Subject: [Libre-soc-dev] [RFC] SVP64 on branch instructions In-Reply-To: References: Message-ID: On 8/3/21, Jacob Lifshay wrote: > ok, 3 issues: > 1. CR fields set before a call and used after a call will not work, ... you get the general idea, though: that with sz and SNZ there's a way for predicate masks to interact with the CR Vector, to create Vec-AND and VEC-OR behaviour that, at the same time, still allows CTR the option of counting masked-in elements or all elements. also, early-exit has the "truncate VL" option, so that, hmm, i just realised, that could help with strncpy and strlen. feel free to help edit and correct the syntax errors. l. From luke.leighton at gmail.com Wed Aug 4 11:23:54 2021 From: luke.leighton at gmail.com (lkcl) Date: Wed, 04 Aug 2021 10:23:54 +0000 Subject: [Libre-soc-dev] [RFC] SVP64 on branch instructions In-Reply-To: References: Message-ID: On August 3, 2021 6:55:21 PM UTC, Luke Kenneth Casson Leighton wrote: >... you get the general idea, though: that with sz and SNZ there's a >way for predicate masks to interact with the CR Vector, to create >Vec-AND and VEC-OR behaviour that, at the same time, still allows CTR >the option of counting masked-in elements or all elements. sigh. we've run out of bits, and i have a feeling that it is more useful to have the option of updating the CR field being tested, taking predicate masks into account, than it is say to keep the Absolute Address functionality of branch. AA is something that is only used in Hypervisor mode, for interrupt tables or OS source, and is otherwise very much wasted in userspace. normally it is a Hard Rule that under no circumstances should SVP64 alter the base operation. this so that when talking about it, and advocating it, we may state, plainly "base. loop. simple". the moment the word "except" has to go into that sentence, it will make people nervous when it cones to adoption. in this particular case however the entire Branch has to be *replaced*. thoughts? l. From luke.leighton at gmail.com Wed Aug 4 12:44:08 2021 From: luke.leighton at gmail.com (lkcl) Date: Wed, 04 Aug 2021 11:44:08 +0000 Subject: [Libre-soc-dev] [RFC] SVP64 on branch instructions In-Reply-To: References: Message-ID: <0CBB2A4D-E459-4461-BB0B-4AF9000E1CC7@gmail.com> On August 4, 2021 10:23:54 AM UTC, lkcl wrote: >On August 3, 2021 6:55:21 PM UTC, Luke Kenneth Casson Leighton > wrote: > >>... you get the general idea, though: that with sz and SNZ there's a >>way for predicate masks to interact with the CR Vector, to create >>Vec-AND and VEC-OR behaviour that, at the same time, still allows CTR >>the option of counting masked-in elements or all elements. > >sigh. > >we've run out of bits, and i have a feeling that it is more useful to have the option of updating the CR field being tested, taking predicate masks into account, than it is say to keep the Absolute Address functionality of branch. i just noticed, AA is (bit 30) only in bc: PO BO BI BD AA LK 0 6 11 16 30 31 whereas for bclr there are bits spare: PO BO BI /// BH XO LK 0 6 11 16 19 21 31 thus only bc need have altered behaviour from v3.0B as far as bit definitions are concerned: bclr may set a new bitfield 16-18. excellent. the reason the branch pseudocode has to change is because the loop on the CR Field Vector must not run to letting LR or other alterations occur. i.e. we cannot just use the existing bc pseudocode and run it in a VL loop. l. From luke.leighton at gmail.com Wed Aug 4 14:14:40 2021 From: luke.leighton at gmail.com (lkcl) Date: Wed, 04 Aug 2021 13:14:40 +0000 Subject: [Libre-soc-dev] [RFC] SVP64 on branch instructions In-Reply-To: References: Message-ID: <121DFA96-9CAA-45C7-B24D-258D2F62D0DB@gmail.com> On August 3, 2021 6:21:33 PM UTC, Jacob Lifshay wrote: >On Tue, Aug 3, 2021, 07:52 Luke Kenneth Casson Leighton >wrote: > >> drat. fricking gmail HTML Basic mode is barely useable. hit send >> instead of save. grrr. >> >> ok. >> >> https://libre-soc.org/openpower/sv/branches/?updated >> >> i've added the example and created SVP64 hypothetical assembler. > > >ok, 3 issues: >1. CR fields set before a call and used after a call will not work, ah, i just noticed: you may have missed the significance of this: sv.crand CR80.v.SO, CR60.v.GT, CR80.v.LT # if = loop & pred_b f(CR80.v.SO) that's taking the *LT* field from the CRv for b, and ANDing it with the *GT* field for a, and storing it in *a completely separate* CR field (SO). thus whatever f() does there will be no impact. technically, EABI definitions are out of scope at the moment, i would like to get the ISA design right and focus on that, first. Vectors of CRs is not a concept that exists in EABI v2.0 so there is no existing EABI. at some point we have to define one... i would prefer that not to be right now (it is a massive task of its own) regarding overwrite and use of AA for alternative purposes, i realised after some thought that actually, combining the predicate mask with the Vector-Branch-CR test is not appropriate to do inside Branch itself. the example above illustrates why: CR80.v.SO has bits *cleared* where the predicate mask is cleared, and the behaviour of predicate masks in operations is to act on elements where the bits are *set*. altering Branch to cope with these inverted semantics, given its early-out capability, is completely inappropriate. l. From luke.leighton at gmail.com Wed Aug 4 18:45:19 2021 From: luke.leighton at gmail.com (lkcl) Date: Wed, 04 Aug 2021 17:45:19 +0000 Subject: [Libre-soc-dev] [RFC] SVP64 on branch instructions In-Reply-To: <121DFA96-9CAA-45C7-B24D-258D2F62D0DB@gmail.com> References: <121DFA96-9CAA-45C7-B24D-258D2F62D0DB@gmail.com> Message-ID: <41FC12EC-19C5-4635-94C2-224F264A62AF@gmail.com> On August 4, 2021 1:14:40 PM UTC, lkcl wrote: >regarding overwrite and use of AA for alternative purposes, i realised >after some thought that actually, combining the predicate mask with the >Vector-Branch-CR test is not appropriate to do inside Branch itself. ... but _is_ appropriate for svstep mode, to allow some situations where you want to know what a REMAP schedule might look like (and to obtain all the endpoints of all loops in one hit), yet in others you don't care, you just want to branch/loop. i've therefore put re-purposing of AA as Rc back.in, sigh. implementations of this are going to be... tricky. although, just thinking about it: hypothetically, and just like LD/ST, it may still be possible to use the existing scalar v3.0B instruction, but "fake" what data it receives. (for Vector LDST in ISACaller i actually changed the immediate D to contain D*srcstep and other modes. something similar might be possible with branches. have to see) l. From programmerjake at gmail.com Wed Aug 4 19:41:11 2021 From: programmerjake at gmail.com (Jacob Lifshay) Date: Wed, 4 Aug 2021 11:41:11 -0700 Subject: [Libre-soc-dev] XDC2021 Message-ID: Phoronix had an article about the XDC talks, I didn't see any Libre-SOC talks, were we going to submit any? https://www.phoronix.com/scan.php?page=news_item&px=XDC-2021-Scheduler Jacob From luke.leighton at gmail.com Wed Aug 4 19:52:22 2021 From: luke.leighton at gmail.com (lkcl) Date: Wed, 04 Aug 2021 18:52:22 +0000 Subject: [Libre-soc-dev] XDC2021 In-Reply-To: References: Message-ID: <14EC9F6E-988B-451A-B1E1-1D9CA683BAC2@gmail.com> On August 4, 2021 6:41:11 PM UTC, Jacob Lifshay wrote: >Phoronix had an article about the XDC talks, I didn't see any Libre-SOC >talks, were we going to submit any? i did however the website said "submissions open" for seversl weeks, and only in small letters much further down contained the deadline for talk submissions. they've updated their procedures and also included libre-soc-dev in template notifications. l. From luke.leighton at gmail.com Wed Aug 4 23:05:51 2021 From: luke.leighton at gmail.com (lkcl) Date: Wed, 04 Aug 2021 22:05:51 +0000 Subject: [Libre-soc-dev] [llvm-dev] [RFC] Vector/SIMD ISA Context Abstraction In-Reply-To: References: Message-ID: On August 3, 2021 5:32:29 PM UTC, Luke Kenneth Casson Leighton wrote: >(renato thank you for cc'ing, due to digest subscription at the moment) > >On Tue, Aug 3, 2021 at 3:25 PM Renato Golin wrote: >> * For example, some reduction intrinsics were added to address >bloat, but no target is forced to use them. > >excellent. iteration and reduction (including fixed schedule >paralleliseable reduction) is one of the intrinsics being added to >SVP64. apologies to all for the follow-up, i realised i joined iteration and reduction together as if they were the same concept: they are not. Iterative Sum when carried out on add of a Vector containing all 1s results in a Pascal Triangle Vector output example of existing hardware that has actual Iteration instructions: Section 8.15 of SX-Aurora ISA guide, p8-297, the pseudocode for Iterative Add: for (i = 0 to VL-1) { Vx(i) ← Vy(i) + Vx(i-1), where Vx(-1)=Sy } where if Vx and Vy are the same register you get the Pascal Triangle effect. https://www.hpc.nec/documents/guide/pdfs/Aurora_ISA_guide.pdf SVP64 does not have this *specifically* added: it is achieved incidentally by issuing an add where the src and dest registers differ by one (SVP64 sits on top of a rather large scalar regfile, 128 64 bit entries) sv.add r1, r1, r0 we did however need to add a "reverse gear" (for (i = 0 to VL-1)) which was needed for ffmpeg's MP3 CODEC ironically to *avoid* the Pascal Triangle effect (and not need to copy a large batch of registers instead) can anyone say if LLVM SVE happened to add Iteration? l. From luke.leighton at gmail.com Fri Aug 6 10:01:17 2021 From: luke.leighton at gmail.com (lkcl) Date: Fri, 06 Aug 2021 09:01:17 +0000 Subject: [Libre-soc-dev] [RFC] SVP64 on branch instructions In-Reply-To: <41FC12EC-19C5-4635-94C2-224F264A62AF@gmail.com> References: <121DFA96-9CAA-45C7-B24D-258D2F62D0DB@gmail.com> <41FC12EC-19C5-4635-94C2-224F264A62AF@gmail.com> Message-ID: <13162CF9-9E95-4ACC-80FD-B101602FC7F3@gmail.com> On August 4, 2021 5:45:19 PM UTC, lkcl wrote: >i've therefore put re-purposing of AA as Rc back.in, sigh. i just realised / remembered that there are some spare bits in the RM EXTRA2/3 area that can be used rather than make life hell (and create critical dependencies) in RM MODE. for other instructions including LD/ST the EXTRA2/3 area is often entirely taken up, so trying to reuse it for Mode bits is inappropriate. however Branches are so specific (only 2) that we *know*, from examining the register profile of Branches, that they will not use the high area of EXTRA2/3, or in fact the ELWIDTH area either (which may be a better choice, EXTRA2/3 is quite complex decoding, and adding extra MUXes into it is not something done lightly) i will rework things today. l. From luke.leighton at gmail.com Fri Aug 6 12:12:07 2021 From: luke.leighton at gmail.com (lkcl) Date: Fri, 06 Aug 2021 11:12:07 +0000 Subject: [Libre-soc-dev] [RFC] SVP64 on branch instructions In-Reply-To: <13162CF9-9E95-4ACC-80FD-B101602FC7F3@gmail.com> References: <121DFA96-9CAA-45C7-B24D-258D2F62D0DB@gmail.com> <41FC12EC-19C5-4635-94C2-224F264A62AF@gmail.com> <13162CF9-9E95-4ACC-80FD-B101602FC7F3@gmail.com> Message-ID: <0E44672E-F2F2-4C03-8C43-60086A2C783D@gmail.com> | 4 | 5 | 6 | 7 | 19 | 20 | 21 | 22 23 | description | | - | - | - | - | -- | -- | --- |---------|-------------------- | |ALL|LRu| / | / | 0 | 0 | / | SNZ sz | normal mode | |ALL|LRu| / | / | 0 | 1 | VLI | SNZ sz | VLSET mode | |ALL|LRu|BRc| / | 1 | 0 | / | SNZ sz | svstep mode | |ALL|LRu|BRc| / | 1 | 1 | VLI | SNZ sz | svstep+VLSET mode | that's more like it. are there any other modes worth considering? From luke.leighton at gmail.com Fri Aug 6 21:15:28 2021 From: luke.leighton at gmail.com (lkcl) Date: Fri, 06 Aug 2021 20:15:28 +0000 Subject: [Libre-soc-dev] [RFC] SVP64 on branch instructions In-Reply-To: <0E44672E-F2F2-4C03-8C43-60086A2C783D@gmail.com> References: <121DFA96-9CAA-45C7-B24D-258D2F62D0DB@gmail.com> <41FC12EC-19C5-4635-94C2-224F264A62AF@gmail.com> <13162CF9-9E95-4ACC-80FD-B101602FC7F3@gmail.com> <0E44672E-F2F2-4C03-8C43-60086A2C783D@gmail.com> Message-ID: <27695B12-7CAA-4AB0-8FEB-EF0E0667AD22@gmail.com> On August 6, 2021 11:12:07 AM UTC, lkcl wrote: >are there any other modes worth considering? i came up with another one: VLSET mode, the truncation occurs EITHER if the branch succeeded OR if the branch failed, depending on a new bit VSb (VL is set if branch success) this allows Vertical-First looping to truncate on failure (exit from a loop) without needing an extra (non-conditional) branch to do so, and it allows Horizontal-First to truncate VL on *success* (branch) or fail, as appropriate. other modes: mapreduce doesn't compute. it would be better to apply mapreduce separately and do a scalar branch-conditional. saturation is inapplicable to CRs. i am running out of ideas which is a good thing because holy cow is this a powerful instruction. it's not actually complex, it's just that the combination of the options and modes is hugely flexible. l. From hendrik at topoi.pooq.com Sat Aug 7 22:25:11 2021 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Sat, 7 Aug 2021 17:25:11 -0400 Subject: [Libre-soc-dev] [RFC] SVP64 on branch instructions In-Reply-To: <27695B12-7CAA-4AB0-8FEB-EF0E0667AD22@gmail.com> References: <121DFA96-9CAA-45C7-B24D-258D2F62D0DB@gmail.com> <41FC12EC-19C5-4635-94C2-224F264A62AF@gmail.com> <13162CF9-9E95-4ACC-80FD-B101602FC7F3@gmail.com> <0E44672E-F2F2-4C03-8C43-60086A2C783D@gmail.com> <27695B12-7CAA-4AB0-8FEB-EF0E0667AD22@gmail.com> Message-ID: <20210807212511.t52ojkjyrxkvwp52@topoi.pooq.com> On Fri, Aug 06, 2021 at 08:15:28PM +0000, lkcl wrote: > > > On August 6, 2021 11:12:07 AM UTC, lkcl wrote: > > >are there any other modes worth considering? > > i came up with another one: VLSET mode, the truncation occurs EITHER if the branch succeeded OR if the branch failed, depending on a new bit VSb (VL is set if branch success) > > this allows Vertical-First looping to truncate on failure (exit from a loop) without needing an extra (non-conditional) branch to do so, > > and > > it allows Horizontal-First to truncate VL on *success* (branch) or fail, as appropriate. > > other modes: > > mapreduce doesn't compute. it would be better to apply mapreduce separately and do a scalar branch-conditional. > > saturation is inapplicable to CRs. > > i am running out of ideas which is a good thing because holy cow is this a powerful instruction. it's not actually complex, it's just that the combination of the options and modes is hugely flexible. Flexible because it's a small piece of truly functional programming, single instructions being the functions being passed to the instruction setting up all these loop structures. -- hendrik > > l. > > _______________________________________________ > Libre-soc-dev mailing list > Libre-soc-dev at lists.libre-soc.org > http://lists.libre-soc.org/mailman/listinfo/libre-soc-dev From luke.leighton at gmail.com Sun Aug 8 00:13:48 2021 From: luke.leighton at gmail.com (lkcl) Date: Sat, 07 Aug 2021 23:13:48 +0000 Subject: [Libre-soc-dev] [RFC] SVP64 on branch instructions In-Reply-To: <20210807212511.t52ojkjyrxkvwp52@topoi.pooq.com> References: <121DFA96-9CAA-45C7-B24D-258D2F62D0DB@gmail.com> <41FC12EC-19C5-4635-94C2-224F264A62AF@gmail.com> <13162CF9-9E95-4ACC-80FD-B101602FC7F3@gmail.com> <0E44672E-F2F2-4C03-8C43-60086A2C783D@gmail.com> <27695B12-7CAA-4AB0-8FEB-EF0E0667AD22@gmail.com> <20210807212511.t52ojkjyrxkvwp52@topoi.pooq.com> Message-ID: On August 7, 2021 9:25:11 PM UTC, Hendrik Boom wrote: >Flexible because it's a small piece of truly functional programming, >single instructions >being the functions being passed to the instruction setting up all >these loop structures. pretty much, yes. a vector of tests you want to know: * AND, OR, NAND or NOR of the tests * end early and truncate VL to the fail point (however in some cases you want *at* the fail point, others *before* it) * for REMAP you might want to know before doing a loop what all the inner, middle and outer loop-end points are, and use those as predicate masks etc etc etc. so what should be a "simple" branch has something like *40* possible variations! l. From luke.leighton at gmail.com Sun Aug 8 14:30:53 2021 From: luke.leighton at gmail.com (lkcl) Date: Sun, 08 Aug 2021 13:30:53 +0000 Subject: [Libre-soc-dev] [RFC] SVP64 on branch instructions In-Reply-To: References: <121DFA96-9CAA-45C7-B24D-258D2F62D0DB@gmail.com> <41FC12EC-19C5-4635-94C2-224F264A62AF@gmail.com> <13162CF9-9E95-4ACC-80FD-B101602FC7F3@gmail.com> <0E44672E-F2F2-4C03-8C43-60086A2C783D@gmail.com> <27695B12-7CAA-4AB0-8FEB-EF0E0667AD22@gmail.com> <20210807212511.t52ojkjyrxkvwp52@topoi.pooq.com> Message-ID: i've started on the 24 bit RM decoder for BC, combining bits into 2 bit enums with only 3 entries in most cases, quite annoying that, but it is what it is. svstep for example: * disabled * non Rc mode * Rc mode and VLSET: * disabled * set to VL * set to VL-1 also with using both elwidth fields there now has to be a MUX on element widths, where the selector of that MUX is dependent on whether the operation is a Branch or not. hmmm. fortunately it is local i.e. not dependent on SVSTATE. i nearly made the mistake of making Branch Conditional dependent on SVSTATE.VerticalFirst Mode, which would have serious adverse consequences for multi-issue decoding. this is exactly the same reason why i said "Hard No" to the idea of making the decoder critically dependent on when SVSTATE.VL==0 if we were designing something that was specifically intended for non-supercomputer non-multi-issue uses, adding critical dependencies between SVSTATE and the decoder would be perfectly fine. l. From richard.wilbur at gmail.com Sun Aug 8 18:20:08 2021 From: richard.wilbur at gmail.com (Richard Wilbur) Date: Sun, 8 Aug 2021 10:20:08 -0700 Subject: [Libre-soc-dev] [RFC] SVP64 on branch instructions In-Reply-To: References: Message-ID: > On Aug 8, 2021, at 06:32, lkcl wrote: > > i've started on the 24 bit RM decoder for BC, combining bits into 2 bit enums with only 3 entries in most cases, quite annoying that, but it is what it is. Indeed, realizing that it is not as densely packed as if all the possibilities were used can be vexing, but it leaves room to accommodate one more option if we realize later that something could be an immense improvement with an additional mode. […] > i nearly made the mistake of making Branch Conditional dependent on SVSTATE.VerticalFirst Mode, which would have serious adverse consequences for multi-issue decoding. > > this is exactly the same reason why i said "Hard No" to the idea of making the decoder critically dependent on when SVSTATE.VL==0 > > if we were designing something that was specifically intended for non-supercomputer non-multi-issue uses, adding critical dependencies between SVSTATE and the decoder would be perfectly fine. So I’m envisioning the supercomputer multi-issue decoder loading something like a cache line at a time from memory/cache, starting the decode by determining instruction boundaries (left-to-right cascade, but pretty quick/simple to determine 32-bit or 64-bit), then parallel decode can start on each instruction up to dispatch when hazards from interactions with resources modified by previous instructions need to be taken into account. It is a very cool picture—even cooler because, to the extent they are used, the horizontal and vertical loop/vector modes will relieve a large amount of instruction cache and decoder activity! I suppose dispatch will need to depend on/have a hazard on-SVSTATE (at least VL?) in order to possibly parallelize some vector operations in an implementation-dependent fashion? It seems likely that if VL <= the number of ALU’s that the initial multiplications of a vector dot product could be dispatched in parallel. From lkcl at lkcl.net Sun Aug 8 20:11:29 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Sun, 8 Aug 2021 20:11:29 +0100 Subject: [Libre-soc-dev] [RFC] SVP64 on branch instructions In-Reply-To: References: Message-ID: --- crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68 On Sun, Aug 8, 2021 at 6:20 PM Richard Wilbur wrote: > > > > On Aug 8, 2021, at 06:32, lkcl wrote: > > > > i've started on the 24 bit RM decoder for BC, combining bits into 2 bit enums with only 3 entries in most cases, quite annoying that, but it is what it is. > > Indeed, realizing that it is not as densely packed as if all the possibilities were used can be vexing, but it leaves room to accommodate one more option if we realize later that something could be an immense improvement with an additional mode. as a last resort, yes. the complexity involved of first spotting those brownfield encodings then chaining on them, it gets... yeah. > > if we were designing something that was specifically intended for non-supercomputer non-multi-issue uses, adding critical dependencies between SVSTATE and the decoder would be perfectly fine. > > So I’m envisioning the supercomputer multi-issue decoder loading something > like a cache line at a time from memory/cache, yes, and pushing that into a queue. (often this is not a shift register, just an SRAM but where the address is what moves. exactly like how static-sized queues get implemented in software, with a pointer to head and pointer to tail, you move them on) > starting the decode by determining instruction boundaries (left-to-right cascade, > but pretty quick/simple to determine 32-bit or 64-bit), yes. jacob had a great idea there to use a standard carry-save-propagation algorithm. > then parallel decode can start on each instruction up to dispatch correct. > when hazards from interactions with resources modified by previous instructions need to be taken into account. this (dispatch) is where, if you have dependencies on SVSTATE (such as the VerticalFirst bit, or the idea of having VL==0 mean something completely different as far as what those 64-bits *actually* mean, it all goes to hell. one of the prior instructions in the current "batch" might *change* VL, or *change* to VerticalFirst Mode. now you want every one of those parallel decoders to be critically dependent on something that was in a previous slot?? oink. that's no longer a paralleliseable decoder, is it? > It is a very cool picture—even cooler because, to the extent they are used, > the horizontal and vertical loop/vector modes will relieve a large amount of > instruction cache and decoder activity! Vertical-First in "batch" mode - i.e. when the hardware has set the VF "Hint" to a value other than 1, yes. or, if, like in MyISA 66000 by Mitch Alsup, the hardware can determine through lookahead that it can parallelise a whole batch (automatically determine the number of elements in a loop that can be done entirely in parallel) > I suppose dispatch will need to depend on/have a hazard on-SVSTATE > (at least VL?) yes. in Horizontal-First, mode, definitely. the actual relationship between parallelly-decoded instructions and the issued elements-which-may-be-batched is *not* a linear one. decoder1 decoder2 decoder3 decoder4 decoder5 decoder6 sv.add sv.mul setvli 5 sv.sub ... VL=4 VL=4 VL=5 VL=5 the instructions that get issued will be: decoder1: QTY 4x ADDs decoder2: QTY 4x MULs decoder3: QTY 1x change of SVSTATE decoder4: QTY 5x SUBs **ONLY** in the circumstance where all 4 ADDs may be passed straight through to **ONE** ALU in *ONE* clock cycle will it be possible to also consider some of the MUL operations. in the case where that is not possible, let us assume e.g. that there are 8 potential issue slots, we may issue QTY4 ADDs to the first 4 slots and QTY4 MULs to the next 4. ... errr.... but we have 8-way multi-issue and 8-way parallel decode? errr what happened to the other 8 decoded instructions? answer: the issue slots are all full, just from the first two instructions. the rest have to wait. this is not a bad thing per se, because execution has just been spammed and is 100% occupied. > in order to possibly parallelize some vector operations in an implementation-dependent fashion? It seems likely that if VL <= the number of ALU’s that the initial multiplications of a vector dot product could be dispatched in parallel. even if VL >= the number of ALUs, the multiplications can still be issued in parallel. it's just that the decoders sit there "zzzzz" and yet we're perfectly happy with that situation because back-end execution is 100% occupied. l. From vklr at vkten.in Mon Aug 9 01:29:04 2021 From: vklr at vkten.in (Veera) Date: Mon, 9 Aug 2021 05:59:04 +0530 Subject: [Libre-soc-dev] Status of Power ASIC Chip sent to TSMC Fab Message-ID: <20210809002902.GA1671@lily.local> Hi, What is the status of OPENPOWER ASIC Chip sent to TSMC 180nm Fab? Has it arrived to Libre-SOC team. Regards, Veera From whygee at f-cpu.org Mon Aug 9 01:31:12 2021 From: whygee at f-cpu.org (whygee at f-cpu.org) Date: Mon, 09 Aug 2021 02:31:12 +0200 Subject: [Libre-soc-dev] Status of Power ASIC Chip sent to TSMC Fab In-Reply-To: <20210809002902.GA1671@lily.local> References: <20210809002902.GA1671@lily.local> Message-ID: <2563e68ff2735262e36666a108670cf5@f-cpu.org> On 2021-08-09 02:29, Veera wrote: > Hi, > > What is the status of OPENPOWER ASIC Chip sent to TSMC 180nm Fab? > > Has it arrived to Libre-SOC team. I doubt : it's going to take many months... and we'd be the first to know ! Still, I'm looking forward to more news like everybody :-) > Regards, > Veera yg From luke.leighton at gmail.com Mon Aug 9 16:46:43 2021 From: luke.leighton at gmail.com (lkcl) Date: Mon, 09 Aug 2021 15:46:43 +0000 Subject: [Libre-soc-dev] [llvm-dev] [RFC] Vector/SIMD ISA Context Abstraction In-Reply-To: References: Message-ID: <402E58C0-81FF-4ED3-9E8A-14741C1665F4@gmail.com> again, apologies, a follow-up: i'd like to keep the conversation going (with everyone). a reminder / summary of the proposal: all basic *scalar* LLVM intrinsics extend with *optional* arguments that provide Vector / SIMD Augmentation Context. the benefit being that the number of intrinsics needed now and in the future in LLVM is dramatically reduced first, a clarification: Renato, you asked if the shuffle capability of LLVM SVE was sufficient: i replied slightly flippantly asking if shuffle-{any-arith-op} existed as a concept (apologies for that). SVP64 does not have shuffle-{any-arith-op} however being targetted at 3D and Video it does have Swizzle and a new concept: REMAP. Swizzle can be applied through prefixing to all source registers. it is well-known in the GPU world, especially how important it is, and does not need describing. REMAP is a completely new concept. an algorithmic "remapping" is applied to the normally sequentially-incrementing Vector Element indices. useful limited easy-to-implement "remappings" are being developed, such as Matrix Schedules (0 3 6 1 4 7 2 5 8) and RADIX-2 FFT/DCT Butterfly Schedules. normally Shuffle is limited to either memory operations or to register MV operations, and both are inherently supported by SVP64 through Vectorisation of base scalar operations: Indexed LD/ST for example. my point is that whilst SVP64 supports the "normal" expected type of Shuffle Operations expected of Vector ISAs (Vector-Indexed-LD, Indexed-Reg-MV) it also has GPU style Swizzle (a limited type of shuffle for short vectors up to length 4) and REMAP. thus, there is a case even for adding shuffle-augmentation to base LLVM intrinsics as optional arguments. the one that *is* much more general purpose but was not mentioned except in passing was VGATHER-VSCATTER. in all other Vector ISAs these are usually either memory-only or Reg-MV operations (or both). it's usually done with Predicate Masks. In SVP64, surprise: both VGATHER and VSCATTER are abstracted-out concepts that can apply to almost every operation. this is not possible to do all thr time, but when *both* are applied (VGATHER to the source regs or memory, VSCATTER to the dest), we call that "Twin Predication". thus, again, we would propose adding *both* a source predicate mask *and* destination predicate mask to base llvm intrinsics, as optional arguments. the other concept is slightly odd: element-width overrides even on operations where the source registers are specified at a fixed width already. this one i am slightly uncertain about. we have a Mode in SVP64 called "Saturate" which has sub-options Signed and Unsigned. the rules for this took us some time to derive: eventually we realised that the rule has to be that the arithmetic operation appears to take place at *infinite* precision, followed up by truncation to the min/max of the output bitwidth. all other definitions turned out to be problematic in some way (particularly for multiply or power). what i am not certain about is whether it is perfectly sufficient to use standard base LLVM intrinsics, and count on source register type and return type as the SVP64 src width and dest width, and simply add optional arguments for signed/unsigned saturation. however what is clear to me is that there is very little conceptual limit as to what can be added as optional arguments to base intrinsics. it would be up to ISA Maintainers to define what they can provide in hardware. i would very much love to hear from other ISA Maintainers as to whether the ISA they are responsible for could benefit from this approach, both in the 3D GPU World as well as standard non-GPU: ARM SVE2, x86, AMDGPU, MIPS, ppc64, SX-Aurora, everyone. SIMD ISAs would have an optional argument specifying the (fixed) length. Cray-style Scalar Vector ISAs would have an optional argument specifying that the length was variable. the invitation is therefore to see if this idea, of adding optional Vectorisation Context to base llvm intrinsics, has merit across the entire LLVM community, and, if it does, what would it look like? key question: what impact would a large number of optional arguments to LLVM base intrinsics have, on performance and memory consumption? would it be beneficial or adverse? i honestly have no idea. another question: if a given ISA does not provide a particular hardware feature (saturation let us say) then should this be declared in some fashion such that LLVM avoids emitting llvm.add(args, sat=signed) OR should the functionality be provided anyway by way of soft-passes behind the scenes? i.e. the lack of hardware saturation would result in IR being emitted that ultimately performed the saturation using multiple assembly operations. given that this latter approach would effectively imply that *all* LLVM IR backends "supported" SIMD and Vectorisation (emulated through IR passes for non-Vector non-SIMD hardware) it would need some serious thought. l. From lkcl at lkcl.net Mon Aug 9 19:09:39 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Mon, 9 Aug 2021 19:09:39 +0100 Subject: [Libre-soc-dev] libre-soc server cgroups In-Reply-To: References: Message-ID: On Mon, Aug 2, 2021 at 9:12 AM Luke Kenneth Casson Leighton wrote: > https://contabo.com/en/vps/vps-s-ssd/?image=ubuntu.267&qty=1&contract=1 > > 4 cores, 8 GB RAM, 200 GB SSD for EUR 6, that's pretty damn good. > moving to a different VM however is quite a bit of hassle. i wonder if > i can get mythic-beasts to negotiate alternative pricing. Staf, i had a word with mythic-beasts, they explained the difference: contabo are likely to be using "ballooning" (meaning, they allocate more customers than there are actual resources). mythic-beasts *very deliberately* do not do that. also, their peering arrangement with other ISPs provides a guaranteed gigabit-ethernet-level service, and if you look at the bandwidth allocations the "level up" for contabo is a whopping EUR 300+ euros a month. bottom line is, if the libre-soc server gets hammered we can scale it up at not unreasonable prices, whereas contabo will hit limits much earlier and we'd be hit with disproportionately high bills to fix that. the team at mythic-beasts loved what we're doing, so they offered to upgrade us to 2-core 4GB RAM for 12 GBP/m inc VAT (actually 10 GBP but i don't think they were aware of the IPv4 address) i've now listed them as a hosting sponsor on the front page. l. From lkcl at lkcl.net Tue Aug 10 12:48:43 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Tue, 10 Aug 2021 12:48:43 +0100 Subject: [Libre-soc-dev] [RFC] SVP64 on branch instructions In-Reply-To: References: Message-ID: hm, i'm starting to implement the SVP64 Branch in ISACaller and ran immediately into an issue. unlike CTR decrementing, combining SVSTEP capability into Branch is actually very complicated: a lot of gates, a lot of state. question: when preparing the *next* SVSTATE (the next src/dest step), do you use the CR bit to test from the *CURRENT* src/dest step, or the NEXT src/dest step? what happens when REMAP is involved? how many gates in the chain are there before you can even determine if the branch should go ahead? CTR you can just subtract one. much as i would like an svstep mode in sv.bc, it's too CISC. annoying. l. From luke.leighton at gmail.com Tue Aug 10 13:47:24 2021 From: luke.leighton at gmail.com (lkcl) Date: Tue, 10 Aug 2021 12:47:24 +0000 Subject: [Libre-soc-dev] NLnet cryotoprimitives grant approved In-Reply-To: References: Message-ID: with many thanks to NLnet, the EUR 50,000 grant to research and develop Draft cryptographic primitives and instructions to the newly-open Power ISA has been approved. unlike RISC-V where full transparency and trust is problematic and there are many participants whose interests may not necessarily align, the OpenPOWER initiative, which has been in careful planning for nearly 10 years, is a much less crowded space and, crucially, does not require non-transparent membership of OPF in order to submit ISA RFCs (Requests for Change) [non-OPF members cannot participate in actual ISA WG meetings and certainly cannot vote on RFCs, but they can at least submit them. whereas whilst the RISC-V Foundation's Commercial Confidence Requirements are perfectly reasonable, the blanket secrecy even for submitting RFCs is not] we at Libre-SOC aim to use this process, based on taking apart key strategic cryptographic algorithms back to their mathematical roots, then applying Vector ISA design analysis and seeing what can be created. examples include going back to the fundamental basis of Rijndael, and instead of creating hardcoded custom silicon for MixColumns as is the "normal" practice, adding a generic Galois Field ALU and a generic Matrix Multiply system. another is to design instructions suitable for "big integer math" this in turn means that the resultant ISA would be ideally suited to the experimental development of future cryptographic algorithms for use in securing wallets and other purposes related to blockchain management. [as bitcoin stands we cannot possibly hope to compete with custom silicon dedicated to SHA hash production, however we would very much like to see a future version of bitcoin that uses far less power yet retains its high strategic value, and, at the same time, like e.g. monero RandomX, is better suited to a general-purpose Vector Supercomputer ISA, which is what we are developing] OpenPOWER's commitment to a transparent RFC process allows us to do that without compromising trust: no discussions that we participate in will ever be behind closed doors. if anyone would be interested to participate or collaborate on this, we have funding available, and welcome involvement in designing and testing an ISA suitable for securing bitcoin for end-users in a fully transparent fashion. l. From lkcl at lkcl.net Tue Aug 10 13:54:55 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Tue, 10 Aug 2021 13:54:55 +0100 Subject: [Libre-soc-dev] Fwd: NLnet cryotoprimitives grant approved In-Reply-To: References: Message-ID: with many thanks to NLnet, the EUR 50,000 grant to research and develop Draft cryptographic primitives and instructions to the newly-open Power ISA has been approved. unlike RISC-V where full transparency and trust is problematic and there are many participants whose interests may not necessarily align, the OpenPOWER initiative, which has been in careful planning for nearly 10 years, is a much less crowded space and, crucially, does not require non-transparent membership of OPF in order to submit ISA RFCs (Requests for Change) [non-OPF members cannot participate in actual ISA WG meetings and certainly cannot vote on RFCs, but they can at least submit them. whereas whilst the RISC-V Foundation's Commercial Confidence Requirements are perfectly reasonable, the blanket secrecy even for submitting RFCs is not] we at Libre-SOC aim to use this process, based on taking apart key strategic cryptographic algorithms back to their mathematical roots, then applying Vector ISA design analysis and seeing what can be created. examples include going back to the fundamental basis of Rijndael, and instead of creating hardcoded custom silicon for MixColumns as is the "normal" practice, adding a generic Galois Field ALU and a generic Matrix Multiply system. another is to design instructions suitable for "big integer math" this in turn means that the resultant ISA would be ideally suited to the experimental development of future cryptographic algorithms for use in securing wallets and other purposes related to blockchain management. [as bitcoin stands we cannot possibly hope to compete with custom silicon dedicated to SHA hash production, however we would very much like to see a future version of bitcoin that uses far less power yet retains its high strategic value, and, at the same time, like e.g. monero RandomX, is better suited to a general-purpose Vector Supercomputer ISA, which is what we are developing] OpenPOWER's commitment to a transparent RFC process allows us to do that without compromising trust: no discussions that we participate in will ever be behind closed doors. if anyone would be interested to participate or collaborate on this, we have funding available, and welcome involvement in designing and testing an ISA suitable for securing bitcoin for end-users in a fully transparent fashion. l. --- crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68 From programmerjake at gmail.com Tue Aug 10 18:10:25 2021 From: programmerjake at gmail.com (Jacob Lifshay) Date: Tue, 10 Aug 2021 10:10:25 -0700 Subject: [Libre-soc-dev] Fwd: NLnet cryotoprimitives grant approved In-Reply-To: References: Message-ID: On Tue, Aug 10, 2021, 05:55 Luke Kenneth Casson Leighton wrote: > with many thanks to NLnet, the EUR 50,000 grant to research and > develop Draft cryptographic primitives and instructions to the > newly-open Power ISA has been approved. Yay! I told Phoronix since I think they would deem this sufficiently newsworthy. Jacob From lkcl at lkcl.net Tue Aug 10 18:25:10 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Tue, 10 Aug 2021 18:25:10 +0100 Subject: [Libre-soc-dev] Fwd: NLnet cryotoprimitives grant approved In-Reply-To: References: Message-ID: On Tue, Aug 10, 2021 at 6:10 PM Jacob Lifshay wrote: > Yay! > I told Phoronix since I think they would deem this sufficiently newsworthy. ah yeah that's a good idea. primarily i wanted to see if there's anyone in the bitcoin community interested in this. there was a company i'd been speaking to who wanted to do something based on RISC-V. rather sheepishly i had to explain to them the conflict between "making things transparent and public" and "the way ISA mods are done in RISC-V". they believed that they could do the modifications as a custom extension, which they probably can... except they end up being the permanent maintainers of a hard fork of gcc, llvm, binutils, u-boot, linux kernel, libc6 and so on. oops. l. From programmerjake at gmail.com Wed Aug 11 13:14:47 2021 From: programmerjake at gmail.com (Jacob Lifshay) Date: Wed, 11 Aug 2021 05:14:47 -0700 Subject: [Libre-soc-dev] Fwd: NLnet cryotoprimitives grant approved In-Reply-To: References: Message-ID: On Tue, Aug 10, 2021, 10:32 Luke Kenneth Casson Leighton wrote: > On Tue, Aug 10, 2021 at 6:10 PM Jacob Lifshay > wrote: > > > Yay! > > I told Phoronix since I think they would deem this sufficiently > newsworthy. > > ah yeah that's a good idea. > https://www.phoronix.com/scan.php?page=news_item&px=Libre-SoC-Crypto-Project Jacob Lifshay > From luke.leighton at gmail.com Thu Aug 12 13:21:44 2021 From: luke.leighton at gmail.com (lkcl) Date: Thu, 12 Aug 2021 12:21:44 +0000 Subject: [Libre-soc-dev] [RFC] SVP64 Vertical-First Mode, batch processing Message-ID: since adding Vertical-First Mode, which is very cool, a lot simpler to add into compilers, and closer to Mitch Alsup's MyISA 66000 Virtual Vectors, the implications have taken some time to sink in. VF Mode does *not* increment srcstep/dststep automatically on running an instruction: srcstep/dststep *remain where they are*. an explicit instruction, svstep, is called to increment src/dststep, then a branch-conditional test of whether VL has been reached, loop back on a BATCH of instructions to do the next element(s). the next logical evolution on that is: do you allow just the one element per instruction to be executed? or do you allow up to a certain explicit set limit? in Mitch Alsup's MyISA 66000 it is entirely up to the hardware to determine and decide that "batch size". the idea being: for very simple hardware, the batch size (number of elements executed per instruction) is definitely one. this means that the VVM Loop is basically very similar to Power ISA Branch CTR automatic decrementing. this is also the "fallback" position for complex hardware if it cannot determine it can do multiple elements safely. more complex hardware in MyISA 66000 can use OoO in-flight buffers. the caveat: the VVM loop has to be short enough that the engine can analyse the entire loop (a couple of cache lines), and determine that even memory accesses inside the loop are "safe", and thus determine the element batch size, which, obviously, has to be fixed for the ENTIRE loop. (it's no good executing 3 elements of the vector for the first instruction then doing 5 for the next, you are guaranteed data corruption that way) the limitations: you can't do branches inside the loop, you can't call functions, and the only way to get Vectors per se is to use memory LD/STs. for most situations this is perfectly fine, for us it's not. also, critically relying on an OoO engine to determine the batch size, i am not happy with that. so the initial idea is, to have a "Batch Hint" size, very similar to VL. the compiler informs the hardware "you can safely do up to this many elements per instruction, please tell me exactly how many you CAN do". ironically you should recognise that as the EXACT same rules for Cray Vectors setvl! here's where it gets complicated, given how far along we are. i initially thought, "we need a new hint SPR, like VL and MAXVL, called VFHintLen". this hint would be completely separate from VL and MVL, still within the limits of VL and MVL. VFHintLen <= VL <= MVL and you execute batches of length VFHintLen until hitting VL however what i have just come to realise is: actually, VFHintLen is redundant.... *if VL is made to do its job*. in Horizontal-First Mode we have: * MVL set to max reservation (statically determined by compiler) * VL set dynamically at runtime to explicit value * loops go from 0 to VL-1 in VF Mode currently it is: * MVL set to max reservation (statically determined by compiler) * VL set dynamically at runtime to explicit value * VFHint *requested* but is set to hw limit * VFHint elements are run in batches limited by VL example, MVL=12, VL=10, VFH=3 * first time round a loop elements 0 1 2 are executed in parallel * svstep called, src/dststep incremented by VFHint (3) * second loop elements 3 4 5 executed in parallel * svstep called, src/dst incremented to 6 * third loop elements 6 7 8 executed in parallel * svstep called, src/dst incremented to 9 * fourth loop ONLY element 9 executed because VL=10 * svstep sets CR0 to 1 to indicate "src/dst exceeds VL" * Branch-Conditional fails, loop is exited notice how MVL was wasted, there? what i *believe* we may be able to do is: do without VFHint and use *VL and MVL instead*. example, in Vertical-First mode: * MVL would be set to 10 (as an immediate) * VL would be *requested* to be set to a given dynamic value, but would be set to a value that HARDWARE determines it can cope with * proceed same as above but src/dst step test against **VL** not VFHint and * svstep tests a limit against **MVL** not VL. basically all testing of the limit of src/dststep right now is: if srcstep < VL srcstep increments i propose this change to: if HorizontalFirst if srcstep < VL srstsep increments else if VerticalFirst if srcstep < *MAXVL* srcstep increments questions, comments? l. From richard.wilbur at gmail.com Thu Aug 12 22:37:16 2021 From: richard.wilbur at gmail.com (Richard Wilbur) Date: Thu, 12 Aug 2021 15:37:16 -0600 Subject: [Libre-soc-dev] [RFC] SVP64 Vertical-First Mode, batch processing Message-ID: <1FAAD960-18E2-406B-88D4-96613BF7949E@gmail.com> > On Aug 12, 2021, at 06:23, lkcl wrote: > > since adding Vertical-First Mode, which is very cool, a lot simpler to add into compilers, and closer to Mitch Alsup's MyISA 66000 Virtual Vectors, the implications have taken some time to sink in. Very cool indeed. Sounds like Mitch Alsup’s MyISA 66000 design would be very interesting reading. Is there public documentation? It is interesting to me how reminiscent this is of my proposal back in 1988-1990 of a massively serial machine that would decode a section of code and configure connections between functional units and data dependencies. Then it would go run the code limited only by the timing of data availability. > VF Mode does *not* increment srcstep/dststep automatically on running an instruction: srcstep/dststep *remain where they are*. an explicit instruction, svstep, is called to increment src/dststep, then a branch-conditional test of whether VL has been reached, loop back on a BATCH of instructions to do the next element(s). What if svstep was a state associated with the branch instruction in the Finite State Machine implementing Vertical-First Mode instead of requiring a separate op code, cache space, and a decode slot? Is svstep used outside of the Vertical-First Mode context? […] > i propose this change to: > > if HorizontalFirst > if srcstep < VL > srstsep increments > else if VerticalFirst > if srcstep < *MAXVL* > srcstep increments > > questions, comments? Sounds like a good thing. From luke.leighton at gmail.com Thu Aug 12 23:14:48 2021 From: luke.leighton at gmail.com (lkcl) Date: Thu, 12 Aug 2021 22:14:48 +0000 Subject: [Libre-soc-dev] [RFC] SVP64 Vertical-First Mode, batch processing In-Reply-To: <1FAAD960-18E2-406B-88D4-96613BF7949E@gmail.com> References: <1FAAD960-18E2-406B-88D4-96613BF7949E@gmail.com> Message-ID: On August 12, 2021 9:37:16 PM UTC, Richard Wilbur wrote: >Very cool indeed. Sounds like Mitch Alsup’s MyISA 66000 design would >be very interesting reading. Is there public documentation? ah no. you can however email Mitch (check comp.arch newsgroup) and request it. > >It is interesting to me how reminiscent this is of my proposal back in >1988-1990 of a massively serial machine that would decode a section of >code and configure connections between functional units and data >dependencies. Then it would go run the code limited only by the timing >of data availability. intriguing >What if svstep was a state associated with the branch instruction in >the Finite State Machine implementing Vertical-First Mode instead of >requiring a separate op code, cache space, and a decode slot? Is >svstep used outside of the Vertical-First Mode context? yes it is. sv.step (a Horizontal version of single-step) can be used to obtain a Vector of Condition Registers, where each CR Field contains whether a given src step is part of a "loop end condition". let us say that VL=4, you call sv.step. (Rc=1) the result will be that CR0=0 CR1=0 CR2=0 CR3=1 because VL=4, and the end condition of the loop 0..VL-1 terminates at CR3, CR3 gets a "1". it gets more complex when REMAP is involved: there you can extract the end-points of the inner, middle *and* outer REMAP loop end-conditions. e.g. if you use MATRIX remap, a 2x2 matrix: CR0=b00 CR1=0b01 CR2=0b10 CR3=0b11 >[…] >> i propose this change to: >> >> if HorizontalFirst >> if srcstep < VL >> srstsep increments >> else if VerticalFirst >> if srcstep < *MAXVL* >> srcstep increments >> >> questions, comments? > >Sounds like a good thing. my only concern is, should MVL be restricted to an immediate (for VFirst mode) or should it be allowed to be set via a register (RA). whilst the logic behind making MVL compile-time static for Horizontal Mode is obvious, i haven't got my head round Vertical Mode yet. l. From luke.leighton at gmail.com Fri Aug 13 00:22:56 2021 From: luke.leighton at gmail.com (lkcl) Date: Thu, 12 Aug 2021 23:22:56 +0000 Subject: [Libre-soc-dev] [RFC] SVP64 Vertical-First Mode, batch processing In-Reply-To: References: <1FAAD960-18E2-406B-88D4-96613BF7949E@gmail.com> Message-ID: <056BE990-F94D-4E50-B9AF-D5769A9B25E8@gmail.com> On August 12, 2021 10:14:48 PM UTC, lkcl wrote: >On August 12, 2021 9:37:16 PM UTC, Richard Wilbur > wrote: >>What if svstep was a state associated with the branch instruction in >>the Finite State Machine implementing Vertical-First Mode instead of >>requiring a separate op code, cache space, and a decode slot? forgot to say, the svstep instruction has a lot more options than sv.bc, and there are not enough bits available spare in 24 bit RM. also the number of registers that go in and out of bc is already really high: in: * SVSTATE * CIA * LR * CTR * CR out: * SVSTATE * NIA * CTR * LR that's a hell of a lot of registers. an svstep variant of bc would also need to write to CR. that's *ten* registers, 5 read, 5 write, i don't think any other instruction in the whole of Power ISA has anywhere near that many. l. From luke.leighton at gmail.com Fri Aug 13 16:14:43 2021 From: luke.leighton at gmail.com (lkcl) Date: Fri, 13 Aug 2021 15:14:43 +0000 Subject: [Libre-soc-dev] [RFC] SVP64 Vertical-First Mode, batch processing In-Reply-To: References: <1FAAD960-18E2-406B-88D4-96613BF7949E@gmail.com> Message-ID: <52A9B46C-43E4-4625-AC6C-C58B3E62FA21@gmail.com> On August 12, 2021 10:14:48 PM UTC, lkcl wrote: > > >On August 12, 2021 9:37:16 PM UTC, Richard Wilbur > wrote: >>> i propose this change to: >>> >>> if HorizontalFirst >>> if srcstep < VL >>> srstsep increments >>> else if VerticalFirst >>> if srcstep < *MAXVL* >>> srcstep increments >>> >>> questions, comments? >> >>Sounds like a good thing. > >my only concern is, should MVL be restricted to an immediate (for >VFirst mode) or should it be allowed to be set via a register (RA). > >whilst the logic behind making MVL compile-time static for Horizontal >Mode is obvious, i haven't got my head round Vertical Mode yet. Horizontal-First, you perform these types of loops: setmaxvli 8 loop: setvl r5, r3 # VL=r5=MAX(MVL, r3) sv.ld r20.v, r4(0) # load VL elements (max 8) sv.addi r20.v, r20.v, 55 # add 55 to all vector sv.st r20.v, r4(0) # store VL elements add r4, r4, r5 # move r4 pointer forward sub. r3, r3, r5 # decrement total count by VL bnz loop this will always do 8 elements at a time until r3 drops below 8. VerticalFirst you insert a *second inner loop* with an svstep instruction just before the bnz but also, at the moment, rather than just setmaxvli 8 is is: setmaxvvlandvfhint 8, 2 # MVL=8, VFHint=2 if the hardware *chooses* to set VFHint=2, there we will always have 2 elements at a time in the inner loop, until srcstep reaches VL setmaxvvlandvfhint 8, 2 # MVL=8, VFHint=2 loop: setvl r5, r3 # VL=r5=MAX(MVL, r3) loopinner: sv.ld r20.v, r4(0) # load VLhint elements (max 2) sv.addi r20.v, r20.v, 55 # add 55 to 2 elements sv.st r20.v, r4(0) # store VLhint elements svstep. # srcstep += VLhint bnz loopinner # repeat until srcstep=VL # now done VL elements, move to next batch add r4, r4, r5 # move r4 pointer forward sub. r3, r3, r5 # decrement total count by VL bnz loop the question is, then: can we get rid of the inner loop? and if we do can anything useful be done? i have a feeling, looking at this assembler, that VLhint genuinely serves a different purpose *in addition* to VL and MAXVL. (btw aside: svstep+bnz was why i wanted a step-and-test branch conditional instruction but it's too CISC) l. From madan.kartheessan at gmail.com Fri Aug 13 18:17:12 2021 From: madan.kartheessan at gmail.com (Madan Kartheessan) Date: Fri, 13 Aug 2021 22:47:12 +0530 Subject: [Libre-soc-dev] General Introduction Message-ID: Hello all: Good evening, I am from Chennai (formerly, Madras) , India. I work for Object Automation Software Solutions Private Limited. My title is Techno Project Manager. I also teach Python, its libraries like Pandas, Numpy, Scikit-learn, etc. and Machine Learning algorithms. I am happy to join the libre-soc-dev list. Happy weekend. Regards Madan K. From lkcl at lkcl.net Fri Aug 13 18:21:34 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Fri, 13 Aug 2021 18:21:34 +0100 Subject: [Libre-soc-dev] General Introduction In-Reply-To: References: Message-ID: (ccing you, Madan, as you are not yet subscribed) On Fri, Aug 13, 2021 at 6:17 PM Madan Kartheessan < madan.kartheessan at gmail.com> wrote: > Hello all: > Good evening, I am from Chennai (formerly, Madras) , India. I work for > Object Automation Software Solutions Private Limited. My title is Techno > Project Manager. I also teach Python, its libraries like Pandas, Numpy, > Scikit-learn, etc. and Machine Learning algorithms. > fantastic, great to hear from you Madan. that's very interesting to hear that you have extensive knowledge of numpy. I am happy to join the libre-soc-dev list. > ok so here, at this page, fill in your email address, name (if you want) and you can leave the password field blank, one will be created and emailed to you: http://lists.libre-soc.org/mailman/listinfo/libre-soc-dev i approved this message you sent to the list, it is better if you subscribe yourself, then you can receive all the messages sent to the list. best, l. From umbertocerrato at outlook.it Fri Aug 13 18:46:13 2021 From: umbertocerrato at outlook.it (Umberto Cerrato) Date: Fri, 13 Aug 2021 17:46:13 +0000 Subject: [Libre-soc-dev] General Introduction In-Reply-To: References: Message-ID: Hello, Welcome From luke.leighton at gmail.com Fri Aug 13 21:48:23 2021 From: luke.leighton at gmail.com (lkcl) Date: Fri, 13 Aug 2021 20:48:23 +0000 Subject: [Libre-soc-dev] [RFC] SVP64 Vertical-First Mode, batch processing In-Reply-To: <52A9B46C-43E4-4625-AC6C-C58B3E62FA21@gmail.com> References: <1FAAD960-18E2-406B-88D4-96613BF7949E@gmail.com> <52A9B46C-43E4-4625-AC6C-C58B3E62FA21@gmail.com> Message-ID: <8F813409-2548-446F-B4E6-E44429A1142B@gmail.com> On August 13, 2021 3:14:43 PM UTC, lkcl wrote: >Horizontal-First, you perform these types of loops: > > setmaxvli 8 >loop: > setvl r5, r3 # VL=r5=MAX(MVL, r3) > sv.ld r20.v, r4(0) # load VL elements (max 8) > sv.addi r20.v, r20.v, 55 # add 55 to all vector > sv.st r20.v, r4(0) # store VL elements > add r4, r4, r5 # move r4 pointer forward > sub. r3, r3, r5 # decrement total count by VL > bnz loop oo, oo, i just had an idea. setvlc r5 # VL=r5=MAX(MVL, CTR) ... ... add r4, r4, r5 sv.bnz/VLCTR # subtracts VL from CTR SVSTATE is *already* going into sv.bc so it is not a hardship to subtract VL from CTR. this reduces critical inner loops by one instruction and frees up a GPR. using CTR for loops is normal in Power ISA anyway. doesn't help with VFHint though. l From lkcl at lkcl.net Sat Aug 14 22:57:55 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Sat, 14 Aug 2021 22:57:55 +0100 Subject: [Libre-soc-dev] General Introduction In-Reply-To: References: Message-ID: On Fri, Aug 13, 2021 at 6:21 PM Luke Kenneth Casson Leighton wrote: > (ccing you, Madan, as you are not yet subscribed) > Madan, i checked the list mailing list membership, and you are subscribed... .... but you sent the intro from a *completely different* email address (one that you have *not* subscribed with - the gmail account) this was why i received a moderation request for a post from a non-member. there are a couple of solutions here: 1) subscribe the *gmail* account as well and set "nomail" (otherwise you receive 2 copies of the list posts) 2) set up "send as a 2nd email address" on gmail, *and remember to use it*. honestly, i just do (1) because sometimes i forget. what you wrote as an introduction is perfect to put on the team page http://libre-soc.org/about_us best, l. From programmerjake at gmail.com Sun Aug 15 06:46:01 2021 From: programmerjake at gmail.com (Jacob Lifshay) Date: Sat, 14 Aug 2021 22:46:01 -0700 Subject: [Libre-soc-dev] General Introduction In-Reply-To: References: Message-ID: On Fri, Aug 13, 2021, 10:18 Madan Kartheessan wrote: > Hello all: Welcome! Always glad to have more people interested in Libre-SOC! Jacob Lifshay From lkcl at lkcl.net Sun Aug 15 17:24:31 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Sun, 15 Aug 2021 17:24:31 +0100 Subject: [Libre-soc-dev] [RFC] SVP64 on branch instructions In-Reply-To: References: Message-ID: ok so there are now two first unit tests, "sv.bc" and "sv.bc/all". https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_bc.py;h=5378040085995813070ccaa9cbe28a1add9a5e81;hb=c3b9973df8edcb1f6c1583c2da693336af7d1921#l80 sv.bc/all makes a bit of a mess of the pseudocode, it's a Finite State Machine where ISACaller is calling the sv.bc operation once per element. in the case of sv.bc/all it is necessary to branch *ONLY* when *ALL* tests are successful... but the tests actually need to be done. normal bc pseudocode: if cond_ok then branch sv.bc pseudocode: if cond_ok and last element in VL loop: branch it's more complex than that, though. if NOT last element and cond NOT ok terminate entire VL loop with early-out non-ALL mode (ANY mode) is more straightforward, but again, on branch, you must not continue to do further tests! so branch definitely terminates the VL loop... ... but in Vertical-First Mode it's a completely different story, much more like a standard scalar branch. there's an awful lot going on, quite fascinating. l. From lkcl at lkcl.net Sun Aug 15 19:10:59 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Sun, 15 Aug 2021 19:10:59 +0100 Subject: [Libre-soc-dev] General Introduction In-Reply-To: References: Message-ID: On Sat, Aug 14, 2021 at 10:57 PM Luke Kenneth Casson Leighton wrote: > Madan, https://libre-soc.org/irclog/%23libre-soc.2021-08-15.log.html#t2021-08-15T17:18:09 great! that looks like a successful join of the #libre-soc IRC channel. normally, one would leave the IRC client running in order for people to notice, and respond, and say hello (hold a conversation). this is why i said that you should leave the IRC client running 24x7, or use "bnc4you" or other IRC proxy. if you log on, ask a question, then leave immediately, how can you receive the answer? it is like going into a room where people are having conversations, making an announcement, then walking out without waiting for anyone to turn around :) IRC conversations can often take 36 hours round-trip because people are in different timezones. you need to adjust expectations accordingly. l. From lkcl at lkcl.net Mon Aug 16 14:09:42 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Mon, 16 Aug 2021 14:09:42 +0100 Subject: [Libre-soc-dev] OA minutes 2021 aug 10 In-Reply-To: References: Message-ID: Madan, hi, i saw the update to the minutes, which is great, moving the contents that had been added to a page dedicated to another set of minutes (with a different date) https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=1fa1211c74a3b3dc4878bfdf7c479a79ac7b6a47 normal practice would be to then do a follow-up email informing everyone that that action has been taken. now that everyone is subscribed to the mailing list, you can use the mailing list for that purpose: "This is Madan: I have updated the minutes of the meeting that took place on Aug 10th, please can everyone read and review them" so, to illustrate by example, here is a summary of what i have done: 1) made some whitespace and formatting corrections to both minutes pages 2) created a new (template) page for the meeting tomorrow, https://libre-soc.org/oa/minutes/2021aug17/ given that you (and you alone) have sent an email to the list, and also joined (briefly) the IRC channel, we will go ahead with tomorrow's meeting, and we will allocate some time for you to explain to everyone else what you did and how you did it. i do expect everyone in the team to complete these tasks promptly, they are extremely basic, fundamental, and absolutely critical. projects are not about the code, they're about the communication. l. From niranjan at object-automation.com Mon Aug 16 16:57:47 2021 From: niranjan at object-automation.com (niranjan at object-automation.com) Date: Mon, 16 Aug 2021 11:57:47 -0400 Subject: [Libre-soc-dev] General Introduction Message-ID: <60c1dcfef7436bb25aaa577d5741d081.squirrel@email.powweb.com> Hello all, I am Niranjan, from Kerala, India. I am a third year B.Tech student at Indian Institute of Technology Madras (IITM), doing a project at Object Automation Software Solutions Pvt Ltd. I'm happy to join the libre-soc-dev mailing list. Thanks and regards, Niranjan J Nair From niranjan at object-automation.com Mon Aug 16 16:57:48 2021 From: niranjan at object-automation.com (niranjan at object-automation.com) Date: Mon, 16 Aug 2021 11:57:48 -0400 Subject: [Libre-soc-dev] General Introduction Message-ID: Hello all, I am Niranjan, from Kerala, India. I am a third year B.Tech student at Indian Institute of Technology Madras (IITM), doing a project at Object Automation Software Solutions Pvt Ltd. I'm happy to join the libre-soc-dev mailing list. Thanks and regards, Niranjan J Nair From luke.leighton at gmail.com Mon Aug 16 17:19:52 2021 From: luke.leighton at gmail.com (lkcl) Date: Mon, 16 Aug 2021 16:19:52 +0000 Subject: [Libre-soc-dev] General Introduction In-Reply-To: References: Message-ID: <387E632F-AD18-4A95-93CA-DDD510E19216@gmail.com> On August 16, 2021 3:57:48 PM UTC, niranjan at object-automation.com wrote: >Hello all, > >I am Niranjan, from Kerala, India. I am a third year B.Tech student at >Indian Institute of Technology Madras (IITM), doing a project at Object >Automation Software Solutions Pvt Ltd. >I'm happy to join the libre-soc-dev mailing list. fantastic, great to hear from you, welcome. have you reviewed the Charter and are you happy to abide by it? http://libre-soc.org/charter any questions about it feel free to ask. also do edit the wiki page and add yourself to http://libre-soc.org/about_us. best, l. From madan.kartheessan at gmail.com Mon Aug 16 17:17:04 2021 From: madan.kartheessan at gmail.com (Madan Kartheessan) Date: Mon, 16 Aug 2021 21:47:04 +0530 Subject: [Libre-soc-dev] Issues with madan.kartheessan@gmail.com Message-ID: Hi: I have subscribed using both of my email IDs to the libre-soc-dev list. 1) madan at object-automation.com 2) madan.kartheessan at gmail.com I am sending this mail using madan at object-automation.com. But, I am not able to see *"madan.kartheessan at gmail.com "* listed under the subscribers list of libre-soc-dev. I am able to see "madan at object-automation.com" When I try to login using madan.kartheessan at gmail.com and give the password , I get the message "*Libre-soc-dev roster authentication failed."* Regards Madan K. From luke.leighton at gmail.com Mon Aug 16 17:21:27 2021 From: luke.leighton at gmail.com (lkcl) Date: Mon, 16 Aug 2021 16:21:27 +0000 Subject: [Libre-soc-dev] General Introduction In-Reply-To: <60c1dcfef7436bb25aaa577d5741d081.squirrel@email.powweb.com> References: <60c1dcfef7436bb25aaa577d5741d081.squirrel@email.powweb.com> Message-ID: <986A9949-B7FE-4BD8-AD70-5696F21564FE@gmail.com> On August 16, 2021 3:57:47 PM UTC, niranjan at object-automation.com wrote: >Hello all, > >I am Niranjan, from Kerala, India. I am a third year B.Tech student at >Indian Institute of Technology Madras (IITM), doing a project at Object >Automation Software Solutions Pvt Ltd. >I'm happy to join the libre-soc-dev mailing list fantastic, great to hear from you as well, Niranjan. next steps are to review the Charter and reply if you are happy to abide by it or if you have any questions. best, l. From gautham at object-automation.com Mon Aug 16 17:31:34 2021 From: gautham at object-automation.com (gautham at object-automation.com) Date: Mon, 16 Aug 2021 12:31:34 -0400 Subject: [Libre-soc-dev] General Introduction Message-ID: Hello Everyone! I am Gautham, from Kerala, India. I am a pre-final year student at the Department of Electrical Engineering, Indian Institute of Technology Madras. I have some exprerience with Gate Level Design and Verilog. I am also familiar with C, C++ and Python. I also fool around with popular Machine Learning algorithms and packages. I also went through the charter and found it very interesting! Of course, I agree to abide by it. Very happy to be part of the libre-soc community! Regards Gautham From libre-soc at platen-software.de Mon Aug 16 17:32:56 2021 From: libre-soc at platen-software.de (Tobias Platen) Date: Mon, 16 Aug 2021 18:32:56 +0200 Subject: [Libre-soc-dev] daily kan-ban update 16aug2021 Message-ID: <51b21ccea7ac4e1af1bf93261f6ab3a3dd8f24f6.camel@platen-software.de> today: continuing where I left two weeks ago From lkcl at lkcl.net Mon Aug 16 17:34:58 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Mon, 16 Aug 2021 17:34:58 +0100 Subject: [Libre-soc-dev] Issues with madan.kartheessan@gmail.com In-Reply-To: References: Message-ID: On Mon, Aug 16, 2021 at 5:20 PM Madan Kartheessan < madan.kartheessan at gmail.com> wrote: > Hi: > I have subscribed using both of my email IDs to the libre-soc-dev list. > > 1) madan at object-automation.com > 2) madan.kartheessan at gmail.com i checked the subscriber list, it's not a member. there are however two messages in the exim4 logs to that email address, one in and one out, one at 16:58 (40 mins ago) and one at 17:20 (10 mins ago). try the subscription again. l. From lkcl at lkcl.net Mon Aug 16 17:48:31 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Mon, 16 Aug 2021 17:48:31 +0100 Subject: [Libre-soc-dev] General Introduction In-Reply-To: References: Message-ID: On Mon, Aug 16, 2021 at 5:32 PM wrote: > Hello Everyone! > > I am Gautham, from Kerala, India. I am a pre-final year student at the > Department of Electrical Engineering, Indian Institute of Technology > Madras. I have some exprerience with Gate Level Design and Verilog. I am > also familiar with C, C++ and Python. nice! I also fool around with popular > Machine Learning algorithms and packages. > very cool. do put all of that on this page https://libre-soc.org/about_us/ i created a section "Object Automation". > I also went through the charter and found it very interesting! Of course, > I agree to abide by it. Very happy to be part of the libre-soc community! > :) great to hear. if you can generate an ssh key i can add you to the gitolite3 git access. instructions for generating the ssh key are in https://libre-soc.org/HDL_workflow/ search for "ssh-keygen". l. From programmerjake at gmail.com Mon Aug 16 18:05:01 2021 From: programmerjake at gmail.com (Jacob Lifshay) Date: Mon, 16 Aug 2021 10:05:01 -0700 Subject: [Libre-soc-dev] General Introduction In-Reply-To: References: Message-ID: On Mon, Aug 16, 2021, 08:58 wrote: > Hello all, > > I am Niranjan, from Kerala, India. I am a third year B.Tech student at > Indian Institute of Technology Madras (IITM), doing a project at Object > Automation Software Solutions Pvt Ltd. > Welcome! Jacob Lifshay From programmerjake at gmail.com Mon Aug 16 18:06:50 2021 From: programmerjake at gmail.com (Jacob Lifshay) Date: Mon, 16 Aug 2021 10:06:50 -0700 Subject: [Libre-soc-dev] General Introduction In-Reply-To: References: Message-ID: On Mon, Aug 16, 2021, 09:32 wrote: > Hello Everyone! > > I am Gautham, from Kerala, India. I am a pre-final year student at the > Department of Electrical Engineering, Indian Institute of Technology > Madras. Welcome! I have some exprerience with Gate Level Design and Verilog. I am > also familiar with C, C++ and Python. I also fool around with popular > Machine Learning algorithms and packages. > Neat! Jacob Lifshay From madan.kartheessan at gmail.com Mon Aug 16 18:51:38 2021 From: madan.kartheessan at gmail.com (Madan Kartheessan) Date: Mon, 16 Aug 2021 23:21:38 +0530 Subject: [Libre-soc-dev] =?utf-8?q?Minutes_of_the_Meeting=E2=80=94August_6?= =?utf-8?q?_and_10?= Message-ID: Luke, David and the Object Automation team Using the link below, you will be able to read the August 6 and August 10 MoMs of Libre-SoC and the Object Automation teams.. https://libre-soc.org/oa/minutes/ Regards Madan K. From gautham at object-automation.com Mon Aug 16 18:52:17 2021 From: gautham at object-automation.com (gautham at object-automation.com) Date: Mon, 16 Aug 2021 13:52:17 -0400 Subject: [Libre-soc-dev] General Introduction In-Reply-To: References: Message-ID: <56454d111c41b22063c827c0a0705d42.squirrel@email.powweb.com> Hi, I have updated the "About Us" page. I am also sending my public ssh key. Gautham From libre-soc at platen-software.de Mon Aug 16 18:58:18 2021 From: libre-soc at platen-software.de (Tobias Platen) Date: Mon, 16 Aug 2021 19:58:18 +0200 Subject: [Libre-soc-dev] daily kan-ban update 16aug2021 In-Reply-To: <51b21ccea7ac4e1af1bf93261f6ab3a3dd8f24f6.camel@platen-software.de> References: <51b21ccea7ac4e1af1bf93261f6ab3a3dd8f24f6.camel@platen-software.de> Message-ID: <20210816195818.55667402c25ad5ec0f2387da@platen-software.de> On Mon, 16 Aug 2021 18:32:56 +0200 Tobias Platen wrote: > today: continuing where I left two weeks ago this includes fixing the renamed symbols. I get an AttributeError in the store function: def store(dut, src1, src2, src3, imm, imm_ok=True, update=False, byterev=True): print("ST", src1, src2, src3, imm, imm_ok, update) yield dut.oper_i.insn_type.eq(MicrOp.OP_STORE) yield dut.oper_i.data_len.eq(2) # half-word yield dut.oper_i.byte_reverse.eq(byterev) yield dut.src1_i.eq(src1) yield dut.src2_i.eq(src2) yield dut.src3_i.eq(src3) #FIXME -- symbols have been renamed -- #orig yield dut.oper_i.imm_data.imm.eq(imm) #orig yield dut.oper_i.imm_data.ok.eq(imm_ok) #orig yield dut.oper_i.update.eq(update) yield dut.oper_i.imm_data.data.eq(imm) yield dut.oper_i.imm_data.ok.eq(imm_ok) #error here: yield dut.oper_i.update.eq(update) #AttributeError: Record 'oper_i_None' does not have a field 'update'. #Did you mean one of: insn_type, fn_unit, imm_data, zero_a, rc, oe, #msr, is_32bit, is_signed, data_len, byte_reverse, sign_extend, #ldst_mode, insn, sv_pred_sz, sv_pred_dz, sv_saturate, sv_ldstmode, SV_Ptype yield dut.issue_i.eq(1) yield yield dut.issue_i.eq(0) > > > > > _______________________________________________ > Libre-soc-dev mailing list > Libre-soc-dev at lists.libre-soc.org > http://lists.libre-soc.org/mailman/listinfo/libre-soc-dev -- Tobias Platen From madan.kartheessan at gmail.com Mon Aug 16 19:03:52 2021 From: madan.kartheessan at gmail.com (Madan Kartheessan) Date: Mon, 16 Aug 2021 23:33:52 +0530 Subject: [Libre-soc-dev] MoM of Libre-SoC and the Object Automation teams Message-ID: Luke Thanks for putting the links in appropriate MoM and also formatting the content. It looks amazing now. Regards Madan K. From adigopzz3 at gmail.com Mon Aug 16 19:18:56 2021 From: adigopzz3 at gmail.com (Adithya Gopan) Date: Mon, 16 Aug 2021 23:48:56 +0530 Subject: [Libre-soc-dev] General Introduction Message-ID: Hello everyone, I am Adithya Gopan from Kerala, India. I am a third year BTech student majoring in Electrical Engineering, studying in Indian Institute of Technology Madras. I am currently doing a project at Object Automation. I am very excited to join this libre-soc-dev mailing list and I am very excited to join the libre-soc-dev mailing list. I hope to have a really cool and interesting ride. From luke.leighton at gmail.com Mon Aug 16 19:33:38 2021 From: luke.leighton at gmail.com (lkcl) Date: Mon, 16 Aug 2021 18:33:38 +0000 Subject: [Libre-soc-dev] MoM of Libre-SoC and the Object Automation teams In-Reply-To: References: Message-ID: <7DE09405-ABF2-4F94-A888-FB8B9C1BAF25@gmail.com> On August 16, 2021 6:03:52 PM UTC, Madan Kartheessan wrote: >Luke > >Thanks for putting the links in appropriate MoM and also formatting the >content. It looks amazing now. i did the simplest thing for now, which is to use ``` to make it fixed-width font. if you read up on markdown format (google it) you can make future ones look even better. l. From luke.leighton at gmail.com Mon Aug 16 19:35:59 2021 From: luke.leighton at gmail.com (lkcl) Date: Mon, 16 Aug 2021 18:35:59 +0000 Subject: [Libre-soc-dev] General Introduction In-Reply-To: <56454d111c41b22063c827c0a0705d42.squirrel@email.powweb.com> References: <56454d111c41b22063c827c0a0705d42.squirrel@email.powweb.com> Message-ID: <43FA1787-DCD9-4F2F-A08C-D9654E6AECDD@gmail.com> On August 16, 2021 5:52:17 PM UTC, gautham at object-automation.com wrote: >Hi, > >I have updated the "About Us" page. saw that, looks great https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=67a8e09b081506318780d4b6b0b849e4380d5a61 >I am also sending my public ssh key. great, send that to me (luke.leighton at gmail.com) it does not need to go to the list. l. From adithya at object-automation.com Mon Aug 16 19:23:00 2021 From: adithya at object-automation.com (adithya at object-automation.com) Date: Mon, 16 Aug 2021 14:23:00 -0400 Subject: [Libre-soc-dev] General Introduction Message-ID: <4bb393863ed37e27698498aa92857793.squirrel@email.powweb.com> Hello everyone, I am Adithya Gopan from Kerala, India. I am a third year BTech student majoring in Electrical Engineering, studying in Indian Institute of Technology Madras. I am currently doing a project at Object Automation. I am very excited to join this libre-soc-dev mailing list and I am very excited to join the libre-soc-dev mailing list. I hope to have a really cool and interesting ride. From luke.leighton at gmail.com Mon Aug 16 19:41:49 2021 From: luke.leighton at gmail.com (lkcl) Date: Mon, 16 Aug 2021 18:41:49 +0000 Subject: [Libre-soc-dev] daily kan-ban update 16aug2021 In-Reply-To: <20210816195818.55667402c25ad5ec0f2387da@platen-software.de> References: <51b21ccea7ac4e1af1bf93261f6ab3a3dd8f24f6.camel@platen-software.de> <20210816195818.55667402c25ad5ec0f2387da@platen-software.de> Message-ID: <3CDEBA34-5753-4608-942F-154B14F8BC78@gmail.com> On August 16, 2021 5:58:18 PM UTC, Tobias Platen wrote: >On Mon, 16 Aug 2021 18:32:56 +0200 >Tobias Platen wrote: > >> today: continuing where I left two weeks ago >this includes fixing the renamed symbols. I get an AttributeError in >the store function: > store function of which file? >def store(dut, src1, src2, src3, imm, imm_ok=True, update=False, > byterev=True): > #orig yield dut.oper_i.update.eq(update) > yield dut.oper_i.imm_data.data.eq(imm) > yield dut.oper_i.imm_data.ok.eq(imm_ok) > #error here: yield dut.oper_i.update.eq(update) > #AttributeError: Record 'oper_i_None' does not have a field 'update'. this *might* now be ldst_mode, i think. but it is not a True/False it is an enum. you'll need to check the Record. recursive grep ldst_mode. l. From programmerjake at gmail.com Mon Aug 16 19:41:51 2021 From: programmerjake at gmail.com (Jacob Lifshay) Date: Mon, 16 Aug 2021 11:41:51 -0700 Subject: [Libre-soc-dev] General Introduction In-Reply-To: References: Message-ID: On Mon, Aug 16, 2021, 11:19 Adithya Gopan wrote: > Hello everyone, > > I am Adithya Gopan from Kerala, India. I am a third year BTech student > majoring in Electrical Engineering, studying in Indian Institute of > Technology Madras. > Welcome! I am currently doing a project at Object Automation. I am very excited to > join this libre-soc-dev mailing list and I am very excited to join the > libre-soc-dev mailing list. I hope to have a really cool and interesting ride. > :) Jacob Lifshay From luke.leighton at gmail.com Mon Aug 16 19:46:37 2021 From: luke.leighton at gmail.com (lkcl) Date: Mon, 16 Aug 2021 18:46:37 +0000 Subject: [Libre-soc-dev] General Introduction In-Reply-To: References: Message-ID: <29B147EC-B882-4F96-8939-2E7CF62239EB@gmail.com> On August 16, 2021 6:18:56 PM UTC, Adithya Gopan wrote: >Hello everyone, > >I am Adithya Gopan from Kerala, India. I am a third year BTech student >majoring in Electrical Engineering, studying in Indian Institute of >Technology Madras. >I am currently doing a project at Object Automation. I am very excited >to >join this libre-soc-dev mailing list and I am very excited to join the >libre-soc-dev mailing list. >I hope to have a really cool and interesting ride. great to hear from you, Adithya. do add yourself to the about us page, i just added a TODO for you https://libre-soc.org/about_us/ just copy what you wrote above, anything else you'd like to add, as well, feel free. also, just as with everyone, review the charter, sny questions ask straight away. best, l. From arjun at object-automation.com Mon Aug 16 20:16:43 2021 From: arjun at object-automation.com (arjun at object-automation.com) Date: Mon, 16 Aug 2021 15:16:43 -0400 Subject: [Libre-soc-dev] Hello from Arjun Nag Message-ID: Hello Everyone, Glad to get connected with the members of Libre soc team. Thanks & regards Arjun From lkcl at lkcl.net Mon Aug 16 20:47:22 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Mon, 16 Aug 2021 20:47:22 +0100 Subject: [Libre-soc-dev] Hello from Arjun Nag In-Reply-To: References: Message-ID: On Mon, Aug 16, 2021 at 8:17 PM wrote: > Hello Everyone, > > Glad to get connected with the members of Libre soc team. > :) looks like you got onto #libre-soc IRC channel, but left just after saying hello: https://libre-soc.org/irclog/%23libre-soc.2021-08-16.log.html#t2021-08-16T20:07:25 you can see from the logs that i said hello back, but you'd already left by that point. this is why i recommended using bnc4you (a persistent IRC proxy) or just leave the irc client active 24x7. l. From programmerjake at gmail.com Mon Aug 16 20:51:57 2021 From: programmerjake at gmail.com (Jacob Lifshay) Date: Mon, 16 Aug 2021 12:51:57 -0700 Subject: [Libre-soc-dev] Hello from Arjun Nag In-Reply-To: References: Message-ID: On Mon, Aug 16, 2021, 12:17 wrote: > Hello Everyone, > Glad to get connected with the members of Libre soc team. > Welcome! Jacob From arjunpartha99 at gmail.com Mon Aug 16 19:36:42 2021 From: arjunpartha99 at gmail.com (Arjun Nag) Date: Tue, 17 Aug 2021 00:06:42 +0530 Subject: [Libre-soc-dev] Arjun Nag Message-ID: Hello everyone, I am here with a quick intro... • Design Verification Engineer in the field of VLSI • Hands on Experience in System Verilog, Verilog & VHDL. • Good Knowledge of UVM (Universal Verification Methodology). • Good understanding of SoC level Test bench Architecture • Good Hands-on at RTL - C Co simulation on HLS tool • Well versed in creating the OVC’s and UVC’s • Good understanding of Processor boot flows • Good understanding of Verification flow at SoC level • Good Knowledge of FPGA, ASIC & SoC Design & Verification Life Cycles. • Excellent understanding of protocols like PCIe (Gen 2 & 3), UART, SPI and Compression/decompression Engine, QSPI, AMBA AXI, SMB & HTP • Good Knowledge in both functional and gate level simulations. • Possess hands on experience in Emulating complex SoC designs, Interface Build-up and Debugging. • Good understanding of Design Partitioning and Trimming. • Hands-on Working experience of Mentor Veloce Quattro 2 & MAXIMUS Emulators. • Mentor Veloce Quattro 2 based Emulation setup and chip compile/run hands on experience • Experience in Emulating Complex memory controllers and functional IP blocks in In Circuit Emulation (ICE) and TBX Modes. • Good understanding of differences between RTL and X-RTL. • Well versed in setting up software top and hardware top in a Top Level Emulation test bench. • Have good knowledge on protocols like PCIe Gen 2, UART, SPI and Compression/decompression Engine. • Excellent debugging skills • Excellent Documentation skills • Knowledge of Configuration Management tools like Tele logic DOORS and CM Synergy tools. • Fair Knowledge of IBM rational Tools like Clear Case and Clear Quest • Familiar with concepts of C Language and Object Oriented Methodology. • Willing to learn new skills and ability to learn fast. • Technical support to team members From lkcl at lkcl.net Mon Aug 16 21:03:46 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Mon, 16 Aug 2021 21:03:46 +0100 Subject: [Libre-soc-dev] Arjun Nag In-Reply-To: References: Message-ID: On Mon, Aug 16, 2021 at 9:01 PM Arjun Nag wrote: > Hello everyone, > I am here with a quick intro... > cool! you sent this from the gmail account, it went to moderation, so i approved it and have added you to be able to send in future without moderation. if you prefer, subscribe the gmail account as well. • Design Verification Engineer in the field of VLSI > • Hands on Experience in System Verilog, Verilog & VHDL. > these are fantastic skills to have, delighted to have you on board. best, l. From libre-soc at platen-software.de Tue Aug 17 18:54:23 2021 From: libre-soc at platen-software.de (Tobias Platen) Date: Tue, 17 Aug 2021 19:54:23 +0200 Subject: [Libre-soc-dev] daily kan-ban update 17aug2021 Message-ID: today: more work on dcbz testcase From niranjan at object-automation.com Tue Aug 17 19:10:55 2021 From: niranjan at object-automation.com (niranjan at object-automation.com) Date: Tue, 17 Aug 2021 14:10:55 -0400 Subject: [Libre-soc-dev] General Introduction In-Reply-To: <387E632F-AD18-4A95-93CA-DDD510E19216@gmail.com> References: <387E632F-AD18-4A95-93CA-DDD510E19216@gmail.com> Message-ID: > > > On August 16, 2021 3:57:48 PM UTC, niranjan at object-automation.com wrote: >>Hello all, >> >>I am Niranjan, from Kerala, India. I am a third year B.Tech student at >>Indian Institute of Technology Madras (IITM), doing a project at Object >>Automation Software Solutions Pvt Ltd. >>I'm happy to join the libre-soc-dev mailing list. > > fantastic, great to hear from you, welcome. > > have you reviewed the Charter and are you happy to abide by it? > http://libre-soc.org/charter any questions about it feel free to ask. > > also do edit the wiki page and add yourself to > http://libre-soc.org/about_us. Thank you, I have gone through the charter and would be happy to abide by it. I have also added myself to the wiki page and sent you an email with my ssh key. Thanks and regards. From luke.leighton at gmail.com Tue Aug 17 19:22:53 2021 From: luke.leighton at gmail.com (lkcl) Date: Tue, 17 Aug 2021 18:22:53 +0000 Subject: [Libre-soc-dev] General Introduction In-Reply-To: References: <387E632F-AD18-4A95-93CA-DDD510E19216@gmail.com> Message-ID: <467FCC64-2611-4791-9159-FEB283015AFB@gmail.com> On August 17, 2021 6:10:55 PM UTC, niranjan at object-automation.com wrote: >Thank you, I have gone through the charter and would be happy to abide >by >it. I have also added myself to the wiki page and sent you an email >with >my ssh key. brilliant, will add it shortly once i receive it btw, you remember what i said about "trim context"? you notice how i cut everything above? this you have to do manually. gmail is *not* your friend, here. they have this stupid thing with the 3 dots, which hides reply context. you *must* expand out the *full* message then edit out anything irrelevant, ok? it's explained in HDL_workflow. again, this is just standard netiquette of 30 years standing, for technical mailing lists. l. From libre-soc at platen-software.de Tue Aug 17 19:40:49 2021 From: libre-soc at platen-software.de (Tobias Platen) Date: Tue, 17 Aug 2021 20:40:49 +0200 Subject: [Libre-soc-dev] daily kan-ban update 17aug2021 In-Reply-To: References: Message-ID: <20210817204049.aae88eeee5acc419dd6a4e84@platen-software.de> On Tue, 17 Aug 2021 19:54:23 +0200 Tobias Platen wrote: > today: more work on dcbz testcase found two bugs in src/soc/experiment/compldst_multi.py, the first one is fixed, the second one is more complex > > _______________________________________________ > Libre-soc-dev mailing list > Libre-soc-dev at lists.libre-soc.org > http://lists.libre-soc.org/mailman/listinfo/libre-soc-dev -- Tobias Platen From niranjan at object-automation.com Tue Aug 17 20:15:45 2021 From: niranjan at object-automation.com (niranjan at object-automation.com) Date: Tue, 17 Aug 2021 15:15:45 -0400 Subject: [Libre-soc-dev] General Introduction In-Reply-To: <467FCC64-2611-4791-9159-FEB283015AFB@gmail.com> References: <387E632F-AD18-4A95-93CA-DDD510E19216@gmail.com> <467FCC64-2611-4791-9159-FEB283015AFB@gmail.com> Message-ID: <5e0bb24a441b79bed6f164d89661a393.squirrel@email.powweb.com> > btw, you remember what i said about "trim context"? you notice how i cut > everything above? this you have to do manually. > you *must* expand out the *full* message then edit out anything > irrelevant, ok? Sorry, I will keep that in mind and do so from now. Thank you. From niranjan at object-automation.com Tue Aug 17 20:15:44 2021 From: niranjan at object-automation.com (niranjan at object-automation.com) Date: Tue, 17 Aug 2021 15:15:44 -0400 Subject: [Libre-soc-dev] General Introduction In-Reply-To: <467FCC64-2611-4791-9159-FEB283015AFB@gmail.com> References: <387E632F-AD18-4A95-93CA-DDD510E19216@gmail.com> <467FCC64-2611-4791-9159-FEB283015AFB@gmail.com> Message-ID: > btw, you remember what i said about "trim context"? you notice how i cut > everything above? this you have to do manually. > you *must* expand out the *full* message then edit out anything > irrelevant, ok? Sorry, I will keep that in mind and do so from now. Thank you. From lkcl at lkcl.net Tue Aug 17 21:11:54 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Tue, 17 Aug 2021 21:11:54 +0100 Subject: [Libre-soc-dev] daily kan-ban update 17aug2021 In-Reply-To: <20210817204049.aae88eeee5acc419dd6a4e84@platen-software.de> References: <20210817204049.aae88eeee5acc419dd6a4e84@platen-software.de> Message-ID: On Tue, Aug 17, 2021 at 7:40 PM Tobias Platen wrote: > On Tue, 17 Aug 2021 19:54:23 +0200 > Tobias Platen wrote: > > > today: more work on dcbz testcase > found two bugs in src/soc/experiment/compldst_multi.py, > the first one is fixed, nice. the second one is more complex > Cesar is looking at LD/ST exceptions at the moment, be aware of that, you are both working on the same code. i will run the full test_issuer.py to make sure all's good. l. From luke.leighton at gmail.com Wed Aug 18 01:20:21 2021 From: luke.leighton at gmail.com (lkcl) Date: Wed, 18 Aug 2021 00:20:21 +0000 Subject: [Libre-soc-dev] General Introduction In-Reply-To: References: <387E632F-AD18-4A95-93CA-DDD510E19216@gmail.com> <467FCC64-2611-4791-9159-FEB283015AFB@gmail.com> Message-ID: <2FA8C2ED-49E3-478D-8AC6-34AE63804C4A@gmail.com> On August 17, 2021 7:15:44 PM UTC, niranjan at object-automation.com wrote: >> you *must* expand out the *full* message then edit out anything >> irrelevant, ok? > >Sorry, I will keep that in mind and do so from now. hooray, you did it this time :) however (and, again, this is in standard mailing list netiquette of 30 years), generally the "attribution" is left in. the "on date x, y wrote". this is so it can be seen who wrote what, and therefore nobody tries to claim ownership of your work. l. From luke.leighton at gmail.com Wed Aug 18 11:58:46 2021 From: luke.leighton at gmail.com (lkcl) Date: Wed, 18 Aug 2021 11:58:46 +0100 Subject: [Libre-soc-dev] General Introduction In-Reply-To: References: <387E632F-AD18-4A95-93CA-DDD510E19216@gmail.com> Message-ID: On Tue, Aug 17, 2021 at 7:11 PM wrote: > Thank you, I have gone through the charter and would be happy to abide by > it. I have also added myself to the wiki page and sent you an email with > my ssh key. i took a look, i haven't received it (not in spam). do reply instead with it as an attachment to the list. public ssh keys (id_rsa.pub) are perfectly fine to distribute: the *private* key you absolutely must keep secret. l. From lkcl at lkcl.net Wed Aug 18 12:03:11 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Wed, 18 Aug 2021 12:03:11 +0100 Subject: [Libre-soc-dev] General Introduction In-Reply-To: <56454d111c41b22063c827c0a0705d42.squirrel@email.powweb.com> References: <56454d111c41b22063c827c0a0705d42.squirrel@email.powweb.com> Message-ID: On Mon, Aug 16, 2021 at 6:53 PM wrote: > > Hi, > > I have updated the "About Us" page. > I am also sending my public ssh key. gautham, i've added your ssh key, you should now be able to move on to the next phase: https://libre-soc.org/HDL_workflow/#gitolite3_access l. From niranjan at object-automation.com Wed Aug 18 15:38:17 2021 From: niranjan at object-automation.com (niranjan at object-automation.com) Date: Wed, 18 Aug 2021 10:38:17 -0400 Subject: [Libre-soc-dev] General Introduction In-Reply-To: References: <387E632F-AD18-4A95-93CA-DDD510E19216@gmail.com> Message-ID: <34c6a677c472baaaefcd22969a1ee413.squirrel@email.powweb.com> > i took a look, i haven't received it (not in spam). do reply instead with > it as an attachment to the list. public ssh keys (id_rsa.pub) are > perfectly > fine to distribute: the *private* key you absolutely must keep secret. I am attaching the public key herewith. Thanks and regards. From luke.leighton at gmail.com Wed Aug 18 16:30:33 2021 From: luke.leighton at gmail.com (lkcl) Date: Wed, 18 Aug 2021 16:30:33 +0100 Subject: [Libre-soc-dev] General Introduction In-Reply-To: <34c6a677c472baaaefcd22969a1ee413.squirrel@email.powweb.com> References: <387E632F-AD18-4A95-93CA-DDD510E19216@gmail.com> <34c6a677c472baaaefcd22969a1ee413.squirrel@email.powweb.com> Message-ID: On Wed, Aug 18, 2021 at 3:39 PM wrote: > > > i took a look, i haven't received it (not in spam). do reply instead with > > it as an attachment to the list. public ssh keys (id_rsa.pub) are > > perfectly > > fine to distribute: the *private* key you absolutely must keep secret. > > I am attaching the public key herewith. excellent, i've added it, you should now (like gautham) be able to move to the test access phase: https://libre-soc.org/HDL_workflow/#gitolite3_access as i said to gautham, don't type a password! it is guaranteed to fail. *read the whole section carefully*, multiple times. i've given you both write-access to the ikiwiki, you should be able to clone it with: git clone ssh://gitolite3 at git.libre-soc.org:922/libreriscv.git l. From luke.leighton at gmail.com Wed Aug 18 17:14:50 2021 From: luke.leighton at gmail.com (lkcl) Date: Wed, 18 Aug 2021 16:14:50 +0000 Subject: [Libre-soc-dev] [RFC] SVP64 Vertical-First Mode loops Message-ID: https://youtu.be/fn2KJvWyBKg whilst doing the video above i encountered a design flaw in Vertical-First batching which needs fixing. Vertical-First is important for scenarios where even with 128 registers there is still not enough space to Vectorise all input, output and temporary regs in a given loop, if done Horizontally. a solution is to have *most* of the input, temp regs and output as Vectorised but some of it be scalars, of course the priority being first on temp regs to be scalars. one crucial example here is the DCT cosine values, which are quite a big table (O N log N) and therefore take up considerably more registers if done as Horizontal Mode. instead of pre-calculating the entire table, which itself results in considerably more LDs, and in strip-mining of the L1 Cache, Vertical-First Mode allows each cosine value to be calculated *on demand* as a scalar element, for a SPECIFIC src/dststep at the EXACT moment it is needed. what i wanted to also allow is *batches* of such scalar values to be calculated, but i realise as i an writing this, the concept "batch" and "scalar" are mutually incompatible by definition. any ideas? l. From adigopzz3 at gmail.com Wed Aug 18 17:34:14 2021 From: adigopzz3 at gmail.com (Adithya Gopan) Date: Wed, 18 Aug 2021 22:04:14 +0530 Subject: [Libre-soc-dev] General Introduction In-Reply-To: <29B147EC-B882-4F96-8939-2E7CF62239EB@gmail.com> References: <29B147EC-B882-4F96-8939-2E7CF62239EB@gmail.com> Message-ID: On Tue, Aug 17, 2021 at 12:16 AM lkcl wrote: > > > On August 16, 2021 6:18:56 PM UTC, Adithya Gopan > wrote: > >Hello everyone, > > > >I am Adithya Gopan from Kerala, India. > > great to hear from you, Adithya. > do add yourself to the about us page, i just added a TODO for you > https://libre-soc.org/about_us/ just copy what you wrote above, anything > else you'd like to add, as well, feel free. > > also, just as with everyone, review the charter, sny questions ask > straight away. > > > Yah, I am Adithya. I kinda forgot to mention that I did read the charter and do agree to abide by it. Also, I have sent my public ssh key to Luke. I have also updated the wiki page. From luke.leighton at gmail.com Wed Aug 18 17:49:42 2021 From: luke.leighton at gmail.com (lkcl) Date: Wed, 18 Aug 2021 16:49:42 +0000 Subject: [Libre-soc-dev] General Introduction In-Reply-To: References: <29B147EC-B882-4F96-8939-2E7CF62239EB@gmail.com> Message-ID: <6F6D57C4-2CB6-429A-ACD3-C413F3788A4A@gmail.com> On August 18, 2021 4:34:14 PM UTC, Adithya Gopan wrote: >> Yah, I am Adithya. I kinda forgot to mention that I did read the >charter >and do agree to abide by it. >Also, I have sent my public ssh key to Luke. I have also updated the >wiki >page. fantastic give it 5 mins then you can also check ssh access, i will add you to the wiki ssh access. l. From programmerjake at gmail.com Wed Aug 18 17:53:26 2021 From: programmerjake at gmail.com (Jacob Lifshay) Date: Wed, 18 Aug 2021 09:53:26 -0700 Subject: [Libre-soc-dev] [RFC] SVP64 Vertical-First Mode loops In-Reply-To: References: Message-ID: On Wed, Aug 18, 2021, 09:15 lkcl wrote: > instead of pre-calculating the entire table, which itself results in > considerably more LDs, and in strip-mining of the L1 Cache, Vertical-First > Mode allows each cosine value to be calculated *on demand* as a scalar > element, for a SPECIFIC src/dststep at the EXACT moment it is needed. > Even if we get a HW cos pipeline, it will almost always be much faster to load the constant from memory...additionally some codecs may specify using specific rounded values of cos (for repeatability across implementations) and we have to use those exact values, not recalculate our own. Jacob From lkcl at lkcl.net Wed Aug 18 18:08:09 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Wed, 18 Aug 2021 18:08:09 +0100 Subject: [Libre-soc-dev] [RFC] SVP64 Vertical-First Mode loops In-Reply-To: References: Message-ID: On Wed, Aug 18, 2021 at 5:53 PM Jacob Lifshay wrote: > Even if we get a HW cos pipeline, it will almost always be much faster to > load the constant from memory... turns out from an analysis by Mitch Alsup that this is a mistaken assumption. i also believed it to be true until he explained it on comp.arch. for particularly large DCTs (used in ffmpeg) the regularity of the LDs results in regular power-of-two hammering of L1 cache lines so badly that it results in the *L2* cache getting hammered as well. extreme large DCTs and FFTs, you end up strip-mining the L2 cache *as well*. under these circumstances it is imperative to reduce the amount of LDs and computing the cos values on-demand *significantly* speeds up performance by reducing the total number of LDs, compared to assuming that pre-computed tables are "always the best option under all circumstances". l. From libre-soc at platen-software.de Wed Aug 18 19:27:38 2021 From: libre-soc at platen-software.de (Tobias Platen) Date: Wed, 18 Aug 2021 20:27:38 +0200 Subject: [Libre-soc-dev] daily kan-ban update 18aug2021 Message-ID: <725ab3e2836eb96975bd679397a49d88d670c051.camel@platen-software.de> today: debugging src/soc/experiment/compldst_multi.py From luke.leighton at gmail.com Wed Aug 18 20:06:29 2021 From: luke.leighton at gmail.com (lkcl) Date: Wed, 18 Aug 2021 19:06:29 +0000 Subject: [Libre-soc-dev] [RFC] SVP64 Vertical-First Mode loops In-Reply-To: References: Message-ID: <6B3C8E17-A666-40F2-8F48-37042C839EA3@gmail.com> On August 18, 2021 5:08:09 PM UTC, Luke Kenneth Casson Leighton wrote: >extreme large DCTs and FFTs, you end up strip-mining the L2 cache *as >well*. basically, to do large DCT / FFT recursively, you split into two halves, do each half at half the DCT/FFT size, then recombine the results. the further down the recursion depth you first get offsets of 2 for every element, then 4, then 8 etc etc. by the time you get to an offset of 64 you've hit the L1 cache row size, and thereafter EVERY SINGLE LD/ST for the ENTIRE sub-FFT/DCT hits the EXACT same cache line. whoops. Mitch pointed out very plainly and simply that using a reasonably efficient pipelined cos implementation is therefore way faster than hammering L1 and L2 even more than they already are. so for this one example alone it justifies VF Mode's existence. having thought it through i don't think it's going to be possible to do batches. only one element (one srcstep, one dststep) at a time. which in turn makes the VFHint field also kinda unnecessary. l. From richard.wilbur at gmail.com Wed Aug 18 23:02:49 2021 From: richard.wilbur at gmail.com (Richard Wilbur) Date: Wed, 18 Aug 2021 16:02:49 -0600 Subject: [Libre-soc-dev] [RFC] SVP64 Vertical-First Mode loops Message-ID: <01E4886F-B4F7-4311-9CEB-C7BE4D14E69C@gmail.com> On Aug 18, 2021, at 13:06, lkcl wrote: > basically, to do large DCT / FFT recursively, you split into two halves, do each half at half the DCT/FFT size, then recombine the results. Each half could use the same scalar coefficients. Seems for a particular size data set that if we are doing recursive sizes of transforms to compute the transforms. If they are always related by powers of two then one time calculating the coefficients should be sufficient if we could calculate them and store them either in the order they are used (in a non-destructive FIFO with capability to set a step size) or with an easy scheme to access them via an index, we might at once calculate the coefficients using our vector engine and then use them on each of the subdivisions of the transform below a certain size—avoiding hammering cache and external memory for the coefficients in the process! […] > which in turn makes the VFHint field also kinda unnecessary. If we had such a coefficient cache, I think VFHint could still be useful. From luke.leighton at gmail.com Wed Aug 18 23:16:20 2021 From: luke.leighton at gmail.com (lkcl) Date: Wed, 18 Aug 2021 22:16:20 +0000 Subject: [Libre-soc-dev] [RFC] SVP64 Vertical-First Mode loops In-Reply-To: <01E4886F-B4F7-4311-9CEB-C7BE4D14E69C@gmail.com> References: <01E4886F-B4F7-4311-9CEB-C7BE4D14E69C@gmail.com> Message-ID: On August 18, 2021 10:02:49 PM UTC, Richard Wilbur wrote: >On Aug 18, 2021, at 13:06, lkcl wrote: >> basically, to do large DCT / FFT recursively, you split into two >halves, do each half at half the DCT/FFT size, then recombine the >results. > >Each half could use the same scalar coefficients. could... but remember: FFT of size N you need N coefficients. now you can only hold in regfile half an FFT as if you did it with Vertical-First Mode for DCT it is *N ln N* coefficients needed for a DCT of size N. DCT of size 32 needs 32+16+8+4+2+1 registers for the COS coefficients! we just used the ENTIRE regfile! or... you can use only 1/2 the regfile and do a 64-wide DCT > Seems for a >particular size data set that if we are doing recursive sizes of >transforms to compute the transforms. If they are always related by >powers of two then one time calculating the coefficients should be >sufficient if we could calculate them and store them either in the >order they are used (in a non-destructive FIFO with capability to set a >step size) or with an easy scheme to access them via an index, we might >at once calculate the coefficients using our vector engine and then use DCT unfortunately doesn't work that way. in order to complete all butterflies you need, in each row, cos((i+0.5)/n) from i=0..n-1 where n goes up in powers of two per butterfly row. you can share those values *in* a row but unlike an FFT you cannot *reuse* them on a *different* row due to the +0.5 >If we had such a coefficient cache, I think VFHint could still be >useful. interesting idea, to have a special separate cache for coefficients. it is however pretty specialist. if it really becomes really a focus for performance it's worth pursuing. right now issuing cos instructions is "generic". specialist single-purpose instructions make me twitchy. for 3D texture interpolation it's fine / great / obvious payoff. l. From programmerjake at gmail.com Thu Aug 19 00:31:24 2021 From: programmerjake at gmail.com (Jacob Lifshay) Date: Wed, 18 Aug 2021 16:31:24 -0700 Subject: [Libre-soc-dev] [RFC] SVP64 Vertical-First Mode loops In-Reply-To: References: <01E4886F-B4F7-4311-9CEB-C7BE4D14E69C@gmail.com> Message-ID: On Wed, Aug 18, 2021, 15:17 lkcl wrote: > > > On August 18, 2021 10:02:49 PM UTC, Richard Wilbur < > richard.wilbur at gmail.com> wrote: > >On Aug 18, 2021, at 13:06, lkcl wrote: > >> basically, to do large DCT / FFT recursively, you split into two > >halves, do each half at half the DCT/FFT size, then recombine the > >results. > > > >Each half could use the same scalar coefficients. > > could... but remember: FFT of size N you need N coefficients. now you can > only hold in regfile half an FFT as if you did it with Vertical-First Mode > > for DCT it is *N ln N* coefficients needed for a DCT of size N. DCT of > size 32 needs 32+16+8+4+2+1 registers for the COS coefficients! > > we just used the ENTIRE regfile! > well, you still need the registers for cos coefficients if you either load them from memory or if you compute them with a cos instruction... Jacob From richard.wilbur at gmail.com Thu Aug 19 00:50:41 2021 From: richard.wilbur at gmail.com (Richard Wilbur) Date: Wed, 18 Aug 2021 17:50:41 -0600 Subject: [Libre-soc-dev] [RFC] SVP64 Vertical-First Mode loops Message-ID: > On Aug 18, 2021, at 16:17, lkcl wrote: > >> On August 18, 2021 10:02:49 PM UTC, Richard Wilbur wrote: > >> Each half [of FFT] could use the same scalar coefficients. > > could... but remember: FFT of size N you need N coefficients. now you can only hold in regfile half an FFT as if you did it with Vertical-First Mode That’s why I proposed a coefficient cache. >> Seems for a >> particular size data set that if we are doing recursive sizes of >> transforms to compute the transforms. If they are always related by >> powers of two then one time calculating the coefficients should be >> sufficient if we could calculate them and store them either in the >> order they are used (in a non-destructive FIFO with capability to set a >> step size) or with an easy scheme to access them via an index, we might >> at once calculate the coefficients using our vector engine and then use > > DCT unfortunately doesn't work that way. in order to complete all butterflies you need, in each row, cos((i+0.5)/n) from i=0..n-1 where n goes up in powers of two per butterfly row. > > you can share those values *in* a row but unlike an FFT you cannot *reuse* them on a *different* row due to the +0.5 That stinks for the DCT. Thanks for reminding me! How much coefficient sharing could be done on a single row? >> If we had such a coefficient cache, I think VFHint could still be >> useful. > > interesting idea, to have a special separate cache for coefficients. it is however pretty specialist. if it really becomes really a focus for performance it's worth pursuing. > > right now issuing cos instructions is "generic". specialist single-purpose instructions make me twitchy. I agree about specialist single-purpose instructions unless we can make a good case for how such instructions would be clearly superior performance-wise for an important algorithm! > for 3D texture interpolation it's fine / great / obvious payoff. If FFT turns out to win handsomely with said coefficient cache, I would suggest we implement a status register for the cache that stores relevant info. like characteristics of the stored coefficients: n, …. Then when we prepare to perform another FFT we can quickly check whether we can reuse the coefficients or need to recalculate them. An MRI (Magnetic Resonance Imaging) workload would likely use a large number of same-sized FFTs to reconstruct the 2-D slice images. From luke.leighton at gmail.com Thu Aug 19 01:13:56 2021 From: luke.leighton at gmail.com (lkcl) Date: Thu, 19 Aug 2021 00:13:56 +0000 Subject: [Libre-soc-dev] [RFC] SVP64 Vertical-First Mode loops In-Reply-To: References: Message-ID: On August 18, 2021 11:50:41 PM UTC, Richard Wilbur wrote: >> you can share those values *in* a row but unlike an FFT you cannot >*reuse* them on a *different* row due to the +0.5 > >That stinks for the DCT. yes it does. > Thanks for reminding me! How much >coefficient sharing could be done on a single row? each row being a butterfly goes double the distance and therefore needs half the coefficients compared to the previous row. * when you have 16 values you need twin mul-add on 8 values and therefore 8 coefficients * next row you have 2 batches of 8 values therefore the 4 coefficients can be used 2x * next row you have 4 batches of 4 values, 2 coefficients can be used 4x * next row 8 batches of 2 values, the 1 coefficient can be used 8x 8 4 2 1 on batch sizes and num coefficients 1 2 4 8 times reuse on coefficients >I agree about specialist single-purpose instructions unless we can make >a good case for how such instructions would be clearly superior >performance-wise for an important algorithm! if the cache was purely transparent i.e. spotted input values and returned a lookup that would need no ISA augmentation. only thing being: multiple lookups of 64 bit FP numbers is one hell of a lot of gates. 10 gates per XOR bit, 640 gates per reg, you want say 32 cache entries, that's 20k gates just in lookups let alone DFFs. that's likely competing with an efficient COS pipeline for gate count >An MRI (Magnetic Resonance Imaging) workload would likely use a large >number of same-sized FFTs to reconstruct the 2-D slice images. FFT you can do a couple of tricks, one of which is a Matrix variant of FFT. * FFT small rows * transpose * FFT small col.. errr now rows again * transpose something like that. if it's a square matrix you can reuse the same coefficients. l. From luke.leighton at gmail.com Thu Aug 19 01:19:58 2021 From: luke.leighton at gmail.com (lkcl) Date: Thu, 19 Aug 2021 00:19:58 +0000 Subject: [Libre-soc-dev] [RFC] SVP64 Vertical-First Mode loops In-Reply-To: References: <01E4886F-B4F7-4311-9CEB-C7BE4D14E69C@gmail.com> Message-ID: On August 18, 2021 11:31:24 PM UTC, Jacob Lifshay wrote: >well, you still need the registers register singular. quantity ONE. > for cos coefficients if you either >load >them from memory or if you compute them with a cos instruction... to reiterate what i've said throughout the whole thread, many times: * one register (QTY ONE) for the cos coefficient in VF Mode vs * (N ln N) registers for HF Mode DCT COS coeffs. this because in Horizontal Mode the *entire* triple-loop butterfly is computed in one single instruction, and there is no other option but to have the entire coefficient set in regs [it is possible to do one row at a time but please let's not complicate the discussion] breakdown: * cost in registers and memory for HF variant: - N ln N registers for cos coefficients - N registers for input - N ln N LDs of coefficients from memory - N LDs for input - N STs for output total: - N + (N ln N) regs - 3N + (N ln N) memory accesses * cost in regs and mem for VF: - ONE scalar reg for cos coeff - N regs for input - ZERO LDs for coeffs - N LDs for input - N STs for output total: - N + 1 regs - 2N memory accesses it is therefore blindingly obvious that when COS can be done efficiently in hardware that it significantly reduces resource utilisation to use VF Mode. this is the total opposite of "normal" processors which often don't even have a hardware COS instruction and consequently the cost of calculating COS far exceeds even the worst strip-mining scenarios. l. From programmerjake at gmail.com Thu Aug 19 02:04:02 2021 From: programmerjake at gmail.com (Jacob Lifshay) Date: Wed, 18 Aug 2021 18:04:02 -0700 Subject: [Libre-soc-dev] [RFC] SVP64 Vertical-First Mode loops In-Reply-To: References: <01E4886F-B4F7-4311-9CEB-C7BE4D14E69C@gmail.com> Message-ID: On Wed, Aug 18, 2021, 17:20 lkcl wrote: > > On August 18, 2021 11:31:24 PM UTC, Jacob Lifshay < > programmerjake at gmail.com> wrote: > > >well, you still need the registers > > register singular. quantity ONE. > i completely spaced that you were talking about vertical-first mode...oops! if you only are running one element per inner loop, it makes me think this won't be any faster than scalar code ... not a good look. I watched the video you made earlier, and it mostly matches what I expected from vertical-first mode. Jacob From richard.wilbur at gmail.com Thu Aug 19 02:46:59 2021 From: richard.wilbur at gmail.com (Richard Wilbur) Date: Wed, 18 Aug 2021 19:46:59 -0600 Subject: [Libre-soc-dev] [RFC] SVP64 Vertical-First Mode loops Message-ID: > On Aug 18, 2021, at 18:13, lkcl wrote: >> On August 18, 2021 11:50:41 PM UTC, Richard Wilbur wrote: >> How much coefficient sharing could be done on a single row? > > each row being a butterfly goes double the distance and therefore needs half the coefficients compared to the previous row. > > * when you have 16 values you need twin mul-add on 8 values and therefore 8 coefficients > * next row you have 2 batches of 8 values therefore the 4 coefficients can be used 2x > * next row you have 4 batches of 4 values, 2 coefficients can be used 4x > * next row 8 batches of 2 values, the 1 coefficient can be used 8x > > 8 4 2 1 on batch sizes and num coefficients > 1 2 4 8 times reuse on coefficients Is this for FFT? Very cool, I suspected it would be pretty good reuse. I wasn’t specific enough when I asked, “How much coefficient reuse in a particular row?” I meant to ask concerning the DCT since it isn’t an option to share coefficients between rows in that algorithm. >> I agree about specialist single-purpose instructions unless we can make >> a good case for how such instructions would be clearly superior >> performance-wise for an important algorithm! > > if the cache was purely transparent i.e. spotted input values and returned a lookup that would need no ISA augmentation. > > only thing being: multiple lookups of 64 bit FP numbers is one hell of a lot of gates. 10 gates per XOR bit, 640 gates per reg, you want say 32 cache entries, that's 20k gates just in lookups let alone DFFs. > > that's likely competing with an efficient COS pipeline for gate count Except that the input numbers are rationals with a common denominator for a particular row in DCT. I think we could effectively store them with a particular structure based on the denominator, indexed with the integer count along the row. More of a coefficient array/RAM than cache (your usage of this term was more loaded than mine, I simply was referring to a convenient place to stow the numbers where we could easily and quickly get them back when needed). >> An MRI (Magnetic Resonance Imaging) workload would likely use a large >> number of same-sized FFTs to reconstruct the 2-D slice images. > > FFT you can do a couple of tricks, one of which is a Matrix variant of FFT. > > * FFT small rows > * transpose > * FFT small col.. errr now rows again > * transpose > > something like that. > > if it's a squa Did you mean to describe the case where the matrix is square? From richard.wilbur at gmail.com Thu Aug 19 03:00:42 2021 From: richard.wilbur at gmail.com (Richard Wilbur) Date: Wed, 18 Aug 2021 20:00:42 -0600 Subject: [Libre-soc-dev] [RFC] SVP64 Vertical-First Mode loops In-Reply-To: References: Message-ID: <632C595C-3E0C-4A1E-9FB6-E208108FFE3A@gmail.com> > On Aug 18, 2021, at 18:20, lkcl wrote: […] > * cost in registers and memory for HF variant: > > - N ln N registers for cos coefficients > - N registers for input > - N ln N LDs of coefficients from memory > - N LDs for input > - N STs for output > > total: > > - N + (N ln N) regs > - 3N + (N ln N) memory accesses by my count that adds to 2N + (N ln N) memory accesses > * cost in regs and mem for VF: > > - ONE scalar reg for cos coeff > - N regs for input > - ZERO LDs for coeffs > - N LDs for input > - N STs for output > > total: > > - N + 1 regs > - 2N memory accesses Hence the excess is around (N ln N) for both registers and memory accesses. Still non-trivial overhead. From abhisheksharma at object-automation.com Thu Aug 19 06:42:32 2021 From: abhisheksharma at object-automation.com (Abhishek Sharma) Date: Thu, 19 Aug 2021 11:12:32 +0530 Subject: [Libre-soc-dev] =?utf-8?q?=28no_subject=29?= Message-ID: Hi All, I am Abhishek Sharma, from Kanpur, the northern part of India. Having skillsets of digital circuit designs, know Python, C. Excited to contribute to this project and work with awesome developers. From sreerekha at object-automation.com Thu Aug 19 06:43:36 2021 From: sreerekha at object-automation.com (sreerekha at object-automation.com) Date: Thu, 19 Aug 2021 01:43:36 -0400 Subject: [Libre-soc-dev] Introduction Message-ID: I am Sree Rekha K P from Karnataka, India. I am a pre final year student and I'm pursuing my BTech in electronics and communication engineering. I am quite good at Verilog, microcontroller 8051, Digital electronics and basics C. From luke.leighton at gmail.com Thu Aug 19 11:43:22 2021 From: luke.leighton at gmail.com (lkcl) Date: Thu, 19 Aug 2021 10:43:22 +0000 Subject: [Libre-soc-dev] [RFC] SVP64 Vertical-First Mode loops In-Reply-To: References: Message-ID: <57B487F3-3B1F-4F36-AD77-E36E08CD7B03@gmail.com> On August 19, 2021 1:46:59 AM UTC, Richard Wilbur wrote: > >> On Aug 18, 2021, at 18:13, lkcl wrote: >> 8 4 2 1 on batch sizes and num coefficients >> 1 2 4 8 times reuse on coefficients > >Is this for FFT? no, DCT. N ln N coefficients actually N/2 ln N > Very cool, I suspected it would be pretty good reuse. it's not. RADIX2 FFT on the other hand there are N coefficients and you *can* reuse them, for row 2 you jump every other coefficient for all 2 sub-crossbars, for row 3 you jump every 4th coefficient for sub-sub-crossbars but they are the same N coefficients. for DCT the same thing happens as far as jumpingbis concerned and in-row reusr but because of the i+0.5 they are NOT THE SAME IN EACH ROW >I wasn’t specific enough when I asked, “How much coefficient reuse in a >particular row?” I meant to ask concerning the DCT since it isn’t an >option to share coefficients between rows in that algorithm. and i answered as per your question. >Except that the input numbers are rationals with a common denominator >for a particular row in DCT. I think we could effectively store them >with a particular structure based on the denominator, indexed with the >integer count along the row. More of a coefficient array/RAM than >cache (your usage of this term was more loaded than mine, I simply was >referring to a convenient place to stow the numbers where we could >easily and quickly get them back when needed). indeed... at the cost of designing and adding yet more instructions, this time with an extremely small probability that they can be put to use elsewhere. the butterfly REMAP schedule is generic and i can foresee it being used elsewhere. >Did you mean to describe the case where the matrix is square? as a special case yes although implementations i've seen try to do at least one dimension as power-two then use bernstein convolution for the other. even power 2 you may end up with e.g. 128 (2^7) which is an odd power 2 i.e. not square breaks into 2^3 x 2^4 l. From akshara at object-automation.com Thu Aug 19 06:43:09 2021 From: akshara at object-automation.com (akshara at object-automation.com) Date: Thu, 19 Aug 2021 01:43:09 -0400 Subject: [Libre-soc-dev] =?utf-8?q?=28no_subject=29?= Message-ID: <88c55b00dfa81e65b82bd819e3a1047e.squirrel@email.powweb.com> I am Akshara, from Karnataka, India. I am a pre final year student and I'm pursuing my BTech in electronics and communication engineering. I have a basic knowledge in verilog, C and python From madan at object-automation.com Thu Aug 19 13:10:52 2021 From: madan at object-automation.com (Madan Kartheessan) Date: Thu, 19 Aug 2021 17:40:52 +0530 Subject: [Libre-soc-dev] =?utf-8?q?MoM_of_Object_Automation_and_Libre-SoC?= =?utf-8?b?4oCUQXVndXN0IDE3?= Message-ID: Luke, David and the Object Automation team Please use the below link to access the MoM of Object Automation and Libre-SoC .https://libre-soc.org/oa/minutes/2021aug17/ Regards Madan K From luke.leighton at gmail.com Thu Aug 19 14:15:01 2021 From: luke.leighton at gmail.com (lkcl) Date: Thu, 19 Aug 2021 13:15:01 +0000 Subject: [Libre-soc-dev] [RFC] SVP64 Vertical-First Mode loops In-Reply-To: References: <01E4886F-B4F7-4311-9CEB-C7BE4D14E69C@gmail.com> Message-ID: On August 19, 2021 1:04:02 AM UTC, Jacob Lifshay wrote: >i completely spaced that you were talking about vertical-first >mode...oops! :v >if you only are running one element per inner loop, it makes me think >this >won't be any faster than scalar code ... not a good look. that's where out-of-order multi-issue comes into play. Mitch Alsup's VVM system is designed around exactly and precisely the Vertical First Vector concept. Mitch has been describing how it works using OoO for 3 years, it just took me 2 to understand it :) the multi-issue engine analyses loops, spots that the element slots in the in-flight data and merges them into the same SIMD Reservation Station (if they are smaller data width than the SIMD ALUs) or just goes with the flow if the data width is the same as the ALU width. thus you have to ensure that the total available RSes is big enough to be able to cover at least the entire loop, preferably 2x or 3x bigger. Mitch points out in many many discussions over the past 3 years that the majority of scenarios and algorithms for which Vertical First can be deployed successfully and parallelism exploited through OoO in-flight "RS stuffing" are in fact short loops. he also points out that when not possible the fallback is simple scalar. and with a Monster Multi-Issue Engine that scalar execution is going to scream along even without parallelism in most situations. if however elwidth overrides are deployed and there are no scalar 8-bit or 16-bit ALUs then, yeah, things run sub-optimally. i can live with that. >I watched the video you made earlier, and it mostly matches what I >expected >from vertical-first mode. yeah, the mistake i made though, you can see i realised it towards the end, is that the VFirst batching has no src/dst-step of its own. basically VFirst Batching would be: for i in srcstep .. srcstep+VFHint-1 whilst also running j on dststep... but also skipping masked-out elements... *then winding back* on the next instruction in order to do srcstep..srcstep+VFHint+1 *again*, and that needs a pair of independent counters (subsrcstep, subdststep) which reset back to srcstep/dststep after each VFirst instruction. then, what do you do with the COS coefficients? they're supposed to be scalar. where do you put the 2nd *scalar* coefficient if the VFirst Batch size is 2, or 3? nowhere, it's impossible. there isn't enough spare space in SVSTATE to add a pair of 7 bit sub-steps in, and to be honest it's getting scarily complex. therefore i think the simplest thing is just to have svstep increment only by ONE, to have the VFirst Mode only do scalar (one element) interaction, and use Mitch's research and OoO multi-issue concepts. l. From luke.leighton at gmail.com Thu Aug 19 14:22:35 2021 From: luke.leighton at gmail.com (lkcl) Date: Thu, 19 Aug 2021 13:22:35 +0000 Subject: [Libre-soc-dev] [RFC] SVP64 Vertical-First Mode loops In-Reply-To: <632C595C-3E0C-4A1E-9FB6-E208108FFE3A@gmail.com> References: <632C595C-3E0C-4A1E-9FB6-E208108FFE3A@gmail.com> Message-ID: <1B05E0B8-032F-465C-9AD2-A25FD3A318E5@gmail.com> On August 19, 2021 2:00:42 AM UTC, Richard Wilbur wrote: >by my count that adds to > 2N + (N ln N) memory accesses doh >Hence the excess is around (N ln N) for both registers and memory >accesses. Still non-trivial overhead. yes. when strip-mining occurs it is as if you had no L1 cache at all: LD/ST runs 3-5 times slower. even bigger, you strip-mine L2 as well and that ends up 8-10x slower. i had an idea here which might help: a L1 cache hint which swaps over to using the next 6 LSBs for cache line lookup: cache_row_num = MUX(Hint, ADDRESS[6..11], ADDRESS[0..5] however you'd need to do a full cache flush to swap modes, so you'd better be damn sure you want to do that. l. From aksharas260 at gmail.com Thu Aug 19 06:38:45 2021 From: aksharas260 at gmail.com (Akshara S) Date: Thu, 19 Aug 2021 11:08:45 +0530 Subject: [Libre-soc-dev] =?utf-8?q?=28no_subject=29?= Message-ID: I am Akshara, from Karnataka, India. I am a pre final year student and I'm pursuing my BTech in electronics and communication engineering. I have a basic knowledge in verilog, C and python. From luke.leighton at gmail.com Thu Aug 19 15:44:38 2021 From: luke.leighton at gmail.com (lkcl) Date: Thu, 19 Aug 2021 14:44:38 +0000 Subject: [Libre-soc-dev] [RFC] SVP64 Vertical-First Mode loops In-Reply-To: <1B05E0B8-032F-465C-9AD2-A25FD3A318E5@gmail.com> References: <632C595C-3E0C-4A1E-9FB6-E208108FFE3A@gmail.com> <1B05E0B8-032F-465C-9AD2-A25FD3A318E5@gmail.com> Message-ID: one thing that does get expensive in Vertical-First Mode: predicated execution. for CR-based predication, not so much of a problem: the CRs are 4 bit wide, the CR regfile port width will be 8x4=32, so not so many wires. INT predication, the entire 64 bit INT GPR is read multiple times, only 1 bit extracted (1< > On Aug 18, 2021, at 18:13, lkcl wrote: > > each row being a butterfly goes double the distance and therefore needs half the coefficients compared to the previous row. > > * when you have 16 values you need twin mul-add on 8 values and therefore 8 coefficients > * next row you have 2 batches of 8 values therefore the 4 coefficients can be used 2x > * next row you have 4 batches of 4 values, 2 coefficients can be used 4x > * next row 8 batches of 2 values, the 1 coefficient can be used 8x > > 8 4 2 1 on batch sizes and num coefficients > 1 2 4 8 times reuse on coefficients Looks like, for this example, N = 16 values, number of unique coefficients = 8 + 4 + 2 + 1 = 15 = N - 1 That’s interesting. The total number of coefficients = (N / 2) * log(base=2, N) Number of reuses = (N / 2) * log(base=2, N) - (N - 1) = 32 - 15 = 17 From lkcl at lkcl.net Fri Aug 20 00:20:35 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Fri, 20 Aug 2021 00:20:35 +0100 Subject: [Libre-soc-dev] harmful SIMD again Message-ID: https://news.ycombinator.com/item?id=28114934 https://groups.google.com/g/comp.arch/c/GaA6ywyEyoo https://news.ycombinator.com/item?id=28114934 --- crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68 From programmerjake at gmail.com Fri Aug 20 03:07:28 2021 From: programmerjake at gmail.com (Jacob Lifshay) Date: Thu, 19 Aug 2021 19:07:28 -0700 Subject: [Libre-soc-dev] (no subject) In-Reply-To: References: Message-ID: On Thu, Aug 19, 2021 at 7:37 AM Akshara S wrote: > > I am Akshara, from Karnataka, India. Welcome! > I am a pre final year student and I'm > pursuing my BTech in electronics and communication engineering. I have a > basic knowledge in verilog, C and python. Neat! Jacob Lifshay From programmerjake at gmail.com Fri Aug 20 03:09:10 2021 From: programmerjake at gmail.com (Jacob Lifshay) Date: Thu, 19 Aug 2021 19:09:10 -0700 Subject: [Libre-soc-dev] (no subject) In-Reply-To: References: Message-ID: On Wed, Aug 18, 2021 at 10:43 PM Abhishek Sharma wrote: > > Hi All, > > I am Abhishek Sharma, from Kanpur, the northern part of India. Welcome! > Having > skillsets of digital circuit designs, know Python, C. Excited to contribute > to this project and work with awesome developers. Glad to have more people interested! Jacob From programmerjake at gmail.com Fri Aug 20 03:10:18 2021 From: programmerjake at gmail.com (Jacob Lifshay) Date: Thu, 19 Aug 2021 19:10:18 -0700 Subject: [Libre-soc-dev] Introduction In-Reply-To: References: Message-ID: On Wed, Aug 18, 2021 at 10:44 PM wrote: > > I am Sree Rekha K P from Karnataka, India. Welcome! > I am a pre final year student > and I'm pursuing my BTech in electronics and communication engineering. I > am quite good at Verilog, microcontroller 8051, Digital electronics and > basics C. Neat! Jacob From programmerjake at gmail.com Fri Aug 20 03:19:57 2021 From: programmerjake at gmail.com (Jacob Lifshay) Date: Thu, 19 Aug 2021 19:19:57 -0700 Subject: [Libre-soc-dev] [RFC] SVP64 Vertical-First Mode loops In-Reply-To: References: <01E4886F-B4F7-4311-9CEB-C7BE4D14E69C@gmail.com> Message-ID: On Thu, Aug 19, 2021 at 6:15 AM lkcl wrote: > > > > On August 19, 2021 1:04:02 AM UTC, Jacob Lifshay wrote: > > >i completely spaced that you were talking about vertical-first > >mode...oops! > > :v > > >if you only are running one element per inner loop, it makes me think > >this > >won't be any faster than scalar code ... not a good look. > > that's where out-of-order multi-issue comes into play. But you still have the issue of needing to re-fetch and re-decode the loop, and predict the branch (usually you can only predict one branch per cycle, unless you have an absolutely monster core), making it not much faster or power-efficient than a standard OoO processor executing a scalar loop. SV is faster precisely because the fetch/decode pipe stops and sends a firehose of usually pre-simd-packed ops at the execution units. Seems like we're throwing away our advantage... Jacob From programmerjake at gmail.com Fri Aug 20 03:34:16 2021 From: programmerjake at gmail.com (Jacob Lifshay) Date: Thu, 19 Aug 2021 19:34:16 -0700 Subject: [Libre-soc-dev] harmful SIMD again In-Reply-To: References: Message-ID: On Thu, Aug 19, 2021 at 4:21 PM Luke Kenneth Casson Leighton wrote: > > https://news.ycombinator.com/item?id=28114934 > https://groups.google.com/g/comp.arch/c/GaA6ywyEyoo > https://news.ycombinator.com/item?id=28114934 duplicate link :) I commented on the ycombinator thread about Rust's project-portable-simd, which I'm helping implement. We're figuring out fixed-length vectors first (since they're simpler and waay more common), we'll add support for variable-length vectors later. Jacob From madan at object-automation.com Fri Aug 20 06:26:38 2021 From: madan at object-automation.com (Madan Kartheessan) Date: Fri, 20 Aug 2021 10:56:38 +0530 Subject: [Libre-soc-dev] Charter and About-us page Message-ID: I believe 100% of the first point of the Charter and follow it, which is "Always do good". I have read the Charter and accept it. I have also filled in my details in the "libre-soc.org/about-us" page Regards Madan K. From sparsha at object-automation.com Fri Aug 20 06:45:32 2021 From: sparsha at object-automation.com (sparsha at object-automation.com) Date: Fri, 20 Aug 2021 01:45:32 -0400 Subject: [Libre-soc-dev] =?utf-8?q?=28no_subject=29?= Message-ID: Hi! I'm Sparsha from Bangalore,Karnataka, India. I'm currently doing my 4th year B.tech in Electronics and Communication. I've a knowelge in Verilog and C programming. I'm happy to work with you all. From lkcl at lkcl.net Fri Aug 20 09:48:52 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Fri, 20 Aug 2021 09:48:52 +0100 Subject: [Libre-soc-dev] harmful SIMD again In-Reply-To: References: Message-ID: --- crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68 On Fri, Aug 20, 2021 at 3:34 AM Jacob Lifshay wrote: > > On Thu, Aug 19, 2021 at 4:21 PM Luke Kenneth Casson Leighton > wrote: > > > > https://news.ycombinator.com/item?id=28114934 > > https://groups.google.com/g/comp.arch/c/GaA6ywyEyoo > > https://news.ycombinator.com/item?id=28114934 > > duplicate link :) yeah i meant to put this one https://www.reddit.com/r/programming/comments/p0yn45/three_fundamental_flaws_of_simd/ > I commented on the ycombinator thread about Rust's > project-portable-simd, which I'm helping implement. We're figuring out > fixed-length vectors first (since they're simpler and waay more > common), we'll add support for variable-length vectors later. that'll be interesting. SVE2 is already in llvm so the concept of variable-length vectors is alread in the LLVM-IR, in a basic form. l. From lkcl at lkcl.net Fri Aug 20 09:51:19 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Fri, 20 Aug 2021 09:51:19 +0100 Subject: [Libre-soc-dev] Charter and About-us page In-Reply-To: References: Message-ID: --- crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68 On Fri, Aug 20, 2021 at 6:42 AM Madan Kartheessan wrote: > > I believe 100% of the first point of the Charter and follow it, which is > "Always do good". I have read the Charter and accept it. > > I have also filled in my details in the "libre-soc.org/about-us" page fantastic, Madan, do keep moving through the steps, don't wait for me (this goes for everyone else, too), now you can follow these instructions https://libre-soc.org/HDL_workflow/#gitolite3_access send me an ssh public key. l. From luke.leighton at gmail.com Fri Aug 20 10:11:34 2021 From: luke.leighton at gmail.com (lkcl) Date: Fri, 20 Aug 2021 10:11:34 +0100 Subject: [Libre-soc-dev] [RFC] SVP64 Vertical-First Mode loops In-Reply-To: References: <01E4886F-B4F7-4311-9CEB-C7BE4D14E69C@gmail.com> Message-ID: On Fri, Aug 20, 2021 at 3:20 AM Jacob Lifshay wrote: > > >if you only are running one element per inner loop, it makes me think > > >this > > >won't be any faster than scalar code ... not a good look. > > > > that's where out-of-order multi-issue comes into play. > > But you still have the issue of needing to re-fetch and re-decode the > loop, and predict the branch (usually you can only predict one branch > per cycle, unless you have an absolutely monster core), the demarcation in Mitch's VVM is done with a pair of instructions (start-loop, end-loop) where start-loop indicates which register is to be the "loop counter". consequently, like all Zero-Overhead Loop ISAs there is zero branch prediction miss. this is why i am adding CTR mode to SVP64 Branches. > making it not > much faster or power-efficient than a standard OoO processor executing > a scalar loop. SV is faster precisely because the fetch/decode pipe > stops and sends a firehose of usually pre-simd-packed ops at the > execution units. Seems like we're throwing away our advantage... no, not at all. there's nothing to stop you from doing the Cray-style Horizontal-First. if however you try that with DCT it requires the extra registers. Vertical-First Mode is actually a hell of a lot easier to understand and teach compilers about. l. From lkcl at lkcl.net Fri Aug 20 12:23:29 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Fri, 20 Aug 2021 12:23:29 +0100 Subject: [Libre-soc-dev] harmful SIMD again In-Reply-To: References: Message-ID: https://www.reddit.com/r/programming/comments/p0yn45/three_fundamental_flaws_of_simd/h9ncoft/?utm_source=reddit&utm_medium=web2x&context=3 FUZxxl is advocating the case for SIMD by implementing this algorithm in AVX512, SSE and NEON: https://github.com/clausecker/pospop/blob/master/safe.go i initially made the mistake of thinking it was a straight vectorised popcount, it's not. for count8safe there are 8 accumulators: * accumulator 0 receives the count of the number of bit zeros of ALL the input vector. * accumulator 1 receives the count of the number of bit 1s of the entire input vector for i := range buf { for j := 0; j < 8; j++ { counts[j] += int(buf[i] >> j & 1) } } NOT for i := range buf { for j := 0; j < 8; j++ { # counts is NOT the same length as buf. counts[>>>>I<<<<] += int(buf[i] >> j & 1) } } turns out that SVP64 can do this algorithm in about 12 or so instructions, which is 35x less than the instruction count needed for AVX512 or NEON. l. From abhisheksharma at object-automation.com Fri Aug 20 12:51:02 2021 From: abhisheksharma at object-automation.com (Abhishek Sharma) Date: Fri, 20 Aug 2021 17:21:02 +0530 Subject: [Libre-soc-dev] Charter and About-US section Message-ID: I have gone through the charter and fully agree and accept it. Also, I have filled in my details in the about us section. Regards, Abhishek From sreerekha at object-automation.com Fri Aug 20 12:58:27 2021 From: sreerekha at object-automation.com (sreerekha at object-automation.com) Date: Fri, 20 Aug 2021 07:58:27 -0400 Subject: [Libre-soc-dev] Charter and About-Us Message-ID: <6bc97c1c029dcdabb837e9d42844e223.squirrel@email.powweb.com> I have gone through the charter and I agree and accept it. And also, I have filled in my details in the about us section. Regards, Sree Rekha K P From akshara at object-automation.com Fri Aug 20 13:19:05 2021 From: akshara at object-automation.com (akshara at object-automation.com) Date: Fri, 20 Aug 2021 08:19:05 -0400 Subject: [Libre-soc-dev] Acceptance of the charter Message-ID: <31af7b014e18222a51aa23d331681e35.squirrel@email.powweb.com> I have gone through the charter and I agree and accept it. I have also filled in my details in the 'about us' section. Regards, Akshara S From lkcl at lkcl.net Fri Aug 20 15:17:36 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Fri, 20 Aug 2021 15:17:36 +0100 Subject: [Libre-soc-dev] Charter and About-US section In-Reply-To: References: Message-ID: fantastic, i have added you to the list --- crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68 On Fri, Aug 20, 2021 at 12:51 PM Abhishek Sharma wrote: > > I have gone through the charter and fully agree and accept it. > Also, I have filled in my details in the about us section. > > Regards, > Abhishek > _______________________________________________ > Libre-soc-dev mailing list > Libre-soc-dev at lists.libre-soc.org > http://lists.libre-soc.org/mailman/listinfo/libre-soc-dev From lkcl at lkcl.net Fri Aug 20 15:57:22 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Fri, 20 Aug 2021 15:57:22 +0100 Subject: [Libre-soc-dev] Acceptance of the charter In-Reply-To: <31af7b014e18222a51aa23d331681e35.squirrel@email.powweb.com> References: <31af7b014e18222a51aa23d331681e35.squirrel@email.powweb.com> Message-ID: brilliant, ok so if you can create a checklist bugreport, https://libre-soc.org/HDL_workflow/new_checklist/?updated also send me an ssh public key. --- crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68 On Fri, Aug 20, 2021 at 1:20 PM wrote: > > I have gone through the charter and I agree and accept it. I have also > filled in my details in the 'about us' section. > > Regards, > Akshara S > > > _______________________________________________ > Libre-soc-dev mailing list > Libre-soc-dev at lists.libre-soc.org > http://lists.libre-soc.org/mailman/listinfo/libre-soc-dev From programmerjake at gmail.com Fri Aug 20 18:19:57 2021 From: programmerjake at gmail.com (Jacob Lifshay) Date: Fri, 20 Aug 2021 10:19:57 -0700 Subject: [Libre-soc-dev] (no subject) In-Reply-To: References: Message-ID: On Thu, Aug 19, 2021, 22:46 wrote: > Hi! I'm Sparsha from Bangalore,Karnataka, India. Welcome! I'm currently doing my > 4th year B.tech in Electronics and Communication. I've a knowelge in > Verilog and C programming. I'm happy to work with you all. > Glad to have more people interested! Jacob > From luke.leighton at gmail.com Fri Aug 20 19:22:15 2021 From: luke.leighton at gmail.com (lkcl) Date: Fri, 20 Aug 2021 18:22:15 +0000 Subject: [Libre-soc-dev] (no subject) In-Reply-To: References: Message-ID: <911CBEA1-5256-42FE-9B83-5B638CFCAD1A@gmail.com> On August 20, 2021 5:45:32 AM UTC, sparsha at object-automation.com wrote: >Hi! I'm Sparsha from Bangalore,Karnataka, India. I'm currently doing my >4th year B.tech in Electronics and Communication. I've a knowelge in >Verilog and C programming. I'm happy to work with you all. fantastic, great to hear from you. do put the bits about your experience and skills on the http://libre-soc/about_us page, i will create a section for you. also do create a bugreport with the checklist, i hace just changed your bugzilla login from gmail to sparsha at object-automation.com for you, use this one as a template https://bugs.libre-soc.org/show_bug.cgi?id=670 best, l. From luke.leighton at gmail.com Fri Aug 20 19:55:28 2021 From: luke.leighton at gmail.com (lkcl) Date: Fri, 20 Aug 2021 18:55:28 +0000 Subject: [Libre-soc-dev] Charter and About-Us In-Reply-To: <6bc97c1c029dcdabb837e9d42844e223.squirrel@email.powweb.com> References: <6bc97c1c029dcdabb837e9d42844e223.squirrel@email.powweb.com> Message-ID: <211382DB-58EF-43A5-AAB0-CDDCEE347D46@gmail.com> On August 20, 2021 11:58:27 AM UTC, sreerekha at object-automation.com wrote: > >I have gone through the charter and I agree and accept it. >And also, I have filled in my details in the about us section. fantastic, when you are ready generate an ssh key, and email the public key to me. also remember the checklist, mark that task as "done" (then keep going through the other tasks) best, l. From luke.leighton at gmail.com Fri Aug 20 19:55:28 2021 From: luke.leighton at gmail.com (lkcl) Date: Fri, 20 Aug 2021 18:55:28 +0000 Subject: [Libre-soc-dev] Charter and About-Us In-Reply-To: <6bc97c1c029dcdabb837e9d42844e223.squirrel@email.powweb.com> References: <6bc97c1c029dcdabb837e9d42844e223.squirrel@email.powweb.com> Message-ID: <211382DB-58EF-43A5-AAB0-CDDCEE347D46@gmail.com> On August 20, 2021 11:58:27 AM UTC, sreerekha at object-automation.com wrote: > >I have gone through the charter and I agree and accept it. >And also, I have filled in my details in the about us section. fantastic, when you are ready generate an ssh key, and email the public key to me. also remember the checklist, mark that task as "done" (then keep going through the other tasks) best, l. From colepoirier at gmail.com Fri Aug 20 21:08:27 2021 From: colepoirier at gmail.com (Cole Poirier) Date: Fri, 20 Aug 2021 13:08:27 -0700 Subject: [Libre-soc-dev] Bugzilla Libre-SOC-Org default assignee Message-ID: <8BFE2240-D889-4A76-8FA7-2D7D59A94871@gmail.com> Hi Luke, Please change the default assignee for the Bugzilla libre-soc-org from me to someone else. Thank you, Cole From programmerjake at gmail.com Fri Aug 20 21:18:22 2021 From: programmerjake at gmail.com (Jacob Lifshay) Date: Fri, 20 Aug 2021 13:18:22 -0700 Subject: [Libre-soc-dev] Bugzilla Libre-SOC-Org default assignee In-Reply-To: <8BFE2240-D889-4A76-8FA7-2D7D59A94871@gmail.com> References: <8BFE2240-D889-4A76-8FA7-2D7D59A94871@gmail.com> Message-ID: On Fri, Aug 20, 2021, 13:08 Cole Poirier wrote: > Hi Luke, > > Please change the default assignee for the Bugzilla libre-soc-org from me > to someone else. > I changed it to Luke. No other bug categories/products default to you. Jacob From lkcl at lkcl.net Fri Aug 20 22:56:11 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Fri, 20 Aug 2021 22:56:11 +0100 Subject: [Libre-soc-dev] Bugzilla Libre-SOC-Org default assignee In-Reply-To: <8BFE2240-D889-4A76-8FA7-2D7D59A94871@gmail.com> References: <8BFE2240-D889-4A76-8FA7-2D7D59A94871@gmail.com> Message-ID: On Fri, Aug 20, 2021 at 9:08 PM Cole Poirier wrote: > Hi Luke, > > Please change the default assignee for the Bugzilla libre-soc-org from me to someone else. you can take that action yourself, you do not need to burden me with spending the time doing something that you yourself can do. l. From wielgusmikolaj at gmail.com Sat Aug 21 00:09:17 2021 From: wielgusmikolaj at gmail.com (Mikolaj Wielgus) Date: Sat, 21 Aug 2021 01:09:17 +0200 Subject: [Libre-soc-dev] New contributor Message-ID: Hello! I'm Mikolaj (mikolajw on IRC). I've recently found this project and decided to contribute to it. Prepare for a lot of stupid questions and bad jokes. ;) I would like to ask you to register my key for git repository access: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDQksH+iy+K7JBcqZXix8Q8sl0CVBPFvc0ebhFyRrDc0eEDVqtyXIfZqlRAu2nnMTTk/KGeQw71/MpspXnK2U8vks/SvzhcQptoobRbfggMElKRQpWjt37ORClZE4Swr+bSF080KDhTjCpaQXBf4RP6sRX+E5k5V7UxponpOQrn4tpyIa72h7RIHdEWmszCBm+nDwLqpLG9HyqLq5e0iTXqLqA4hE46wtNU+fc4iPS0QBfAsNT2ouYSr9RCeKYO72GrigngAAC/g5FnGZI90z/fsfGNAS9ITKbQMT2WXSWcUYp1NfM2O0LjWP0FEiEz8hSZjk0lSLMuOFvKyrr/w7/J mikolaj at lilienfeld -Mikolaj From lkcl at lkcl.net Sat Aug 21 00:46:21 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Sat, 21 Aug 2021 00:46:21 +0100 Subject: [Libre-soc-dev] New contributor In-Reply-To: References: Message-ID: (cc'ing you as you're not subscribed to the list) On Sat, Aug 21, 2021 at 12:42 AM Mikolaj Wielgus wrote: > > Hello! > > I'm Mikolaj (mikolajw on IRC). > > I've recently found this project and decided to contribute to it. fantastic, welcome. thanks for agreeing to the charter on IRC. > Prepare for a lot of stupid questions and bad jokes. ;) funny man :) > I would like to ask you to register my key for git repository access: done, added you to hdl-dev-setup and the wiki. you should be able to continue with the instructions at https://libre-soc.org/HDL_workflow/#gitolite3_access take care when ssh'ing to the server when testing access, if you get banned by fail2ban let me know, i can whitelist your IP address. best, l. From programmerjake at gmail.com Sat Aug 21 01:44:11 2021 From: programmerjake at gmail.com (Jacob Lifshay) Date: Fri, 20 Aug 2021 17:44:11 -0700 Subject: [Libre-soc-dev] New contributor In-Reply-To: References: Message-ID: On Fri, Aug 20, 2021, 16:43 Mikolaj Wielgus wrote: > Hello! > > I'm Mikolaj (mikolajw on IRC). > > I've recently found this project and decided to contribute to it. Welcome! Prepare for a > lot of stupid questions and bad jokes. ;) :) Jacob From sukhanshu at object-automation.com Sat Aug 21 12:45:51 2021 From: sukhanshu at object-automation.com (sukhanshu at object-automation.com) Date: Sat, 21 Aug 2021 07:45:51 -0400 Subject: [Libre-soc-dev] =?utf-8?q?=28no_subject=29?= Message-ID: Hello all, I am Sukhanshu D. from Nagpur. I'm currently in my B.tech Final year at VIT Pune. I'm glad to be a part of the team and looking forward to learn new things. From lkcl at lkcl.net Sat Aug 21 13:02:08 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Sat, 21 Aug 2021 13:02:08 +0100 Subject: [Libre-soc-dev] (no subject) In-Reply-To: References: Message-ID: On Sat, Aug 21, 2021 at 12:46 PM wrote: > Hello all, I am Sukhanshu D. from Nagpur. I'm currently in my B.tech Final > year at VIT Pune. > I'm glad to be a part of the team and looking forward to learn new things. great to hear from you, that's really encouraging. i've created a bugzilla account for you, please log in, set your own password, and go through the on-boarding process. you can see an example here: https://bugs.libre-soc.org/show_bug.cgi?id=668 l. From richard.wilbur at gmail.com Sat Aug 21 22:30:21 2021 From: richard.wilbur at gmail.com (Richard Wilbur) Date: Sat, 21 Aug 2021 15:30:21 -0600 Subject: [Libre-soc-dev] [RFC] SVP64 Vertical-First Mode loops Message-ID: > On Aug 19, 2021, at 04:43, lkcl wrote: > >> On August 19, 2021 1:46:59 AM UTC, Richard Wilbur wrote: >> >>>>> On Aug 18, 2021, at 18:13, lkcl wrote: >>> >>> 8 4 2 1 on batch sizes and num coefficients >>> 1 2 4 8 times reuse on coefficients >> >> Is this for FFT? > > no, DCT. N ln N coefficients actually N/2 ln N > >> Very cool, I suspected it would be pretty good reuse. > > it's not. Well, from my analysis above, the DCT for N elements (where N is a power if 2) seems to require calculating N-1 unique coefficients which have N+1 total reuses in the course of the transform, thus providing 2*N total coefficients. If I’m correct, that is a safe bet on doing half of the hard work of calculating coefficients! Now, let us turn to the RADIX2 FFT. > RADIX2 FFT on the other hand there are N coefficients and you *can* reuse them, for row 2 you jump every other coefficient for all 2 sub-crossbars, for row 3 you jump every 4th coefficient for sub-sub-crossbars but they are the same N coefficients. So, if I understand you correctly, with dataset comprising 16 samples: N = 16 coefficients 16 8 4 2 1 1 2 4 8 16 reuse in row unique coefficients = 16 = N total coefficients = 5 * 16 = N (log(base=2, N) + 1) reuses = 1 * (2 + 4 + 8 + 16) + 1 * (2 + 4 + 8) + 2 * (2 + 4) + 4 * 2 = 1 * 30 + 1 * 14 + 2 * 6 + 4 * 2 = 30 + 14 + 12 + 8 = 64 = N log(base=2, N) So, if my calculations are correct, the DCT and the FFT for those two algorithms and same sample size require basically the same number of unique coefficients to be calculated (the hard part). The reuse is considerably better on the FFT but it still could heavily impact both algorithms. >> Except that the input numbers are rationals with a common denominator >> for a particular row in DCT. I think we could effectively store them >> with a particular structure based on the denominator, indexed with the >> integer count along the row. More of a coefficient array/RAM than >> cache (your usage of this term was more loaded than mine, I simply was >> referring to a convenient place to stow the numbers where we could >> easily and quickly get them back when needed). > > indeed... at the cost of designing and adding yet more instructions, this time with an extremely small probability that they can be put to use elsewhere. The overhead of the following proposal is, I believe, two new instructions, described near the bottom of the proposal in order to benefit from the context established between here and there. > the butterfly REMAP schedule is generic and i can foresee it being used elsewhere. Here’s where we can use the properties of the REMAP schedule to our considerable advantage. I envision a hard result cache with tags structured as follows: algorithm index = 8 bits loop1 index = 8 bits loop2 index = 8 bits loop3 index = 8 bits This buys us some headroom to grow if and when we decide to further expand our register set but in effect puts off expanding it to simply hold coefficients. (It also means that initial implementations could be more sparse depending on the size of the register set and the semantics of the REMAP schedule. The loop indices might require only 7, 6, and 5 bits to represent for a register set size of 128=2^7—thus 18-bits instead of 24.) It would likely be useful to make the allocation of bits below the algorithm id something that could be set when the algorithm is initialized in the hard result management unit(a knock off of the memory management unit?). This would allow us to tailor use of the 16.7M tag space/algorithm to the loop or index sizes of particular algorithms. Also associated with the hard result management unit is a list of vectors to subroutines (indexed by algorithm id) that implement the calculation of the hard result should it not be present in the hard result cache. The semantics of the routine are restricted to use whatever is implied by the REMAP schedule (indices in particular registers). When a particular hard calculation routine (algorithm id) is changed the cache for all associated hard results would be invalidated. When a particular algorithm was set up in a REMAP section, the decoder or dispatch unit could schedule the calculation of vectors of coefficients to pre-load the hard result cache if that would be on an interference basis with the calculations (functional unit contention) or data flow (register usage contention) of the algorithm. The proposed two new instructions: 1. Load or calculate hard result, with parameters destination register, algorithm id, loop1 index, loop2 index, loop3 index 2. Register algorithm with the hard result management unit, with parameters to include algorithm id, entry point of routine to calculate hard result, bit widths of loop indices in the tag. The effects of the hard result cache I can think of right now will be to: 1. Cut down the pressure on the register set to hold algorithm-specific coefficients that often are reused many times in the course of a particular calculation. 2. On the other side of that same coin, it will reduce the number of times we have to calculate the same coefficient, thus considerably reducing the load on the arithmetic units for redundant calculations and reserving the arithmetic and memory bandwidth for the data to which the algorithm is being applied. 3. Being able to dedicate a higher percentage of our register set to data and results and avoiding the recalculation of coefficients should both work together to improve the performance of the libre-soc Simple-V architecture. (The hard result cache needn’t be tied specifically to REMAP, it could be used by normal vector or scalar code.) Richard From luke.leighton at gmail.com Sat Aug 21 22:42:30 2021 From: luke.leighton at gmail.com (lkcl) Date: Sat, 21 Aug 2021 21:42:30 +0000 Subject: [Libre-soc-dev] [RFC] SVP64 Vertical-First Mode loops In-Reply-To: References: Message-ID: On August 21, 2021 9:30:21 PM UTC, Richard Wilbur wrote: > >(The hard result cache needn’t be tied specifically to REMAP, it could >be used by normal vector or scalar code.) ya know... another name for "fast small hard result cache" is "register file"? everything you described has the identical properties of a register file... :) l. From richard.wilbur at gmail.com Sat Aug 21 23:52:01 2021 From: richard.wilbur at gmail.com (Richard Wilbur) Date: Sat, 21 Aug 2021 16:52:01 -0600 Subject: [Libre-soc-dev] [RFC] SVP64 Vertical-First Mode loops In-Reply-To: References: Message-ID: On Sat, Aug 21, 2021 at 3:42 PM lkcl wrote: > On August 21, 2021 9:30:21 PM UTC, Richard Wilbur wrote: > >(The hard result cache needn’t be tied specifically to REMAP, it could > >be used by normal vector or scalar code.) > > ya know... another name for "fast small hard result cache" is "register file"? Is it? "fast", yes. "small", not necessarily. > everything you described has the identical properties of a register file... :) That's sort of what we want but don't have space in the instruction format for the bits to specify the register numbers, right? So I see this as an opportunity to create an algorithm-specific method of addressing the new "registers". Another advantage of this scheme is that it is never in need of saving and restoring with a context switch. From mehul at object-automation.com Sun Aug 22 10:36:26 2021 From: mehul at object-automation.com (mehul at object-automation.com) Date: Sun, 22 Aug 2021 05:36:26 -0400 Subject: [Libre-soc-dev] =?utf-8?q?=28no_subject=29?= Message-ID: Hello, I am Mehul Nachankar. I recently completed my Bachelor's in July'21. I worked as a SoC Verification Intern at Sion Semiconductors for the past 6 months. I have also worked as a Research Intern at Korea Institute of Science and Technology (KIST) in 2019. I am really looking forward for this opportunity and eager to give my contribution in any way possible. From lkcl at lkcl.net Sun Aug 22 10:47:00 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Sun, 22 Aug 2021 10:47:00 +0100 Subject: [Libre-soc-dev] (no subject) In-Reply-To: References: Message-ID: On Sun, Aug 22, 2021 at 10:37 AM wrote: > > Hello, I am Mehul Nachankar. I recently completed my Bachelor's in > July'21. I worked as a SoC Verification Intern at Sion Semiconductors for > the past 6 months. I have also worked as a Research Intern at Korea > Institute of Science and Technology (KIST) in 2019. I am really looking > forward for this opportunity and eager to give my contribution in any way > possible. fantastic, great to hear from you. please go through the on-boarding process. please can someone else - one of the other interns - reply and explain what that process is. i would like all of you to begin helping each other out. l. From umbertocerrato at outlook.it Sun Aug 22 11:17:45 2021 From: umbertocerrato at outlook.it (Umberto Cerrato) Date: Sun, 22 Aug 2021 10:17:45 +0000 Subject: [Libre-soc-dev] (no subject) In-Reply-To: References: Message-ID: <364436EF-C514-4218-9218-284E475DA018@outlook.it> Hi Mehul, Here you can find list of task as a new developer: https://libre-soc.org/#help_as_developer You may have already seen it. Once you have joined the mailing list, you would accept the charter that you can find here: https://libre-soc.org/charter/ Then, if you have IRC, you would join that too. It is easier to ask stuff there than in the mailing list, in my opinion. Some info here: Libera IRC. channel: #libre-soc at irc.libera.chat port 6697 (see https://libera.chat/guides/connect in case). About IRC, do not expect people to reply to you immediately. Not everyone online is next to her/his the computer. For this reason, you should find a way to store chat logs in order to not lose the thread of the conversation. Anyway, logs are stored here: https://libre-soc.org/irclog/index.html Next, if you plan to do HDL work, you should familiarize yourself with this: https://libre-soc.org/HDL_workflow/ You might want to edit the pages of the website. Like adding informations, notes, suggestions to make it easier for new people to get started, and so on. If you would like to do so, ask in the mailing list or the IRC. You will likely need an account to make the edits. Last but not least, pay attention to git. If you are unsure about something, ask. Best, umc From luke.leighton at gmail.com Sun Aug 22 12:10:32 2021 From: luke.leighton at gmail.com (lkcl) Date: Sun, 22 Aug 2021 12:10:32 +0100 Subject: [Libre-soc-dev] [RFC] SVP64 Vertical-First Mode loops In-Reply-To: References: Message-ID: On August 21, 2021 10:52:01 PM UTC, Richard Wilbur wrote: >On Sat, Aug 21, 2021 at 3:42 PM lkcl wrote: >> On August 21, 2021 9:30:21 PM UTC, Richard Wilbur > wrote: >> >(The hard result cache needn’t be tied specifically to REMAP, it >could >> >be used by normal vector or scalar code.) >> >> ya know... another name for "fast small hard result cache" is >"register file"? > >Is it? yes. > "fast", yes. "small", not necessarily. which would need explaining to the ISA WG, "why we are duplicating the functionality of a register file including adding explicit instructions which are to transfer between the new type of register file and the standard GPR/FPR" also if it is particularly large you run into latency issues. > >> everything you described has the identical properties of a register >file... :) > >That's sort of what we want but don't have space in the instruction >format for the bits to specify the register numbers, right? correct. and don't want to (a) modify v3.0B or (b) go retrospectively back and alter the SVP64 RM field. > So I see >this as an opportunity to create an algorithm-specific method of >addressing the new "registers". which in turn requires a means and method of actually accessing those new registers. > Another advantage of this scheme is >that it is never in need of saving and restoring with a context >switch. this isn't true: i can foresee circumstances where two proceses will need to use different constants. honestly richard although at first glance it seems like a good idea, it's really no different from "A Register File". plus, really, a way is needed for *all* instructions to read from "The Registers/Cache" not just one or two, because if it's just one ("move from one register/cache to the GPR/FPR") then that's one extra instruction inside inner loops and if it's merged into a "specialist" instruction (DCT coefficient multiply) we just caused what was previously a potentially useful generic twin mul-add instruction to become a non-generic one. all these things need to be thought through - in full - unfortunately, when it comes to ISA design. then, when you've spent several days/weeks outlining the entire lot, you then have to spend several more days/weeks making a comparative analysis against *existing* schemes. part of that analysis involves * "what's the cost of implementing this" as well as * "what's the cost to CHANGE an EXISTING implementation" and * "how much work is it to create a Conformance Validation Test Suite" and * "what will the ISA WG think about this proposal, what will they ask" l. From lkcl at lkcl.net Sun Aug 22 12:24:01 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Sun, 22 Aug 2021 12:24:01 +0100 Subject: [Libre-soc-dev] nmigen users / contributors re-creating nmutil Stage API Message-ID: https://github.com/nmigen/nmigen/issues/317 drat, we've been so busy that interacting with other nmigen users has been on the back-burner. it looks like there's a concerted effort to redesign and duplicate pretty much the entirety of nmutil's ready/valid pipeline API. i hadn't realised, the discussion dates back *over a year*. could someone please cross-reference and reply to the github comments, https://libera.irclog.whitequark.org/nmigen/2021-08-22#30718156; l. --- crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68 From programmerjake at gmail.com Sun Aug 22 19:59:15 2021 From: programmerjake at gmail.com (Jacob Lifshay) Date: Sun, 22 Aug 2021 11:59:15 -0700 Subject: [Libre-soc-dev] (no subject) In-Reply-To: References: Message-ID: On Sat, Aug 21, 2021, 04:46 wrote: > Hello all, I am Sukhanshu D. from Nagpur. I'm currently in my B.tech Final > year at VIT Pune. > Welcome! Jacob From programmerjake at gmail.com Sun Aug 22 20:01:41 2021 From: programmerjake at gmail.com (Jacob Lifshay) Date: Sun, 22 Aug 2021 12:01:41 -0700 Subject: [Libre-soc-dev] (no subject) In-Reply-To: References: Message-ID: On Sun, Aug 22, 2021, 02:37 wrote: > Hello, I am Mehul Nachankar. Welcome! I recently completed my Bachelor's in > July'21. I worked as a SoC Verification Intern at Sion Semiconductors for > the past 6 months. I have also worked as a Research Intern at Korea > Institute of Science and Technology (KIST) in 2019. Neat! I am really looking > forward for this opportunity and eager to give my contribution in any way > possible. > Glad to have more people interested! Jacob From adithya at object-automation.com Mon Aug 23 06:10:59 2021 From: adithya at object-automation.com (adithya at object-automation.com) Date: Mon, 23 Aug 2021 01:10:59 -0400 Subject: [Libre-soc-dev] Unable to login to Bugzilla account Message-ID: <48adbf0f8a93c983b35e076ce81ca660.squirrel@email.powweb.com> I am Adithya. I am unable to login to my bugzilla account. I tried to reset the password, but have not received any mail to change the password (I did check the spam). Please help. From programmerjake at gmail.com Mon Aug 23 06:17:44 2021 From: programmerjake at gmail.com (Jacob Lifshay) Date: Sun, 22 Aug 2021 22:17:44 -0700 Subject: [Libre-soc-dev] Unable to login to Bugzilla account In-Reply-To: <48adbf0f8a93c983b35e076ce81ca660.squirrel@email.powweb.com> References: <48adbf0f8a93c983b35e076ce81ca660.squirrel@email.powweb.com> Message-ID: On Sun, Aug 22, 2021, 22:11 wrote: > I am Adithya. I am unable to login to my bugzilla account. I tried to > reset the > password, but have not received any mail to change the password (I did > check the > spam). Please help. > It looks like luke used the wrong email address when he created your account (aditya instead of adithya), I fixed that, try resetting your password again. Jacob From adithya at object-automation.com Mon Aug 23 07:23:39 2021 From: adithya at object-automation.com (Adithya Gopan) Date: Mon, 23 Aug 2021 11:53:39 +0530 Subject: [Libre-soc-dev] Unable to login to Bugzilla account In-Reply-To: References: <48adbf0f8a93c983b35e076ce81ca660.squirrel@email.powweb.com> Message-ID: On Mon, Aug 23, 2021 at 10:48 AM Jacob Lifshay wrote: > On Sun, Aug 22, 2021, 22:11 wrote: > > > I am Adithya. I am unable to login to my bugzilla account. I fixed that, > try resetting your > password again. > > Jacob > _______________________________________________ > Libre-soc-dev mailing list > Libre-soc-dev at lists.libre-soc.org > http://lists.libre-soc.org/mailman/listinfo/libre-soc-dev Thank You, Jacob. I am able to log in. From programmerjake at gmail.com Mon Aug 23 07:33:40 2021 From: programmerjake at gmail.com (Jacob Lifshay) Date: Sun, 22 Aug 2021 23:33:40 -0700 Subject: [Libre-soc-dev] Unable to login to Bugzilla account In-Reply-To: References: <48adbf0f8a93c983b35e076ce81ca660.squirrel@email.powweb.com> Message-ID: On Sun, Aug 22, 2021, 23:24 Adithya Gopan wrote: > Thank You, Jacob. I am able to log in. Yay! Sorry for our mistake... Jacob From lkcl at lkcl.net Tue Aug 24 12:07:05 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Tue, 24 Aug 2021 12:07:05 +0100 Subject: [Libre-soc-dev] naming change in nmutil Message-ID: https://libera.irclog.whitequark.org/nmigen/2021-08-24#30728292; a discussion there, prompted me to do a global search/replace of the naming of ready/valid in the StageAPI. i have of course just realised that data_i and data_o need the same treatment. apparently nmigen gained a convention to place o_ and i_ at the front of signals since the time that nmutil's StageAPI was written. on balance it's a good idea to follow that convention. l. --- crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68 From lkcl at lkcl.net Tue Aug 24 15:33:22 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Tue, 24 Aug 2021 15:33:22 +0100 Subject: [Libre-soc-dev] wiki edits for OA minutes Message-ID: Madan, hi, i investigated the edits you made (which you can see on git.libre-soc.org yourself) 1) you had edited the page oa/madan.mdwn but had expected the change to appear on a completely different, unrelated page, about_us.mdwn 2) you had edited the template.mdwn file inserting the minutes of 20th august onto template.mdwn not its own page. use the button to create the new page, but copy the contents of template.mdwn before doing so then paste them into the new page. (i have yet to work out how this can be done in an automated fashion with ikiwiki) l. diff --git a/about_us.mdwn b/about_us.mdwn index 5db025ed..329bcdd5 100644 --- a/about_us.mdwn +++ b/about_us.mdwn @@ -141,7 +141,7 @@ Alain's website: ### [[oa/madan]] -* Experience: Programming in Python and Knowledge of ML algorithms and NLP +* Interests: Programming in Python and Knowledge of ML algorithms and NLP * Availability: 5 hours per week * Statistician diff --git a/oa/madan.mdwn b/oa/madan.mdwn index b9366a20..80731368 100644 --- a/oa/madan.mdwn +++ b/oa/madan.mdwn @@ -1,4 +1,4 @@ -* Experience: Programming in Python and Knowledge of ML algorithms and NLP +* Interests: Programming in Python and Knowledge of ML algorithms and NLP * Availability: 5 hours per week * Statistician --- crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68 From lechenko at bsuir.by Tue Aug 24 23:01:09 2021 From: lechenko at bsuir.by (Anton Lechanka) Date: Wed, 25 Aug 2021 01:01:09 +0300 Subject: [Libre-soc-dev] Introduction - Anton Message-ID: Hi All, My name is Anton Lechanka, I am a software engineer and assistant professor in Computer Architecture. I used to do some software simulation and microarchitecture tasks a few years ago. Now I would like to join your community and work on this fascinating project. I have read the charter and I agree to abide by it. Thank you, Anton. From programmerjake at gmail.com Wed Aug 25 03:32:32 2021 From: programmerjake at gmail.com (Jacob Lifshay) Date: Tue, 24 Aug 2021 19:32:32 -0700 Subject: [Libre-soc-dev] Introduction - Anton In-Reply-To: References: Message-ID: On Tue, Aug 24, 2021 at 3:02 PM Anton Lechanka wrote: > > Hi All, Welcome! > My name is Anton Lechanka, I am a software engineer and assistant > professor in Computer Architecture. Neat! > I used to do some software simulation and microarchitecture tasks a > few years ago. > Now I would like to join your community and work on this fascinating project. > I have read the charter and I agree to abide by it. Glad to have you! Luke was describing some of your experience on the weekly meeting today (assuming he got the right person), It sounds waay more impressive than what you covered in your intro email. :) Jacob Lifshay From madan at object-automation.com Wed Aug 25 07:45:34 2021 From: madan at object-automation.com (Madan Kartheessan) Date: Wed, 25 Aug 2021 12:15:34 +0530 Subject: [Libre-soc-dev] MoM August 20 Message-ID: *Luke, David and the Object Automation team:* Please use the URL below to read the August 20 MoM . https://libre-soc.org/oa/minutes/2021aug20/ *Luke* Thanks a lot for rectifying those issues that I brought up yesterday in the call. Regards Madan K. From vklr at vkten.in Wed Aug 25 10:20:38 2021 From: vklr at vkten.in (Veera) Date: Wed, 25 Aug 2021 14:50:38 +0530 Subject: [Libre-soc-dev] Any new work for me Message-ID: <20210825092037.GA1720@lily.local> Hi, Any new work for me, small to moderate, paid or unpaid. Paid one will be given priority. Thanks, Veera From umbertocerrato at outlook.it Wed Aug 25 10:26:07 2021 From: umbertocerrato at outlook.it (Umberto Cerrato) Date: Wed, 25 Aug 2021 09:26:07 +0000 Subject: [Libre-soc-dev] Any new work for me In-Reply-To: <20210825092037.GA1720@lily.local> References: <20210825092037.GA1720@lily.local> Message-ID: <94A037ED-0994-4703-B68C-35E04DCB832C@outlook.it> Same for me :) Small preferred. Thank you > Il giorno 25 ago 2021, alle ore 11:20, Veera ha scritto: > > Hi, > > Any new work for me, small to moderate, paid or unpaid. > Paid one will be given priority. > > Thanks, > Veera > > _______________________________________________ > Libre-soc-dev mailing list > Libre-soc-dev at lists.libre-soc.org > http://lists.libre-soc.org/mailman/listinfo/libre-soc-dev From lkcl at lkcl.net Wed Aug 25 11:50:44 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Wed, 25 Aug 2021 11:50:44 +0100 Subject: [Libre-soc-dev] MoM August 20 In-Reply-To: References: Message-ID: On Wed, Aug 25, 2021 at 7:45 AM Madan Kartheessan wrote: > Thanks a lot for rectifying those issues that I brought up yesterday in > the call. no problem. i created a stub 2021aug24 one for you as well https://libre-soc.org/oa/minutes/2021aug24/ can i recommend that you carry on with regular meetings (run by yourselves), perhaps using this link? https://meet.jit.si/Libre-SOC-OA that works in a web browser however it also works with the Jitsi Android app https://play.google.com/store/apps/details?id=org.jitsi.meet&hl=en_GB&gl=US l. From lkcl at lkcl.net Wed Aug 25 11:55:20 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Wed, 25 Aug 2021 11:55:20 +0100 Subject: [Libre-soc-dev] Any new work for me In-Reply-To: <20210825092037.GA1720@lily.local> References: <20210825092037.GA1720@lily.local> Message-ID: --- crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68 On Wed, Aug 25, 2021 at 10:20 AM Veera wrote: > > Hi, > > Any new work for me, small to moderate, paid or unpaid. the symbiflow one. https://bugs.libre-soc.org/show_bug.cgi?id=654 umberto, create yourself a section on the about_us page and also create a home page, it helps having what you can do, so we can work out a good fit. and your home page is used for tracking payments / tasks. l. From madan.kartheessan at gmail.com Wed Aug 25 12:05:46 2021 From: madan.kartheessan at gmail.com (Madan Kartheessan) Date: Wed, 25 Aug 2021 16:35:46 +0530 Subject: [Libre-soc-dev] MoM August 20 In-Reply-To: References: Message-ID: can i recommend that you carry on with regular meetings (run by yourselves), perhaps using this link? Luke Thanks Luke, I will do that. On Wed, Aug 25, 2021 at 4:20 PM Luke Kenneth Casson Leighton wrote: > On Wed, Aug 25, 2021 at 7:45 AM Madan Kartheessan > wrote: > > > Thanks a lot for rectifying those issues that I brought up yesterday in > > the call. > > no problem. i created a stub 2021aug24 one for you as well > https://libre-soc.org/oa/minutes/2021aug24/ > > can i recommend that you carry on with regular meetings (run by > yourselves), > perhaps using this link? > > https://meet.jit.si/Libre-SOC-OA > > that works in a web browser however it also works with the Jitsi > Android app > > https://play.google.com/store/apps/details?id=org.jitsi.meet&hl=en_GB&gl=US > > l. > > _______________________________________________ > Libre-soc-dev mailing list > Libre-soc-dev at lists.libre-soc.org > http://lists.libre-soc.org/mailman/listinfo/libre-soc-dev > From luke.leighton at gmail.com Wed Aug 25 12:29:53 2021 From: luke.leighton at gmail.com (lkcl) Date: Wed, 25 Aug 2021 11:29:53 +0000 Subject: [Libre-soc-dev] Introduction - Anton In-Reply-To: References: Message-ID: On August 24, 2021 10:01:09 PM UTC, Anton Lechanka wrote: >Hi All, > >My name is Anton Lechanka, I am a software engineer and assistant >professor in Computer Architecture. like jacib said, great to hear from you. >I used to do some software simulation and microarchitecture tasks a >few years ago. interesting. you may be intrigued to know, we put in a new Grant request to use Peter Hsu's cavatools for the basis of a new Power ISA simulator. >Now I would like to join your community and work on this fascinating >project. >I have read the charter and I agree to abide by it. fantastic. i'm assuming that you'd like to receive donations from NLnet, if so please do carry on through the checklist on the "how can i help as a developer" section, we use people's wiki homepage to track payments (example http://libre-soc.org/lkcl) second, do send me an ssh public key i'll add you to the list. third, you no doubt saw we use nmigen, this is because we have some Serious OO Design work to do, and the evaluation for all other HDL tools came up either short or so-short-you-need-a-microscope. to illustrate, here's what can be done by using operator-overloading at the python level: https://libre-soc.org/3d_gpu/architecture/dynamic_simd/ if you'd like to jump straight in, an immediate useful task would be binutils support for SVP64. i am currently using a python class to generate ".long xxx; v3.0b asmop" and in some cases ".long xxx .long yyyy" the bugreport is here: https://bugs.libre-soc.org/show_bug.cgi?id=550 it will give you an immediate feel for the SVP64 format. the only thing: it is *not* a good idea to hand-create the tables needed by binutils. these should be *auto-generated*, teaching sv_analysis.py how to do that. https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/sv/sv_analysis.py;hb=HEAD there's nothing particularly sophisticated or clever about that program: it's written in a bland, non-OO "Get It Done" style. it: * reads OpenPOWER ISA v3.0B CSV files containing micro-code-style instruction format information (exactly like the tables in binutils) * identifies and groups v3.0B instructions by identical register file profile (number of Read regs, number of Write regs, number of CR regs read etc) * assigns an SVP64 "Style" to each (Twin/Single-predicate, 2 or 3 EXTRA bits for reg extension) * spits out *more* CSV files with that grouping information in it, to assist in decoding thus rather than hand-create the SVP64 decoding information in binutils it should be trivial to autogenerate c header files and c structs. this will give you a good running start directly into how SVP64 is formatted. interested? :) l. From luke.leighton at gmail.com Wed Aug 25 12:38:38 2021 From: luke.leighton at gmail.com (lkcl) Date: Wed, 25 Aug 2021 11:38:38 +0000 Subject: [Libre-soc-dev] MoM August 20 In-Reply-To: References: Message-ID: <26890C2A-8942-42DD-9D13-96DF3756CF7D@gmail.com> On August 25, 2021 11:05:46 AM UTC, Madan Kartheessan wrote: >can i recommend that you carry on with regular meetings (run by >yourselves), >perhaps using this link? Madan: please just reply inline rather than copy the text that somebody else wrote. by copying my words it looks like *you* are saying "carry on with regular meetings". are you instructing *me* to carry on with regular meetings? look above at the header. it says, "On aug 25th Madan wrote: can i recommend you carry on with regular meetings" you also replied with top-posting. please can you look up and read "mailing list netiquette" use google search to find some pages on that. the internet joke with "A" before "Q" is particularly funny but is also illustrative at the same time. best, l. From luke.leighton at gmail.com Wed Aug 25 14:08:08 2021 From: luke.leighton at gmail.com (lkcl) Date: Wed, 25 Aug 2021 13:08:08 +0000 Subject: [Libre-soc-dev] Introduction - Anton In-Reply-To: References: Message-ID: On August 25, 2021 12:37:51 PM UTC, Anton Lechanka wrote: >> the bugreport is here: >> https://bugs.libre-soc.org/show_bug.cgi?id=550 >> >> interested? :) >> > >Sure! Do you have any deadlines or time estimates for this task to be >done? with autogeneration by svanalysis i really would be surprised if it was longer than 3 weeks. however ramp-up / questions, obviously, not included in that. no deadlines given that i am using the python class, which has a mode where it can do .S processing. i actually had to add gas macro recognition to get that to work. so there is a temporary workaround. however it will become increasingly more of a priority particularly for Lauri who is working at assembler level for Video/Audio CODECs, and later for compilers. the function entrypoint is asm_process() https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/sv/trans/svp64.py;h=45b292b4c4c32bbff548f2bf299235633d31db6c;hb=HEAD#l1052 you can see it looks for ".set" macros of the utmost basic form, example where this is used: https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=media/Makefile;h=4dd904b6ba48f3fcae3b1ab04e1b0479e460abd4;hb=HEAD#l34 and some actual assembler containing sv.xxx opcodes, which get translated by asm_process() libe by line into ".long xxxxx; some_v3.0b_asmopcode" https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=media/audio/mp3/mp3_0_apply_window_float_basicsv.s;hb=HEAD you've seen the spec page which contains the format? https://libre-soc.org/openpower/sv/svp64/ it's very deliberately only describing the format, not why it is what it us, or how to *use* that format (how to implement hardware etc i mean). l. From lechenko at bsuir.by Wed Aug 25 15:13:39 2021 From: lechenko at bsuir.by (Anton Lechanka) Date: Wed, 25 Aug 2021 17:13:39 +0300 Subject: [Libre-soc-dev] Introduction - Anton In-Reply-To: References: Message-ID: On Wed, 25 Aug 2021 at 16:08, lkcl wrote: > > > On August 25, 2021 12:37:51 PM UTC, Anton Lechanka > wrote: > > > > >Sure! Do you have any deadlines or time estimates for this task to be > >done? > > with autogeneration by svanalysis i really would be surprised if it was > longer than 3 weeks. however ramp-up / questions, obviously, not included > in that. > > no deadlines given that i am using the python class, which has a mode > where it can do .S processing. i actually had to add gas macro recognition > to get that to work. I see. I must warn you that I cannot provide more than 8 hours per week, due to my other commitments. So it may take even longer for me. If it is okay with you, then I’lol proceed with this task as soon as environment is ready. Anton. From luke.leighton at gmail.com Wed Aug 25 16:06:29 2021 From: luke.leighton at gmail.com (lkcl) Date: Wed, 25 Aug 2021 15:06:29 +0000 Subject: [Libre-soc-dev] Introduction - Anton In-Reply-To: References: Message-ID: <2DC9A197-869D-46B3-A725-D04098D62550@gmail.com> On August 25, 2021 2:13:39 PM UTC, Anton Lechanka wrote: >On Wed, 25 Aug 2021 at 16:08, lkcl wrote: > >> >> >> On August 25, 2021 12:37:51 PM UTC, Anton Lechanka > >> wrote: >> >> > >> >Sure! Do you have any deadlines or time estimates for this task to >be >> >done? >> >> with autogeneration by svanalysis i really would be surprised if it >was >> longer than 3 weeks. however ramp-up / questions, obviously, not >included >> in that. >> >> no deadlines given that i am using the python class, which has a mode >> where it can do .S processing. i actually had to add gas macro >recognition >> to get that to work. > > >I see. I must warn you that I cannot provide more than 8 hours per >week, >due to my other commitments. So it may take even longer for me. no problem. we have a few part time people. and, honestly, i have found many times that part time results in better quality work, due to a lot more "thinking time" in between "typing time" > If it >is >okay with you, then I’lol proceed with this task as soon as environment >is >ready. sure, that'd be fantastic. i will add you a bugzilla account (i had to disable account creation after a lovevely bugreport about the price and availability of juniper network routers) you can then set a password. l. From mehul at object-automation.com Thu Aug 26 06:54:49 2021 From: mehul at object-automation.com (mehul at object-automation.com) Date: Thu, 26 Aug 2021 01:54:49 -0400 Subject: [Libre-soc-dev] =?utf-8?q?=28no_subject=29?= Message-ID: <9146f830489a1a81df4c4348b1a7857f.squirrel@email.powweb.com> Hi All, I am Mehul Nachankar. I have read the charter and I agree to abide by it. From lkcl at lkcl.net Thu Aug 26 10:48:39 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Thu, 26 Aug 2021 10:48:39 +0100 Subject: [Libre-soc-dev] (no subject) In-Reply-To: <9146f830489a1a81df4c4348b1a7857f.squirrel@email.powweb.com> References: <9146f830489a1a81df4c4348b1a7857f.squirrel@email.powweb.com> Message-ID: On Thu, Aug 26, 2021 at 6:55 AM wrote: > > Hi All, > > I am Mehul Nachankar. I have read the charter and I agree to abide by it. fantastic, mehul, i got the ssh key and have added you to the gitolite3 access. you have write permission to the wiki now via ssh. do update your checklist and continue the onboarding process. best, l. From sukhanshu at object-automation.com Thu Aug 26 13:12:34 2021 From: sukhanshu at object-automation.com (sukhanshu at object-automation.com) Date: Thu, 26 Aug 2021 08:12:34 -0400 Subject: [Libre-soc-dev] Charter Message-ID: <016eb8d3e3af16fc195779d720005b98.squirrel@email.powweb.com> I have read the charter and i agree to abide by it. From luke.leighton at gmail.com Thu Aug 26 16:48:06 2021 From: luke.leighton at gmail.com (lkcl) Date: Thu, 26 Aug 2021 15:48:06 +0000 Subject: [Libre-soc-dev] [RFC] SVP64 Branches (contd) Message-ID: <31A70FCD-BE6F-4257-AE69-F1C12B3BB56E@gmail.com> https://libre-soc.org/openpower/sv/branches/ after realising that CTR can be used to save instructions in tight Vector loops by not needing an explicit GPR subtraction, comparison, and etc etc, i also realised that there are even more possible modes that might be useful. * only reduce CTR on predicated elements. you know there's 50 elements somewhere in a list of length e.g. 5000, you want to process those but stop when the 50th has been done. * only reduce CTR on condition tests that succeed * only reduce CTR on condition tests that *fail* there are plenty more options like this, what are they? what is actually useful? l. From luke.leighton at gmail.com Thu Aug 26 22:59:01 2021 From: luke.leighton at gmail.com (lkcl) Date: Thu, 26 Aug 2021 21:59:01 +0000 Subject: [Libre-soc-dev] [RFC] SVP64 Branches (contd) In-Reply-To: <31A70FCD-BE6F-4257-AE69-F1C12B3BB56E@gmail.com> References: <31A70FCD-BE6F-4257-AE69-F1C12B3BB56E@gmail.com> Message-ID: <4DA99921-B2B1-4786-A93B-2CD29B7CFABA@gmail.com> On August 26, 2021 3:48:06 PM UTC, lkcl wrote: >https://libre-soc.org/openpower/sv/branches/ > >after realising that CTR can be used to save instructions in tight >Vector loops by not needing an explicit GPR subtraction, comparison, >and etc etc, i also realised that there are even more possible modes >that might be useful. again we are running out of bits, or have to start overloading even more fields from the 24 bit RM. question: does it make sense for CTR to *always* be decremented "per element"? this would kinda match with the whole Sub-PC thing: CTR counts elements not SVP64 instructions. are there any circumstances where you would want to count *batches*? i.e. the number of SVP64 Branch Instructions and, if so, is it sufficient to simply pre-multiply CTR by VL, such that, again, CTR may be decremented by "number of elements", not "number of times sv.bc is called"? l. From lkcl at lkcl.net Fri Aug 27 11:35:39 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Fri, 27 Aug 2021 11:35:39 +0100 Subject: [Libre-soc-dev] Charter In-Reply-To: <016eb8d3e3af16fc195779d720005b98.squirrel@email.powweb.com> References: <016eb8d3e3af16fc195779d720005b98.squirrel@email.powweb.com> Message-ID: On Thu, Aug 26, 2021 at 1:13 PM wrote: > > I have read the charter and i agree to abide by it. fantastic, i've received your ssh public key, and have added it to the gitolite3 repository, and given you write permission to the wiki. do carry on with the on-boarding process. best, l. From lkcl at lkcl.net Fri Aug 27 17:43:08 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Fri, 27 Aug 2021 17:43:08 +0100 Subject: [Libre-soc-dev] L0 Cache Buffer Message-ID: anton, hi, here is the page containing the diagram for what we called the "L0 Cache Buffer" https://libre-soc.org/3d_gpu/architecture/memory_and_cache/ we have a standard interface (a nmigen object / record) called "PortInterface" which has all the usual signals: data, address, LD/ST request, addr_ok, exception, but where the length is specified as a per-byte "mask" per bit this is processed by LDSTSplitter into *two* PortInterfaces, where the addresses are now 8-byte aligned, and the non-aligned "masks" have been been split across the two. example: LD 0x0003 8-bytes => * even port LD 0x0000 5-bit mask 0b11111000 * odd port LD 0x0008 3-bit mask 0b00000111 you can then take all the EVEN requests, separated from ODD requests, and put all EVEN requests into the EVEN DataMerger, and all ODD requests into the ODD DataMerger then, because all of the requests comprise bit-level masks, you can merge all of the bit-masks in the EVEN side, and (separately) merge all of the bit-masks on the ODD side... ... this of course if they have the same MSBs in Addr[5:11] and Addr[12:48]... and you can then pick one of them (PriorityPicker) on each side to pass through to a left-side L1 Cache and a right-side L1 Cache. or, other such scheme, making a 256-bit-wide single request etc. etc. etc. but the primary focus here is on the address and data merging. l. --- crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68 From luke.leighton at gmail.com Sat Aug 28 13:21:45 2021 From: luke.leighton at gmail.com (lkcl) Date: Sat, 28 Aug 2021 12:21:45 +0000 Subject: [Libre-soc-dev] [RFC] SVP64 Fail-first Mode "VL inclusive" Message-ID: when working on SV Branches i added a VLSET Mode which truncates the Vector Length at the point where the first CR Test fails, it is effectively identical to Data-dependent Fail-First mode. however due to the combinations of Great-Big-{AND/OR/NAND/NOR} on the CR tests i realised we also need to be able to either *exclude* or *include* the element being tested from the count that ends up in VL. once this realisation had sunk in i realised that FFirst could also benefit from the same. i'm therefore proposing replacing the dz bit in ffirst Rc=0 mode with a VLi bit. the dz bit is for setting zeroing, which, as i found out from the SV Branches, makes no sense or is incomplete. SV Branches allow a bit to set whether the masked-out element be replaced with a one *or* a zero, but there is nowhere near enough bits available for this additional sophistication in Data-dependent FFirst Mode. therefore dropping zeroing entirely from ffirst is the more logical choice, i feel, leaving just "skipping" (predicate masking) thoughts appreciated. l. From luke.leighton at gmail.com Sat Aug 28 15:57:13 2021 From: luke.leighton at gmail.com (lkcl) Date: Sat, 28 Aug 2021 14:57:13 +0000 Subject: [Libre-soc-dev] ARM China a separate company Message-ID: <5D0534E4-0796-4F6A-81A2-A2E66673BB8D@gmail.com> https://semianalysis.substack.com/p/the-semiconductor-heist-of-the-century i'm not sure what to say :) l. From luke.leighton at gmail.com Sat Aug 28 17:36:02 2021 From: luke.leighton at gmail.com (lkcl) Date: Sat, 28 Aug 2021 16:36:02 +0000 Subject: [Libre-soc-dev] [RFC] SVP64 Data-dependent fail-first on CR operations (crand, cror, etc) Message-ID: <2288D2DB-FFAF-4050-BD8B-721827579A91@gmail.com> https://libre-soc.org/openpower/sv/svp64/appendix/ just one of those things, i realised we have not thought through yet the full implications of combining Data-Dependent Fail-First with crand, crxor, etc. the idea of DDFF is that a CR bit test (like BO for branches) is given, "is CRField[idx] == BO[1]" and if that test fails it TERMINATES the current instruction Vector Loop and truncates VL to that point. subsequent Horizontal instructions will then only run at the truncated VL loop size. usually this woule be used with Rc=1. example: you do a subtract (sv.subf.) and if any one of those subtracts is less than zero the loop is terminated at that point, VL set to a length that excludes that failed element. here's the thing: some instructions like crand do not then also have an "Rc=1" option, but they still produce CR field modifications that would be useful to test. example: two sv.cmp operations are carried out, with different numbers, a and b as scalar limits. a sv.crand is performed, you want the loop to terminate at the point where: * first LE comparison against scalar A failed OR * second GE comparison against scalar B failer. a FFirst crand of the A LE bits with the B GE bits would achieve this effect. we would think in this case it would be necessary to use the 3 bits (inv-test, index, just like in v3.0B Branches) from the 24-bit RM Mode field howeverrrrr.. let us look more carefully at crand (etc) crand etc actually pass in 5 bit arguments, for a full 32-bit in each case. * BA selects a CR Field 0-7 and selects which bit EQ LE GE SO to use * likewise BB * likewise BC thus we *have* the bit to select for the FFirst testing *already*, from BC. thus, we can use the *other* type of Ffirst mode, called RC1 mode. this mode is normally reserved for operations that do not have an Rc=0/1 option, but there is also two other bits: * inv (to test if our bit selected by BC is 1/0) * VLi which is "VL Inclusive" mode VLi mode will, if VL is truncated, *include* the current element (the one whose CR bit test failed) in the count that goes into VL. this is extremely useful for things like strncpy where you want to include the terminating zero in a copy operation. other operations which *actually* operate on entire (complete) CR Fields are usually only 3 bits for specifying the CR Field. these *would* need the index mode. it is quite... a lot of analysis. l. From luke.leighton at gmail.com Sun Aug 29 14:01:40 2021 From: luke.leighton at gmail.com (lkcl) Date: Sun, 29 Aug 2021 13:01:40 +0000 Subject: [Libre-soc-dev] [RFC] SVP64 Data-dependent fail-first on CR operations (crand, cror, etc) In-Reply-To: <2288D2DB-FFAF-4050-BD8B-721827579A91@gmail.com> References: <2288D2DB-FFAF-4050-BD8B-721827579A91@gmail.com> Message-ID: <1EAEFC8F-10C2-4DE6-9C72-E14399E4B1FF@gmail.com> On August 28, 2021 4:36:02 PM UTC, lkcl wrote: >https://libre-soc.org/openpower/sv/svp64/appendix/ >other operations which *actually* operate on entire (complete) CR >Fields are usually only 3 bits for specifying the CR Field. these >*would* need the index mode. question: is it worthwhile to use the elwidth bits of the 24-bit RM, which are meaningless for at least the result if the result is a 4 bit CR Field, to provide additional fields? this is already done for SV Branches, the VLi field is for example in elwidth bit zero in SV Branch RM. l. From richard.wilbur at gmail.com Mon Aug 30 17:07:59 2021 From: richard.wilbur at gmail.com (Richard Wilbur) Date: Mon, 30 Aug 2021 10:07:59 -0600 Subject: [Libre-soc-dev] [RFC] SVP64 Data-dependent fail-first on CR operations (crand, cror, etc) In-Reply-To: <1EAEFC8F-10C2-4DE6-9C72-E14399E4B1FF@gmail.com> References: <1EAEFC8F-10C2-4DE6-9C72-E14399E4B1FF@gmail.com> Message-ID: > On Aug 29, 2021, at 07:02, lkcl wrote: >> On August 28, 2021 4:36:02 PM UTC, lkcl wrote: >> https://libre-soc.org/openpower/sv/svp64/appendix/ > >> other operations which *actually* operate on entire (complete) CR >> Fields are usually only 3 bits for specifying the CR Field. these >> *would* need the index mode. > > question: > > is it worthwhile to use the elwidth bits of the 24-bit RM, which are meaningless for at least the result if the result is a 4 bit CR Field, to provide additional fields? > > this is already done for SV Branches, the VLi field is for example in elwidth bit zero in SV Branch RM. I think making meaningful use of resources at hand is a very good thing! Id est, to take a field which is meaningless in this context and be able to assign a useful meaning is more than worthwhile—it improves the expressiveness and POWER of the instruction set. From lkcl at lkcl.net Mon Aug 30 17:48:28 2021 From: lkcl at lkcl.net (Luke Kenneth Casson Leighton) Date: Mon, 30 Aug 2021 17:48:28 +0100 Subject: [Libre-soc-dev] [RFC] SVP64 Data-dependent fail-first on CR operations (crand, cror, etc) In-Reply-To: References: <1EAEFC8F-10C2-4DE6-9C72-E14399E4B1FF@gmail.com> Message-ID: On Mon, Aug 30, 2021 at 5:08 PM Richard Wilbur wrote: > I think making meaningful use of resources at hand is a very good thing! Id est, to take a field which is meaningless in this context and be able to assign a useful meaning is more than worthwhile—it improves the expressiveness and POWER of the instruction set. indeed. some care does have to be taken in the design, however, not to add so many overloaded meanings for different bits that the decoder is compromised for gate delay. l. From mehul at object-automation.com Mon Aug 30 21:06:45 2021 From: mehul at object-automation.com (mehul at object-automation.com) Date: Mon, 30 Aug 2021 16:06:45 -0400 Subject: [Libre-soc-dev] =?utf-8?q?=28no_subject=29?= Message-ID: Hi All! I am Mehul Nachankar. I have read the charter and agree to abide by it. From richard.wilbur at gmail.com Tue Aug 31 02:06:50 2021 From: richard.wilbur at gmail.com (Richard Wilbur) Date: Mon, 30 Aug 2021 19:06:50 -0600 Subject: [Libre-soc-dev] (no subject) In-Reply-To: References: Message-ID: <5083E6B9-A36B-4DF9-AC6B-3D1F1B7118D0@gmail.com> Welcome Mehul! Great to have you in the group! Do you have a specific interest in one or more areas of the design or a more general interest in the project? Richard From mehul at object-automation.com Tue Aug 31 09:07:13 2021 From: mehul at object-automation.com (mehul at object-automation.com) Date: Tue, 31 Aug 2021 04:07:13 -0400 Subject: [Libre-soc-dev] (no subject) In-Reply-To: <5083E6B9-A36B-4DF9-AC6B-3D1F1B7118D0@gmail.com> References: <5083E6B9-A36B-4DF9-AC6B-3D1F1B7118D0@gmail.com> Message-ID: <00c568db2e140e786da0353b0d2ecb16.squirrel@email.powweb.com> > Welcome Mehul! Great to have you in the group! Do you have a specific > interest in one or more areas of the design or a more general interest in > the project? > > Richard > _______________________________________________ > Libre-soc-dev mailing list > Libre-soc-dev at lists.libre-soc.org > http://lists.libre-soc.org/mailman/listinfo/libre-soc-dev > Hi Richard!! I have experience working with Verilog, System Verilog and basics of UVM. Furthermore, I am also eager to explore new areas. From luke.leighton at gmail.com Tue Aug 31 13:15:39 2021 From: luke.leighton at gmail.com (lkcl) Date: Tue, 31 Aug 2021 13:15:39 +0100 Subject: [Libre-soc-dev] binutils svp64 In-Reply-To: <2DC9A197-869D-46B3-A725-D04098D62550@gmail.com> References: <2DC9A197-869D-46B3-A725-D04098D62550@gmail.com> Message-ID: https://bugs.libre-soc.org/show_bug.cgi?id=550 On Wed, Aug 25, 2021 at 4:06 PM lkcl wrote: > i will add you a bugzilla account (i had to disable account creation after a lovevely bugreport about the price and availability of juniper network routers) you can then set a password. done. got the ssh key, added you with write-permission to the wiki, which you can use to test. anton i remembered this morning that contributions to binutils require a Copyright Assignment to the FSF. i will forward you the forms i enquired about. you assign copyright to the FSF, and they *reassign* your copyrighted material back to you (keeping a "copy" of those rights themselves). this is not essential / critical / blocker for *writing* patches, but it is essential for submitting them upstream, and we want, ultimately, everything to be upstream. l.