  96                 # look up the output bit in the lookup table
  97                 bit_o = Signal()
  98                 comb += bit_o.eq(Mux(bit_b,
  99                                      Mux(bit_a, lut[3], lut[1]),
 100                                      Mux(bit_a, lut[2], lut[0])))

hmm.  this should not have been done this way: it is manual construction
(replication) of a pmux.

that prevents the HDL tools from identifying a pmux opportunity, which
would in turn mean that if an optimised ASIC-grade pmux Cell is ever
created, the tools would not use it.

this code should be replaced with:

     lut_array = Array([lut[i] for i in range(4)]
     bit_o.eq[lut_array[Cat(bit_b, bit_a))

or, better, the use of the LUT module once converted to (much simpler)
use of Array.

