[Libre-soc-bugs] [Bug 602] low performance bare minimum functionality SIMD emulator required

bugzilla-daemon at libre-soc.org bugzilla-daemon at libre-soc.org
Mon Jun 7 11:06:05 BST 2021


--- Comment #16 from Jacob Lifshay <programmerjake at gmail.com> ---
(In reply to Luke Kenneth Casson Leighton from comment #15)
> (In reply to Jacob Lifshay from comment #14)
> > I ended up looking through Wikipedia's list of OCR programs, and I noticed
> > Tessarect (and several others) supports outputting to hOCR format, an
> > HTML-based format, which seems like it would be waay easier to parse than
> > trying to manually roll-your-own text column/row/formatting detector based
> > on Octave and FFTs
> my feeling is it's better to let richard do what he's doing.
> also in this particular case we don't need to know the contents of
> the formatting: all that is needed is the XY WidthHeight to pass
> to the OCR to extract the required text.

yeah, mostly posting the above for richard's benefit -- if it already extracts
the required information into the hOCR, why duplicate the logic when you can
just use a xml processor and save tons of effort?

You are receiving this mail because:
You are on the CC list for the bug.

More information about the libre-soc-bugs mailing list