[Libre-soc-bugs] [Bug 602] low performance bare minimum functionality SIMD emulator required

bugzilla-daemon at libre-soc.org bugzilla-daemon at libre-soc.org
Mon Jun 7 11:22:39 BST 2021


--- Comment #18 from Jacob Lifshay <programmerjake at gmail.com> ---
(In reply to Luke Kenneth Casson Leighton from comment #17)
> (In reply to Jacob Lifshay from comment #16)
> >
> > yeah, mostly posting the above for richard's benefit -- if it already
> > extracts the required information into the hOCR, why duplicate the logic
> > when you can just use a xml processor and save tons of effort?
> because richard's efforts are only about 1000 lines long.

well, assuming you can use something like `jq` but for xml, it could be like 3
lines of code:
use imagemagick or something to convert pdf to list of png images
use tessarect or similar to convert pngs to hOCR
use jq-like program to extract right part

now you have the text for all the sections you care about

You are receiving this mail because:
You are on the CC list for the bug.

More information about the libre-soc-bugs mailing list