This is the syntaxis of #Tesseract-OCR
tesseract IMAGEFILE TEXTFILE [OPTIONS]
It is that simple.
The resulting text file may not be perfect, and you may have to tweak it somewhat.
See the man page for more options.
Hat schon mal jemand mit #imagemagick ein Bild von einem digitalen Display für #tesseract OCR aufbereitet? Wie sähe eine näherungsweise sauber arbeitende Zeile für #convert hinsichtlich Schärfung, Kontrastverbesserung und Graustufenkonversion aus? Retoot gerne gesehen.
Every now & then, I give #ChatGPT a scan of my handwriting to test its skills in working with #handwrittentexts. Initially, it responded that it could not process the scans or gave me entirely fictional output, but today it got almost everything right. These results are better than those I achieved with #HWR models in #Tesseract & #OCR4all without additional training. I also asked ChatGPT what it "thought" about my writing & it called it "consistently shaped & large with stylistic strokes."
Looks like #TheDearHunter is coming back to Europe this year at #BeProgMyFriend in Spain and #Euroblast in Germany, both in September. Alongside #TesseracT!
I think "A Haven with Two Faces" from the new Spiritbox album "Tsunami Sea" is just awesome!
I've mentioned it before, but I really like the drift into atmospheric, progressive music and it reminds me of TesseracT, one of my favourite bands
#WSL is nice because I can use #OCRmyPDF on #Ubuntu. I set it up to watch a folder for any new #PDF then automatically #deskew #rotate #OCR then #export to a "done" folder. It is very nice to have it done automatically in the background. No more opening, clicking to OCR and waiting on the software and unable to open other PDFs. Plus, this process is way lighter on resources. Man, I love #OpenSource.
#Tesseract
I’m still annoyed with the state of #OCR in #Linux (or #FLOSS in general). Not that the need for OCR’ing hasn’t diminished by the years, as more and more of publications are already in electronic form, but every once in a while a need arises. #Tesseract’s quality is #abysmal (and not in Joey’s sense). #ABBYYFineReader used to be the best in #Windows, and once upon a time they provided a #CLI-usable OCR engine for Linux too, but not any more. #atkjuttuja #computers
#OpenSource Programm I need.
1.some sort of an apple tags like variant for the open source world ( best is file manager from #elementaryos at this point but it only support tagging 8 colours no #
(Nice to look at automation like the Mac #hazel or Mac #defaultfolderx)
2.and #preview replacement ( pdf and other files reader with most of the pro features and some sort of working #ocr ( possibly a gui of #tesseract ? ) for #Linux and #android preferably (best I found was #pdfsambasic)
@SchnDa If you are thinking less GUI and more workflow then you might also want to check #ocrd https://ocr-d.de/. Simplified it provides an abstraction over several #ocr tools including #tesseract, #calamari, #kraken, ... to build customized ocr workflows.
Hi #histodons,
I need your expertise. We want to integrate an #opensource #ocr tool into our #useGalaxy Platform so you can better analyse your texts, etc.
I worked with #tesseract some years ago, and I heard about #ocr4all.
Do you have experience with any of these - or other recommendations?
We are also integrating #tranksribus via API but want another ocr-specific option.
Looking forward to your experiences!
Mit ImageToolbox¹ lässt sich der Text aus Bildern extrahieren!
Umlage Tool of verwendet Tesseract² für die OCR.
Beide Werkzeuge sind Freie Software unter der GPL Lizenz der FSF³.
¹ https://github.com/T8RIN/ImageToolbox
Für alle die Bilder mit Textpassagen einstellen!
² https://github.com/tesseract-ocr/tesseract
³ https://de.m.wikipedia.org/wiki/GNU_General_Public_License
#ImageToolbox #Tesseract
#TesseracT, #Leprous, #GreenLung and #Sungazer among others announced for this year's #ArcTanGent. Tempting
@lavaeolus have worked with #tesseract but not via a GUI - Thanks for this.
#DeepStash tries to be an alternate to algorithmic #SocialMedia by providing small bites of information from books, articles, and/or quotes. It has the ability to bookmark content, though it is limited for free accounts.
Alternately one can just screenshot the specific chunks of data and potentially make one's own filtered timeline if one combines the data from the screenshots into #Anki. This can either be done directly by using the images or one can quickly extract the text via #tesseract and some #python to generate a #csv file which can be used as an import into Anki.
#TIL that it looks like #tesseract is preinstalled on #fedora.
This means I do not need to battle with #python to extract quotes from images and instead can do it all via #bash like
```
for f in *.png; do tesseract "$f" "${f%.*}"; done
```
within the specific directory