sigmoid.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
A social space for people researching, working with, or just interested in AI!

Server stats:

658
active users

#spaCy

0 posts0 participants0 posts today
Zoe Tucker und Kristian Allen von der UCLA Library haben auf der Code4Lib 2024 eine #OpenSource #Metadaten-Extraktions-Pipeline zur automatischen #Erschließung von Digitalisaten mit komplexen Layouts vorgestellt.
https://yewtu.be/watch?v=tujc_9nVg3o&t=10445
In einer zweiten Iteration haben sie sich für die Kombination folgender Komponenten entschieden, um bessere Ergebnisse zu erzielen: PaddleOCR (statt #Tesseract) für #OCR, Amazon Science ReFinED (statt #spaCy) für #NER und Ollama (statt #ChatGPT und #Gemini) für die Metadaten-Generierung in Dublin Core oder MODS.
Das experimentelle Toolkit steht auf GitHub als Docker-Container mit Jupyter Lab Umgebung bereit und wurde in Python umgesetzt: https://github.com/UCLALibrary/metadata-extraction-lab
#KIinBibliotheken #Bibliotheken #GenerativeKI #LLMs #KI #Erschliessung #Katalogisierung #c4l24

[SOLVED] Please help, dear corpus and computational L friends! Is there a #multilingual #model for #TreeTagger, even with a very basic tagset?

I would like to annotate lemma + POS in a #corpus of short #texts in 3-4 European #languages (mainly #German, #English, #French) within #TXM, a process that requires using TreeTagger.

I know I could do that with #spaCy, selecting the right model for each text. But then I need to get those #annotations into shape for import into TXM.

#EasyWayOut?

Continued thread

Prys: Two types of outputs: language tech building blocks that companies and others can use (distributed under permissive open source licenses) and end-user products (e.g. apps) for the language.

These tools from techiaith.cymru/ have also been integrated into #Spacy (spacy.io/).

They also have a Welsh virtual assistant called Macsen which you can run on iOS or Android to interact with your phone by voice in Welsh.

Long overdue 💖

Hi, I'm Ines, the co-founder and CEO of @explosion – we're the makers of the library , the Prodigy annotation tool and more. I love working on applied real-world NLP, open-source and developer experience.

👩‍💻 My Website: ines.io
🧠 Applied NLP Thinking: explosion.ai/blog/applied-nlp-
📺 How to Ignore Most Startup Advice and Build a Decent Software Business: youtube.com/watch?v=74AsJ7RET2

ines.ioines.ioI'm a software developer working on Artificial Intelligence and Natural Language Processing technologies, and the co-founder and CEO of Explosion, makers of the popular NLP library spaCy and Prodigy, a modern annotation tool for machine learning.