sigmoid.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
A social space for people researching, working with, or just interested in AI!

Server stats:

586
active users

#ner

0 posts0 participants0 posts today

[LangExtract](developers.googleblog.com/en/i) has got me curious, but I don't get what makes it different from a [spacy-llm/prodigy](prodi.gy/docs/large-language-m) setup. Is it just that I am spared the effort of chunking long input and/or constructing output JSON from entities and offsets by writing the corresponding python code myself?...

Ah, one more difference is that langextract is #OpenSource whereas prodigy is not (?). (On the other hand, prodigy has a better integration with a correction+training workflow.)

developers.googleblog.comIntroducing LangExtract: A Gemini powered information extraction library- Google Developers BlogExplore LangExtract: a Gemini-powered, open-source Python library for reliable, structured information extraction from unstructured text with precise source grounding.

🔠 Panel: More than Chatbots: Multimodal Large Language Models in Humanities Workflows

At #DHd2025, Nina Rastinger explores how well #AI handles abbreviations & NER:

✅ NER works well, even with small, low-cost models
❌ Abbreviations are tricky—costs & resource demands skyrocket
🚀 GPT o1 improves performance, even on abbreviations, but remains resource-intensive
Balancing accuracy & efficiency in text processing remains a challenge! ⚖️

🥁 We are happy to announce that we just published our first preprint on arXiv: "NER4all or Context is All You Need: Using LLMs for low-effort, high-performance NER on historical texts. A humanities informed approach".🎉

👉 arxiv.org/abs/2502.04351 👈

It is also our first endevour into collaborative work with such a large number of collaborators & contributors from the Chair of Digital History, NFDI4Memory's Methods Innovation Lab, & AI-Skills.

arXiv.orgNER4all or Context is All You Need: Using LLMs for low-effort, high-performance NER on historical texts. A humanities informed approachNamed entity recognition (NER) is a core task for historical research in automatically establishing all references to people, places, events and the like. Yet, do to the high linguistic and genre diversity of sources, only limited canonisation of spellings, the level of required historical domain knowledge, and the scarcity of annotated training data, established approaches to natural language processing (NLP) have been both extremely expensive and yielded only unsatisfactory results in terms of recall and precision. Our paper introduces a new approach. We demonstrate how readily-available, state-of-the-art LLMs significantly outperform two leading NLP frameworks, spaCy and flair, for NER in historical documents by seven to twentytwo percent higher F1-Scores. Our ablation study shows how providing historical context to the task and a bit of persona modelling that turns focus away from a purely linguistic approach are core to a successful prompting strategy. We also demonstrate that, contrary to our expectations, providing increasing numbers of examples in few-shot approaches does not improve recall or precision below a threshold of 16-shot. In consequence, our approach democratises access to NER for all historians by removing the barrier of scripting languages and computational skills required for established NLP tools and instead leveraging natural language prompts and consumer-grade tools and frontends.

ReadMe2KG: Github ReadMe to Knowledge Graph has been published as part of the Natural Scientific Language Processing and Research Knowledge Graphs workshop co-located with . This task aims to complement the NDFI4DataScience KG via information extraction from GitHub README files.

task description: nfdi4ds.github.io/nslp2025/doc
website: codabench.org/competitions/539

@eswc_conf @GenAsefa @shufan @NFDI4DS