At #EMNLP2023, our colleague Jonathan Tonglet presented his master thesis, conducted at the KU Leuven. Find out more about »SEER : A Knapsack approach to Exemplar Selection for In-Context HybridQA« in this thread :
A group photo from the poster presentation of »AmbiFC: Fact-Checking Ambiguous Claims with Evidence«, co-authored by our colleague Max Glockner, @ievaraminta, James Thorne, Gisela Vallejo, Andreas Vlachos and Iryna Gurevych. #EMNLP2023
A successful #EMNLPMeeting has come to an end! A group photo of our colleagues Yongxin Huang, Jonathan Tonglet, Aniket Pramanick, Sukannya Purkayastha, Dominic Petrak and Max Glockner, who represented the UKP Lab in Singapore! #EMNLP2023
You can find our paper here: https://arxiv.org/abs/2311.00408
and our code here: https://github.com/UKPLab/AdaSent
Check out the work of our authors Yongxin Huang, Kexin Wang, Sourav Dutta, Raj Nath Patel, Goran Glavaš and Iryna Gurevych! (6/) #EMNLP2023 #AdaSent #NLProc
What makes the difference ?
We attribute the effectiveness of the sentence encoding adapter to the consistency between the pre-training and DAPT objectives of the base PLM. If the base PLM is domain-adapted with another loss, the adapter won’t be compatible any more, reflected in a performance drop. (5/) #EMNLP2023
AdaSent decouples DAPT and SEPT by storing the sentence encoding abilities into an adapter, which is trained only once in the general domain and plugged into various DAPT-ed PLMs. It can match or surpass the performance of DAPT→SEPT, with more efficient training. (4/) #EMNLP2023
Domain-adapted sentence embeddings can be created by applying general-domain SEPT on top of a domain-adapted base PLM (DAPT→SEPT). But this requires the same SEPT procedure to be done on each DAPT-ed PLM for every domain, resulting in computational inefficiency. (3/) #EMNLP2023
In our #EMNLP2023 paper we demonstrate AdaSent's effectiveness in extensive experiments on 17 different few-shot sentence classification datasets! It matches or surpasses the performance of full SEPT on DAPT-ed PLM (DAPT→SEPT) while substantially reducing training costs. (2/)
Need a lightweight solution for few-shot domain-specific sentence classification?
We propose #AdaSent! Up to 7.2 acc. gain in 8-shot classification with 10K unlabeled data
Small backbone with 82M parameters
Reusable general sentence adapter across domains
(1/) #EMNLP2023
Which factors shape #NLProc research over time? This was the topic of the talk by our colleague Aniket Pramanick at #EMNLP2023!
Learn more about the paper by him, Yufang Hou, Saif M. Mohammad & Iryna Gurevych here: https://arxiv.org/abs/2305.12920
If you are around at #EMNLP2023, look out for our colleague Sukannya Purkayastha, who presented today our paper on the use of Jiu-Jitsu argumentation in #PeerReview, authored by her, Anne Lauscher (Universität Hamburg) and Iryna Gurevych.
Check out the full paper on arXiv and the code on GitLab – we look forward to your thoughts and feedback! (9/9) #NLProc #eRisk #EMNLP2023
Paper https://arxiv.org/abs/2211.07624
Code https://gitlab.irlab.org/anxo.pvila/semantic-4-depression
We also illustrate how our semantic retrieval pipeline provides interpretability of the symptom estimation, highlighting the most relevant sentences. (8/) #EMNLP2023 #NLProc
Our approaches achieve good performance in two Reddit benchmark collections (DCHR metric). (7/) #EMNLP2023 #NLProc
With this aim, we introduce two data selection strategies to detect representative sentences, both unsupervised & semi-supervised.
For the latter, we propose an annotation schema to obtain relevant training samples. (6/) #EMNLP2023