---
date: '2022-11-23T20:47:05'
hypothesis-meta:
  created: '2022-11-23T20:47:05.414293+00:00'
  document:
    title:
    - 'Towards Automatic Curation of Antibiotic Resistance Genes via Statement Extraction
      from Scientific Papers: A Benchmark Dataset and Models'
  flagged: false
  group: __world__
  hidden: false
  id: _Vj2omtvEe2z-rfNY4eZiw
  links:
    html: https://hypothes.is/a/_Vj2omtvEe2z-rfNY4eZiw
    incontext: https://hyp.is/_Vj2omtvEe2z-rfNY4eZiw/aclanthology.org/2022.bionlp-1.40.pdf
    json: https://hypothes.is/api/annotations/_Vj2omtvEe2z-rfNY4eZiw
  permissions:
    admin:
    - acct:ravenscroftj@hypothes.is
    delete:
    - acct:ravenscroftj@hypothes.is
    read:
    - group:__world__
    update:
    - acct:ravenscroftj@hypothes.is
  tags:
  - prompt-models
  - NLProc
  target:
  - selector:
    - end: 1532
      start: 444
      type: TextPositionSelector
    - exact: "Antibiotic resistance has become a growingworldwide concern as new resistance\
        \ mech-anisms are emerging and spreading globally,and thus detecting and collecting\
        \ the cause\u2013 Antibiotic Resistance Genes (ARGs), havebeen more critical\
        \ than ever. In this work,we aim to automate the curation of ARGs byextracting\
        \ ARG-related assertive statementsfrom scientific papers. To support the researchtowards\
        \ this direction, we build SCIARG, anew benchmark dataset containing 2,000\
        \ man-ually annotated statements as the evaluationset and 12,516 silver-standard\
        \ training state-ments that are automatically created from sci-entific papers\
        \ by a set of rules. To set upthe baseline performance on SCIARG, weexploit\
        \ three state-of-the-art neural architec-tures based on pre-trained language\
        \ modelsand prompt tuning, and further ensemble themto attain the highest\
        \ 77.0% F-score. To the bestof our knowledge, we are the first to leveragenatural\
        \ language processing techniques to cu-rate all validated ARGs from scientific\
        \ papers.Both the code and data are publicly availableat https://github.com/VT-NLP/SciARG."
      prefix: g,clb21565,lifuh}@vt.eduAbstract
      suffix: 1 IntroductionAntibiotic resista
      type: TextQuoteSelector
    source: https://aclanthology.org/2022.bionlp-1.40.pdf
  text: The authors use prompt training on LLMs to build a classifier that can identify
    statements that describe whether or not micro-organisms have antibiotic resistant
    genes in scientific papers.
  updated: '2022-11-23T20:47:05.414293+00:00'
  uri: https://aclanthology.org/2022.bionlp-1.40.pdf
  user: acct:ravenscroftj@hypothes.is
  user_info:
    display_name: James Ravenscroft
in-reply-to: https://aclanthology.org/2022.bionlp-1.40.pdf
tags:
- prompt-models
- NLProc
- hypothesis
type: annotation
url: /annotation/2022/11/23/1669236425

---


 <blockquote>Antibiotic resistance has become a growingworldwide concern as new resistance mech-anisms are emerging and spreading globally,and thus detecting and collecting the cause– Antibiotic Resistance Genes (ARGs), havebeen more critical than ever. In this work,we aim to automate the curation of ARGs byextracting ARG-related assertive statementsfrom scientific papers. To support the researchtowards this direction, we build SCIARG, anew benchmark dataset containing 2,000 man-ually annotated statements as the evaluationset and 12,516 silver-standard training state-ments that are automatically created from sci-entific papers by a set of rules. To set upthe baseline performance on SCIARG, weexploit three state-of-the-art neural architec-tures based on pre-trained language modelsand prompt tuning, and further ensemble themto attain the highest 77.0% F-score. To the bestof our knowledge, we are the first to leveragenatural language processing techniques to cu-rate all validated ARGs from scientific papers.Both the code and data are publicly availableat https://github.com/VT-NLP/SciARG.</blockquote>The authors use prompt training on LLMs to build a classifier that can identify statements that describe whether or not micro-organisms have antibiotic resistant genes in scientific papers.