add crossplag
continuous-integration/drone/push Build is passing Details

This commit is contained in:
James Ravenscroft 2023-04-11 10:04:20 +01:00
parent 7e2f7ddedc
commit 54bba72f37
1 changed files with 2 additions and 1 deletions

View File

@ -36,7 +36,7 @@ As use of LLMs becomes more widespread and people ask it questions and use it to
## Bot Detection ## Bot Detection
There are certainly opportunities in bot vs human detection. Solutions like GPTZero [^GPTZero] and GLTR[^GLTRGlitterV0] rely on the statistical likelihood that a model would use a given sequence of words based on historical output (for example if the words "bananas in pajamas" never appear in known GPT output but they appear in the input document, the probability that it was written by a human is increased). Approaches like DetectGPT [^mitchellDetectGPTZeroShotMachineGenerated2023] use a model to perturb (subtly change) the output and compare the probabilities of the strings being generated to see if the original "sticks out" as being unusual and thus more human-like. There are certainly opportunities in bot vs human detection. Solutions like GPTZero [^GPTZero] and GLTR[^GLTRGlitterV0] rely on the statistical likelihood that a model would use a given sequence of words based on historical output (for example if the words "bananas in pajamas" never appear in known GPT output but they appear in the input document, the probability that it was written by a human is increased). Approaches like DetectGPT [^mitchellDetectGPTZeroShotMachineGenerated2023] use a model to perturb (subtly change) the output and compare the probabilities of the strings being generated to see if the original "sticks out" as being unusual and thus more human-like. ***edit: I was also contacted by Tracey Deacker - a computer science student in Reykjavik, who recommended CrossPlag[^CrossPlag] - another such detection tool.***
It seems like bot detection and evading detection are likely to be a new arms race: as new detection methods emerge, people will build more and more complex methods for evading detection or rely on adversarial training approaches to train existing models to evade new detection approaches automatically. It seems like bot detection and evading detection are likely to be a new arms race: as new detection methods emerge, people will build more and more complex methods for evading detection or rely on adversarial training approaches to train existing models to evade new detection approaches automatically.
@ -119,3 +119,4 @@ https://link.medium.com/6Bz5jc2hsyb - a blog post from an NLP professor about fi
[^ribeiroAccuracyBehavioralTesting2020]: Ribeiro, M. T., Wu, T., Guestrin, C., & Singh, S. (2020). Beyond Accuracy: Behavioral Testing of NLP Models with CheckList. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 49024912. https://doi.org/10.18653/v1/2020.acl-main.442 [^ribeiroAccuracyBehavioralTesting2020]: Ribeiro, M. T., Wu, T., Guestrin, C., & Singh, S. (2020). Beyond Accuracy: Behavioral Testing of NLP Models with CheckList. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 49024912. https://doi.org/10.18653/v1/2020.acl-main.442
[^morrisTextAttackFrameworkAdversarial2020]: Morris, J. X., Lifland, E., Yoo, J. Y., Grigsby, J., Jin, D., & Qi, Y. (2020). TextAttack: A Framework for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLP (arXiv:2005.05909). arXiv. https://doi.org/10.48550/arXiv.2005.05909 [^morrisTextAttackFrameworkAdversarial2020]: Morris, J. X., Lifland, E., Yoo, J. Y., Grigsby, J., Jin, D., & Qi, Y. (2020). TextAttack: A Framework for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLP (arXiv:2005.05909). arXiv. https://doi.org/10.48550/arXiv.2005.05909
[^knightSloppyUseMachine]: Knight, W. (n.d.). Sloppy Use of Machine Learning Is Causing a Reproducibility Crisis in Science. Wired. Retrieved 25 March 2023, from https://www.wired.com/story/machine-learning-reproducibility-crisis/ [^knightSloppyUseMachine]: Knight, W. (n.d.). Sloppy Use of Machine Learning Is Causing a Reproducibility Crisis in Science. Wired. Retrieved 25 March 2023, from https://www.wired.com/story/machine-learning-reproducibility-crisis/
[^CrossPlag]: AI Content Detector - Crossplag - https://crossplag.com/ai-content-detector/