The Theoretical Argument for Disproving Asymptotic Upper-Bounds on the Accuracy of Part-of-Speech Tagging Algorithms: Adopting a Linguistics, Rule-Based Approach

Foley, William (2016) The Theoretical Argument for Disproving Asymptotic Upper-Bounds on the Accuracy of Part-of-Speech Tagging Algorithms: Adopting a Linguistics, Rule-Based Approach. Undergraduate thesis, under the direction of Allison Burkette from Modern Languages, University of Mississippi.

[img]
Preview
Text
HonorsThesisWF.pdf

Download (569kB) | Preview

Abstract

This paper takes a deep dive into a particular area of the interdisciplinary domain of Computational Linguistics, Part-of-Speech Tagging algorithms. The author relies primarily on scholarly Computer Science and Linguistics papers to describe previous approaches to this task and the often-hypothesized existence of the asymptotic accuracy rate of around 98%, by which this task is allegedly bound. However, after doing more research into why the accuracy of previous algorithms have behaved in this asymptotic manner, the author identifies valid and empirically-backed reasons why the accuracy of previous approaches do not necessarily reflect any sort of general asymptotic bound on the task of automated Part-of-Speech Tagging. In response, a theoretical argument is proposed to circumvent the shortcomings of previous approaches to this task, which involves abandoning the flawed status-quo of training machine learning algorithms and predictive models on outdated corpora, and instead walks the reader from conception through implementation of a rule-based algorithm with roots in both practical and theoretical Linguistics. While the resulting algorithm is simply a prototype which cannot be currently verified in achieving a tagging-accuracy rate of over 98%, its multi-tiered methodology, meant to mirror aspects of human cognition in Natural Language Understanding, is meant to serve as a theoretical blueprint for a new and inevitably more-reliable way to deal with the challenges in Part-of-Speech Tagging, and provide much-needed advances in the popular area of Natural Language Processing.

Item Type: Thesis (Undergraduate)
Creators: Foley, William
Student's Degree Program(s): B.S. in Computer Science, Linguistics
Thesis Advisor: Allison Burkette
Thesis Advisor's Department: Modern Languages
Institution: University of Mississippi
Subjects: P Language and Literature > P Philology. Linguistics
T Technology > T Technology (General)
Depositing User: Will Foley
Date Deposited: 26 Apr 2016 20:05
Last Modified: 26 Apr 2016 20:05
URI: http://thesis.honors.olemiss.edu/id/eprint/495

Actions (login required)

View Item View Item