| 年份 | 2015 |
| 學科 | 機器人與智能機器 Robotics and Intelligent Machines |
| 國家/州 | United States of America |
Development of an Authorship Identification Algorithm for Twitter Using Stylometric Techniques
I developed software that implements semi-supervised learning to dramatically improve accuracy when stylometrically attributing an unidentified tweet to the correct author from a set of known Twitter authors. Existing stylometric techniques generally do not perform well on short texts. Software written in Python streamed, preliminarily processed, and stored 1000 tweets each from up to 30 prolific authors on Twitter. Traditional and flexible bigrams, as well as their frequencies of occurrence, were extracted from both the authors’ known tweets and the unknown tweet, forming each author’s profile. These bigrams were then used as tokens for a Naive Bayes classifier which returned the probability of each author having written the unknown tweet. The first, second, and third most likely authors were determined by the classifier and written as output. After repeating this process multiple times, the percent accuracy of identifying the correct author was calculated. A program was completed that would, to a significant degree of accuracy, identify the author of an unknown tweet. Furthermore, it was found that excluding retweets, using a combination of flexible and traditional bigrams, and other techniques produced the most effective algorithm for stylometrically identifying the author of a tweet. With 10 authors, the algorithm correctly identified the author of the tweet with 73 percent accuracy on the first guess and with 87 percent accuracy within the top three guesses, showcasing the potential of stylometric techniques in application to extremely short messages. Moreover, this algorithm has significant potential in investigating anonymous cyber-crimes committed over social media.
高中生科研 英特爾 Intel ISEF
資訊 · 課程 · 全程指導
請掃碼添加微信好友

[vc_btn title="聯系電話:(021) 63526628" color="black" align="center" css_animation="fadeIn" link="url:tel%3A02163526628||target:%20_blank|rel:nofollow"]
英特爾國際科學與工程大獎賽,簡稱 "ISEF",由美國 Society for Science and the Public(科學和公共服務協會)主辦,英特爾公司冠名贊助,是全球規模最大、等級最高的中學生的科研科創賽事。ISEF 的學術活動學科包括了所有數學、自然科學、工程的全部領域和部分社會科學。ISEF 素有全球青少年科學學術活動的“世界杯”之美譽,旨在鼓勵學生團隊協作,開拓創新,長期專一深入地研究自己感興趣的課題。
>>> 實用鏈接匯總 <<<
· 數學 · 物理 · 化學 · 生物 · 計算機 · 工程 ·
Studies in which the use of machine intelligence is paramount to reducing the reliance on human intervention.
Biomechanics?(BIE):?Studies and apparatus which mimic the role of mechanics in biological systems.
Cognitive Systems?(COG):?Studies/apparatus that operate similarly to the ways humans think and process information. Systems that provide for increased interaction of people and machines to more naturally extend and magnify human expertise, activity, and cognition.
Control Theory?(CON):?Studies that explore the behavior of dynamical systems with inputs, and how their behavior is modified by feedback. ?This includes new theoretical results and the applications of new and established control methods, system modelling, identification and simulation, the analysis and design of control systems (including computer-aided design), and practical implementation.
Machine Learning?(MAC):?Construction and/or study of algorithms that can learn from data.
Robot Kinematics?(KIN):?The study of movement in robotic systems.
Other?(OTH):?Studies that cannot be assigned to one of the above subcategories.?If the project involves multiple subcategories, the principal subcategory should be chosen instead of Other.

? 2025. All Rights Reserved. 滬ICP備2023009024號-1