| 年份 | 2017 |
| 學(xué)科 | 數(shù)學(xué) Mathematics |
| 國家/州 | United States of America |
An Exploration in Textual Analysis
The internet’s expansion has increased the data available to researchers, allowing analysis to make insights into human behavior. This project statistically analyzes trends in computer science by studying the ArXiv archive’s CS articles between 1989 and 2016. Analysis show that documents’ meaningful words have a quantitative significance that a machine can extract without understanding the text.
Measurements were taken from a sample of ~8,600 documents. Their words were positively weighted by their normalized frequency and negatively weighted by either their presence in other documents (TF-IDF) or by their probability to appear in the body of a web page, in two separate measurements, to reduce the noise in plain word frequencies from “structure words” like “the” and “a.” The observed probabilities of words with large weights were graphed to show their popularity during each year. Finally, vector representations of the words were used to find contextually similar words and to examine how changes in popularity may be caused by shifts in terminology.
The secondary word relevance measurement was found comparable to TF-IDF, an algorithm used by search engines to retrieve documents: 54.9% of the 30 highest weighted words (secondary measure) are also in the 30 highest TFIDF weighted words. A significant design challenge was testing the integrity of measurements as there is little control data; this was solved by a secondary dataset of words’ probability to appear in web pages. In both measurements, words with large weights are hand-verifiably “important” to documents that they appear in, so the results show that words which are meaningful to documents have an identifiable quantitative significance.
英特爾國際科學(xué)與工程大獎賽,簡稱 "ISEF",由美國 Society for Science and the Public(科學(xué)和公共服務(wù)協(xié)會)主辦,英特爾公司冠名贊助,是全球規(guī)模最大、等級最高的中學(xué)生的科研科創(chuàng)賽事。ISEF 的學(xué)術(shù)活動學(xué)科包括了所有數(shù)學(xué)、自然科學(xué)、工程的全部領(lǐng)域和部分社會科學(xué)。ISEF 素有全球青少年科學(xué)學(xué)術(shù)活動的“世界杯”之美譽,旨在鼓勵學(xué)生團隊協(xié)作,開拓創(chuàng)新,長期專一深入地研究自己感興趣的課題。
The study of the measurement, properties, and relationships of quantities and sets, using numbers and symbols. The deductive study of numbers, geometry, and various abstract constructs, or structures.
Algebra?(ALB):?The study of algebraic operations and/or relations and the structures which arise from them. An example is given by (systems of) equations which involve polynomial functions of one or more variables.
Analysis?(ANL):?The study of infinitesimal processes in mathematics, typically involving the concept of a limit. This begins with differential and integral calculus, for functions of one or several variables, and includes differential equations.
Combinatorics, Graph Theory, and Game Theory?(CGG):?The study of combinatorial structures in mathematics, such as finite sets, graphs, and games, often with a view toward classification and/or enumeration.
Geometry and Topology?(GEO):?The study of the shape, size, and other properties of figures and spaces. Includes such subjects as Euclidean geometry, non-Euclidean geometries (spherical, hyperbolic, Riemannian, Lorentzian), and knot theory (classification of knots in 3-space).
Number Theory?(NUM):?The study of the arithmetic properties of integers and related topics such as cryptography.
Probability and Statistics?(PRO):?Mathematical study of random phenomena and the study of statistical tools used to analyze and interpret data.
Other?(OTH):?Studies that cannot be assigned to one of the above subcategories. If the project involves multiple subcategories, the principal subcategory should be chosen instead of Other.


? 2025. All Rights Reserved. 滬ICP備2023009024號-1