1) Sentence completion
task using web-scale data
We propose a method to
automatically answer SAT-style sentence completion questions using web-scale
data. Web-scale da-ta have been used in many language studies and have been found to be a very useful resource for improving
accuracy in sentence completion task. Our method employs assorted N-gram
probability information for each candidate word. We also proposed back-off
strategy was used to remove zero probabilities. We found that the accuracy of
our proposed method improved by 52-87% over the current state-of-the-art.
2) Semantic graph based approach for text mining
A semantic network is a graphical notation, for representing knowledge in form of interconnected nodes and arcs. In this paper we propose a novel approach to construct a semantic graph from a text document. Our approach considers all the nouns of a document and builds a semantic graph, such that it represents entire document. We think that our graph captures many properties of the text documents and can be used for different application in the field of text mining and NLP, such as keyword extraction and to know the nature of the document. Our approach to construct a semantic graph is independent of any language. We performed an experimental analysis to validate our results to extract keywords of document and to derive nature of graph. We present the experimental result on construction of graph on FIRE data set and present its application for keyword extraction and commenting on the nature of document.
3) Text Simplification Tools: Using Machine Learning to Discover
Features that Identify Difficult Text
Although providing
understandable information is a critical component in healthcare, few tools
exist to help clinicians identify difficult sections in text. We systematically
examine sixteen features for predicting the difficulty of health texts using
six different machine learning algorithms. Three represent new features not
previously examined: medical concept density, specificity (calculated using
word-level depth in MeSH); and ambiguity (calculated using the number of UMLS
Metathesaurus concepts associated with a word). We examine these features for a
binary prediction task on 118,000 simple and difficult sentences from a
sentence-aligned corpus. Using all features, random forests is the most
accurate with 84% accuracy. Model analysis of the six models and a
complementary ablation study shows that the specificity and ambiguity features
are the strongest predictors (24% combined impact on accuracy). Notably, a
training size study showed that even with a 1% sample (1,062 sentences) an
accuracy of 80% can be achieved.
Dhiya:
A stemmer for morphological level analysis of Gujarati language
To understand a language,
analysis has to be done at word level, sentence level, context level and
discourse level. Morphological analysis comes at the base of all, as it is the
first step to understand a given sentence. One of the tasks that can be done at
morphological level is stemming. To identify the stem term of a given word is
stemming. Stemming is one of the important activities which is not just related
to Natural Language Processing domain, but is equally important in Information Retrieval domain. In
this paper, authors suggest DHIYA a stemmer for Gujarati language. This
stemmer is based on the morphology of Gujarati language. To
develop the stemmer, inflections which appeared most in Gujarati text were
identified. Based on it, the rule set was created. For training and evaluation
of the stemmer's performance the EMILLE corpus is used. The accuracy of the
stemmer is 92.41%.
Medical Document Classification Based on
MeSH
One of the most challenging projects in information systems is extracting information from unstructured texts, including medical document classification. I am developing a classification algorithm that classifies a medical document by analyzing its content and categorizing it under predefined topics from the Medical Subject Headings (MeSH). I collected a corpus of 50 full-text journal articles (N=50) from MEDLINE, which were already indexed by experts based on MeSH. Using natural language processing(NLP), my algorithm classifies the collected articles under MeSH subject headings. I evaluated the algorithm's outcome by measuring its precision and recall of resulting subject headings from the algorithm, comparing results to the actual documents' subject headings. The algorithm classified the articles correctly under 45% to 60% of the actual subject headings and got 40% to 53% of the total subject headings correct. This holds promising solutions for the global health arena to index and classify medical documents expeditiously.
Korean-Thai
Lexicon for Natural Language Processing
This paper presents
Korean-Thai lexicon. This research aims to study and collect necessary features
to construct the Korean-Thai lexicon for natural language processing (NLP) and speech processingresearches.
The research method used for study was that of (1) creating Korean-Thai lexicon
consisting of 7 parts : Korean words, Korean Revised Romanization, part of
speech, sub part of speech, special characteristic, Thai meaning and description
of meaning (2) Korean transcription. According to lack of useful tools for the
Korean- Thai machine translation, therefore we have a proposal for creating
Korean-Thai lexicon for machine translation. The Korean-Thai lexicon consists
of 36,000 Korean words. As it would take a lot of time and effort to gather
enough Korean words to cover all domains, Korean Revised Romanization was
applied for some words such as terminology, names and places.
Analysis
of stock market using text mining and natural language processing
Stock market has become one of the major components of economy
not only in developed countries but also in third world developing countries.
Making decision in stock market is not really easy because a lot of factors are
involved with every choice we make. Therefore, a lot of analysis is required to
make an optimal move on stock market which may involve price trend, market's
nature, company's stability, different news and rumors about stocks etc. The
objective of this study is to extract fundamental information from relevant
news sources and use them to analyze or sometimes forecast the stock market
from the common investor's viewpoint. We surveyed the existing business text
mining researches and proposed a framework that uses our text parser and
analyzer algorithm with an open source natural language processing tool to
analyze (machine learning and text mining), retrieve (naturallanguage processing), forecast (compare with
historic data) investment decisions from any text data source on stock market.
For our research we used the data of Dhaka Stock Exchange (DSE), capital market
of Bangladesh.
CyberMate
∼
Artificial Intelligent business help desk assistant with instance messaging
services
Most of the existing Artificial Intelligent
software agents are based on a single specific purpose whereas CyberMate is a
multipurpose AI agent based on pattern recognition with Chatterbot approach to interact with remote users. The system introduces
a new technology to automate customer care services by hosting business
information and technical support to address inquiries by the end users.
CyberMate allows remote end users to connect via Instant Messaging services and
get requested information using English, Tamil, Sinhala language or any natural
language. The system can act as a dedicated AI agent between client and the
service hosting party that can provide information via instance messages. This
is a much faster and familiar solution than hosting a Frequently Asked
Questions (FAQ) page as a part of a business website and even the system can
reduces the work load of the customer care service waiting queue, which leads
to decrease the customer's dissatisfaction and make service accessible from any
device which has Internet connectivity. The business information can be easily
modeled into the CyberMate system using the newly introduced CyberMate
Scripting Language (CSL) and its own development environment (CSL-IDE).
On
novel chatterbot system by means of web information
Recently, the use of various chatterbots has
been proposed to simulate conversation with human users. Several chatterbots
can talk with users very well without a high-level contextual understanding.
However, it may be difficult for chatterbots to reply to specific and
interesting sentences because chatterbots lack intelligence. To solve this
problem, we propose a novel chatterbot that can directly use Web information. We carried out
computational experiments by applying the proposed chatterbot to “2channel” (2ch) and “Twitter”.
CHARLIE:
An AIML-based chatterbot which works as an interface among INES and humans
INES (INtelligent Educational System) is a
functional prototype of an online learning platform, which combines three
essential capabilities related to e-learning activities. These capabilities are
those concerning to a LMS (learning management system), a LCMS (learning
content management system), and an ITS (intelligent tutoring system). To carry
out all this functionalities, our system, as a whole, comprises a set different
tools and technologies, as follows: semantic managing users (administrators,
teachers, students...) and contents tools, an intelligent chatterbot able to communicate with students in natural language, an
intelligent agent based on BDI (believes, desires, intentions) technology that
acts as the brain of the system, an inference engine based on JESS (a rule
engine for the Java platform) and ontologies (to modelled the user, his/her
activities, and the learning contents) that contribute with the semantics of
the system, etc. At the present paper we will focus on the chatterbot, CHARLIE (CHAtteR
Learning Interface Entity), developed and used in the platform, which is an
AIML-based (artificial intelligence markup language) bot. We will specifically
address its performance and its contribution to INES.
Extending
chatterbot system into multimodal interaction framework with embodied
contextual understanding
This work aims to realize multimodal
interaction with embodied contextual understanding based on the simple chatterbot system. A system framework is proposed to integrate the
dialogue system into a 3D simulation platform, SIGVerse to attain multimodal
interaction. The chatterbot AIML implementations are described in the achievement of the
conversations with embodied contextual understanding in HRI simulations.
No comments:
Post a Comment