Demystifying Text Generation Approaches

Table of contents

1. I. INTRODUCTION

Natural Language Processing (NLP) is a subfield of Artificial Intelligence that is focused on enabling computers to understand and process human languages, to get computers closer to a human level understanding of language. Humans have been writing things down for thousands of years. Over that time, our brains have gained tremendous amount of data and experience in understanding natural language. [9] The goal of NLP is to accomplish human like language processing. It is a theoretically motivated range of computational techniques. There are various applications such as Machine translation, Speech synthesis, Automatic summarization, word processing, Text Prediction, Dialogue systems, Named Entity Recognization, Story understanding, Language teaching and assistive computing.

The steps for generating text is divided in to four phases. First is dataset collection, second one is cleaning of that dataset, third one is loading of cleaned text and the final one is generating text.

In 2016, Artificial Intelligence has generated movie script "Sun spring" created by Ross Goodwin and also directed by Oscar Sharp. It was written by program called Jetson which is called Benjamin. Benjamin's other films are "Zone out" and "This wild." In addition, A new chapter of famous series "Harry Potter" by J. K. Rowling had been published by Botnik studios titled as "Harry Potter and the Portrait of What Looked Like a Large Pile of Ash." There are many songs which are generated by Artificial Intelligence such as "Daddy's car" and "Break free". The other experiment is Wikipedia text generation. The poems can also be generated by Artificial Intelligence. In Chinese literature, poems have been generated by AI. The connectionist models are used to model the aspects of human perceptions such as, cognition and behavior, learning process under such behaviors and storage and information retrieval from memory. The Neural Networks, which are a sub part of connectionist models, are nothing but a model that mimics how human brain works. We will discuss how these neural networks are useful for generating text.

2. II. RELATED WORK

Alex Graves (2014) [1] emphasized on demonstrating that LSTM can use it's memory to generate complex, realistic sequences containing long range structure. In this paper, Alex Graves has taken an approach for generating sequence is for text. He had also shown that how recurrent neural networks can be used to generate complex sequences with long range structure, simply by predicting one data point at a time. In this paper, he had shown that how Recurrent Neural Networks can be trained for sequence generation by processing real data sequences one step at a time, and predicting what comes next. Here predictions are assumed probabilistic and it is also assumed that sequences can be generated from a trained network by iteratively sampling from the network's output and then feeding in the sample as input at the next step. It has been stated in paper that in practice, standard Recurrent Neural Networks are not able to store information about past inputs for very long.The word level Recurrent Neural Network performed better than character level network but that gap appeared close when regularizations are used. [2] has given a review about recurrent neural networks regarding how they learn sequences. The Recurrent Neural Networks are connectionist models. The connectionist models are used to model the aspects of human perception, cognition and behavior, learning process under such behaviors and storage and their retrieval of information from memory.

3. Lipton et al. (2015)

The neural networks are powerful learning models that give the state-of-the-art results in a wide range of supervised and unsupervised machine learning tasks. But standard neural networks are having limitations, too. In that, there is no dependency between the concurrent states or layers. So when data is related through time or space, these network models are not useful. The examples of such data are frames from video, audio snippets, words pulled from sentences.

Thus, Recurrent Neural Network's requirement came in to picture. Because they are connected through time, all the data that is related through time can be modeled. The recurrent neural network is depicted in figure (1). [3] has proposed two convolutional neural networks models for matching two sentences, by adapting the convolutional strategy in vision and speech. The proposed models not only depicts the hierarchical structures (structure of sentences in which phrases are nested in phrases) of sentences with their layer-by-layer composition and pooling, but also can capture the rich matching patterns at different levels. A successful sentence-matching algorithm needs to capture the whole structure including the internal structures of sentences and also rich patterns in their interactions. Kalchbrenner(2014)

4. Zhengdong et al. (2014)

5. III. METHODOLOGY

When any writer or poet determines to write about any particular topic, he/she has to gather abundant knowledge about that topic. That knowledge will work as raw material for building a new block. So, from that raw material he/she will be able to write new things about that topic, which will be proprietary. This process of generating new text will be same for the computer as of humans. Text Generation is a part of Natural Language Generation. The Neural Networks are used to model these facilities in the computers.

The connectionist models are used to model the aspects of human perceptions such as, cognition and behavior, learning process under such behaviors and storage and information retrieval from memory .The Neural Networks , which are a sub part of connectionist models, are nothing but a model that mimics how human brain works.

Basically, in a supervised learning ANN (Artificial Neural Network) plays an important role. If we compare it to the human brain, then we can assume that ANN works as temporal lobe, CNN (Convolutional Neural Network) works as a occipital lobe and RNN(Recurrent Neural Network.) works as a frontal lobe of the brain.

The ANNs are very powerful tool to learn machine perception tasks and gives various state-of-the-art results in a wide range of supervised and unsupervised machine learning tasks. But the standard neural networks have a major shortcoming i.e. the current output is independent of previous output. Which is not advantageous to our definition. Humans have context about things, so he/she can get the meaning of new things. When we are reading text book of any subject, if we have understood previous paragraph, then and then only we are able to understand the current paragraph. So we can reach to the conclusion that our current output is dependent on the previous one. So for our definition, RNNs are very helpful which address this issue. In these, networks have many loops which allow information to remain in it.

Basically, in these networks, neurons are connecting to themselves through time. So that they have memory which is short-term, but they can remember what was just happened in the previous neuron or layer. Which helps our definition to generate the sequences? The representation of RNN is as following.

6. IV. RESULTS AND DISCUSSION

The

Figure 1. Volume 23 |
23
Figure 2.
Figure 3. Fig 2 :
2
Figure 4. Fig 3 :
3
Figure 5.
Figure 6.
et al, have described convolutional architecture dubbed the Dynamic Convolutional Neural Network for semantic modeling of sentences. The network uses Dynamic K-max pooling, a global pooling operation over a linear sequences. The main aim of this paper is to analyze and represent thesemantic content of a sentence for a purpose of classification or generation. Manurang et al. (2012) [4]has implemented system, McGonagall which uses genetic algorithm to construct text. In this paper, the main goal of authors is to generate texts which are syntactically well formed, meet certain pre specified patterns of metre and convey some meaning. They have proved that if some constraints on metre were relaxed, then their experiments can generate relatively meaningful text. The poetry generation involves many aspects of languages so automatic generation of such poetic text is challenging. They have set some restricted definition of poetry as a text that embodies meaningfulness, grammaticality and poeticness. Malmi et al. (2016) [5]focus on generating rap lyrics They have given model which is based on two machine learning techniques: 1). The RankSVM algorithm 2).Automatic Rhyme Detection. They have divided next line prediction problem in to three groups that are rhyming, structural similarity and semantic similarity. They have generated tool called Deepbeat.org which generates rap lyrics by giving a key word as input. Wei et al. (2018) [6]have tried to generate Classical Chinese poetry, which often incorporates expressive folk influences filtered through the minds of Chinese poets, which consistently has been held in extremely high regard in china. In this paper, they have proposed a Poet based Poetry generation method which generates poems by controlling not only content selection but also poetic style factor. They have done studies that improves the content quality issues of poetic generation system. PoetPG framework takes the content of current line and poet's name as input and then generates a poem in the following two stages: Poetic Style Model, Poem generation.
Figure 7.
V. CONCLUSION AND FUTURE SCOPE
from going through various methods used in
various papers, we can conclude that, there are
different methods available for modelling
sequence of words
1

Appendix A

  1. , P. T
  2. , christopher c. olah's blog
  3. Generating Sequences with Recurrent Neural Networks. A Graves . Computing Research Repository-CoRR ArXiv 2014.
  4. Using genetic algorithms to create meaningful poetic text. G R H T Ruli Manurang . Journal of Experimental & Theoritical Artificial Intelligence 2013. 24 p. .
  5. Hitch Haiku : An Interactive Supporting System for Composing Haiku Poem. H O M M Naoko Tosa . International Fedaration for Information Processing, 2008. p. .
  6. Knowledge Discovery and Data Mining,Association for Computer Machinery, H T T R A Eric Malmi . 2016. p. . (DopeLearning: A Computational Approach to Rap Lyrics Generation)
  7. A Critical Review of Recurrent Neyral Networks for Sequence Learning. J B C E Zachary , C Lipton . Computer Research Repository-arXiv 2015.
  8. Poet-based Poetry Generation: Controlling Personal Style with Recurrent Neural Networks. Q Z Y C Wei . 2018 workshop on computing, Networking and Communications(CNC), 2018.
  9. Convolutional Neural Network Architectures for Matching Natural Language Sentences. Z L H L C Baotian , Hu . Neural Information Processing Systems Foundation, 2014.
Notes
1

© 2023 London Journals Press

Date: 1970-01-01