1. I. INTRODUCTION

Natural Language Processing (NLP) is a subfield of Artificial Intelligence that is focused on enabling computers to understand and process human languages, to get computers closer to a human level understanding of language. Humans have been writing things down for thousands of years. Over that time, our brains have gained tremendous amount of data and experience in understanding natural language. [9] The goal of NLP is to accomplish human like language processing. It is a theoretically motivated range of computational techniques. There are various applications such as Machine translation, Speech synthesis, Automatic summarization, word processing, Text Prediction, Dialogue systems, Named Entity Recognization, Story understanding, Language teaching and assistive computing.

The steps for generating text is divided in to four phases. First is dataset collection, second one is cleaning of that dataset, third one is loading of cleaned text and the final one is generating text.

In 2016, Artificial Intelligence has generated movie script "Sun spring" created by Ross Goodwin and also directed by Oscar Sharp. It was written by program called Jetson which is called Benjamin. Benjamin's other films are "Zone out" and "This wild." In addition, A new chapter of famous series "Harry Potter" by J. K. Rowling had been published by Botnik studios titled as "Harry Potter and the Portrait of What Looked Like a Large Pile of Ash." There are many songs which are generated by Artificial Intelligence such as "Daddy's car" and "Break free". The other experiment is Wikipedia text generation. The poems can also be generated by Artificial Intelligence. In Chinese literature, poems have been generated by AI. The connectionist models are used to model the aspects of human perceptions such as, cognition and behavior, learning process under such behaviors and storage and information retrieval from memory. The Neural Networks, which are a sub part of connectionist models, are nothing but a model that mimics how human brain works. We will discuss how these neural networks are useful for generating text.

2. II. RELATED WORK

Alex Graves (2014) [1] emphasized on demonstrating that LSTM can use it's memory to generate complex, realistic sequences containing long range structure. In this paper, Alex Graves has taken an approach for generating sequence is for text. He had also shown that how recurrent neural networks can be used to generate complex sequences with long range structure, simply by predicting one data point at a time. In this paper, he had shown that how Recurrent Neural Networks can be trained for sequence generation by processing real data sequences one step at a time, and predicting what comes next. Here predictions are assumed probabilistic and it is also assumed that sequences can be generated from a trained network by iteratively sampling from the network's output and then feeding in the sample as input at the next step. It has been stated in paper that in practice, standard Recurrent Neural Networks are not able to store information about past inputs for very long.The word level Recurrent Neural Network performed better than character level network but that gap appeared close when regularizations are used. [2] has given a review about recurrent neural networks regarding how they learn sequences. The Recurrent Neural Networks are connectionist models. The connectionist models are used to model the aspects of human perception, cognition and behavior, learning process under such behaviors and storage and their retrieval of information from memory.

3. Lipton et al. (2015)

The neural networks are powerful learning models that give the state-of-the-art results in a wide range of supervised and unsupervised machine learning tasks. But standard neural networks are having limitations, too. In that, there is no dependency between the concurrent states or layers. So when data is related through time or space, these network models are not useful. The examples of such data are frames from video, audio snippets, words pulled from sentences.

Thus, Recurrent Neural Network's requirement came in to picture. Because they are connected through time, all the data that is related through time can be modeled. The recurrent neural network is depicted in figure (1). [3] has proposed two convolutional neural networks models for matching two sentences, by adapting the convolutional strategy in vision and speech. The proposed models not only depicts the hierarchical structures (structure of sentences in which phrases are nested in phrases) of sentences with their layer-by-layer composition and pooling, but also can capture the rich matching patterns at different levels. A successful sentence-matching algorithm needs to capture the whole structure including the internal structures of sentences and also rich patterns in their interactions. Kalchbrenner(2014)

4. Zhengdong et al. (2014)

5. III. METHODOLOGY

When any writer or poet determines to write about any particular topic, he/she has to gather abundant knowledge about that topic. That knowledge will work as raw material for building a new block. So, from that raw material he/she will be able to write new things about that topic, which will be proprietary. This process of generating new text will be same for the computer as of humans. Text Generation is a part of Natural Language Generation. The Neural Networks are used to model these facilities in the computers.

The connectionist models are used to model the aspects of human perceptions such as, cognition and behavior, learning process under such behaviors and storage and information retrieval from memory .The Neural Networks , which are a sub part of connectionist models, are nothing but a model that mimics how human brain works.

Basically, in a supervised learning ANN (Artificial Neural Network) plays an important role. If we compare it to the human brain, then we can assume that ANN works as temporal lobe, CNN (Convolutional Neural Network) works as a occipital lobe and RNN(Recurrent Neural Network.) works as a frontal lobe of the brain.

The ANNs are very powerful tool to learn machine perception tasks and gives various state-of-the-art results in a wide range of supervised and unsupervised machine learning tasks. But the standard neural networks have a major shortcoming i.e. the current output is independent of previous output. Which is not advantageous to our definition. Humans have context about things, so he/she can get the meaning of new things. When we are reading text book of any subject, if we have understood previous paragraph, then and then only we are able to understand the current paragraph. So we can reach to the conclusion that our current output is dependent on the previous one. So for our definition, RNNs are very helpful which address this issue. In these, networks have many loops which allow information to remain in it.

Basically, in these networks, neurons are connecting to themselves through time. So that they have memory which is short-term, but they can remember what was just happened in the previous neuron or layer. Which helps our definition to generate the sequences? The representation of RNN is as following.

6. IV. RESULTS AND DISCUSSION

The

Issue 1 | Compilation 1.0 London Journal of Research in Computer Science and Technology Demystifying Text Generation Approaches — Figure 1. Volume 23 |

(a) Representation of RNN LSTMs (Long Short Term Memory units), which are called memory cells of RNN which work as memory units of RNN and also overcome the limitation of traditional Artificial Neural Network. The other techniques used to generate text are Markov chain, Recursive neural networks, Long Short Term Memory ,etc. [10] We will see LSTMs and HMM in depth in following subsection of the paper. LSTMs: The Long Short Term Memory unit is a memory unit of Recurrent Neural Network as discussed above. The traditional recurrent neural network is having a shortcoming of vanishing gradient, which is overcome by Long Short Term Memory. The LSTM captures long range dependencies that means it can remember what has happened just previously. These LSTMs can implemented in various ways such as word level and character level. It observes sequence and then gives output according to input. The standard representation of LSTM is as shown below. — Figure 2.

Fig 3: LSTM notations[8] — Figure 4. Fig 3 :

set of probability called transition probability. According to associated probability distribution, the observations can be generated. It is called hidden markov model because only outcome is visible to external world, not the internal state transitions are visible. But when the set of possible hidden states grows large, HMM are infeasible. In addition , HMM cannot captures long range dependencies that means HMM cannot remember what has just happened previously. — Figure 5.

Figure 6.

et al, have described convolutional architecture dubbed the Dynamic Convolutional Neural Network for semantic modeling of sentences. The network uses Dynamic K-max pooling, a global pooling operation over a linear sequences. The main aim of this paper is to analyze and represent thesemantic content of a sentence for a purpose of classification or generation. Manurang et al. (2012) [4]has implemented system, McGonagall which uses genetic algorithm to construct text. In this paper, the main goal of authors is to generate texts which are syntactically well formed, meet certain pre specified patterns of metre and convey some meaning. They have proved that if some constraints on metre were relaxed, then their experiments can generate relatively meaningful text. The poetry generation involves many aspects of languages so automatic generation of such poetic text is challenging. They have set some restricted definition of poetry as a text that embodies meaningfulness, grammaticality and poeticness. Malmi et al. (2016) [5]focus on generating rap lyrics They have given model which is based on two machine learning techniques: 1). The RankSVM algorithm 2).Automatic Rhyme Detection. They have divided next line prediction problem in to three groups that are rhyming, structural similarity and semantic similarity. They have generated tool called Deepbeat.org which generates rap lyrics by giving a key word as input. Wei et al. (2018) [6]have tried to generate Classical Chinese poetry, which often incorporates expressive folk influences filtered through the minds of Chinese poets, which consistently has been held in extremely high regard in china. In this paper, they have proposed a Poet based Poetry generation method which generates poems by controlling not only content selection but also poetic style factor. They have done studies that improves the content quality issues of poetic generation system. PoetPG framework takes the content of current line and poet's name as input and then generates a poem in the following two stages: Poetic Style Model, Poem generation.

Figure 7.

V. CONCLUSION AND FUTURE SCOPE

from going through various methods used in

various papers, we can conclude that, there are

different methods available for modelling

sequence of words

Demystifying Text Generation Approaches

Table of contents