Are you willing to use python to generate text? Thanks to the significant advancements in the field of Natural Language Processing (NLP), and Text generator machines are now able to understand the context all by themselves.
Read this article thoroughly till the end to understand everything about Text Generators. To build a machine learning model that can write stories and sonnets.
Also Read:
Content://com.android.browser.home – Full Guide 2021
How To Generate QR Code Easily
How to hide WordPress theme name, plugins, wp-content, wp-includes from Source Code
10 Best learning apps that will update you with the Latest Technology
Contents
What Is a Text Generator?
Text Generator – These days, there is a huge amount of data that can be classified as sequential. It is present in the form of time series, texts, video, sensor data, etc. One thing about this data is that if two events are taking place in a particular period of time, the phenomenon of event A before event B is an entirely different scenario compared to event A after event B.
Although, in conventional machine learning problems, it barely matters whether a particular data point was recorded before the other. This contemplation gives our sequence prediction a different solving approach.
Text generator is a great application, right from creating original art to regenerating the data you have lost. Although, it is difficult to crack if a stream of characters is lined up one after another. While handling texts, a model might be trained to make accurate predictions. But if one prediction goes wrong, then it can make the entire sentence meaningless. Nevertheless, in the case of numerical sequence prediction problems, if a prediction goes entirely wrong, it would still be considered a rational prediction.
What are the Different Steps of Text Generation?
Text generation involves a few steps; go through each step carefully mentioned below.
Importing Dependencies
It is nothing but importing all the libraries essential for our study.
Such as import NumPy as NP
Import pandas as PD
from nltk.tokenize import RegexpTokenizer from nltk.corpus import stopwords from keras.models import Sequential from keras.layers import Dense, Dropout, LSTM from keras.utils import np_utils from keras.callbacks import ModelCheckpoint
Loading the data
Here, we load a combined collection of all Shakespear’s sonnets that can be downloaded. The text file is then opened and saved in text. The content is then converted into lowercase to reduce the number of possible words.
text=(open(“/Users/pranjal/Desktop/text_generator/sonnets.txt”).read()) text=text.lower()
Creating Character/Word mappings
Creating characters or mapping the words is a step in assigning an arbitrary number to a word or a character in the text. By this, all the unique characters are mapped together. It is necessary because machines understand numbers much better than texts, making the training process easier later.
characters = sorted(list(set(text))) n_to_char = {n:char for n, char in enumerate(characters)} char_to_n = {char:n for n, char in enumerate(characters)
We have created a dictionary with a number assigned to each unique character present in the text. These special characters are first converted into symbols and are then enumerated. Note that here we have not used word mapping but character level mapping. Although, a word-based model shows higher accuracy when compared to a character-based model. The reason behind this is the latter model requires a much larger.
Network to learn long-term dependencies because it has to remember the sequences of words and has to learn to predict a correct word.
Data Preprocessing
It is kind of difficult when it comes to LSTM models. Data transforming at hand into a relatable format is a challenging task.
X = [] Y = [] length = len(text) seq_length = 100 for i in range(0, length-seq_length, 1): sequence = text[i:i + seq_length] label =text[i + seq_length] X.append([char_to_n[char] for char in sequence]) Y.append(char_to_n[label])
In this, X is a train array, and Y is the target array. seq_length is the sequence length of characters that we want to consider before foreseeing a particular character.
For a sequence length of 4 and the text ‘Hello India.’
X Y
[h, e, l, l] [o]
[e, l, l, o] [ ]
[l, l, o, ] [i]
[l, o, , i] [n]
…. ….
Modelling
Here, we build a sequential model with two LSTM layers having 400 units each. The first layer has to be in the Input Shape. The next LSTM layer must be able to process the same sequences; we enter the return_sequences parameter as True.
The Last layer results in a hot encoded vector which gives character output.
model = Sequential() model.add(LSTM(400, input_shape=(X_modified.shape[1], X_modified.shape[2]), return_sequences=True)) model.add(Dropout(0.2)) model.add(LSTM(400)) model.add(Dropout(0.2)) model.add(Dense(Y_modified.shape[1], activation=’softmax’)) model.compile(loss=’categorical_crossentropy,optimizer=’adam’.
Generating text
Starting with a random row from the X array, an array of 100 characters. We target foreseeing another 100 characters following X. However, the input here is reshaped and scaled, and the next character with the maximum probability is predicted.
Seq is stored to decode the format of the string that has been predicted till now.
string_mapped = X[99] # generating characters for i in range(seq_length): x = np.reshape(string_mapped,(1,len(string_mapped), 1)) x = x / float(len(characters)) pred_index = np.argmax(model.predict(x, verbose=0)) seq = [n_to_char[value] for value in string_mapped] string_mapped.append(pred_index) string_mapped = string_mapped[1:len(string_mapped)
Also Read – How To Be a Cyber Expert
How to read and write text files in python(Text Generator)?
Python provides excellent inbuilt functions for creating, reading, and writing files. 2 types of files are handled in python. One is normal text files, and the other is binary files.
- Text file – In this, each and every type of text is terminated with a special character called EOL. I.e., end of the line.
- Binary File – there are no terminated lines, and the data is stored after converting it into machine-understandable text.
6 main file access modes
There are 6 types of access mode files in Python.
Read Only (r) – Open the Text File for reading; the position of the handle is at the beginning of the file. If the file doesn’t exist, it raises an I/O error. This is also the default mode in which the file is opened.
Read and Write ( r+) – Open the text file for reading and writing. The handle here is positioned at the beginning of the file. It also raises an I/O error if the file doesn’t exist.
Write only ( w) – Open the text file again for writing. For the existing file, the date is over-written. The position of the handle is placed at the beginning of the file. Create the file if the line doesn’t exist.
Write and Read (‘w+’) – Open the text file for reading and writing. For existing files, data is truncated and over-written. The handle is positioned at the beginning of the file.
Append Only (‘a’) – Open the text file for writing. The file is created if it does not exist. The position of the handle is placed at the end of the file. The data which has been written will be inserted at the end, after the existing data.
Append and Read (‘a+’) – Open the text file for reading and writing. The file is created if it does not exist. The handle is positioned at the end of the file. The data which has been written will be inserted at the end, after the existing data.
How to write files?
You can write a file in two ways:
1. Write() – Insert the str1 string in a single line in the text file.
File_object.write(str1)
2. writelines() – Each string is inserted from a list of string elements in the text file.
File_object.writelines(L) for L = [str1, str2, str3]
How to read files?
There are 3 ways of reading a file:
•read() – File_object.read([n])
• readline() – File_object.readline([n])
• readlines() – File_object.readlines()
Final Words
Text generators are of great applications, right from creating original art to regenerating the data you have lost. We hope you have cleared all your doubts through this article. Have a great day!