A text paraphrasing program comes in handle for numerous purposes, including rewriting a block of sentences in an article, post, or email.
The task of paraphrasing a text usually requires building and training a Natural Language Processing (NLP) model.
NLP is tasking not only because language is a complex structure, but also the amount of data required to train an NLP model to carry out tasks such as paraphrasing sentences impacts the model performance heavily.
Hence, if it is not properly trained, you get funny outputs.
Also, the process of acquiring and labeling additional observations for an NLP can be expensive and very time-consuming.
One common approach to building a text paraphraser, especially in Python, has been to apply data augmentation to the labeled text data and rewrite the text using back translation, e.g. (en -> de -> en).
What is the Pegasus transformer Model?
Google’s research team introduced a world-class summarization model called PEGASUS. It expands Pre-training with Extracted Gap-sentences for Abstractive Summarization.
We can adopt this summarization model to paraphrase text or a sentence using seq2seq transformer models.
Additionally, seq2seq transformer models make it easy to rewrite a text without using the back translation process.
This post does not in any way promote stealing content from other websites using a method popularly called article spinning. It is solely intended for research and testing purposes.
NB: Running this program will download some files. One of which is the model is about 2 GB or more in size.
How to Build a Text Paraphraser Using Python with Pegasus Transformer for NLP
Adopting this model for paraphrasing text means that we fine-tune the Google Pegasus model for paraphrasing tasks and convert TF checkpoints to PyTorch using this script on transformer’s library by Huggingface.
Install the Dependencies
The first step would be to install the required dependencies for our paraphrasing model.
We use PyTorch and the transformers package to work with the PEGASUS model.
Also, we use the sentence-splitter package to split our paragraphs into sentences and the SentencePiece package to encode and decode sentences.
Set Up the PEGASUS Model
Next, we will set up our PEGASUS transformer model, import the dependencies, make the required settings such as maximum length of sentences, and more.
Access the Model
Test the Model
Paraphrase a single sentence:
We got ten different paraphrased sentences by the model because we set the number of responses to 10.
Paraphrase a paragraph:
The model works efficiently on a single sentence.
Hence, we have to break a paragraph into single sentences.
The code below takes the input paragraph and splits it into a list of sentences.
Then we apply a loop operation and paraphrase each sentence in the iteration.
Combine the separated lists into a paragraph:
You learned how to create a Text Paraphrase model by using NLP methods.
You also learned about the PEGASUS transformer model and explored its main components for NLP and how it simplifies the process.