Creating a Summarization Tool with OpenAI’s GPT-3: A Step-by-Step Guide

Sajid Hasan Sifat
8 min readJan 28, 2023

Unleashing the power of GPT-3 to automatically summarize large paragraphs with ease

image from https://openai.com/

Introduction

Summarization is the process of reducing a large piece of text to its most important points. This can be useful for a variety of applications, such as news summarization, document summarization, and more. In this article, we will show you how to use the OpenAI API to create a summarization tool that can automatically summarize large pieces of text.

Setting up the OpenAI API

To use the OpenAI API, you will need to sign up for an account on the OpenAI website (https://beta.openai.com/signup/). Once you have created an account and logged in, you will be able to generate an API key from the settings page. Additionally, you will also need to add a payment method to your account in order to use the GPT-3 API.

Installing the OpenAI library

To use the OpenAI API in your Python code, you will need to install the openai library. You can do this by running the following command in your command line:

pip install openai

Creating the summarization tool

Once you have set up the OpenAI API and installed the openai library, you can start creating your summarization tool. Here is an example of how you can use the GPT-3 API to summarize a large paragraph

Here is a sample paragraph

"Up to the 1980s, most natural language processing systems were based on complex sets of hand-written rules. Starting in the late 1980s, however, there was a revolution in natural language processing with the introduction of machine learning algorithms for language processing. This was due to both the steady increase in computational power (see Moore’s law) and the gradual lessening of the dominance of Chomskyan theories of linguistics (e.g. transformational grammar), whose theoretical underpinnings discouraged the sort of corpus linguistics that underlies the machine-learning approach to language processing.[3] Some of the earliest-used machine learning algorithms, such as decision trees, produced systems of hard if-then rules similar to existing hand-written rules. However, part-of-speech tagging introduced the use of hidden Markov models to natural language processing, and increasingly, research has focused on statistical models, which make soft, probabilistic decisions based on attaching real-valued weights to the features making up the input data. The cache language models upon which many speech recognition systems now rely are examples of such statistical models. Such models are generally more robust when given unfamiliar input, especially input that contains errors (as is very common for real-world data), and produce more reliable results when integrated into a larger system comprising multiple subtasks.Many of the notable early successes occurred in the field of machine translation, due especially to work at IBM Research, where successively more complicated statistical models were developed. These systems were able to take advantage of existing multilingual textual corpora that had been produced by the Parliament of Canada and the European Union as a result of laws calling for the translation of all governmental proceedings into all official languages of the corresponding systems of government. However, most other systems depended on corpora specifically developed for the tasks implemented by these systems, which was (and often continues to be) a major limitation in the success of these systems. As a result, a great deal of research has gone into methods of more effectively learning from limited amounts of data. Recent research has increasingly focused on unsupervised and semi-supervised learning algorithms. Such algorithms can learn from data that has not been hand-annotated with the desired answers or using a combination of annotated and non-annotated data. Generally, this task is much more difficult than supervised learning, and typically produces less accurate results for a given amount of input data. However, there is an enormous amount of non-annotated data available (including, among other things, the entire content of the World Wide Web), which can often make up for the inferior results if the algorithm used has a low enough time complexity to be practical.In the 2010s, representation learning and deep neural network-style machine learning methods became widespread in natural language processing, due in part to a flurry of results showing that such techniques[4][5] can achieve state-of-the-art results in many natural language tasks, for example in language modeling,[6] parsing,[7][8] and many others. Popular techniques include the use of word embeddings to capture semantic properties of words, and an increase in end-to-end learning of a higher-level task (e.g., question answering) instead of relying on a pipeline of separate intermediate tasks (e.g., part-of-speech tagging and dependency parsing). In some areas, this shift has entailed substantial changes in how NLP systems are designed, such that deep neural network-based approaches may be viewed as a new paradigm distinct from statistical natural language processing. For instance, the term neural machine translation (NMT) emphasizes the fact that deep learning-based approaches to machine translation directly learn sequence-to-sequence transformations, obviating the need for intermediate steps such as word alignment and language modeling that was used in statistical machine translation (SMT)."
import openai

# Use your own API key
openai.api_key = "YOUR_API_KEY"

text = "Up to the 1980s, most natural language processing systems were based on complex sets of hand-written rules. Starting in the late 1980s, however, there was a revolution in natural language processing with the introduction of machine learning algorithms for language processing. This was due to both the steady increase in computational power (see Moore’s law) and the gradual lessening of the dominance of Chomskyan theories of linguistics (e.g. transformational grammar), whose theoretical underpinnings discouraged the sort of corpus linguistics that underlies the machine-learning approach to language processing.[3] Some of the earliest-used machine learning algorithms, such as decision trees, produced systems of hard if-then rules similar to existing hand-written rules. However, part-of-speech tagging introduced the use of hidden Markov models to natural language processing, and increasingly, research has focused on statistical models, which make soft, probabilistic decisions based on attaching real-valued weights to the features making up the input data. The cache language models upon which many speech recognition systems now rely are examples of such statistical models. Such models are generally more robust when given unfamiliar input, especially input that contains errors (as is very common for real-world data), and produce more reliable results when integrated into a larger system comprising multiple subtasks.Many of the notable early successes occurred in the field of machine translation, due especially to work at IBM Research, where successively more complicated statistical models were developed. These systems were able to take advantage of existing multilingual textual corpora that had been produced by the Parliament of Canada and the European Union as a result of laws calling for the translation of all governmental proceedings into all official languages of the corresponding systems of government. However, most other systems depended on corpora specifically developed for the tasks implemented by these systems, which was (and often continues to be) a major limitation in the success of these systems. As a result, a great deal of research has gone into methods of more effectively learning from limited amounts of data. Recent research has increasingly focused on unsupervised and semi-supervised learning algorithms. Such algorithms can learn from data that has not been hand-annotated with the desired answers or using a combination of annotated and non-annotated data. Generally, this task is much more difficult than supervised learning, and typically produces less accurate results for a given amount of input data. However, there is an enormous amount of non-annotated data available (including, among other things, the entire content of the World Wide Web), which can often make up for the inferior results if the algorithm used has a low enough time complexity to be practical.In the 2010s, representation learning and deep neural network-style machine learning methods became widespread in natural language processing, due in part to a flurry of results showing that such techniques[4][5] can achieve state-of-the-art results in many natural language tasks, for example in language modeling,[6] parsing,[7][8] and many others. Popular techniques include the use of word embeddings to capture semantic properties of words, and an increase in end-to-end learning of a higher-level task (e.g., question answering) instead of relying on a pipeline of separate intermediate tasks (e.g., part-of-speech tagging and dependency parsing). In some areas, this shift has entailed substantial changes in how NLP systems are designed, such that deep neural network-based approaches may be viewed as a new paradigm distinct from statistical natural language processing. For instance, the term neural machine translation (NMT) emphasizes the fact that deep learning-based approaches to machine translation directly learn sequence-to-sequence transformations, obviating the need for intermediate steps such as word alignment and language modeling that was used in statistical machine translation (SMT)."

# Use the GPT-3 engine to summarize the text
response = openai.Completion.create(
engine="text-davinci-002",
prompt=(f"Please summarize the following text: {text}"),
temperature=0.5,
max_tokens=150
)

# Print the summary
print(response.choices[0].text)

Output

The text discusses the history of natural language processing, with a focus on the shift from hand-written rules to machine learning algorithms in the late 1980s. This shift was due to both the increasing computational power of computers and the lessening dominance of Chomskyan theories of linguistics. The machine learning approach to language processing is more robust when given unfamiliar input, and produces more reliable results when integrated into a larger system. However, most machine learning systems for natural language processing depend on corpora specifically developed for the tasks they are designed to perform, which can be a major limitation.

This script uses the openai.Completion.create() function to make a request to the GPT-3 API and pass the text to be summarized. The engine parameter is used to specify which GPT-3 model to use. The prompt parameter is used to pass the text to be summarized, and the temperature parameter is used to control the randomness of the summary. The max_tokens parameter is used to control the length of the summary.

Conclusion

In this article, we showed you how to use the OpenAI API to create a summarization tool that can automatically summarize large pieces of text. The OpenAI API is a powerful tool that can be used to perform various natural language processing tasks, and it is easy to use with the openai library. However, it's worth noting that GPT-3 is a powerful model and it can give a good summary but it might also include some irrelevant information or generate a summary that is not coherent with the original text, depending on the complexity and the type of text, so it's important to review and check the generated

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

No responses yet

Write a response