Models have to learn a much deeper relationship between specific keywords, topics, and phrases as the text grows. Use causes such as blog posts, interviews, transcripts and more have multiple swings in the dialog that make it harder to understand what contextual information is valuable for the summary. Longer documents often have much more internal data variance and swings in the information. The new dynamic nature of the input data means our data variance is much larger than what is seen with smaller text. Chunking algorithms control how much of the larger document we pass into a summarizer based on the max tokens the model allows and parameters we’ve set. If our summary has to be say 5 sentences max it is much harder to decide what information is valuable enough to be added with 500 words vs 50,000 words.Ĭhunking algorithms are often required, but they do grow the data variance coverage a model must have to be accurate. Packing all the contextual information from a document into a short summary is much harder with long text. Difference Between Large & Small Text Summarization The growth in understanding of how to build and use chunking algorithms that keep the structure of contextual information and reduce data variance at runtime has been key as well. Past architectures such as LSTMs or RNNs were not as efficient nor as accurate as these transformer based models, which made long document summarization much harder. The key changes that have led to the new push in long text summarization are the introduction of transformer models such as BERT and GPT-3 that can handle much longer input sequences of text in a single run and a new understanding of chunking algorithms. The increased demand for the summarization of longer documents such as news articles and research papers has driven the growth in the space. The summarization space has grown rapidly with a new focus on handling super large text inputs to summarize down into a few lines. With the amount of time and resources required for manual summarization, it's no surprise that automatic summarization with NLP has grown across a number of different use cases for many different document lengths. Text summarization is an NLP process that focuses on reducing the amount of text from a given input while at the same time preserving key information and contextual meaning.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |