First attempt at question answering by ML

The goal of this experiment is to provide a summary of a text with sources.

I’d aim for a self-hosted solution for answering question, my inspiration being

A basic code example:

from transformers import pipeline
summarizer = pipeline(
  'summarization',
  'pszemraj/long-t5-tglobal-base-16384-book-summary',
)
long_text = """
Text_GO_here
"""
result = summarizer(long_text)
print(result[0]['summary_text'])

The funny thing to note about the output is “Wuthering Heights” is a book title :grin:

In this chapter, Wuthering Heights explains the state of affairs in China. 
He compares Shanghai to Beijing and notes that it is still very different from the two big cities in the world. 
Shanghai is more culture-oriented than the rest of the country, but also more business-oriented.

Thanks for reading and feedback welcome!

2 Likes

UPDATE: So I’ve managed to use a vector store for the documents, this has enabled semantic search of documents example “apples are fruits”
now I can get the top four paragraphs and do question answering

For the vector store I started with Faiss since it was simple to start, tho will maybe do something more long term like Elasticsearch.