Automatic abstractive summarization for news articles.
- Download repository.
- Install project:
> cd /path/to/abstractive-summarization
> ./setup [path/to/target/directory]
- Project will be moved to the target directory. If no target directory is specified, project is installed in working directory.
- Stanford JARs will be downloaded into
lib
directory.
- Run demo:
> ./demo [arguments]
- If no arguments are specified, an excerpt on Tolstoy's biography
(
./resources/article-tolstoy.txt
) will be summarized for the demo. - If any arguments are specified for
demo
, the default text file will be ignored.
- If no arguments are specified, an excerpt on Tolstoy's biography
(
-h or --help .......... Help
-f or --file [filename] Path to file, containing body of text to be summarized
-m .................... Write metadata to file
-s .................... Write summary to file
- Program reads in file.
- Extracts important semantic information and writes it to file.
- Extracts semantic triples
- Example:
"Bob likes puppies more than cats."
System extracts multiple triples[Bob | likes | puppies]
[Bob | likes | puppies more than cats]
- Example:
- Extracts named entities:
Bob -> Person
- Extracts semantic triples
- Removes semantic information with low confidence scores.
- Removes other problematic extracted triples based on a series of rules.
- Removes sentences that were not assigned triples or had all of their triples removed.
- Generates new sentences off of the remaining information.
- Adds back in the time named entity information.
- Performs formatting.
- Displays summary.
- This program still needs work, but the system does summarize a body of text.
- Text files are provided inside
resources
. - By default, the summarized text is sent to standard out.
- The meta-data (i.e., named entity information and triples) is written to a
file:
originalfilename-meta.txt
. - The summary is written to a file:
originalfilename-summary.txt
.