Keyword extraction and headline generation using novel word features

Songhua Xu, Shaohui Yang, Francis C.M. Lau

Research output: Chapter in Book/Report/Conference proceedingConference contribution

28 Citations (Scopus)

Abstract

We introduce several novel word features for keyword extraction and headline generation. These new word features are derived according to the background knowledge of a document as supplied by Wikipedia. Given a document, to acquire its background knowledge from Wikipedia, we first generat e a query for searching the Wikipedia corpus based on the key facts present in the document. We then use the query to find articles in the Wikipedia corpus that are closely related to the contents of the document. With the Wikipedia search result article set, we extract the inlink, outlink, category and infobox information in each article to derive a set of novel word features which reflect the document's background knowledge. These newly introduced word features of fer valuable indications on individual words' importance in the input document. They serve as nice complements to the traditional word features derivable from explicit information of a document. In addition, we also introduce a word-document fitness feat ure to characterize the influence of a document's genre on the keyword extraction and headline generation process. We study the effectiveness of these novel word features for keyword extraction and headline generation by experiments and have obtained very encouraging results.

Original languageEnglish (US)
Title of host publicationAAAI-10 / IAAI-10 - Proceedings of the 24th AAAI Conference on Artificial Intelligence and the 22nd Innovative Applications of Artificial Intelligence Conference
Pages1461-1466
Number of pages6
StatePublished - Nov 1 2010
Event24th AAAI Conference on Artificial Intelligence and the 22nd Innovative Applications of Artificial Intelligence Conference, AAAI-10 / IAAI-10 - Atlanta, GA, United States
Duration: Jul 11 2010Jul 15 2010

Publication series

NameProceedings of the National Conference on Artificial Intelligence
Volume3

Other

Other24th AAAI Conference on Artificial Intelligence and the 22nd Innovative Applications of Artificial Intelligence Conference, AAAI-10 / IAAI-10
CountryUnited States
CityAtlanta, GA
Period7/11/107/15/10

Fingerprint

Experiments

All Science Journal Classification (ASJC) codes

  • Software
  • Artificial Intelligence

Cite this

Xu, S., Yang, S., & Lau, F. C. M. (2010). Keyword extraction and headline generation using novel word features. In AAAI-10 / IAAI-10 - Proceedings of the 24th AAAI Conference on Artificial Intelligence and the 22nd Innovative Applications of Artificial Intelligence Conference (pp. 1461-1466). (Proceedings of the National Conference on Artificial Intelligence; Vol. 3).
Xu, Songhua ; Yang, Shaohui ; Lau, Francis C.M. / Keyword extraction and headline generation using novel word features. AAAI-10 / IAAI-10 - Proceedings of the 24th AAAI Conference on Artificial Intelligence and the 22nd Innovative Applications of Artificial Intelligence Conference. 2010. pp. 1461-1466 (Proceedings of the National Conference on Artificial Intelligence).
@inproceedings{0328be6512264f9fa0169c676bfb807d,
title = "Keyword extraction and headline generation using novel word features",
abstract = "We introduce several novel word features for keyword extraction and headline generation. These new word features are derived according to the background knowledge of a document as supplied by Wikipedia. Given a document, to acquire its background knowledge from Wikipedia, we first generat e a query for searching the Wikipedia corpus based on the key facts present in the document. We then use the query to find articles in the Wikipedia corpus that are closely related to the contents of the document. With the Wikipedia search result article set, we extract the inlink, outlink, category and infobox information in each article to derive a set of novel word features which reflect the document's background knowledge. These newly introduced word features of fer valuable indications on individual words' importance in the input document. They serve as nice complements to the traditional word features derivable from explicit information of a document. In addition, we also introduce a word-document fitness feat ure to characterize the influence of a document's genre on the keyword extraction and headline generation process. We study the effectiveness of these novel word features for keyword extraction and headline generation by experiments and have obtained very encouraging results.",
author = "Songhua Xu and Shaohui Yang and Lau, {Francis C.M.}",
year = "2010",
month = "11",
day = "1",
language = "English (US)",
isbn = "9781577354666",
series = "Proceedings of the National Conference on Artificial Intelligence",
pages = "1461--1466",
booktitle = "AAAI-10 / IAAI-10 - Proceedings of the 24th AAAI Conference on Artificial Intelligence and the 22nd Innovative Applications of Artificial Intelligence Conference",

}

Xu, S, Yang, S & Lau, FCM 2010, Keyword extraction and headline generation using novel word features. in AAAI-10 / IAAI-10 - Proceedings of the 24th AAAI Conference on Artificial Intelligence and the 22nd Innovative Applications of Artificial Intelligence Conference. Proceedings of the National Conference on Artificial Intelligence, vol. 3, pp. 1461-1466, 24th AAAI Conference on Artificial Intelligence and the 22nd Innovative Applications of Artificial Intelligence Conference, AAAI-10 / IAAI-10, Atlanta, GA, United States, 7/11/10.

Keyword extraction and headline generation using novel word features. / Xu, Songhua; Yang, Shaohui; Lau, Francis C.M.

AAAI-10 / IAAI-10 - Proceedings of the 24th AAAI Conference on Artificial Intelligence and the 22nd Innovative Applications of Artificial Intelligence Conference. 2010. p. 1461-1466 (Proceedings of the National Conference on Artificial Intelligence; Vol. 3).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Keyword extraction and headline generation using novel word features

AU - Xu, Songhua

AU - Yang, Shaohui

AU - Lau, Francis C.M.

PY - 2010/11/1

Y1 - 2010/11/1

N2 - We introduce several novel word features for keyword extraction and headline generation. These new word features are derived according to the background knowledge of a document as supplied by Wikipedia. Given a document, to acquire its background knowledge from Wikipedia, we first generat e a query for searching the Wikipedia corpus based on the key facts present in the document. We then use the query to find articles in the Wikipedia corpus that are closely related to the contents of the document. With the Wikipedia search result article set, we extract the inlink, outlink, category and infobox information in each article to derive a set of novel word features which reflect the document's background knowledge. These newly introduced word features of fer valuable indications on individual words' importance in the input document. They serve as nice complements to the traditional word features derivable from explicit information of a document. In addition, we also introduce a word-document fitness feat ure to characterize the influence of a document's genre on the keyword extraction and headline generation process. We study the effectiveness of these novel word features for keyword extraction and headline generation by experiments and have obtained very encouraging results.

AB - We introduce several novel word features for keyword extraction and headline generation. These new word features are derived according to the background knowledge of a document as supplied by Wikipedia. Given a document, to acquire its background knowledge from Wikipedia, we first generat e a query for searching the Wikipedia corpus based on the key facts present in the document. We then use the query to find articles in the Wikipedia corpus that are closely related to the contents of the document. With the Wikipedia search result article set, we extract the inlink, outlink, category and infobox information in each article to derive a set of novel word features which reflect the document's background knowledge. These newly introduced word features of fer valuable indications on individual words' importance in the input document. They serve as nice complements to the traditional word features derivable from explicit information of a document. In addition, we also introduce a word-document fitness feat ure to characterize the influence of a document's genre on the keyword extraction and headline generation process. We study the effectiveness of these novel word features for keyword extraction and headline generation by experiments and have obtained very encouraging results.

UR - http://www.scopus.com/inward/record.url?scp=77958586107&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77958586107&partnerID=8YFLogxK

M3 - Conference contribution

SN - 9781577354666

T3 - Proceedings of the National Conference on Artificial Intelligence

SP - 1461

EP - 1466

BT - AAAI-10 / IAAI-10 - Proceedings of the 24th AAAI Conference on Artificial Intelligence and the 22nd Innovative Applications of Artificial Intelligence Conference

ER -

Xu S, Yang S, Lau FCM. Keyword extraction and headline generation using novel word features. In AAAI-10 / IAAI-10 - Proceedings of the 24th AAAI Conference on Artificial Intelligence and the 22nd Innovative Applications of Artificial Intelligence Conference. 2010. p. 1461-1466. (Proceedings of the National Conference on Artificial Intelligence).