On the synthesis of metadata tags for HTML files. (3rd September 2020)
- Record Type:
- Journal Article
- Title:
- On the synthesis of metadata tags for HTML files. (3rd September 2020)
- Main Title:
- On the synthesis of metadata tags for HTML files
- Authors:
- Jiménez, Patricia
Roldán, Juan C.
Gallego, Fernando O.
Corchuelo, Rafael - Abstract:
- Summary: RDFa, JSON‐LD, Microdata, and Microformats allow to endow the data in HTML files with metadata tags that help software agents understand them. Unluckily, there are many HTML files that do not have any metadata tags, which has motivated many authors to work on proposals to synthesize them. But they have some problems: the authors either provide an overall picture of their designs without too many details on the techniques behind the scenes or focus on the techniques but do not describe the design of the software systems that support them; many of them cannot deal with data that are encoded using semistructured formats like forms, listings, or tables; and the few proposals that can work on tables can deal with horizontal listings only. In this article, we describe the design of a system that overcomes the previous limitations using a novel embedding approach that has proven to outperform four state‐of‐the‐art techniques on a repository with randomly selected HTML files from 40 different sites. According to our experimental analysis, our proposal can achieve an F 1 score that outperforms the others by 10.14 % ; this difference was confirmed to be statistically significant at the standard confidence level.
- Is Part Of:
- Software, practice & experience. Volume 50:Number 12(2020)
- Journal:
- Software, practice & experience
- Issue:
- Volume 50:Number 12(2020)
- Issue Display:
- Volume 50, Issue 12 (2020)
- Year:
- 2020
- Volume:
- 50
- Issue:
- 12
- Issue Sort Value:
- 2020-0050-0012-0000
- Page Start:
- 2169
- Page End:
- 2192
- Publication Date:
- 2020-09-03
- Subjects:
- embedding techniques -- HTML files -- metadata tags
Computer software -- Periodicals
Computer programming -- Periodicals
Computer programs -- Periodicals
005.3 - Journal URLs:
- http://onlinelibrary.wiley.com/ ↗
- DOI:
- 10.1002/spe.2886 ↗
- Languages:
- English
- ISSNs:
- 0038-0644
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 8321.453000
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 14702.xml