Bounded seas. (December 2015)
- Record Type:
- Journal Article
- Title:
- Bounded seas. (December 2015)
- Main Title:
- Bounded seas
- Authors:
- Kurš, Jan
Lungu, Mircea
Iyadurai, Rathesan
Nierstrasz, Oscar - Abstract:
- Abstract: Imprecise manipulation of source code (semi-parsing) is useful for tasks such as robust parsing, error recovery, lexical analysis, and rapid development of parsers for data extraction. An island grammar precisely defines only a subset of a language syntax (islands), while the rest of the syntax (water) is defined imprecisely. Usually water is defined as the negation of islands. Albeit simple, such a definition of water is naïve and impedes composition of islands. When developing an island grammar, sooner or later a language engineer has to create water tailored to each individual island. Such an approach is fragile, because water can change with any change of a grammar. It is time-consuming, because water is defined manually by an engineer and not automatically. Finally, an island surrounded by water cannot be reused because water has to be defined for every grammar individually. In this paper we propose a new technique of island parsing — bounded seas. Bounded seas are composable, robust, reusable and easy to use because island-specific water is created automatically. Our work focuses on applications of island parsing to data extraction from source code. We have integrated bounded seas into a parser combinator framework as a demonstration of their composability and reusability. Abstract : Highlights: Traditional island grammars are difficult to define and are not flexible enough. Bounded seas — a new technique of island parsing — are composable, robust, reusableAbstract: Imprecise manipulation of source code (semi-parsing) is useful for tasks such as robust parsing, error recovery, lexical analysis, and rapid development of parsers for data extraction. An island grammar precisely defines only a subset of a language syntax (islands), while the rest of the syntax (water) is defined imprecisely. Usually water is defined as the negation of islands. Albeit simple, such a definition of water is naïve and impedes composition of islands. When developing an island grammar, sooner or later a language engineer has to create water tailored to each individual island. Such an approach is fragile, because water can change with any change of a grammar. It is time-consuming, because water is defined manually by an engineer and not automatically. Finally, an island surrounded by water cannot be reused because water has to be defined for every grammar individually. In this paper we propose a new technique of island parsing — bounded seas. Bounded seas are composable, robust, reusable and easy to use because island-specific water is created automatically. Our work focuses on applications of island parsing to data extraction from source code. We have integrated bounded seas into a parser combinator framework as a demonstration of their composability and reusability. Abstract : Highlights: Traditional island grammars are difficult to define and are not flexible enough. Bounded seas — a new technique of island parsing — are composable, robust, reusable and easy to define. Bounded seas are specified using our extension of parsing expression grammars. Parsers utilizing bounded seas require less effort to define and provide both good precision and performance in the two performed case studies. … (more)
- Is Part Of:
- Computer languages, systems & structures. Volume 44:Part A(2015)
- Journal:
- Computer languages, systems & structures
- Issue:
- Volume 44:Part A(2015)
- Issue Display:
- Volume 44, Issue 2015 (2015)
- Year:
- 2015
- Volume:
- 44
- Issue:
- 2015
- Issue Sort Value:
- 2015-0044-2015-0000
- Page Start:
- 114
- Page End:
- 140
- Publication Date:
- 2015-12
- Subjects:
- Semi-parsing -- Island parsing -- Parsing expression grammars
Programming languages (Electronic computers) -- Periodicals
Computer networks -- Periodicals
Computer architecture -- Periodicals
Computer systems -- Periodicals
Langage de programmation
Réseau d'ordinateurs
Architecture d'ordinateur
Périodique électronique (Descripteur de forme)
Ressource Internet (Descripteur de forme)
005.13 - Journal URLs:
- http://www.sciencedirect.com/science/journal/14778424/40 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.cl.2015.08.002 ↗
- Languages:
- English
- ISSNs:
- 1477-8424
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.071000
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 10091.xml