Uniform probabilistic generation of relation instances satisfying a functional dependency. Issue 103 (January 2022)
- Record Type:
- Journal Article
- Title:
- Uniform probabilistic generation of relation instances satisfying a functional dependency. Issue 103 (January 2022)
- Main Title:
- Uniform probabilistic generation of relation instances satisfying a functional dependency
- Authors:
- Berens, Maximilian
Biskup, Joachim
Preuß, Marcel - Abstract:
- Abstract: Software engineering includes the runtime evaluation of a prototype by experiments with carefully selected sample inputs. For the development of software intended to operate on relation instances for a given relational schema with a single functional dependency, we are then challenged to generate appropriate sample instances of increasing size. Moreover, studying the impact of varying the sizes of the attribute domains might be important. We focus on seeing uniformly distributed collections of sample instances to be appropriate. If the schema is normalized (in Boyce–Codd normal form with the left-hand side of the functional dependency forming a unique key), then a straightforward heuristic insertion procedure complies with our requirements. However, for a non-normalized schema (the left-hand side not forming a key), based on a complex combinatorial analysis and exploiting an algorithm for restricted integer partitions, we develop a sophisticated probabilistic procedure of high computational complexity for generating any element of such a collection with equal probability. Moreover, we demonstrate that simpler approaches based on uniform local selections, including the heuristic insertion procedure, fail to achieve global uniformity. The conceptual design and analysis are complemented by an experimental evaluation of a prototype implementation. Highlights: Schema normalization affects the complexity of uniform random instance generation. Naive approaches fail toAbstract: Software engineering includes the runtime evaluation of a prototype by experiments with carefully selected sample inputs. For the development of software intended to operate on relation instances for a given relational schema with a single functional dependency, we are then challenged to generate appropriate sample instances of increasing size. Moreover, studying the impact of varying the sizes of the attribute domains might be important. We focus on seeing uniformly distributed collections of sample instances to be appropriate. If the schema is normalized (in Boyce–Codd normal form with the left-hand side of the functional dependency forming a unique key), then a straightforward heuristic insertion procedure complies with our requirements. However, for a non-normalized schema (the left-hand side not forming a key), based on a complex combinatorial analysis and exploiting an algorithm for restricted integer partitions, we develop a sophisticated probabilistic procedure of high computational complexity for generating any element of such a collection with equal probability. Moreover, we demonstrate that simpler approaches based on uniform local selections, including the heuristic insertion procedure, fail to achieve global uniformity. The conceptual design and analysis are complemented by an experimental evaluation of a prototype implementation. Highlights: Schema normalization affects the complexity of uniform random instance generation. Naive approaches fail to achieve uniform distribution for non-normalized schemas. Duplicate-freeness and dependency satisfaction might strongly interfere. Achieving uniformity has to observe number-theoretic issues like integer partitions. Cases of exponential runtimes require substantial optimization and approximation. … (more)
- Is Part Of:
- Information systems. Issue 103(2022)
- Journal:
- Information systems
- Issue:
- Issue 103(2022)
- Issue Display:
- Volume 103, Issue 103 (2022)
- Year:
- 2022
- Volume:
- 103
- Issue:
- 103
- Issue Sort Value:
- 2022-0103-0103-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-01
- Subjects:
- Boyce–Codd normal form -- Combinatorial analysis -- Domain cardinality -- Duplicate-freeness -- Functional dependency -- Probabilistic instance generation
Database management -- Periodicals
Electronic data processing -- Periodicals
Bases de données -- Gestion -- Périodiques
Informatique -- Périodiques
Database management
Electronic data processing
Periodicals
005.7 - Journal URLs:
- http://www.sciencedirect.com/science/journal/03064379 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.is.2021.101848 ↗
- Languages:
- English
- ISSNs:
- 0306-4379
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4496.367300
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 19213.xml