A branch and bound irredundant graph algorithm for large-scale MLCS problems. (November 2021)
- Record Type:
- Journal Article
- Title:
- A branch and bound irredundant graph algorithm for large-scale MLCS problems. (November 2021)
- Main Title:
- A branch and bound irredundant graph algorithm for large-scale MLCS problems
- Authors:
- Wang, Chunyang
Wang, Yuping
Cheung, Yiuming - Abstract:
- Highlights: Design a branch and bound strategy for identifying non-contributed points and non-longest paths. Construct a much smaller DAG than those constructed by the existing algorithms. Design a strategy for deleting points in the Hash table timely. Propose a new data structure for storing Small-DAG to avoid topological sorting. Propose a new algorithm for larger-scale MLCS problems with lower time and space cost. Abstract: Finding the multiple longest common subsequences (MLCS) among many long sequences (i.e., the large scale MLCS problem) has many important applications, such as gene alignment, disease diagnosis, and documents similarity check, etc. It is an NP-hard problem (Maier et al., 1978). The key bottle neck of this problem is that the existing state-of-the-art algorithms must construct a huge graph (called direct acyclic graph, briefly DAG), and the computer usually has no enough space to store and handle this graph. Thus the existing algorithms cannot solve the large scale MLCS problem. In order to quickly solve the large-scale MLCS problem within limited computer resources, this paper therefore proposes a branch and bound irredundant graph algorithm called Big-MLCS, which constructs a much smaller DAG (called Small-DAG) than the existing algorithms do by a branch and bound method, and designs a new data structure to efficiently store and handle Small-DAG. By these schemes, Big-MLCS is more efficient than the existing algorithms. Also, we compare the proposedHighlights: Design a branch and bound strategy for identifying non-contributed points and non-longest paths. Construct a much smaller DAG than those constructed by the existing algorithms. Design a strategy for deleting points in the Hash table timely. Propose a new data structure for storing Small-DAG to avoid topological sorting. Propose a new algorithm for larger-scale MLCS problems with lower time and space cost. Abstract: Finding the multiple longest common subsequences (MLCS) among many long sequences (i.e., the large scale MLCS problem) has many important applications, such as gene alignment, disease diagnosis, and documents similarity check, etc. It is an NP-hard problem (Maier et al., 1978). The key bottle neck of this problem is that the existing state-of-the-art algorithms must construct a huge graph (called direct acyclic graph, briefly DAG), and the computer usually has no enough space to store and handle this graph. Thus the existing algorithms cannot solve the large scale MLCS problem. In order to quickly solve the large-scale MLCS problem within limited computer resources, this paper therefore proposes a branch and bound irredundant graph algorithm called Big-MLCS, which constructs a much smaller DAG (called Small-DAG) than the existing algorithms do by a branch and bound method, and designs a new data structure to efficiently store and handle Small-DAG. By these schemes, Big-MLCS is more efficient than the existing algorithms. Also, we compare the proposed algorithm with two state-of-the-art algorithms through the experiments, and the results show that the proposed algorithm outperforms the compared algorithms and is more suitable to large-scale MLCS problems. … (more)
- Is Part Of:
- Pattern recognition. Volume 119(2021)
- Journal:
- Pattern recognition
- Issue:
- Volume 119(2021)
- Issue Display:
- Volume 119, Issue 2021 (2021)
- Year:
- 2021
- Volume:
- 119
- Issue:
- 2021
- Issue Sort Value:
- 2021-0119-2021-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-11
- Subjects:
- Multiple longest common subsequences -- Small DAG -- Branch and bound -- Gene alignment
Pattern perception -- Periodicals
Perception des structures -- Périodiques
Patroonherkenning
006.4 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00313203 ↗
http://www.sciencedirect.com/ ↗ - DOI:
- 10.1016/j.patcog.2021.108059 ↗
- Languages:
- English
- ISSNs:
- 0031-3203
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 17786.xml