Literature Report

CSC 5593 - Practical Construction of Suffix Tree

Posted by Nivin Anton Alexis Lawrence on January 26, 2017

Recently by the same author:


Introduction to Functional Programming [Part 2]

Think Functional


Nivin Anton Alexis Lawrence

Human

CSC5593

Literature Report

The paper is about the practical construction of suffix tree that focuses on disk-based solution. The researchers talk about already existing methods for suffix tree construction in O(n). But, the problem with O(n) Ukkenons algorithm is that, suffers poor locality of reference. The random I/O becomes a bottleneck when suffix tree construction exceeds main memory. So, to overcome the above limitation the author suggested O(n^2) Top Down Disk Based algorithm that outperforms already existing method. The main reason for poor locality is because of suffix links that randomly traverses from one subtree to other costing random disk I/O.

In this method, they don’t use suffix link for the construction, instead, it uses WOTD algorithm. The algorithm consists of two phases - partitioning & construction of suffix subtree. In simple terms, the algorithm partitions the input suffix string into n partition. N denotes the character set in the input string. In the second phase, it uses the WOTD algorithm for construction of sub-tree, which is later merged to the primary suffix tree. The random I/O cost is reduced by cleverly allocating memory and using the right buffer replacement policy. The WOTD algorithm uses 4 data-structure - String, Suffixes, Temp and Tree Buffer. After a thorough study of memory used and access pattern, they allocate more buffer to String and the least buffer to the Tree. Main reason for String to be allocated the huge chunk is because of random access pattern.

At last, it’s compared with existing disk-based algorithm and its studied that it outperforms the existing techniques like TOP-Q, Ukkenons, and Hunt. The algorithm performs even better for the large character set. So to conclude, understanding the memory access and addressing the locality of reference, an algorithm with O(n^2) can outperform the theoretical O(n) algorithm.

Reference:

Tata, S., Hankins, R.A., Patel, J.M.: Practical suffix tree construction. In: Proceedings of 30th International Conference on Very Large Data Bases, pp. 36–47 (2004)