The post-doctoral project conducted by the candidate deals with algorithms to compute the Burrows-Wheeler transform (BWT), and investigate string similarity measures based on the BWT. Recently, Egidi, Louza, Manzini and Telles  introduced an external memory algorithm to compute the BWT together with the LCP array for a string collection, and showed how to output the document array (DA). The authors also suggested a simple, scan based, external memory algorithm based on the BWT and DA to construct de Bruijn graphs in a succinct representation. The objective of this internship project is to implement the computation of DA together with the BWT, and present a practical external memory algorithm to compute succinct de Bruijn graphs for large string collections.
News published in Agência FAPESP Newsletter about the scholarship: