FM-index

In com­put­er sci­ence, an FM-index is a com­pressed full-text sub­string index based on the Bur­rows-Wheel­er trans­form, with some sim­i­lar­i­ties to the suf­fix array. It was cre­at­ed by Paolo Fer­rag­i­na and Gio­van­ni Manzini, who describe it as an oppor­tunis­tic data struc­ture as it allows com­pres­sion of the input text while still per­mit­ting fast sub­string queries. The name stands for Full-text index in Min­ute space.

It can be used to effi­cient­ly find the num­ber of occur­rences of a pat­tern with­in the com­pressed text, as well as locate the posi­tion of each occur­rence. Both the query time and stor­age space require­ments are sublinear[clarification need­ed] with respect to the size of the input data.

The orig­i­nal authors have devised improve­ments to their orig­i­nal approach and dubbed it “FM-Index ver­sion 2″. A fur­ther improve­ment, the alpha­bet-friend­ly FM-index, com­bi­nes the use of com­pres­sion boost­ing and wavelet trees to sig­nif­i­cant­ly reduce the space usage for large alpha­bets.

The FM-index has found use in, among oth­er places, bioin­for­mat­ics.

Leave a Reply

Your email address will not be published. Required fields are marked *