Mnesia performance basics

I thought I’d continue the whirlwind guide to Mnesia that I began in Getting started with Mnesia. I have no idea how many of the thousands of readers actually found the last part useful. At least one, judging by the comments, but more to the point it’s rather handy for me to write this stuff down as I’m flitting between various new (to me) things at the moment.

Let’s start with indexing. Last time we defined a table and queried it as follows:

-record(animal, {name,legs,noise}).
 
mnesia:create_table(animal,
                         [{disc_copies, [node()]},
                          {attributes, record_info(fields,animal)}]),
 
Fun fun() ->
  mnesia:match_object({animal,'_',4,'_'})
  end,
{atomic,Results}=mnesia:transaction(Fun).

Our query is returning us all animals with 4 legs, but if we’ve got any sizable amount of records in that table it’s going to run like a dog. The solution is simple:

mnesia:create_table_index(animal,legs).

The legs field is now indexed, and the match_object function will automatically pick up on that and give us more sensible performance that doesn’t involve full table scanning. Remember, as I said last time, that the first field is always considered to be the key, and is always indexed, so you only need to explicitly create an index for additional fields you intend matching against.

Another way of speeding things up is using ‘dirty’ access methods. These involve bypassing the transaction layer altogether, losing the Atomicity and Isolation from Mnesia’s ACID properties, but gaining a major reduction in overhead. Most of the normal access functions have a dirty equivalent – for example, dirty_match_object is the direct equivalent of the match_object function we’re using above. Methods specific to dirty operations are also available, including the wonderfully specific dirty_update_counter.

Using replicated copies of a table across multiple nodes, as well as providing redundancy, allows parallel access to the data. I can see a case for using this to improve performance by distributing the workload across multiple nodes, although in this case it needs to be a table which is read often and written occasionally, since there is a transactional penalty to writing to replicated tables.

Table fragmentation (i.e. the ability to distribute a table across multiple nodes) has performance possibilities too, since with different parts of the table on different nodes, operations can be conducted in parallel between them. I haven’t got fragmentation up and running yet though, so I don’t know how well this will work in practice.

When you consider that in addition to the above, Mnesia already has a head start due to the fact that it’s running in the same address space as the code, and with no mismatch between the Mnesia and Erlang to slow things down, the potential for high performance looks very good indeed. I’ve already got an Mnesia/Erlang back-end up and running which seems to be outperforming the MySQL/Erlang equivalent, although my tests so far are rather unscientific. More to come on this when time allows.

1 comment

  1. Ken’s avatar

    Hello there!

    I realize this is WAY off topic, but I couldn’t find anywhere else on your site to ask this question (no email contact info).

    You mention on your Hardwar page that the rights of the source code may now belong to The Crown. Who, or what office, would I contact about, perhaps, purchasing the rights to the source code (I figure that a small sum might be worth it, especially as I could then provide it to the Hardwar Community … as long as they agreed not to try and use it for profit themselves)? I tried to google up an office that might be able to assist me, but unfortunately, I know almost nothing about UK office names, etc. I only know American Agencies. Thank you for your reply.

Comments are now closed.