Here's what I've discovered about arena mechanics so far:

In the Scavenger Hunt, Obstacle Course, and Hide and Seek, your familiar's performance seems to be independent of your opponent's. (That is, your familiar doesn't get better or worse depending on the competition.) This may be the case in the Ultimate Cage Match too, and then the displayed result is just related to the difference of the independently-generated results, but I have no way to tell that. Note that ties go to you; the "20-round" cage match is probably also a tie, since it does not seem possible to lose a 20-round cage match.

Most familiars have one event in which they are absolutely terrible (the results text will mention something); generally, they will always lose to an opponent which is not terrible in that event regardless of level, and a match against another familiar terrible in that event is pretty much a tossup regardless of level. However, familiars also have strengths and weaknesses in the other three events, too, and taking advantage of these is important to winning. It appears that each familiar level is worth about 1 point in the competition (see table at bottom for full data).

The following indicates results for my testing. All testing is with 25-pound familiars (using either empathy or the 5-lb. equipment), and all cage matches are against a 25 lb. star starfish. In the cage match, "16L" indicates a loss in 16 rounds, and so forth. Results are based on 5 data points.

Experience: If you win by 5 or fewer seconds or items, you gain 5 experience, winning by 6 yields 4 experience, winning by 7 yields 3 experience, and winning by 8 or more yields 2 experience. Cage matches work the same way except they're measured as distance from 20 rounds (i.e. winning in 15 or more rounds gives 5 experience, 14 rounds is 4 experience, and so forth). If you're trying to maximize your experience, therefore, you should try to compete with a house familiar of similar or slightly higher level in a competition where they're weak and your familiar is strong; or, you can compete with a house familiar of lower level in a competition where they're strong and your familiar is weak. The best strategy will probably be determined by whatever house familiars are present on a given day.

Here are two tables with all of the familiar information that I have. The first table just gives a general qualitative description of the familiar's performance. The second has the total data. Note that being "terrible" is much, much worse than any familiar which isn't terrible; brackets in the second table indicate an event in which the familiar is terrible.

The Rye also has a nice chart of familiar arena performance [WWW]here. And if you want something even more convenient, Xylpher has created a very useful page which will give you advice on how best to level up your familiar [WWW]here.

mosquito poor poor very good terrible
leprechaun very poor very good terrible average
potato terrible poor average good
goat very good terrible good poor
lime very good terrible average poor
dice good good average average
skeleton average very good very poor terrible
barrrnacle terrible average poor very good
monkey very poor very good average terrible
stab bat very good average very poor terrible
grue average terrible poor very good
volleyball terrible poor very good good
ghuol poor average terrible very good
gravy fairy terrible very good very poor average
cocoabo poor very good terrible very poor
starfish average poor good terrible
sombrero terrible very good average very poor
pickle very good average terrible good
killer bee good poor good terrible
Jill-O-Lantern very good poor average very poor
hand turkey very good good poor poor
Crimbo elf very poor good poor very good
dreidl good poor very good very poor
baby yeti very good poor poor average
feather boa poor very good very poor poor
raincloud average poor good very poor
doppelshifter very good very good very good good

Here's the raw numbers:

mosquito 16L-18W 32-35 21-23 [29-31]
leprechaun 16L-20W 38-40 [49-51] 54-56
potato [2L] 31-35 24-26 57-59
goat 15W-19W [9-11] 23-26 52-54
lime 15W-19W [11-13] 24-26 51-54
dice 19W-20W 35-38 25-27 53-56
skeleton 18L-19W 38-41 28-30 [30-31]
barrrnacle [2L] 35-36 26-29 58-59
monkey 15L-18L 37-41 24-25 [30-32]
stab bat 16W-18W 34-36 26-30 [29-31]
grue 18L-18W [9-12] 26-29 58-60
volleyball [2L] 33-34 21-23 54-57
ghuol 16L-20W 34-38 [49-50] 58-60
gravy fairy [2L] 37-40 26-30 54-56
cocoabo 18L-20W 38-40 [48-51] 50-52
starfish 19L-19W 31-35 22-23 [29-31]
sombrero [2L] 38-40 24-26 50-54
pickle 14W-18W 34-35 [49-52] 54-57
killer bee 19L-17W 32-35 24-25 [29-31]
Jill-O-Lantern 15W-18W 32-34 24-27 50-53
hand turkey 14W-20W 36-37 27-28 51-54
Crimbo elf 15L-20W 35-38 27-29 57-60
dreidl 19L-17W 32-34 20-22 50-54
baby yeti 16W-17W 32-35 27-29 53-57
feather boa 16L-19W 38-41 27-30 51-54
raincloud 19L-18W 32-34 21-24 50-53
doppelshifter (17W-19W) 38-41 20-24 56-59

Information on the coffee pixie, cheshire bat, and inflatable dodecapede will be coming shortly. Thanks to Derilkio for contributing the Doppelshifter data.

Note: The grue's performance seems to be pretty independent of the moon phase.

Here's some results of my testing of the effects of familiar levels, which seems to indicate that in general, a gain in level is worth about 1 point. All testing is done with a star starfish in the Scavenger Hunt event (with empathy in all cases except the first five).
levelitems found
1 9.6
2 9.0
3 9.4
4 12.0
5 13.0
6 13.6
7 14.2
8 15.2
9 16.6
10 17.2
11 19.6
12 20.6
13 20.0
14 22.0
15 23.3
16 25.0
17 24.7
18 26.8
19 27.0
20 27.7
21 28.6
22 29.5
23 30.4
24 32.2
25 32.2
26 33.5
27 35.0
28 36.3
29 37.4
30 37.8

