Monday, December 5, 2011

Checkers AI, part 2

Today I’d like to wrap up the game logic of our checkers-playing AI. Last time we defined the players, pieces, and the board, and operations on them. Today let’s write the game-playing code.

moves module

The moves module simply links the position module to the board module, moving pieces around a board in accordance with the patterns defined in position.

Just at position defines two movement patterns (move and jump), we will define two predicates to update a board based on these patterns: do_move and do_jump.
:- pred do_move(color, position, position, board, board).
:- mode do_move(in, in, in, in, out) is semidet.
:- mode do_move(in, in, out, in, out) is nondet.
:- mode do_move(in, out, out, in, out) is nondet.

:- pred do_jump(color, position, position, board, board).
:- mode do_jump(in, in, in, in, out) is semidet.
:- mode do_jump(in, in, out, in, out) is nondet.
:- mode do_jump(in, out, out, in, out) is nondet.
The three modes defined for each predicate allow us to check if a given move or jump is valid for the given player, to enumerate the moves or jumps available to a piece at a given position, and to enumerate all moves and jumps available on the board, respectively. Implementation is straightforward:
:- import_module piece.

do_move(Color, From, To, !Board) :-
        move(Piece, From, To),  
        move_piece(From, To, Piece, !Board),
        color(Piece) = Color.

do_jump(Color, From, To, !Board) :-
        jump(Piece, From, Jumped, To),
        move_piece(From, To, Piece, !Board),
        color(Piece) = Color,
        remove_piece(Jumped, Other, !Board),
        opponent(color(Piece), color(Other)).
Since in checkers, a piece must jump if it can jump, we will later find it useful to check whether a piece can jump from a given position without the overhead of actually updating the board. Checking whether a piece can move will also be useful for determining the end of game, so let’s define both can_move and can_jump:
:- pred can_move(color, position, board).
:- mode can_move(in, in, in) is semidet.
:- mode can_move(in, out, in) is nondet.

can_move(Color, From, Board) :-
        move(Piece, From, To),
        piece_at(Board, From, Piece),
        no_piece_at(Board, To),
        color(Piece) = Color.

:- pred can_jump(color, position, board).
:- mode can_jump(in, in, in) is semidet.
:- mode can_jump(in, out, in) is nondet.

can_jump(Color, From, Board) :-
        jump(Piece, From, Jumped, To),
        piece_at(Board, From, Piece),
        no_piece_at(Board, To),  
        color(Piece) = Color,
        piece_at(Board, Jumped, Other),
        opponent(color(Piece), color(Other)).
Note that we could simply have defined can_move(Color, From, Board) :- do_move(Color, From, _, Board, _). (and similarly for can_jump, but, since Mercury neither is lazy nor tracks unused arguments, this would entail the unnecessary computational overhead of updating the board.

game module

The game module will house the state of the game, and logic for generating possible successor states based on the rules of the game. The state of the game is given by whose turn it is, whether they are in the middle of a jump or not (in checkers, after jumping a piece, you may immediately jump another with the same piece), the history of moves (to check for draws) and the state of the board:
:- import_module board, color, position.
:- type game --->
    game(turn::color,
         jumping_state::jumping,
         history::history,      
         board::board).
:- type jumping_state ---> not_jumping; jumping(position).
In history, we need only remember past board states for each player. (This is because, as we’ll see later, we don’t need to track history of jumps.)
:- import_module set.
:- type history == set({color, board}).

Initialization of the game

Initializing the game state is easy. Red goes first and is not in the middle of a jump:
:- func init(int, int) = game.
init(Width, Height) = game(red, not_jumping, init, init(Width, Height)).
Checkers is played on an 8×8 board by default, so let’s provide initializers for this case as well:
:- func init = game.
init = init(8, 8).

:- pred init(game::out) is det.
init(init).
(The init predicate is redundant but some programmers prefer that style over zero-argument functions so we’ll provide it.)

Rules of the game

The change from one game state to the next is uniquely identified by the starting and ending locations of some piece on the board. We can declare a predicate which updates the game state based on this information, or generates possible new game states, as follows:
:- pred step(position, position, game, game).
:- mode step(in, in, in, out) is semidet.
:- mode step(in, out, in, out) is nondet.
:- mode step(out, out, in, out) is nondet.
The definition of step itself is slightly hairy, as it must follow the rules of checkers. I’ve broken it into two cases, corresponding to when a player begins their move (jumping_state is not_jumping), and when a player is forced to continue a move because a further jump is available (jumping_state is jumping):
step(From, To, game(Color0, not_jumping, !.History, !.Board) @ Game0,
               game(Color, Jumping, !:History, !:Board)) :-          
    % The game can only proceed if it is not a draw.
    not is_draw(Game0),
    % We must jump if it is possible.
    (can_jump(Color0, _, !.Board) ->
        % Since are removing a piece, history doesn't matter any more.
        init(!:History),
        % Update the board.
        do_jump(Color0, From, To, !Board),
        % If more jumps are possible, remember this fact.
        (can_jump(Color0, To, !.Board) ->
            Color = Color0,              
            Jumping = jumping(To)
        ;
            opponent(Color, Color0),
            Jumping = not_jumping   
        )
    ;    
        % Add the last position to our history.
        insert({Color0, !.Board}, !History),
        % Update the board.
        do_move(Color0, From, To, !Board),
        % It is now our opponent’s turn.
        opponent(Color0, Color),
        Jumping = not_jumping
    ),
    % If our move or jump ended in the king’s row, king it.
    (in_kings_row(!.Board, Color0, To), king_piece(To, !Board) -> true; true).

step(From, To, game(Color0, jumping(Pos), _, !.Board),
               game(Color, Jumping, init, !:Board)) :-
    % History is ignored while we are jumping.
    % We can only continue a jump from where we left off.
    From = Pos,
    % Update the board.
    do_jump(Color0, From, To, !Board),
    % If more jumps are possible, remember this fact.
    (can_jump(Color0, To, !.Board) ->
        Color = Color0,
        Jumping = jumping(To)
    ;
        opponent(Color, Color0),
        Jumping = not_jumping
    ),
    % If our jump ended in the king’s row, king it.
    (in_kings_row(!.Board, Color0, To), king_piece(To, !Board) -> true; true).
(is_draw is defined below.) Phew! That, fortunately, is the single most complex chunk of code in the entire implementation.

Checking the state of the game

It is important to provide predicates to determine whether a game has finished, and if so, who (if anyone) has won. First, we define can_step as a proxy for step which does not actually create a new game state:
:- pred can_step(game::in) is semidet.

can_step(game(Color, not_jumping, _, Board) @ Game) :-
    not is_draw(Game),
    (can_jump(Color, _, Board); can_move(Color, _, Board)).

can_step(game(_, jumping(_), _, _)).
(Again, as with can_move and can_jump, we could have defined this in terms of step, but we would perform unnecessary computation.)

We can then define is_draw to determine whether the current game state is a draw or not (i.e. whether the current state exists in previous history):
:- pred is_draw(game::in) is semidet.

is_draw(game(Color, not_jumping, History, Board)) :-
    member({Color, Board}, History).
Finally, we can determine whether a game is over, and if so, who the winner is (if any) as follows:
:- pred game_over(game::in, maybe(color)::out) is semidet.

game_over(Game, Winner) :-
    if is_draw(Game) then Winner = no
    else
        not can_step(Game),
        Winner = yes(opponent(Game^turn)).

game_io module

We’re almost ready to test our game logic with human players. To do this, we’ll first need to declare predicates to print the state of the game, and to allow the human player(s) to choose a move:
:- import_module board, color, game, piece.
:- import_module io, maybe.

:- pred print_color(color::in, io::di, io::uo) is det.
:- pred print_piece(piece::in, io::di, io::uo) is det.
:- pred print_board(board::in, io::di, io::uo) is det.
:- pred print_game(game::in, io::di, io::uo) is det.
:- pred print_winner(maybe(color)::in, io::di, io::uo) is det.

:- pred choose_move(list(game)::in(non_empty_list), result(game)::out,
        io::di, io::uo) is det.
These predicates are all fairly straightforward, so I’ll elide most of them here to save space. Of note is choose_move, which presents each game state in the list in turn to the player, asking if the player wishes to choose that move. If the player chooses none, the sequence is repeated. Note that the mode of this argument is given as in(non_empty_list) to ensure that an empty list of moves is not passed, which would result in an infinite loop. Here is the code, which is mostly implemented by choose_move_aux:
choose_move(Games, Chosen, !IO) :- choose_move_aux(Games, Games, Chosen, !IO).

:- pred choose_move_aux(list(game)::in(non_empty_list), list(game)::in,
    result(game)::out, io::di, io::uo) is det.

choose_move_aux(AllGames, [], Chosen, !IO) :-
    write_string("Repeating...\n\n", !IO),
    choose_move_aux(AllGames, AllGames, Chosen, !IO).

choose_move_aux(AllGames, [Game | Games], Chosen, !IO) :-
    print_board(Game^board, !IO), nl(!IO),
    write_string("Ok? ", !IO),
    read_line_as_string(Res, !IO), nl(!IO),
    (
        Res = ok(String),
        (prefix(String, "y") -> Chosen = ok(Game);
         choose_move_aux(AllGames, Games, Chosen, !IO))
    ;
        Res = eof, Chosen = eof
    ;
        Res = error(Err), Chosen = error(Err)
    ).

checkers module

Finally we can wrap everything together with our main module, checkers.

Players

To make our code modular, let’s define a typeclass player:
:- import_module game, io.
:- typeclass player(Player) where [
    pred make_move(Player::in, game::in, game::out, io::di, io::uo)
        is cc_multi
].
This means that a player is any type Player which has defined for it a make_move predicate with the given signature. This allows us to implement various “players”, each with a different choose_move predicate.

By including the IO state and declaring the determinism of make_move as cc_multi, we allow both for players which must interact with IO (e.g., humans), and for players which may choose a move arbitrarily (e.g., search algorithms which find several optimal choices).

Now we can implement various human and AI players which conform to this interface. Since we’re just testing for now, we’ll define a human player like so:
:- import_module game_io, list, require, solutions.

:- type human_player ---> human_player.
:- instance player(human_player) where [
    (choose_move(human_player, Game0, Game1, !IO) :-
        unsorted_solutions(
            pred(Game::out) is nondet :- step(_, _, Game0, Game),
            Moves),                                              
        (
            Moves = [_|_],
            make_move(Moves, Res, !IO),
            (                            
                Res = ok(Game1)
            ;                  
                Res = eof, error("End of file")
            ;                                  
                Res = error(Err), error(error_message(Err))
            )                                              
        ;    
            Moves = [],
            unexpected($module, $pred, "Game is not over but no more moves!")
        )
    )    
].
(The dummy type human_player is needed so we can declare it as an instance of the typeclass player.) unsorted_solutions returns an arbitrarily ordered list of all the possible moves from a given position. We assert that this list should not be empty (as the player should not have been asked to make a move when there are no moves available, as this means that the game is over!), and forward the rest of the work to our previously-defined game_io.choose_move.

We can also define a dummy AI that chooses a move arbitrarily like so:
:- type dummy_player ---> dummy_player.
:- instance player(dummy_player) where [
    (make_move(dummy_player, Game0, Game1, !IO) :-
        unsorted_solutions(                       
            pred(Game::out) is nondet :- step(_, _, Game0, Game),
            Moves),                                              
        Game1 = det_head(Moves))
].

Why typeclasses?
You may rightly ask, “why not simply declare a type player ---> human_player; dummy_player and forget the whole typeclass business?” By using typeclasses, we can provide new player implementations in other modules, rather than stuffing them all into one possibly large module.

Of course, another way of doing this would be using higher-order predicates: we could declare players as a predicate type. The problem with this approach is that we may at some point determine that a player should have more predicates associated with it (such as one defining their name or difficulty), at which point we will need to pass around multiple predicates having messy types and instantiations. Using typeclasses allows us to redefine what a player is more easily.

The main loop

Last but not least, we will define a loop play_game, which takes a game state and two values whose types are instances of the player typeclass to represent the two players. The implementation is straightforward:
:- pred play_game(game::in, Red::in, White::in, io::di, io::uo) is cc_multi
    <= (player(Red), player(White)).

play_game(Game0, Red, White, !IO) :-
    print_game(Game0, !IO), nl(!IO),
    (game_over(Game0, _) -> true;
        (
            Game0^turn = red,
            make_move(Red, Game0, Game1, !IO)
        ;
            Game0^turn = white,
            make_move(White, Game0, Game1, !IO)
        ),
        play_game(Game1, Red, White, !IO)
    ).
In this predicate, make_move will call the make_move predicate associated with the type of Red or White as appropriate.

Last but not least, the main predicate will start a game between two players (in this case, a human_player and a dummy_player):
:- pred main(io::di, io::uo) is cc_multi.
main(!IO) :- play_game(init, human_player, dummy_player, !IO).

Trying it out

Thanks to Mercury’s built-in build system, we can compile the whole shebang with simply mmc --make checkers. Running it lets us play a game versus the dummy AI:
 o o o o
o o o o 
 o o o o
· · · · 
 · · · ·
x x x x 
 x x x x
x x x x 

X's move.
It’s our move first. Let’s cycle through the options:
 o o o o
o o o o 
 o o o o
· · · · 
 · · · x
x x x · 
 x x x x
x x x x 

Ok? n

 o o o o
o o o o 
 o o o o
· · · · 
 · · x ·
x x x · 
 x x x x
x x x x 

Ok? n

 o o o o
o o o o 
 o o o o
· · · · 
 · · x ·
x x · x 
 x x x x
x x x x 

Ok? n

 o o o o
o o o o 
 o o o o
· · · · 
 · x · ·
x x · x 
 x x x x
x x x x 

Ok? n

 o o o o
o o o o 
 o o o o
· · · · 
 · x · ·
x · x x 
 x x x x
x x x x 

Ok? n

 o o o o
o o o o 
 o o o o
· · · · 
 x · · ·
x · x x 
 x x x x
x x x x 

Ok? n

 o o o o
o o o o 
 o o o o
· · · · 
 x · · ·
· x x x 
 x x x x
x x x x 

Ok? n

Repeating...
OK, we’ve seen all of them. Let’s choose the first:
 o o o o
o o o o 
 o o o o
· · · · 
 · · · x
x x x · 
 x x x x
x x x x 

Ok? y

 o o o o
o o o o 
 o o o o
· · · · 
 · · · x
x x x · 
 x x x x
x x x x 
Let’s make sure the computer picks a valid move.
O's move.

 o o o o
o o o o 
 o o o ·
· · · o 
 · · · x
x x x · 
 x x x x
x x x x 
Hooray! Of course, now it’s our move again. This could go on for a while. Testing it proves that everything works however!

Conclusion

Today we covered building the game logic in the moves and game module, and I/O and the main loop in the game_io and checkers modules. Next week we’ll finish up by adding an AI based on the minimax algorithm.

Tuesday, November 29, 2011

Checkers AI, part 1

Logic languages such as Mercury love search problems. Game artifical intelligence is a great application of searches. AIs for many zero-sum two-player board games, such as checkers and chess, can be implemented as a minimax search; a search which assumes that each player makes a move which maximally benefits himself. In today’s post, I’d like to walk through modeling checkers in Mercury.

Architecture

The overall architecture of our checkers game will consist of the following modules:
color
The two colors (players) in the game and their relation (opponents).
piece
The four kinds of pieces in the game and functions over them (namely, kinging).
position
A representation of the position of a piece, and predicates defining useful relationships between positions (namely, moving and jumping).
board
A representation of the checkers board, predicates about positions on it, predicates about pieces on it, and predicates to modify a board in useful ways (namely, moving, removing, and kinging pieces).
moves
Predicates to update a board via the patterns described in position.
game
An encoding of the state of a game, a predicate to modify it (by making a move according to the rules of checkers), and predicates to query the state of a game (namely, whether the game has ended, and the result if so).
util
Some utility functions for dealing with integers and lists; used primarily by minimax.
minimax
The minimax algorithm and related typeclasses.
heuristic
A declaration of the game as a minimax problem, with associated heuristics.
game_io
Predicates for printing various game structures in a human-readable format, and obtaining user input.
checkers
The main module, allowing configuration of the game and defining the overall UI.
This architecture ensures that modules are loosely coupled and small in size (< 120 loc each, average ~ 60 loc), making modifications and debugging easy. A similar overall architecture should prove useful for other game AI implementations. Let’s go into each module in detail. In today’s post, we’ll focus on color, piece, position, and board.

color module

There are two colors in checkers, corresponding to the two players:
:- type color ---> red; white.
They are related as opponents:
:- pred opponent(color, color).
:- mode opponent(in, out) is det.
:- mode opponent(out, in) is det.

opponent(red, white).
opponent(white, red).
We can also define an opponent function for convenience:
:- func opponent(color) = color.
opponent(Color) = Opponent :- opponent(Color, Opponent).
That’s it! Short and to-the-point modules are good. We’ll see that most modules in this game are similarly sized.

piece module

Pieces come in two colors, defined in color, and can be either men or kings:
:- import_module color.
:- type kind ---> man; king.
:- type piece ---> piece(color::color, kind::kind).
It is also useful to talk about “kinging” a piece; changing a man into a king:
:- func king(piece) = piece is semidet.
king(piece(Color, man)) = piece(Color, king).
This module is also short and sweet.

position module

A position is simply an X, Y tuple:
:- type position ---> position(int, int).
It is useful to talk about pieces changing position by moving and jumping. Let’s declare two predicates, move and jump, which will define the relationship between the starting and ending locations of a piece which moves or jumps, and in the case of a jump, the position which was jumped over.
:- pred move(piece, position, position).
:- pred jump(piece, position, position, position).
It will be most useful to know whether a particular change in position describes a move or jump, and given a starting position, what ending positions would result from a move or jump. We will declare two modes, corresponding to these use cases, for each predicate:
:- mode move(in, in, in) is semidet.
:- mode move(in, in, out) is multi.
:- mode jump(in, in, out, in) is semidet.
:- mode jump(in, in, out, out) is multi.
Because Mercury provides bidirectional arithmetic (i.e. + and - can deduce the values of one operand if the result is known), we can define these predicates with only one clause each:
:- import_module color.
:- import_module int.

move(Piece, position(X, Y), position(X + XD, Y + YD)) :-
    (XD = 1; XD = -1),
    (Piece = piece(red, man), YD = 1;
     Piece = piece(white, man), YD = -1;
     Piece = piece(_, king), YD = 1;
     Piece = piece(_, king), YD = -1).

jump(Piece, position(X, Y), position(X + XD, Y + YD),
     position(X + XD2, Y + YD2)) :-
    (XD = 1, XD2 = 2; XD = -1, XD2 = -2),
    (Piece = piece(red, man), YD = 1, YD2 = 2;
     Piece = piece(white, man), YD = -1, YD2 = -2;
     Piece = piece(_, king), YD = 1, YD2 = 2;
     Piece = piece(_, king), YD = -1, YD2 = -2).
Unfortunately any attempt to compact this code results in mode errors. This is because Mercury’s switch detection code is not very robust in the face of complicated switches, and because Mercury seems not to reorder conjunctions to obtain mode-correctness.

board module

First we define the type of a board, which should contain its width, height, a mapping from position to piece, and (for efficiency) a count (using the handy bag module) of how many of each type of piece is on the board:
:- import_module color, piece, position.
:- import_module bag, int, map.

:- type board --->
    board(
        width::int,
        height::int,
        positions::map(position, piece),
        counts::bag(piece)
    ).
Of course we could fix the width and height to 8, but it is easy and interesting to allow for other size boards. Since Mercury automatically generates “getter” functions for width and height, we need only declare them in our interface, even if we choose to hide the type of board:
:- func width(board) = int.
:- func height(board) = int.
Another trivial but important state accessor is count_pieces, which counts the number of a given kind of piece on the board:
:- func count_pieces(board, piece) = int.
count_pieces(Board, Piece) = count_value(Board^counts, Piece).
This function is implemented using the bag we added to the board type so that it may be quickly executed by our heuristic function (defined later).

Initializing the board

Next we define how to initialize a board of a given size:
:- func init(int, int) = board.
init(Width, Height) = !:Board :-
    !:Board = board(Width, Height, init, init),
    fold_up(init_row(piece(red, man)), 0, Height / 2 - 2, !Board),
    fold_up(init_row(piece(white, man)), (Height + 1) / 2 + 1, Height - 1, !Board).
The int.fold_up predicate used here is Mercury’s idea of a for loop. init_row uses solutions.aggregate and legal_position (defined later) to iterate over the valid positions in a row:
init_row(Piece, Y, !Board) :-
    aggregate(
        (pred(Pos::out) is nondet :-
            legal_position(!.Board, position(_, Y) @ Pos)),
        (pred(Pos::in, !.Board0::in, !:Board0::out) is det :-
            !Board0^positions :=
                det_insert(!.Board0^positions, Pos, Piece),
            !Board0^counts :=
                insert(!.Board0^counts, Piece)),
        !Board).
Note that we call legal_position with a partially instantiated position, position(_, Y), which is also bound to Pos. This means that we need a mode of legal_position which can fill in valid X coordinates of a given row.

Legal positions on the board

Here is a suitable declaration of legal_position:
:- pred legal_position(board, position).  
:- mode legal_position(in, in) is semidet.
:- mode legal_position(in, out) is nondet.
:- mode legal_position(in, bound(position(ground, free)) >> ground) is nondet.
:- mode legal_position(in, bound(position(free, ground)) >> ground) is nondet.
All those modes! Beside the modes where the position argument is in and out (corresponding to checking a legal position and generating all legal positions), the other two modes work with partially instantiated positions, where we wish to know the legal positions in a given column or row (respectively). Thankfully, all four modes can be implemented with one body:
legal_position(Board, position(X, Y)) :-
    int_in_range(0, Board^height - 1, Y),
    int_in_range(0, Board^width - 1, X),
    (even(X), even(Y); odd(X), odd(Y)).
Unfortunately the predicate nondet_int_in_range in Mercury’s int module does not provide a semidet mode and is thus unsuitable for bidirectional programming. Instead, we declare a similar predicate, int_in_range, in our util module:
:- pred int_in_range(int, int, int).
:- mode int_in_range(in, in, in) is semidet.
:- mode int_in_range(in, in, out) is nondet.
To define this predicate, we can make use of Mercury’s support for defining different clauses for different modes:
:- pragma promise_equivalent_clauses(int_in_range/3).
int_in_range(Lo::in, Hi::in, X::out) :- nondet_int_in_range(Lo, Hi, X).  
int_in_range(Lo::in, Hi::in, X::in) :- Lo =< X, X =< Hi.
Because we know that these two clauses are logically equivalent, we promise such to Mercury using promise_equivalent_clauses.

Locating pieces

To query which pieces are at which positions, we declare piece_at. piece_at has two modes, one for checking which piece (if any) is at a given position, and one for iterating over all pieces currently on the board:
:- pred piece_at(board, position, piece).
:- mode piece_at(in, in, out) is semidet.
:- mode piece_at(in, out, out) is nondet.
Our implementation of piece_at should be a thin layer over the positions map, but Mercury’s map module does not provide one predicate which serves both purposes. Again, we can remedy this by using different clauses for each mode:
:- pragma promise_equivalent_clauses(piece_at/3).

piece_at(Board::in, Position::in, Piece::out) :-
    search(Board^positions, Position, Piece).   

piece_at(Board::in, Position::out, Piece::out) :-
    member(Board^positions, Position, Piece).
We can also define no_piece_at, which has an identical signature, as:
no_piece_at(Board, Position) :-
    legal_position(Board, Position),
    not piece_at(Board, Position, _).
Note that no_piece_at(…) is not simply not piece_at(…)! By additionally requiring that Position be a legal position, we restrict the domain of no_piece_at and thus allow it to output the finite number of positions where no piece exists (as opposed to the infinite number of positions which either have no piece or are not legal).

Modifying the board

There are three predicates in board to modify the board: move_piece, remove_piece, and king_piece. move_piece and remove_piece have modes which allow for both modifying a given location(s), and for generating all possible such modifications of the board. king_piece has only one mode to modify a given location. All three have similar code utilizing state variable field update syntax. Here is the implementation of king_piece as an example:
king_piece(Pos, !Board) :-
    piece_at(!.Board, Pos, Piece),
    King = king(Piece),
    !Board^positions := update(!.Board^positions, Pos, King),
    !Board^counts := det_remove(!.Board^counts, Piece),
    !Board^counts := insert(!.Board^counts, King).
It is my opinion that Mercury would benefit from an extension to state variable update syntax which allows state variable record syntax for data terms. Such a syntax would, in concert with the standard library argument order modifications introduced in Mercury 11.07, allow the above predicate to be written more concisely as:
king_piece(Pos, !Board) :-
    piece_at(!.Board, Pos, Piece),
    King = king(Piece),
    update(Pos, King, !Board^positions),
    det_remove(Piece, !Board^counts),
    insert(King, !Board^counts).
(Of course, the update functions allowed to be used with such a syntax must be restricted to those which commute with each other.)

Review

That’s all for now, this post is long enough! Today we saw the overall structure of our checkers AI – ten small modules – and the implementations of color, piece, position, and board, the modules defining the “parts” of the game. Next week we’ll see the implementations of the “rules” and “brains” of the game.

Wednesday, November 9, 2011

Making the grade

I know it’s been a while since my last post, but I have something fantastic in the works, I promise. Today though I’d like to talk about the compiler grades.

Common grades

Compiler grades correspond roughly to combinations of different backends (e.g. C or Java) and code models (e.g. parallel or debugging). Those of you new to the language may be confounded by the immense number of these grades, and when initially compiling Mercury it can be difficult to decide which grades to enable. Here’s a list of the ones I use most. In parentheses next to the grade names are the options needed to select the grade, assuming that asm_fast.gc is the default (it usually is).
asm_fast.gc (default)
This is my default go-to grade. It compiles to low-level C code using the Boehm garbage collector and supports concurrency (though not parallelism). It is therefore very fast, can interface seamlessly with C libraries, and can run code that uses threads.
asm_fast.par.gc.stseg (--parallel --stack-segments)
This grade adds parallelism and stack segmentation to the asm_fast.gc grade. Parallelism allows programs to make use of multiple cores (though note that this isn’t necessary if all you need is concurrent execution). Stack segmentation allows pieces of the stack to be allocated on the heap. Beside allowing programs to grow beyond a fixed stack size, this option (only available for the asm_fast grades) greatly reduces the amount of memory allocated by thread spawns, which can otherwise measure in the gigabytes. Both these features come at a small runtime cost.
hlc.gc (-H)
This is the “high-level C” grade. It compiles to C code which is much closer to what a programmer might write than that generated by the asm_fast grade. Curiously this grade is sometimes faster than the low-level grade (likely due to an increased number of optimization opportunities available to GCC), so if you need speed, it is worth testing your application in this grade as well. Unlike asm_fast however, this grade does not support concurrency.
hlc.par.gc (-H --parallel)
This grade adds to hlc.gc support for concurrency and parallelism via native system threads. If your application requires a 1:1 mapping of Mercury threads to system threads, then this grade is a must.
java (--java)
As the name implies, this grade compiles to Java code. It is the only option if you need to interface with Java libraries. Due to the JVM’s speed the code it generates is often as fast as its C counterparts. This grade supports concurrency and parallelism via Java’s built-in threading mechanisms (--parallel is not needed). (Note that unlike in the C grades, a program compiled under the java grade will not wait for all spawned threads to terminate before exiting.) Unfortunately some standard library procedures are not yet implemented.
asm_fast.gc.prof (-p); hlc.gc.prof (-H -p)
These grades allow you to profile your code using mprof, of course at the expense of execution speed. They are handy to have around.
asm_fast.gc.debug (--debug)
This grade allows you to debug your code using mdb. Note that if you’re not debugging the standard library, then instead of compiling this grade, you can instead pass the --trace deep option to the compiler to enable debugging of just your modules.

Compiling extra grades

We all know that the Mercury build process takes forever. This is because it must compile the standard library in each of the required grades. Fortunately there’s a shortcut around this.

The first time you install the system, instead of make install, run make install LIBGRADES=asm_fast.gc. This will install the compiler, but only the default library grade. You can pass LIBGRADES= if you wish instead to install no library grades.

Later, when you wish to install additional grades, run make install_grades LIBGRADES='asm_fast.par.gc hlc.gc' (for example). This will compile only the given library grades. To complete the installation, you should edit /path/to/mercury/lib/mercury/conf/Mercury.config to add the appropriate --libgrade flags to the end of DEFAULT_MCFLAGS (near the end of the file), and /path/to/mercury/lib/mercury/mmake/Mmake.vars to add the grade names to the definition of LIBGRADES. (These steps are only necessary if you plan to build and install libraries.)

Don’t forget also that you can speed up compilation by passing PARALLEL=-j5 (where 5 is one plus the number of parallel cores in your system) to make (or simply -j5 to mmake).

Working with multiple grades

Finally, when testing with multiple grades, you may with to use the --use-grade-subdirs option in conjunction with mmc --make. This will cause mmc --make to use a different subdirectory for each grade in which you build your project, avoiding errors caused by linking incompatible grades, and allowing you not to have to rebuild the entire project each time you switch grades.

Monday, October 17, 2011

Parsing Expression Grammars part 3 (final)

So, last week, we built a C99 parser. A slow C99 parser. At the end we added one line of code that sped things up a bunch. Let’s look at that.

Why recursive descent parsing is slow

Right now our PEG parser is what’s known as a recursive descent parser – it recursively descends down the parse tree looking for matches. Unfortunately, recursive descent is very slow for two reasons:
  1. One grammar rule may try to parse a given span of input the same way multiple times.
  2. Multiple grammar rules may try to parse a given span of input the same way multiple times.
An example of the former is in the selection_statement production (Typedefs omitted for clarity):
selection_statement -->
    (kw("if"), p("("), expression, p(")"), statement, kw("else"), statement) -> [];
    (kw("if"), p("("), expression, p(")"), statement) -> [];
    (kw("switch"), p("("), expression, p(")"), statement).
The string “if (x > 0) x--; return;” will be parsed once by the first branch, which fails when it finds “return” instead of “else”, and will then be reparsed by the second branch, which will succeed. This sort of prefix-repetition can be eliminated by factoring out the repeated portion:
selection_statement -->
    (kw("if"), p("("), expression, p(")"), statement,
     ((kw("else"), statement) -> []; [])) -> [];
    (kw("switch"), p("("), expression, p(")"), statement).
Of course, this means that our code no longer is structurally identical to the grammar we are parsing, but this may be a welcome tradeoff.

An example of the latter, which is more pernicious, is given by, well, almost every production. Why? Consider statement:
statement -->
    labeled_statement -> [];
    compound_statement -> [];
    expression_statement -> [];
    selection_statement -> []; 
    iteration_statement -> []; 
    jump_statement.
Pretty innocuous, right? Take a look at each of the productions it invokes. Almost all of them start with a call to kw, which in turn calls token. In fact, all of them start with a call, directly or indirectly, to token. In fact, this is true of every production in the parser. token is not the only culprit but it is the most pervasive.

This problem can’t be solved through a simple restructuring of the grammar. For this we need a more powerful tool: memoization.

Memoization

Memoization is one of my favorite features. Simply put, memozation of a procedure means maintaining a table of its past input and output values, and looking up output values in this table when the procedure is called in the future with the same input values. Of course, this only works if the procedure has no desirable side effects; fortunately this is true of most Mercury code. Mercury provides built-in support for memoization via the memo pragma. Basic usage is so:
:- pragma memo(Name/Arity).
By default, Mercury memoizes inputs by value. This will catch all repeated calls to a procedure, but can be slow for large structures. We can specify that we would like Mercury to memoize inputs by address instead:
:- pragma memo(Name/Arity, [fast_loose]).
Memoizing by address is much faster but may miss calls with identical values at different addresses. However, because our primary data structure is a list of characters which is very large (meaning memoize-by-value will be slow) and which we are only destructing (meaning identical sublists of this list must be physically the same), we will memoize by address.

Profiling

Now, although we could memoize every procedure in our lexer and parser, memoization does have overhead, and so we should not use it on procedures which are not called often or with repeated inputs. We should limit our scope to grammar rules which are invoked by many different rules, or multiple times by one rule, since these are likely to be invoking the rule repeatedly on the same input.

To find such candidates for memoization, we can use runtime code profiling. To generate profiling data, we can simply compile with the -p switch and then run our program with our test from last week (minus the memoization of token we added at the end):
$ mmc -p --infer-all --make c_parser_direct
$ echo 'int main(void) {}' | ./c_parser_direct
Let that churn for a few seconds, and then kill it with Ctrl+C. Now run mprof to find out what’s been taking so long:
   %  cumulative    self              self    total                                                            
 time    seconds  seconds    calls  ms/call  ms/call name                                                      
 68.5       8.70     8.70 17602295     0.00     0.00 <function `string.to_char_list/1' mode 0> [1]
  7.1       9.60     0.90 17602483     0.00     0.00 <predicate `builtin.unify/2' mode 0> [2]     
  6.3      10.40     0.80   202300     0.00     0.00 <predicate `c_parser_aim.lexer.punctuator_rec/4' mode 0> [3]
  5.9      11.15     0.75   202388     0.00     0.00 <predicate `c_parser_aim.lexer.keyword_rec/4' mode 0> [4]   
  4.7      11.75     0.60 10114349     0.00     0.00 <predicate `c_parser_aim.lexer.punctuator_aux/3' mode 0> [5]
  4.3      12.30     0.55  7487945     0.00     0.00 <predicate `c_parser_aim.lexer.keyword_aux/3' mode 0> [6]
Over 95% of that run was spent just checking for keywords and punctuators! In particular, the biggest culprit, string.to_char_list, is highly suspect. To find out who’s calling it, we can run mprof -c [note: sometimes I have had to run mprof -cd; I can’t reproduce this now so I’m not sure why] and search for the callers of string.to_char_list:
                3.70        3.70 7487945/17602295    <predicate `c_parser_aim.lexer.keyword_rec/4' mode 0> [24]
                5.00        5.00 10114350/17602295    <predicate `c_parser_aim.lexer.punctuator_rec/4' mode 0> [22]
[21]    68.5    8.70        8.70 17602295         <function `string.to_char_list/1' mode 0> [21]
The [21] in the left-most column indicates that this section is dedicated to procedure #21, string.to_char_list, which was called 17,602,295 times. The two lines above this line indicate that both lexer.keyword_rec and lexer.punctuator_rec call string.to_char_list, 7,487,945 and 10,114,350 times respectively. Of course; we use string.to_char_list to convert our keywords and punctuators to character lists for these two procedures to match against the input data. But clearly we don't have 7,487,945 keywords or 10,114,350 punctuators, so string.to_char_list must be getting called with the same arguments repeatedly. Let’s fix that:
:- func to_char_list_memo(string) = list(char).
:- pragma memo(to_char_list_memo/1, [fast_loose]).
to_char_list_memo(S) = to_char_list(S).
Unfortunately we can’t instruct Mercury to memoize procedures in other modules, so we must define to_char_list_memo as above. We can replace the two calls to it in lexer.keyword_rec and lexer.punctuator_rec with the memoized version. Compiling and reprofiling tells us:
   %  cumulative    self              self    total                                                             
 time    seconds  seconds    calls  ms/call  ms/call name                                                       
 45.2       1.65     1.65 10400248     0.00     0.00 <function `c_parser_aim.to_char_list_memo/1' mode 0> [22]
 12.3       2.10     0.45 10400437     0.00     0.00 <predicate `builtin.unify/2' mode 0> [29]                
 12.3       2.55     0.45  5975244     0.00     0.00 <predicate `c_parser_aim.lexer.punctuator_aux/3' mode 0> [27]
 12.3       3.00     0.45   119606     0.00     0.01 <predicate `c_parser_aim.lexer.keyword_rec/4' mode 0> [24]   
  8.2       3.30     0.30  4425004     0.00     0.00 <predicate `c_parser_aim.lexer.keyword_aux/3' mode 0> [28]
  4.1       3.45     0.15   119517     0.00     0.02 <predicate `c_parser_aim.lexer.punctuator_rec/4' mode 0> [21]

Tracking down token

We’ve reduced the percentage of total runtime consumed by to_char_list, but it’s still getting called an awful lot. We know keyword_rec and punctuator_rec don’t call it any more often than they need to, so we need to find out who’s calling them. Let’s follow keyword_rec:
                0.45        1.64  119606/119606      <predicate `c_parser_aim.lexer.keyword/3' mode 0> [23]
[24]    45.0    0.45        1.64  119606         <predicate `c_parser_aim.lexer.keyword_rec/4' mode 0> [24]
                0.70        0.70 4425004/10400248    <function `c_parser_aim.to_char_list_memo/1' mode 0> [22]
                0.30        0.49 4425004/4425004     <predicate `c_parser_aim.lexer.keyword_aux/3' mode 0> [28]
keyword is the only caller, no surprise there. Let’s go up one more level:
                0.00        1.64  119606/119606      <predicate `c_parser_aim.lexer.token/3' mode 0> [1]       
[23]    45.0    0.00        1.64  119606         <predicate `c_parser_aim.lexer.keyword/3' mode 0> [23]
                0.00        0.00      57/239246      <predicate `c_parser_aim.lexer.identifier_nondigit/3' mode 0> [67]
                0.45        1.64  119606/119606      <predicate `c_parser_aim.lexer.keyword_rec/4' mode 0> [24]        
                0.00        0.00      57/836707      <predicate `char.is_digit/1' mode 0> [59]
Again, no surprise – keyword is called only by token.
                0.00        0.21    7009/119606      <predicate `c_parser_aim.cast_expression/3' mode 0> [8]
                0.00        0.00       2/119606      <predicate `c_parser_aim.compound_statement/3' mode 0>  <cycle 3> [55]
                0.00        0.00     117/119606      <predicate `c_parser_aim.declaration_specifiers/4' mode 0>  <cycle 4> [35]
                0.00        0.00       4/119606      <predicate `c_parser_aim.direct_abstract_declarator/3' mode 0> [52]             
                0.00        0.00       9/119606      <predicate `c_parser_aim.direct_declarator/4' mode 0>  <cycle 2> [50]
                0.00        0.00       4/119606      <predicate `c_parser_aim.direct_declarator_rec/3' mode 0>  <cycle 2> [47]
                0.00        0.00      36/119606      <predicate `c_parser_aim.enum_specifier/3' mode 0> [42]                        
                0.01        0.43   14042/119606      <predicate `c_parser_aim.kw/3' mode 0> [30]            
                0.00        0.00       3/119606      <predicate `c_parser_aim.labeled_statement/3' mode 0> [53]
                0.00        0.00       8/119606      <predicate `c_parser_aim.p/3' mode 0> [51]                
                0.00        0.00       2/119606      <predicate `c_parser_aim.parameter_list/3' mode 0>  <cycle 2> [54]
                0.00        0.00       1/119606      <predicate `c_parser_aim.parameter_type_list/3' mode 0>  <cycle 2> [56]
                0.00        0.00      26/119606      <predicate `c_parser_aim.pointer/2' mode 0> [45]                             
                0.01        0.43   14019/119606      <predicate `c_parser_aim.postfix_expression/3' mode 0> [25]
                0.01        0.86   28036/119606      <predicate `c_parser_aim.primary_expression/3' mode 0> [26]
                0.00        0.00      48/119606      <predicate `c_parser_aim.struct_or_union/2' mode 0> [41]   
                0.00        0.00      36/119606      <predicate `c_parser_aim.type_qualifier/2' mode 0> [43] 
                0.00        0.00     120/119606      <predicate `c_parser_aim.type_specifier/3' mode 0> [37]
                0.00        0.00      12/119606      <predicate `c_parser_aim.typedef_name/3' mode 0> [48]  
                0.02        1.71   56072/119606      <predicate `c_parser_aim.unary_expression/3' mode 0> [20]
[1]    100.0    0.05        3.65  119606         <predicate `c_parser_aim.lexer.token/3' mode 0> [1]
                0.00        0.00  119517/119517      <function `c_parser_aim.lexer.punctuator_list/0' mode 0> [66]
                0.00        0.00      31/31          <function `string.from_char_list/1' mode 0> [60]             
                0.00        0.10  119517/119517      <predicate `c_parser_aim.lexer.constant/2' mode 0> [32]
                0.00        0.00  119548/239246      <predicate `c_parser_aim.lexer.identifier_nondigit/3' mode 0> [67]
                0.00        0.00      31/31          <predicate `c_parser_aim.lexer.identifier_rec/3' mode 0> [71]     
                0.00        1.64  119606/119606      <predicate `c_parser_aim.lexer.keyword/3' mode 0> [23]       
                0.15        1.81  119517/119517      <predicate `c_parser_aim.lexer.punctuator_rec/4' mode 0> [21]
                0.00        0.05  119606/119606      <predicate `c_parser_aim.lexer.whitespace/2' mode 0> [34]
And there’s the culprit. token is called by many other procedures, and as discussed above, almost certainly with repeated inputs. Let’s memoize token.
:- pragma memo(token/3, [fast_loose]).
Now (as we found out last week), parsing our simple test completes in a reasonable amount of time. Let’s give our parser something bigger to chew on.

Factoring out multiple calls from multiplicative_expression

For stress testing, I used a 6kb program from Glen McCluskey’s C99 test suite, tdes_047.c. (It is available in the set of free samples if you wish to try it yourself.) We can use this (or any C99 code) as follows:
$ cpp -P c9_samp/tdes_047.c | ./c_parser_direct
And of course, we are met with no end to the parsing. mprof still tells us that token is responsible for the most CPU usage, but since we’ve already memoized it, let’s work our way up the chain some more. Who is responsible for most of the calls to token?
                0.23        0.23 1358257/95678933    <predicate `c_parser_aim.additive_expression/3' mode 0>  <cycle 2> [19]
                0.00        0.00    7546/95678933    <predicate `c_parser_aim.and_expression/3' mode 0>  <cycle 2> [24]     
                0.00        0.00    2370/95678933    <predicate `c_parser_aim.assignment_operator/2' mode 0> [29]            
                1.36        1.36 8218754/95678933    <predicate `c_parser_aim.cast_expression/3' mode 0>  <cycle 2> [12]
                0.00        0.00       7/95678933    <predicate `c_parser_aim.compound_statement/3' mode 0> [14]              
                0.00        0.00     236/95678933    <predicate `c_parser_aim.conditional_expression/3' mode 0>  <cycle 2> [50]
                0.00        0.00      14/95678933    <predicate `c_parser_aim.declaration/4' mode 0> [5]                             
                0.00        0.00     457/95678933    <predicate `c_parser_aim.declaration_specifiers/4' mode 0>  <cycle 2> [43]
                0.00        0.00     392/95678933    <predicate `c_parser_aim.designator/3' mode 0> [48]                             
                0.00        0.00      95/95678933    <predicate `c_parser_aim.direct_abstract_declarator/3' mode 0> [53]
                0.00        0.00     368/95678933    <predicate `c_parser_aim.direct_declarator/4' mode 0>  <cycle 2> [49]
                0.00        0.00     890/95678933    <predicate `c_parser_aim.direct_declarator_rec/3' mode 0>  <cycle 2> [27]
                0.00        0.00     999/95678933    <predicate `c_parser_aim.enum_specifier/3' mode 0> [38]                        
                0.01        0.01   30183/95678933    <predicate `c_parser_aim.equality_expression/3' mode 0>  <cycle 2> [23]
                0.00        0.00    3773/95678933    <predicate `c_parser_aim.exclusive_or_expression/3' mode 0>  <cycle 2> [26]
                0.00        0.00       2/95678933    <predicate `c_parser_aim.identifier_list/2' mode 0> [57]                         
                0.00        0.00    1886/95678933    <predicate `c_parser_aim.inclusive_or_expression/3' mode 0>  <cycle 2> [31]
                0.00        0.00      17/95678933    <predicate `c_parser_aim.init_declarator/4' mode 0> [8]                          
                0.00        0.00       8/95678933    <predicate `c_parser_aim.init_declarator_list/4' mode 0> [7]
                0.00        0.00       2/95678933    <predicate `c_parser_aim.initializer/3' mode 0>  <cycle 2> [56]
                0.00        0.00      48/95678933    <predicate `c_parser_aim.initializer_list/3' mode 0>  <cycle 2> [45]
                0.02        0.02  139104/95678933    <predicate `c_parser_aim.kw/3' mode 0> [22]                               
                0.00        0.00     943/95678933    <predicate `c_parser_aim.logical_and_expression/3' mode 0>  <cycle 2> [39]
                0.00        0.00     472/95678933    <predicate `c_parser_aim.logical_or_expression/3' mode 0>  <cycle 2> [44] 
                1.01        1.01 6112154/95678933    <predicate `c_parser_aim.multiplicative_expression/3' mode 0>  <cycle 2> [17]
                0.00        0.00    1474/95678933    <predicate `c_parser_aim.p/3' mode 0> [34]                                         
                0.00        0.00    1172/95678933    <predicate `c_parser_aim.pointer/2' mode 0> [32]
                2.72        2.72 16437987/95678933    <predicate `c_parser_aim.postfix_expression/3' mode 0>  <cycle 2> [4]
                8.10        8.10 48898650/95678933    <predicate `c_parser_aim.postfix_expression_rec/3' mode 0> [9]             
                2.20        2.20 13264944/95678933    <predicate `c_parser_aim.primary_expression/3' mode 0> [13]   
                0.03        0.03  181101/95678933    <predicate `c_parser_aim.relational_expression/3' mode 0>  <cycle 2> [21]
                0.08        0.08  452752/95678933    <predicate `c_parser_aim.shift_expression/3' mode 0>  <cycle 2> [20]     
                0.00        0.00      68/95678933    <predicate `c_parser_aim.struct_declaration/3' mode 0>  <cycle 2> [54]
                0.00        0.00     136/95678933    <predicate `c_parser_aim.struct_declarator/3' mode 0>  <cycle 2> [52] 
                0.00        0.00      68/95678933    <predicate `c_parser_aim.struct_declarator_list/3' mode 0>  <cycle 2> [55]
                0.00        0.00    1404/95678933    <predicate `c_parser_aim.struct_or_union/2' mode 0> [36]                        
                0.00        0.00     102/95678933    <predicate `c_parser_aim.struct_or_union_specifier/3' mode 0>  <cycle 2> [33]
                0.00        0.00    2334/95678933    <predicate `c_parser_aim.type_qualifier/2' mode 0> [30]                            
                0.00        0.00    4079/95678933    <predicate `c_parser_aim.type_specifier/3' mode 0>  <cycle 2> [25]
                0.00        0.00     333/95678933    <predicate `c_parser_aim.typedef_name/3' mode 0> [51]                   
                0.09        0.09  553352/95678933    <predicate `c_parser_aim.unary_expression/3' mode 0>  <cycle 2> [18]
[6]     52.1   15.85       15.85 95678933         <predicate `c_parser_aim.lexer.token/3' mode 0> [6]                          
                0.00        0.00      96/96          <function `c_parser_aim.lexer.punctuator_list/0' mode 0> [81]
                0.00        0.00      58/58          <function `string.from_char_list/1' mode 0> [64]             
                0.00        0.00     108/108         <predicate `c_parser_aim.lexer.constant/2' mode 0> [88]
                0.00        0.00     166/412         <predicate `c_parser_aim.lexer.identifier_nondigit/3' mode 0> [86]
                0.00        0.00      58/58          <predicate `c_parser_aim.lexer.identifier_rec/3' mode 0> [97]     
                0.00        0.00     225/225         <predicate `c_parser_aim.lexer.keyword/3' mode 0> [83]       
                0.00        0.00      96/96          <predicate `c_parser_aim.lexer.punctuator_rec/4' mode 0> [95]
                0.00        0.00     225/225         <predicate `c_parser_aim.lexer.whitespace/2' mode 0> [82]
It seems like postfix_expression_rec is the winner; it is responsible for over half the calls to token. Tracing it back similarly to how we traced keyword_rec, we find that it is called solely by postfix_expression 8,149,775 times, which is called solely by unary_expression a similar number of times, which is called almost exclusively by cast_expression a similar number of times, which is called almost exclusively by multiplicative_expression, which is called… only 2,054,666 times:
                                 2054666             <predicate `c_parser_aim.additive_expression/3' mode 0>  <cycle 2> [19]
[17]     7.2    0.70        2.18 2054666         <predicate `c_parser_aim.multiplicative_expression/3' mode 0>  <cycle 2> [17]
                1.01        1.01 6112154/95678933    <predicate `c_parser_aim.lexer.token/3' mode 0> [6]                            
                0.47        0.47 6112154/87459316    <unification predicate for type `c_parser_aim.lexer.token/0' mode 0> [11]
                                 8218660             <predicate `c_parser_aim.cast_expression/3' mode 0>  <cycle 2> [12]
In other words, multiplicative_expression is responsible for multiplying the number of calls passing through it to token fourfold. Let’s look at multiplicative_expression (again, with Typedefs elided):
multiplicative_expression -->
    (cast_expression, p("*"), multiplicative_expression) -> [];
    (cast_expression, p("/"), multiplicative_expression) -> [];
    (cast_expression, p("%"), multiplicative_expression) -> [];
    cast_expression.
A-ha, a repeated prefix! We know how to fix this:
multiplicative_expression -->
    cast_expression,
    ((p("*"), multiplicative_expression) -> [];
     (p("/"), multiplicative_expression) -> [];
     (p("%"), multiplicative_expression) -> [];
     []).
Or, if we’d prefer to leave the structure of the grammar intact at the cost of memoization overhead, we can memoize cast_expression:
:- pred cast_expression(list(char)::in, list(char)::out) is semidet.
:- pragma memo(cast_expression/2, [fast_loose]).
(Note the :- pred declaration is necessary as you cannot memoize undeclared modes.)

Notice that most of the other _expression productions follow the same pattern; all of them will in fact benefit from prefix-factoring or memoization. Let’s do that and move on.

Memoizing multiple calls to unary_expression

Despite all our optimization so far, token (and postfix_expression_rec in turn) still reigns supreme. However, tracing the chain of calls back this time now reveals this:
                                  170474             <predicate `c_parser_aim.assignment_expression/3' mode 0>  <cycle 2> [15]
                                  198268             <predicate `c_parser_aim.cast_expression/3' mode 0>  <cycle 2> [32]      
[18]     7.2    0.10        0.32  368742         <predicate `c_parser_aim.unary_expression/3' mode 0>  <cycle 2> [18]
                0.03        0.06  171588/338952      <predicate `c_parser_aim.kw/3' mode 0> [30]                           
                0.13        0.13  686352/12057646    <predicate `c_parser_aim.lexer.token/3' mode 0> [7]
                0.03        0.03  686352/11652569    <unification predicate for type `c_parser_aim.lexer.token/0' mode 0> [17]
                                  368742             <predicate `c_parser_aim.postfix_expression/3' mode 0>  <cycle 2> [9]
unary_expression is called approximately an equal number of times by both assignment_expression and cast_expression. This makes it a perfect candidate for memoization:
:- pred cast_expression(typedefs::in, list(char)::in, list(char)::out) is semidet.
:- pragma memo(cast_expression/3, [fast_loose]).

Ad nauseum

And that’s about all there is to it: following this method I found I needed to memoize two more procedures and factor one more, and the parser parsed the test file faster than time could reliably distinguish. Of course, it would behoove us to test on different inputs in order to exercise different grammar productions, but the method is the same.

Summary

To recap:
  1. Use mprof -c to determine which procedure is the largest time hog.
  2. If the procedure is called by primarily one other procedure on a 1:1 basis, or if you have already optimized it, look instead at its most prominent caller (etc.).
  3. If the procedure is called by primarily one other procedure on a greater than 1:1 basis and it is a common prefix of this caller, factor it out.
  4. Otherwise, memoize it.
  5. Recompile and repeat.
This simple method will asymptotically increase the speed of almost any naïve PEG parser, in my experience, from unusable to quite fast, and will avoid the overhead associated with unnecessarily memoizing every grammar production in a parser.

Download c_parser_memo.m

Further reading

The direct encoding of PEGs in Mercury, and the performance implications thereof, were first explored in-depth by Ralph Becket and Zoltan Somogyi in the paper DCGs + Memoing = Packrat Parsing; But is it worth it?; I highly recommend it as further reading for those interested in the subject.

Sunday, October 9, 2011

Parsing Expression Grammars part 2

Continuing last week’s post, I’d like to give an example of implementing a real-life PEG. What better than the C language, specifically, preprocessed C99. (Note: It may be helpful to the reader to open the linked PDF to Annex A, “Language syntax summary”.)

Recall that PEGs differ from more commonly used CFGs in that they are completely deterministic, greedy, and don’t allow left recursion. This can make translating some grammars difficult or impossible if they rely on those features. Fortunately, C99’s grammar is straightforward enough that, with a little thought, we can work around these problems. Let’s see how.

The Lexer


First off, let’s create a submodule for the lexer like so:
:- module lexer.
:- interface.
From the lexer we will export two predicates, one which lexes single tokens (token), and one which lexes a sequence of whitespace (whitespace):
:- pred token(token::out, list(char)::in, list(char)::out) is semidet.
:- pred whitespace(list(char)::in, list(char)::out) is det.
Note: it’s important that we define the mode of the token argument of token as out rather than in, even though they are logically equivalent. This will allow us later to memoize token on one argument instead of two, greatly reducing our memory use at negligible (if any) processing overhead. This trick is possible precisely because PEGs are deterministic: the determinism of token with token as out will still be semidet, rather than nondet (and hence unmemoizable) as it would be with a CFG.

The C99 standard defines 5 kinds of lexical tokens our lexer will need to generate: keyword, identifier, constant, string-literal, and punctuator. A glance through the C99 grammar reveals that merely to parse the grammar, we do not need to know the specific values of constants and string-literals, so let’s define the type of a lexing token like so:
:- type token --->
    keyword(string);
    identifier(string);
    constant;
    string_literal;
    punctuator(string).
Of course, if we were constructing a parse tree, we would want to hold onto the values of constants and string-literals, but for now since we are just matching the grammar it is OK to leave them out. Onto the implementation!
:- implementation.

Whitespace, or, using built-in character classes


Let’s get whitespace out of the way. The C99 definition of whitespace corresponds to Mercury’s definition of whitespace, so we can import char and define whitespace like so:
whitespace --> ([W], {is_whitespace(W)}) -> whitespace; [].

Tokens, or, parameterizing rules


C99 defines token as follows:
token:
        keyword
        identifier
        constant
        string-literal
        punctuator
A straightforward translation of this to a PEG in Mercury would read:
token -->
    keyword -> [];
    identifier -> [];
    constant -> [];
    string-literal -> [];
    punctuator.
However because this is an unordered choice in the spec, but an ordered choice in Mercury, we need to apply a moment’s thought that choosing this ordering will not change the language. Fortunately this is easy to confirm: the latter four each must start with a different character (letter, digit, quote, and non-quote symbol, respectively), and the first (keyword) should be matched in favor of the second (identifier) anyway.

The C99 spec assumes that whitespace is removed in some phase in-between lexing and parsing, and that string literals are concatenated. Because we will be attaching our lexer directly to our parser (such is the magic of PEGs), we will need to perform these two steps in our tokenizer. Finally, we will want to actually produce a lexing token as well (rather than just match it). The final implementation of token is thus:
token(Token) -->
    whitespace,
    (keyword(KW) -> {Token = keyword(KW)};
     identifier(Ident) -> {Token = identifier(Ident)};
     constant -> {Token = constant};
     (string_literal, (token(string_literal) -> []; [])) ->
        {Token = string_literal};
     punctuator(Punct), {Token = punctuator(Punct)}).
Parameterizing rules like this can be very powerful, as we’ll see later in this post. However let’s tackle each kind of token in turn.

Keywords, or, fixing choice using negative lookahead assertions


keyword looks deceptively simple; it is merely a list of valid C99 keywords. We could be tempted to implement it as such:
keyword(Keyword) --> % WRONG
    ['a', 'u', 't', 'o'] -> {Keyword = "auto"};
    ['b', 'r', 'e', 'a', 'k'] -> {Keyword = "break"};
    etc.
But the devil is in the details. We must ensure that no keyword is matched against the prefix of another keyword or identifier (e.g. we should not match the “do” in the keyword “double”, or the “for” in the identifier “format”). The latter can be fixed by using a negative lookahead assertion:
keyword(Keyword) -->
    (['a', 'u', 't', 'o'] -> {Keyword = "auto"};
     ['b', 'r', 'e', 'a', 'k'] -> {Keyword = "break"};
     etc.),
    not (identifier_nondigit(_) -> []; digit(_)).
In other words, a keyword must not be followed immediately by one of the characters which is valid in a keyword or identifier (defined later). Note that this won’t fix partial keyword matching, since if the assertion fails the parse as a keyword will be rejected entirely. Either we should move the negative lookahead assertion into the bodies of the branch, or simply rearrange the offending keywords from longest to shortest. Since “do” and “double” are the only offenders, it’s easy enough to swap them. (We’ll encounter a better example later.)

(Note: In the completed lexer linked below, I add a little extra machinery to allow listing keywords as strings in a list. There is nothing PEG-specific about this technique, and it saves me only keystrokes.)

Identifiers, or, transforming left-recursive productions


Identifiers take a recursive form commonly seen in the C99 spec:
identifier:
        identifier-nondigit
        identifier identifier-nondigit
        identifier digit
That is, an identifier is a certain initial production followed by a sequence of either of two other productions. A direct implementation is no good:
identifier --> % WRONG
    identifier_nondigit --> [];
    (identifier, identifier_nondigit) --> [];
    (identifier, digit).
This will either match a single identifier_nondigit, or, failing to do so, will get caught in a loop calling itself again having consumed no input. Not what we wanted. We need to split the initial call to identifier_nondigit out, and write a right-recursive rule for the rest of the identifier:
identifier --> identifier_nondigit, identifier_rec.

identifier_rec -->
    (identifier_nondigit, identifier_rec) -> [];
    (digit, identifier_rec) -> [];
    [].
Because identifier-nondigit and digit are disjoint, order is not important. This formulation is a proper PEG and will behave as expected.

Of course, we determined earlier that it was important to us to know what identifier was parsed. We can easily add expressions to our rules to do just this:
identifier(from_char_list(Ident): string) -->:
    identifier_nondigit(C), identifier_rec(Ident0),
    {Ident = [C | Ident0]}.

identifier_rec(Ident) -->
    (identifier_nondigit(C), identifier_rec(Ident0)) ->
        {Ident = [C | Ident0]};
    (digit(D), identifier_rec(Ident0)) ->
        {Ident = [D | Ident0]};
    {Ident = []}.

identifier_nondigit(D) -->
    nondigit(D0) -> {D = D0};
    universal_character_name(D0), {D = D0}.
(The typecast to string in the head of identifier is necessary because from_char_list is overloaded and Mercury gets confused otherwise.)

identifier-nondigit, nondigit, and digit are straightforward and use the techniques described thus far. I’m going to skip over them and move onto constants.

Constants, or, fixing choice using order


C99 defines constants as:
constant:
       integer-constant
       floating-constant
       enumeration-constant
       character-constant
Now, the astute reader will notice that something will go wrong here: an integer literal is also a valid floating-point literal in C99, so if we wrote this production directly as a PEG, it would always match the integer part of a floating-point literal as an integer. Again, not what we want.

We could use a negative lookahead assertion as above to fix this, but easier still is to change the order, prioritizing floats about integers:
constant -->
    floating_constant -> [];
    integer_constant -> []; 
    enumeration_constant -> [];
    character_constant.
Easy enough! Since enumeration constants must always start with a letter (which floats and integers do not), and character constants must always start with a single quote, we are all set here.

The rest of the lexer for constants and string literals uses one of these techniques so let’s move on to the last lexing token, punctuators.

Punctuators, or, fixing choice by ordering by length


punctuator, like keyword, is merely a list of strings of symbols which are meaningful in C99. This list presents the same problem that the list of keywords presents. Namely, that a shorter punctuator may match the prefix of what is in fact a longer punctuator. (For example, < may match the symbol <<= in source text.)

For this production, we can use the solution of ordering the punctuators from longest to shortest. This works without much further thought. (Again, I use some extra code to allow me to define this production as a list of strings in my example code.) And that about does it for the lexer.
:- end module lexer.

The Parser


The parser is, largely, trivial, using the same four techniques described above for the lexer (rewriting left-recursion and the three ordering techniques). Since the grammar makes lots of reference to individual kinds of tokens, I defined a few shortcuts:
kw(KW) --> token(keyword(KW)).
id(Ident) --> token(identifier(Ident)).
const --> token(constant).
str_lit --> token(string_literal).
p(Punct) --> token(punctuator(Punct)).
The grammar also references the enumeration-constant production, which, although not a token directly, is merely an identifier and we can treat it as such:
enumeration_const --> id(_).
Ensuring that all calls to the lexer are funneled through token will make our lives easier when we optimize later.

Now before I sign off, there are two gotchas in the grammar, for both of which C is infamous.

The “dangling else”


C’s if/then/[else] has the problem that in a CFG, it is ambiguous. Does if (1) if (0) puts("hi"); else puts("bye"); print bye or nothing? The C99 standard indicates that an else should match the nearest if it legally can (so the above code would print bye).

Fortunately with PEGs, this ambiguity is trivial to solve. We need simply write the if/then/else case first in the grammar. Any else will be matched by the innermost (and thus nearest) if possible:
selection_statement -->
    (kw("if"), p("("), expression, p(")"), statement,
     kw("else"), statement) -> [];
    (kw("if"), p("("), expression, p(")"), statement) -> [];
    etc.

Typedefs


For the unfamiliar, a typedef in C is a means of defining an alias for an existing type specification. There’s nothing odd about that, except that types in C are keywords and the set of types determines the parse. In other words, C is not context-free.

Even though PEGs can directly represent some non-context-free grammars, C is not one of them. Fortunately the problem is easily solved by parameterizing the grammar and passing around a set of “typedefed” identifiers:
:- import bool, set.
:- type typedefs == set(string).
Typedefs are created in declarations, if one of the declaration-specifiers is a storage-class-specifier which is the keyword typedef:
declaration(!Typedefs) -->
    declaration_specifiers(!.Typedefs, HasTypedef),
    (init_declarator_list(Decls0) ->
        {Decls = Decls0}; {Decls = init}),
    p(";"), {if HasTypedef = yes then union(Decls, !Typedefs) else true}.

declaration_specifiers(Typedefs, HasTypedef) -->
    (storage_class_specifier(IsTypedef),
     declaration_specifiers_opt(Typedefs, HasTypedef0)) ->
        {HasTypedef: bool = IsTypedef `or` HasTypedef0};
    (type_specifier,
     declaration_specifiers_opt(Typedefs, HasTypedef0)) ->
        {HasTypedef = HasTypedef0};
    etc.

declaration_specifiers_opt(Typedefs, HasTypedef) -->
    declaration_specifiers(Typedefs, HasTypedef0) ->
        {HasTypedef = HasTypedef0};
    {HasTypedef = no}.

storage_class_specifier(IsTypedef) -->
    kw("typedef") -> {IsTypedef = yes};
    kw("extern")  -> {IsTypedef = no}; 
    kw("static")  -> {IsTypedef = no};
    kw("auto")    -> {IsTypedef = no};
    kw("register"),  {IsTypedef = no}.
(Note the use of state variable syntax, and the factored-out declaration_specifiers_opt, which makes the code passably legible.) Typedef declarations may legally exist in block-items, scoped within compound-statements:
compound_statement(Typedefs) -->
    p("{"), (block_item_list(Typedefs, _) -> []; []), p("}").

block_item_list(!Typedefs) -->
    (block_item(!Typedefs), block_item_list(!Typedefs)) -> [];
    block_item(!Typedefs).

block_item(!Typedefs) --> declaration(!Typedefs) -> []; statement.
and as external-declarations:
translation_unit(!Typedefs) -->
    (external_declaration(!Typedefs), translation_unit(!Typedefs)) -> [];
    external_declaration(!Typedefs).

external_declaration(!Typedefs) -->
    function_definition -> [];
    declaration(!Typedefs).
Finally, typedef-name itself references the set of typedefs:
typedef_name(Typedefs) --> id(Ident), {set.member(Ident, Typedefs)}.
The rest of the implementation of typedefs involves parameterizing most productions to pass the set of typedefs from external_declaration down to typedef_name, and parameterizing the few illegal uses of typedefs (for example, in function signatures) to simply drop the modified set of typedefs. Nothing brain-bending; just a lot of typing.

Summary & Test


In today’s post we saw how to translate a real language grammar, that of C99, into a Mercury program. We learned how to transform left-recursive rules into right-recursive ones, how to ensure proper production choice by using negative lookahead assertions, ordering by precedence, and ordering by length, and how to implement the non-context-free elements of C99.

Let’s add a main function similar to yesterday’s, compile, and try out our spiffy new parser:
$ ./c_parser_direct
int main(void) {}
…and wait, and wait, and wait, and… what’s going on? Why are we getting no response? Do we have an infinite loop?

Nope. Add this line in the lexer module:
:- pragma memo(token/3, [fast_loose]).
$ ./c_parser_direct
int main(void) {}
(wait for it…)
Success!
Success indeed, but it’s still slow. More on memoization and how to speed this up even more in part 3.

Download c_parser_direct.m

Sunday, October 2, 2011

Parsing Expression Grammars part 1

First, I’d like to say welcome back! AiM is returning from its hiatus since April. The summer has been busy for me, but I hope to update AiM weekly henceforth. Let’s start!

Parsing Expression Grammars


Parsing expression grammars (PEGs) are a new-ish formal grammar intended to describe computer languages. PEGs differ from the more commonplace context-free grammars most notably in that PEGs offer an ordered choice operator. The second branch of an ordered choice implicitly does not match any expression which is matched by the first branch. Since any text can only match exactly one branch of an ordered choice, this has the practical implication that a PEG cannot be ambiguous and rule precedence declarations are not needed.

Another natural consequence of ordered choice is that so-called repetition operators (* and +) and the option operator (?) are necessarily greedy. (Any time a grammar rule can recur, ordered choice ensures that it does recur, since the option of not recurring will implicitly not match.)

A PEG rule, as described in the Wikipedia entry, may be either:
  • a terminal (a symbol),
  • a nonterminal (another rule), or
  • the empty string.
or may be constructed from other rules by:
  • sequence: A B
  • ordered choice: A / B
  • zero or more: A*
  • one or more: A+
  • optional: A?
  • positive lookahead: &A
  • negative lookahead: !A
Positive lookahead and negative lookahead test for a rule match (or lack thereof) but do not consume input.

Translating into Mercury


It’s well-known that CFGs can be implemented directly in logic languages such as Mercury as recursive-descent parsers. Prolog and Mercury even have a special syntax known as definite clause grammar (DCG) to facilitate doing so. It’s similarly easy to implement PEGs, and there’s a surprise at the end!. Let’s see how.

PEG rules can be represented using DCG clauses like so:

rule --> body

The type and mode of rules can be inferred by Mercury, but if you need to declare them, you can do so as:

:- pred rule(list(char)::in, list(char)::out) is semidet.

The elements of the body of a PEG are translated as follows:
terminal C
[C]
nonterminal A
A
empty string
[]
sequence A B
A, B
ordered choice A / B
A -> []; B
Note that using -> [] ; (if/else) instead of ; gives us exactly the greedy semantics we require.
optional A?
A -> []; []
positive lookahead &A
not not A
or equivalently, =(S), A, :=(S)
negative lookahead !A
not A

Repetition constructions (A* and A+) are recursive in nature and cannot be represented inline. However we can define higher order rules to represent them as:
:- mode *(pred(in, out) is semidet, in, out) is det.
*(P) --> (P, *(P)) -> []; [].
:- mode +(pred(in, out) is semidet, in, out) is semidet.
+(P) --> (P, +(P)) -> []; P.
Then we can translate repetition constructions as:
zero or more: A*
*(A) or *(pred(in, out) is semidet --> A)
one or more: A+
+(A) or +(pred(in, out) is semidet --> A)
The first, shorter form, can only be used for named rules with defined modes. The longer form is applicable to any rule construct.

Example


Let’s translate the example of basic arithmetic expressions from Wikipedia:
Value   ← [0-9]+ / '(' Expr ')'
Product ← Value (('*' / '/') Value)*
Sum     ← Product (('+' / '-') Product)*
Expr    ← Sum
First let’s get some boilerplate out of the way. We’ll need the list module for DCG-syntax terminals:
:- module peg_example.

:- interface.
:- import_module io.
:- pred main(io::di, io::uo) is det.

:- implementation.

:- import_module list.
Now onto Value. Right off the bat Wikipedia throws us a curveball with the range expression [0-9]. Fortunately that’s easily translated as a specially-moded function:
:- mode in..in = in is semidet.
A..B = C :- A @=< C, C @=< B.
Even though (in fact, because) .. is a builtin operator, we can use it in definitions as well.

Armed with our new range operator, we can translate the Value rule as:
value -->
    +(pred(in, out) is semidet --> ['0'..'9']) -> [];
    (['('], expr, [')']).
Note that the lambda-predicate (pred(in, out) is semidet -->) is needed since ['0'..'9'] is not a predicate on its own. The translation is otherwise straightforward.

Product, Sum, and Expr follow similarly:
product -->
    value, *(pred(in, out) is semidet -->
             ((['*'] -> []; ['/']), value)).

sum -->
    product, *(pred(in, out) is semidet -->
               ((['+'] -> []; ['-']), value)).

expr --> sum.
Our main function will read a line, try to parse it using expr and make sure that only the newline at the end is left unparsed:
main(!IO) :-
    read_line(Res, !IO),
    (if Res = ok(Line) then
        (if expr(Line, ['\n'])
         then print("Success!\n", !IO)
         else print("Failure!\n", !IO)),
        main(!IO)
     else true).
Compile with mmc --infer-all --make peg_example, run it and try out some expressions, keeping in mind that our grammar doesn’t allow negatives, decimals, or whitespace:
$ ./peg_example
123+456
Success!
5/(10-20)
Success!
45-(40
Failure!
*12
Failure!
a+18
Failure!
Of course, a parser that merely tests whether input is well-formed is only mildly useful. Next time, we’ll look into creating parse trees, and the real-life example of parsing C code.

Download peg_example.m

Monday, April 4, 2011

Dependent typing: reversing non-empty lists

This is the first post in a series on dependent typing. For those not familiar with dependent typing, it is the ability to describe data using types which say something about the particular value of the data itself – the type of the data depends on the value of the data. For example, the phrase “even integers” could describe a dependent type which is a subtype of integers. Whether a number belongs to that type depends on whether its value is evenly divisible by 2. (Note that any subtype can be viewed as a simple kind of dependent type.)

The obvious use of dependent typing is to describe invariants of your program, for example, “a non-NULL pointer”, “an array of length 5”, or “a non-empty list”. While full-blown dependent typing requires a full-blown theorem prover (such as Coq or ATS/LF), many mainstream languages support basic dependent types. Java can infer whether a value is not null, OCaml can talk about types which are made up of only certain constructors, Mozilla’s new language Rust can attach arbitrary predicates to types, and I would be remiss not to mention Mercury.

Mercury has limited but useful support for dependent typing via its mode system. The mode system appears complicated, so let’s start with a brief overview.

The mode system


Each predicate (or function) in Mercury does something to its arguments. The most common possibilities are to look at an argument and leave it be (this would be an input argument), or to attempt to bind it to a value (this would be an output argument). Another possibility is that it destroys its argument after looking at it (this happens with the IO state). These possible actions are called modes.

Modes are described simply by the expected state of the argument before the call, and the guaranteed state of the argument after the call using the syntax Before >> After. These states (called instantiatednesses or insts for short) may be one of free (the argument is unbound), ground or bound (the argument is bound to a value), clobbered (the argument is destroyed or overwritten), and unique (the argument is bound to a value which is not referenced anywhere else). The latter two are used mainly for IO state and arrays, so we will focus on free and bound.

The modes we are all familiar with, in and out, are defined in the module builtin as ground >> ground and free >> ground, respectively. In other words, an in argument is expected to be bound to a value before the call and is left that way after, and an out argument may be unbound before the call, but will be bound after.

The magic is that we may tell Mercury that we expect or guarantee an argument to be bound only to certain constructors! To do this, we must first define the instantiatedness in which we are interested, say non-empty lists:
:- inst non_empty_list == bound([ground | ground]).
This defines the non_empty_list inst as a variable bound to a list cons cell, the head and tail of which are both bound. We could similarly define the inst of an empty list as:
:- inst empty_list == bound([]).
In other words, an empty_list is a variable which is bound to the empty list constructor.

Note that there is also a shorthand which allows the above two insts to be written similarly to type definitions:
:- inst non_empty_list ---> [ground | ground].
:- inst empty_list ---> [].
We could then define the mode of an input argument which is expected to be a non-empty list as:
:- mode in_non_empty_list == non_empty_list >> non_empty_list.
In other words, an argument with this mode would be expected to be bound to a non-empty list, and would remain bound that way afterward. Similarly:
:- mode out_non_empty_list == free >> non_empty_list.
An argument with this mode may be unbound before the call, but is guaranteed to be bound to a non-empty list after. Mercury actually has the mode-functions in() and out() already defined, so we may write in(non_empty_list) or out(non_empty_list) to describe these same modes.

Example


Note: due to bug 191, the following example won’t trigger the proper error messages from the compiler as of this post. Download and apply this patch against the compiler to fix the issue, or work around the bug by replacing the two out modes with out(list(ground)), which won’t change the meaning but will avoid triggering the bug. UPDATE: parts of the standard library won’t compile without the bug, so don’t go applying that patch just yet.

On a recent thread at Hacker News, a question was brought up about proving that [X | Xs] = reverse([Y | Ys]) was total (or in Mercury terms, det). Let’s see if we can do this in Mercury using our new tools. First let’s define reverse2 (named so not to conflict with built-in list.reverse) using a recursive auxiliary function with an accumulator:
:- import_module list.

:- func reverse2(list(A)) = list(A).
:- mode reverse2(in) = out.
reverse2(A) = reverse2_aux(A, []).

:- func reverse2_aux(list(A), list(A)) = list(A).
:- mode reverse2_aux(in, in) = out.
reverse2_aux([], Acc) = Acc.
reverse2_aux([A | As], Acc) = reverse2_aux(As, [A | Acc]).
(I have included the mode declarations as we will be adding to them later.) Testing this function in our main predicate is simple:
main(!IO) :-
    Y = 1, Ys = [2,3,4,5],  
    [X | Xs] = reverse2([Y | Ys]),
    print({X, Xs}, !IO), nl(!IO).
A sharp eye will spot the problem before we even compile: the deconstruction [X | Xs] = can fail, which would cause main to be semidet instead of det (that is, a partial function). The compiler spots this as well:
In `main'(di, uo):
  error: determinism declaration not satisfied.
  Declared `det', inferred `semidet'.
  Unification with `list.[X | Xs]' can fail.
Of course this isn’t true. What the compiler would love to see is a mode declaration stating that if reverse2 is passed a non-empty list, then it will return a non-empty list. Let’s write that:
:- mode reverse2(in(non_empty_list)) = out(non_empty_list).
Now we get a different error:
In clause for `reverse2(in((list.non_empty_list))) =
  out((list.non_empty_list))':
  mode error: argument 2 did not get sufficiently instantiated.
  Final instantiatedness of `HeadVar__2' was `ground',
  expected final instantiatedness was `bound(list.'[|]'(ground,
  ground))'.
(Note: if you didn’t/can’t apply the above compiler patch, as a workaround, replace out in the original mode declarations with out(list(ground)).)

Here, HeadVar__2 refers to the function result (what would be the second variable in the head were reverse2 a predicate). The compiler expected the function result to be non-empty (bound to list.[|]) but it was bound to an unknown value of type list. Of course this is because reverse2_aux doesn't guarantee a non-empty result on non-empty input, so let’s add a declaration stating so as well:
:- mode reverse2_aux(in(non_empty_list), in) = out(non_empty_list).
(Note that we don’t say anything about the accumulator in this declaration.) Unfortunately recompiling gives us a similar error, this time complaining about the second (the recursive) clause of reverse2_aux. Of course – if our initial list had only one item, the recursive call would pass reverse2_aux an empty list, triggering the first clause, and we haven’t made (and can’t make) any special guarantees about that case. Let's take a closer look at that clause:
reverse2_aux([], Acc) = Acc.
The only way that this clause will return a non-empty list is if the accumulator, Acc, is non-empty. Fortunately, it is called recursively with a non-empty accumulator (since elements are prepended on every recurrence), so this should be enough to convince the compiler. Let’s add a mode declaration stating this fact:
:- mode reverse2_aux(in, in(non_empty_list)) = out(non_empty_list).
Success! Compilation is successful, meaning main has been proved to be det (a total function), and running the program gives the expected output:
{5, [4, 3, 2, 1]}
Using Mercury’s mode system, we were able to prove that reversing a non-empty list results in a non-empty list, by adding descriptive mode declarations in strategic locations.

As an exercise, add an inst describing a list of at least two elements and add modes to reverse2 and reverse2_aux which make guarantees about lists of at least two elements.

Download reverse.m