Wednesday, 24 January 2024

SGF Grammar Parser with Peggy

Ideally, I would like to parse SGF's complete grammar. However, at this point, I'm stuck at trying to handle the recursive part only. Here's my feeble attempt at it so far:

import { generate } from "peggy"

const grammar = /* peggy */ `
  string = .*

  parensList = '(' string parensList ')'
             / string
`

const sgf1 =
  "(;GM[1]FF[4]CA[UTF-8]AP[Sabaki:0.52.2]KM[6.5]SZ[19]DT[2023-12-25];B[pd];W[dd];B[pq];W[dp])"

const parser = generate(grammar)

const parse = parser.parse(sgf1)

console.log(parse)
// [
//   '(', ';', 'G', 'M', '[', '1', ']', 'F', 'F', '[', '4',
//   ']', 'C', 'A', '[', 'U', 'T', 'F', '-', '8', ']', 'A',
//   'P', '[', 'S', 'a', 'b', 'a', 'k', 'i', ':', '0', '.',
//   '5', '2', '.', '2', ']', 'K', 'M', '[', '6', '.', '5',
//   ']', 'S', 'Z', '[', '1', '9', ']', 'D', 'T', '[', '2',
//   '0', '2', '3', '-', '1', '2', '-', '2', '5', ']', ';',
//   'B', '[', 'p', 'd', ']', ';', 'W', '[', 'd', 'd', ']',
//   ';', 'B', '[', 'p', 'q', ']', ';', 'W', '[', 'd', 'p',
//   ']', ')'
// ]

Peggy is the successor of Peg.js.

I think I'm failing to identify how to make this recursive properly. How do I make it identify a ( and get into another level with parensList? (I think I need to define string without ( and ) as well...)

What I'm expecting as a result is some sort of tree or JSON like this:

<Branch>{
  moves: [
    <Move>{
      [property]: <Array<String>>
    },
    ...
  ],
  children: <Array<Branch>>
}

But this would be fine as well:

<NodeObject>{
  data: {
    [property]: <Array<String>>
  },
  children: <Array<NodeObject>>
}

SGF is basically a text-based tree format for saving Go (board game) records. Here's an example — SGF doesn't support comments, and it's usually a one-liner, the code below is just to make it easier to read and understand —:

(
  ;GM[1]FF[4]CA[UTF-8]AP[Sabaki:0.52.2]KM[6.5]SZ[19]DT[2023-12-25] // Game Metadata
  ;B[pd] // Black's Move (`pd` = coordinates on the board)
  ;W[dd] // White's Move
    ( // Parentheses denote a branch in the tree
      ;B[pq]
      ;W[dp]
    )
    (
      ;B[dp]
      ;W[pp]
    )
)

You could also have more than one tree at the top, which would yield something like (tree)(tree)...:

(;GM[1]FF[4]CA[UTF-8]AP[Sabaki:0.52.2]KM[6.5]SZ[19]DT[2023-12-25];B[pd];W[dd];B[pq];W[dp])(;GM[1]FF[4]CA[UTF-8]AP[Sabaki:0.52.2]KM[6.5]SZ[19]DT[2023-12-25];B[pd];W[dd](;B[pq];W[dp])(;B[dp];W[pp]))

The whole grammar is this:

Collection     = { GameTree }
GameTree       = "(" RootNode NodeSequence { Tail } ")"
Tail           = "(" NodeSequence { Tail } ")"
NodeSequence   = { Node }
RootNode       = Node
Node           = ";" { Property }
Property       = PropIdent PropValue { PropValue }
PropIdent      = UcLetter { UcLetter }
PropValue      = "[" Value "]"
UcLetter       = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" |
                 "J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" |
                 "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z"

You can use the editor Sabaki to create SGF files.



from SGF Grammar Parser with Peggy

No comments:

Post a Comment