Usage
=====

This page provides a quick introduction to using ``GrewTSE`` to create
minimal-pair datasets from dependency treebanks and evaluate language models.

Quick Start
-----------

The typical workflow is:

1. Parse a treebank
2. Define a GREW query and dependency node
3. Generate a dataset (masked or prompt-based)
4. Generate minimal pairs
5. Optionally evaluate a model and visualise results

Importing ``Grewtse``:

.. code-block:: python

   from grewtse.pipeline import GrewTSEPipe


Parsing a Treebank
------------------

You must first load a CoNLL-U file, which is the standard format available for representing Universal Dependency treebanks. You can learn more about the ``.conllu`` format `here <https://universaldependencies.org/format.html>`_.

.. code-block:: python

   gpipe = GrewTSEPipe()

   path = "./treebanks"
   treebank_path = f"{path}/example-treebank.conllu"
   gpipe.parse_treebank(treebank_path)

Alternatively, you may supply multiple treebank files. We recommend these be from the same UD treebank.

.. code-block:: python

   gpipe = GrewTSEPipe()

   path = "./treebanks"
   treebank_paths = [f"{path}/example-treebank-train.conllu",
   f"{path}/example-treebank-dev.conllu",
   f"{path}/example-treebank-test.conllu"]
   gpipe.parse_treebank(treebank_paths)

Defining a GREW Query
---------------------

A GREW query specifies the syntactic phenomenon to target. The
``dependency_node`` must be a variable in the query and represents the token
to manipulate when generating minimal pairs.
One of the most important things to remember here is to **include the pattern { } syntax**.

.. code-block:: python

   grew_query = """
   pattern {
     V [upos=VERB];
     DirObj [Case=Acc];
     V -[obj]-> DirObj;
   }
   """

   dependency_node = "DirObj"


Generating Datasets
-------------------

Masked Dataset (for Masked Language Modelling)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

If your aim is to test models that are trained on the tasks of MLM, then you likely want to create a masked dataset. This isolates your target word in each sentence and replaces it with a mask (default "[MASK]"). Note that you must check whether the model you want to evaluate was trained with whole-word or token-level masking. The package evaluation model can handle both types.

.. code-block:: python

   masked_df = gpipe.generate_masked_dataset(
       grew_query,
       dependency_node
   )


As an example, take the sentence "The boy eats the cake". Following from our above query isolating direct objects in verb phrases, the resulting string created would be "The boy eats the [MASK]".

Prompt Dataset (for Next-Token Prediction)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: python

   prompt_df = gpipe.generate_prompt_dataset(
       grew_query,
       dependency_node
   )


Creating Minimal Pairs
----------------------

Creating a minimal pair typically consists of adjusting a single typological feature, for instance *'case', 'aspect', 'person'*, and you must supply this feature in the correct way to Grew-TSE or else it will not know how to make this adjustment.
This involves first identifying your feature in the list of morphological features available, for instance using the code below.

.. code-block:: python

   features = gpipe.get_morphological_features()
   print("Adjust any of the following features when creating minimal pairs:")
   for f in features:
       print(f)

There are two additional important things to note:
   - **For all morphological features, the key is provided to the dict in lower case**, even if in the original treebank they contain uppercase letters. The feature value itself remains the same.
   - This **does not include universal part-of-speech** tags as the usefulness of these features is not immediately clear in this context, however this can be implemented if there is a use case.

We then specify how to alter a feature from the list above to form the "ungrammatical" counterpart. For example:

.. code-block:: python

   morphological_feature_adjustment = {
       "case": "Gen"
   }

The above example converts our target word to the Genitive case.
Once you've determined the correct adjustment, generate the minimal pairs:

.. code-block:: python

   mp_dataset = gpipe.generate_minimal_pair_dataset(
       morphological_feature_adjustment
   )

You may then save this dataset for use in TSE evaluation or use the Evaluator module to do this automatically.
If you want to use the evaluation module that handles the full testing for you, have a look at the below code.
Note that currently only Hugging Face encoder (e.g. BERT) or decoder (e.g. GPT) models are supported.

.. code-block:: python

    geval = GrewTSEvaluator()

    model_type = "encoder" # provide either 'encoder' or 'decoder'
    model_repo = "google-bert/bert-base-multilingual-cased" # provide a HF repo
    evaluation_results = geval.evaluate_model(mp_dataset, model_repo, model_type)

    accuracy = geval.get_accuracy()
    asd = geval.get_avg_surprisal_difference()

    print("=========================)
    print("Metrics (Higher is better)")
    print(f"Accuracy: {accuracy}")
    print(f"Average Surprisal Difference: {asd}")
    print("=========================)


End-to-End Workflow
---------------------------

Below is a minimal example pipeline for creating such minimal-pair syntactic tests.
Depending on your treebank, you may have to provide differing feature names and values.

.. code-block:: python

   from grewtse.pipeline import GrewTSEPipe

   gpipe = GrewTSEPipe()
   gpipe.parse_treebank("treebanks/your-treebank.conllu")

   # make sure to include the pattern { } syntax
   grew_query = "pattern { V [upos=VERB, Number=Sing]; }"
   dependency_node = "V"

   masked_df = gpipe.generate_masked_dataset(grew_query, dependency_node)

   alternative_morph_features = {"number": "Plur"}

   mp_dataset = gpipe.generate_minimal_pair_dataset(
       alternative_morph_features
   )

   geval = GrewTSEvaluator()

   model_type = "encoder" # provide either 'encoder' or 'decoder'
   model_repo = "google-bert/bert-base-multilingual-cased" # provide a HF repo
   evaluation_results = geval.evaluate_model(mp_dataset, model_repo, model_type)
   accuracy = geval.get_accuracy()
   asd = geval.get_avg_surprisal_difference()

   print("=========================)
   print("Metrics (Higher is better)")
   print(f"Accuracy: {accuracy}")
   print(f"Average Surprisal Difference: {asd}")
   print("=========================)