# tokenizeBert {#tokenizeBert}

**Parent topic:**[Functions](../../Functions/category.md)

## Syntax {#syntax}

`tokenizeBert(text, vocabName, [addSpecialTokens=true])`

## Arguments {#arguments}

**text** A LITERAL scalar, representing the text to be tokenized.

**vocabName** A STRING scalar specifying the name of the vocabulary to use.

**addSpecialTokens** \(optional\) A boolean indicating whether to add special tokens at the beginning and end of the input text. Currently, only `[CLS]` at the beginning and `[SEP]` at the end are supported. Defaults to true.

## Details {#details}

Tokenizes the input *text* using the specified vocabulary. This function uses the WordPiece tokenization algorithm, designed for use with the BERT \(Bidirectional Encoder Representations from Transformers\) model.

**Return value**: A table with the following columns:

-   tokens: List of tokens.
-   input\_ids: Corresponding token ID.
-   attention\_mask: A mask value used for model input, currently always set to 1.

## Examples {#examples}

``` {#codeblock_hgp_xgp_mgc}
loadVocab("/home/data/vocab.txt", "vocab1")
tokenizeBert("apple ```\n—— abcd1234", "vocab1", true)
```

Related functions: [loadVocab](../l/loadVocab.md), [unloadVocab](../u/unloadVocab.md)

