Interfaces with the 'Hugging Face' tokenizers library to provide implementations of today's most used tokenizers such as the 'Byte-Pair Encoding' algorithm <https://huggingface.co/docs/tokenizers/index>. It's extremely fast for both training new vocabularies and tokenizing texts.
| Version: | 0.2.1 |
| Depends: | R (≥ 4.2.0) |
| Imports: | R6, cli |
| Suggests: | rmarkdown, testthat (≥ 3.0.0), hfhub (≥ 0.1.1), withr |
| Published: | 2025-09-30 |
| DOI: | 10.32614/CRAN.package.tok |
| Author: | Daniel Falbel [aut, cre],
Regouby Christophe [ctb],
Posit [cph] tok author details |
| Maintainer: | Daniel Falbel <daniel at posit.co> |
| BugReports: | https://github.com/mlverse/tok/issues |
| License: | MIT + file LICENSE |
| URL: | https://github.com/mlverse/tok |
| NeedsCompilation: | yes |
| SystemRequirements: | Cargo (Rust's package manager), rustc >= 1.75 |
| Materials: | README, NEWS |
| CRAN checks: | tok results |
| Reference manual: | tok.html , tok.pdf |
| Package source: | tok_0.2.1.tar.gz |
| Windows binaries: | r-devel: tok_0.2.1.zip, r-release: tok_0.2.1.zip, r-oldrel: tok_0.2.1.zip |
| macOS binaries: | r-release (arm64): tok_0.2.1.tgz, r-oldrel (arm64): tok_0.2.1.tgz, r-release (x86_64): tok_0.2.1.tgz, r-oldrel (x86_64): tok_0.2.1.tgz |
| Old sources: | tok archive |
Please use the canonical form https://CRAN.R-project.org/package=tok to link to this page.