kuromoji-es

License

日本語のREADMEはこちらです: README.ja.md

A Japanese morphological analyzer implemented as a pure JavaScript ES module. This is a modern port of the original Kuromoji, designed for simplicity and compatibility with current web standards.

Features

Usage

Import kuromoji-es and use the async createTokenizer function to get a tokenizer instance. The dictionary is loaded from a remote CDN by default.

import { kuromoji } from "https://code4fukui.github.io/kuromoji-es/kuromoji.js";

// Asynchronously load the dictionary and create a tokenizer
const tokenizer = await kuromoji.createTokenizer();

// Tokenize a sentence
const tokens = tokenizer.tokenize("すもももももももものうち");

console.log(tokens);

API Reference

kuromoji.createTokenizer()

Asynchronously loads the dictionary files and returns a Promise that resolves with a tokenizer instance.

tokenizer.tokenize(text)

Takes a string of Japanese text and returns an array of token objects, each containing detailed morphological information.

Token Object Structure

The tokenize() method returns an array of objects with the following structure:

[
  {
    "word_id": 509800,
    "word_type": "KNOWN",
    "word_position": 1,
    "surface_form": "黒文字",
    "pos": "名詞",
    "pos_detail_1": "一般",
    "pos_detail_2": "*",
    "pos_detail_3": "*",
    "conjugated_type": "*",
    "conjugated_form": "*",
    "basic_form": "黒文字",
    "reading": "クロモジ",
    "pronunciation": "クロモジ"
  }
]

Fields:

For more details on the dictionary fields, please refer to the original kuromoji.js JSDoc page.

Acknowledgements

This project is a port of kuromoji.js by Takuya Asano, which is a JavaScript port of the original Kuromoji project by Atilika Inc.

License

This library is licensed under the Apache License, Version 2.0.