MySQLでmysqlftppc mecab pluginを使って全文検索をする
試した環境
# cat /etc/redhat-release CentOS release 5.3 (Final)
mysql> select version(); +------------+ | version() | +------------+ | 5.1.39-log | +------------+ 1 row in set (0.00 sec)
mecab
# wget http://sourceforge.net/projects/mecab/files/mecab/0.98/mecab-0.98.tar.gz/download # tar xzvf mecab-0.98.tar.gz # cd mecab-0.98 # ./configure # make # make check
... runtests faild in FAIL: run-cost-train.sh =================== 1 of 3 tests failed =================== make[2]: *** [check-TESTS] エラー 1 make[2]: ディレクトリ `/tmp/mysql/mecab-0.98/tests' から出ます make[1]: *** [check-am] エラー 2 make[1]: ディレクトリ `/tmp/mysql/mecab-0.98/tests' から出ます make: *** [check-recursive] エラー 1
エラーが出る。
このエラーは無視してよいらしい。
# make install
無事完了。
mecab-naist-jdic
# wget http://iij.dl.sourceforge.jp/naist-jdic/40117/mecab-naist-jdic-0.6.0-20090616pre3.tar.gz # tar xzvf mecab-naist-jdic-0.6.0-20090616pre3.tar.gz # cd mecab-naist-jdic-0.6.0-20090616pre3 # ./configure --with-charset=utf8 # make
... /usr/local/libexec/mecab/mecab-dict-index -d . -o . -f EUC-JP -t utf8 /usr/local/libexec/mecab/mecab-dict-index: error while loading shared libraries: libmecab.so.1: cannot open shared object file: No such file or directory make: *** [matrix.bin] エラー 127
エラーが出る。
mecabのライブラリ(libmecab.so.1)が必要らしいので、ライブラリを読み込み直してから実行する。
# ldconfig # make
今度は通った。
※ちなみに、これでも失敗する場合は/etc/ld.co.conf.d/配下に.confファイルを作って/usr/local/libのパスを通せば動くはず
mecab
# vi /usr/local/etc/mecabrc
dicdir = /usr/local/lib/mecab/dic/ipadic
を
dicdir = /usr/local/lib/mecab/dic/naist-jdic
に変更する。
mysqlftppc-mecab
# wget http://sourceforge.net/projects/mysqlftppc/files/mysqlftppc/1.6/mysqlftppc-mecab-1.6.tar.gz/download # tar xzvf mysqlftppc-mecab-1.6.tar.gz # cd mysqlftppc-mecab-1.6 # ./configure # make # make install
mysql
mysql> install plugin mecab soname 'libftmecab.so'; mysql> show plugins; +---------------------+----------+--------------------+---------------------+---------+ | Name | Status | Type | Library | License | +---------------------+----------+--------------------+---------------------+---------+ | binlog | ACTIVE | STORAGE ENGINE | NULL | GPL | | partition | ACTIVE | STORAGE ENGINE | NULL | GPL | | ARCHIVE | ACTIVE | STORAGE ENGINE | NULL | GPL | | BLACKHOLE | ACTIVE | STORAGE ENGINE | NULL | GPL | | CSV | ACTIVE | STORAGE ENGINE | NULL | GPL | | FEDERATED | ACTIVE | STORAGE ENGINE | NULL | GPL | | MEMORY | ACTIVE | STORAGE ENGINE | NULL | GPL | | MyISAM | ACTIVE | STORAGE ENGINE | NULL | GPL | | MRG_MYISAM | ACTIVE | STORAGE ENGINE | NULL | GPL | | ndbcluster | DISABLED | STORAGE ENGINE | NULL | GPL | | InnoDB | ACTIVE | STORAGE ENGINE | ha_innodb_plugin.so | GPL | | INNODB_TRX | ACTIVE | INFORMATION SCHEMA | ha_innodb_plugin.so | GPL | | INNODB_LOCKS | ACTIVE | INFORMATION SCHEMA | ha_innodb_plugin.so | GPL | | INNODB_LOCK_WAITS | ACTIVE | INFORMATION SCHEMA | ha_innodb_plugin.so | GPL | | INNODB_CMP | ACTIVE | INFORMATION SCHEMA | ha_innodb_plugin.so | GPL | | INNODB_CMP_RESET | ACTIVE | INFORMATION SCHEMA | ha_innodb_plugin.so | GPL | | INNODB_CMPMEM | ACTIVE | INFORMATION SCHEMA | ha_innodb_plugin.so | GPL | | INNODB_CMPMEM_RESET | ACTIVE | INFORMATION SCHEMA | ha_innodb_plugin.so | GPL | | mecab | ACTIVE | FTPARSER | libftmecab.so | BSD | +---------------------+----------+--------------------+---------------------+---------+ 19 rows in set (0.00 sec) mysql> show status like 'Mecab_info'; +---------------+---------------------------------------+ | Variable_name | Value | +---------------+---------------------------------------+ | Mecab_info | with mecab 0.98, ICU 3.6(Unicode 5.0) | +---------------+---------------------------------------+ 1 row in set (0.00 sec)
my.cnf
mysql> show variables like 'mecab%'; +-----------------------+---------+ | Variable_name | Value | +-----------------------+---------+ | mecab_dicdir | | | mecab_normalization | OFF | | mecab_unicode_version | DEFAULT | | mecab_userdic | | +-----------------------+---------+ 4 rows in set (0.00 sec)
Unicode正規化をしたいので設定を変える。
# vi /etc/my.cnf
[mysqld] ... # mysqlftppc-mecab mecab_normalization=KC mecab_unicode_version=3.2 mecab_dicdir=/usr/local/lib/mecab/dic/naist-jdic
ICUがUnicode 5.0なのでmecab_unicode_versionも5.0でいけそうだけど、公式サイトでも3.2なのでそのままにした。
# service mysqld restart
あそぶ
mysql> create database test; mysql> use test; mysql> create table test(msg text, fulltext(msg) with parser mecab) engine=myisam default charset=utf8; mysql> insert into test values('俺は人間をやめるぞ!ジョジョーッ!!'); mysql> select * from test where match(msg) against('+"やめる"' in boolean mode); +--------------------------------------------------------+ | msg | +--------------------------------------------------------+ | 俺は人間をやめるぞ!ジョジョーッ!! | +--------------------------------------------------------+ 1 row in set (0.00 sec)
動いたっぽい。