WordPress使用深度学习模型过滤垃圾评论

use dl model to filter comment in wordpress

上一篇文章中介绍了如何使用 TensorFlow 训练一个深度学习模型做评论内容过滤,本篇文章以此为基础,详细介绍实现对垃圾评论做过滤的方法和步骤。当 WordPress 后台收到并保存评论数据之前,首先是借助训练好的模型对评论内容做判断,通过就保存,否则就丢弃。

有关在 WordPress 中调用 Python 应用的方法请参看这篇文章的描述,下面分别给出 Python 端和 WordPress 端的代码。

Python 服务端就是载入训练好的模型,然后把客户端传入的评论数据作为预测参数进行判断,再把结果返回给客户端。

import os
import sys
import web
import json

import numpy as np
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow import keras

sys.path.append('/home/user/work/www/webpy/app')
os.environ['PYTHON_EGG_CACHE'] = '/home/user/work/www/webpy/.python-egg'
urls = (
  '/uwsgi-test', 'index'
)
app = web.application(urls, globals())

int2label = {0: "ham", 1: "spam"}
MAX_SEQUENCE_LENGTH = 100
model = keras.models.load_model("/home/user/work/model/spam_model.h5")
texts, labels = [], []
   with open("/home/user/work/data/my-spam-datasets") as f:
        for line in f:
            split = line.split()
            labels.append(split[0].strip())
            texts.append(' '.join(split[1:]).strip())
X, y = texts, labels
tokenizer = Tokenizer()
tokenizer.fit_on_texts(X)

class index:
        def POST(self):
            user_data = json.loads(web.data())
            user_key = web.ctx.env.get('HTTP_X_PYTHON_KEY')
            comment = user_data['comment']
            sequence = tokenizer.texts_to_sequences([comment])
            sequence = pad_sequences(sequence, maxlen=MAX_SEQUENCE_LENGTH)
            prediction = model.predict(sequence)[0]
            result = int2label[np.argmax(prediction)]
            if user_key =='1234567' and result == 'ham':
               return json.dumps({"result": "1"})
            else:
               return json.dumps({"result": "0"})
if __name__ == "__main__": app.run()
application = app.wsgifunc()

WordPress 客户端代码是使用评论预处理钩链 preprocess_comment。

function anti_spam($comment_data) {
     $comment = $comment_data['comment_content'];
     $curl = curl_init();
     curl_setopt_array($curl, array(
        CURLOPT_URL => "http://127.0.0.1/uwsgi-test",
        CURLOPT_RETURNTRANSFER => true,
        CURLOPT_FOLLOWLOCATION => true,
        CURLOPT_MAXREDIRS => 2,
        CURLOPT_TIMEOUT => 10,
        CURLOPT_ENCODING => "gzip,deflate",
        CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
        CURLOPT_HTTPHEADER => array(
          "Content-type: application/json;charset='utf-8'",
          "Accept: application/json",
          "Accept-Encoding: gzip, deflate",
          "x-python-host: uwsgi.xyz",
          "x-python-key: 1234567"
        ),
     ));
     curl_setopt($curl, CURLOPT_POST, true);
     curl_setopt($curl, CURLOPT_POSTFIELDS, json_encode(array('comment' => $comment)));
     $response = curl_exec($curl);
     $err = curl_error($curl);
     curl_close($curl);
     if ($err) {
        echo "Error: " . $err;
     } else {
        $jsonObj = json_decode($response);
        $result = $jsonObj->{'result'};
        if ($result == "1") return ($comment_data);
        else wp_die(__('Sorry, ignore comment!'));
     }
     wp_die(__('Sorry, ignore comment!'));
}
add_filter('preprocess_comment', 'anti_spam');

发表评论

邮箱地址不会被公开。 必填项已用*标注