-
-
Save liprais/78ef34ac0779fb6b604eae044d789f90 to your computer and use it in GitHub Desktop.
0. | |
基于tf2.0 | |
1. | |
把已有的模型导出成pb格式 | |
保存的时候的INFO:tensorflow:Unsupported signature for serialization貌似不用管 | |
python code: | |
import os | |
os.environ['TF_KERAS'] = '1' | |
import numpy as np | |
from bert4keras.backend import K as K | |
from bert4keras.models import build_transformer_model | |
from bert4keras.tokenizers import Tokenizer | |
from bert4keras.snippets import AutoRegressiveDecoder | |
from keras.models import load_model | |
import tensorflow as tf | |
from tensorflow.python.framework.ops import disable_eager_execution | |
disable_eager_execution() | |
model = 'gptml.h5' | |
base = '/Volumes/Untitled/pb' | |
keras_model = load_model(model,compile=False) | |
keras_model.save(base + '/150k/1',save_format='tf') # <====注意model path里面的1是代表版本号,必须有这个不然tf serving 会报找不到可以serve的model | |
2. | |
用docker启动server | |
docker run -p 8501:8501 --mount type=bind,source=/Volumes/Untitled/pb/150k/,target=/models/my_model -e MODEL_NAME=my_model -t tensorflow/serving | |
在这个页面查看模型的元数据: | |
http://localhost:8501/v1/models/my_model/metadata | |
inputs后面的就是api要求的参数 | |
"inputs": { "Input-Token": { "dtype": "DT_FLOAT","tensor_shape": {"dim": [{"size": "-1","name": ""},{"size": "-1","name": ""}],"unknown_rank": false},"name": "serving_default_Input-Token:0"}} | |
或者不用docker的话,在ubuntu上可以这样 | |
需要先安装tensorflow_model_server | |
然后执行命令: | |
tensorflow_model_server --model_base_path="/Volumes/Untitled/pb/150k" --rest_api_port=8501 --model_name="my_model" | |
3.用requests调用 | |
python code: | |
import requests | |
import json | |
payload = [[1921,7471,5682,5023,4170,7433]] # <=== 这是tokenizer编码过的中文:天青色等烟雨 | |
d = {"signature_name": "serving_default", | |
"inputs": {"Input-Token":[[1921,7471,5682,5023,4170,7433]]}} <=== payload | |
r = requests.post('http://127.0.0.1:8501/v1/models/my_model:predict',json=d) | |
print(r.json()) | |
4.以https://github.com/bojone/bert4keras/blob/master/examples/basic_language_model_gpt2_ml.py 为例子 | |
把这个类修改一下,加上远程调用的方法 | |
先依赖requests 和 numpy | |
import requests | |
import numpy | |
class ArticleCompletion(AutoRegressiveDecoder): | |
"""基于随机采样的文章续写 | |
""" | |
@AutoRegressiveDecoder.wraps(default_rtype='probas') | |
def predict(self, inputs, output_ids, states): | |
token_ids = np.concatenate([inputs[0], output_ids], 1) | |
return model.predict(token_ids)[:, -1] | |
def generate(self, text, n=1, topk=5): | |
token_ids, _ = tokenizer.encode(text) | |
results = self.random_sample([token_ids], n, topk) # 基于随机采样 | |
return [text + tokenizer.decode(ids) for ids in results] | |
def remote_call(self,token_ids): | |
payload = token_ids.tolist() | |
d = {"signature_name": "serving_default","inputs": {"Input-Token":payload}} | |
r = requests.post('http://127.0.0.1:8501/v1/models/my_model:predict',json=d) | |
return numpy.array(r.json()['outputs']) | |
然后就可以跑了 |
你是怎么确定的接口参数的呢?
你是怎么确定的接口参数的呢?
在这个页面查看模型的元数据:
http://localhost:8501/v1/models/my_model/metadata
inputs后面的就是api要求的参数
"inputs": { "Input-Token": { "dtype": "DT_FLOAT","tensor_shape": {"dim": [{"size": "-1","name": ""},{"size": "-1","name": ""}],"unknown_rank": false},"name": "serving_default_Input-Token:0"}}
我部署的模型在跑批处理的时候推理速度奇慢,看了资源管理发现容器占用的CPU和内存都很低,CPU在10%几,内存占用1G,请问您有遇到过这种问题么
请问 tf2.x 导出的pb模型文件是否可以用 tf1.x 读取呢?我发现报错:tensorflow.python.framework.errors_impl.NotFoundError: Op type not registered 'Einsum' in binary running on hulangdeMacBook-Pro.local.
Make sure the Op and Kernel are registered in the binary running in this process.
Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.)
tf.contrib.resampler
should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
请问您是否有遇到如何解决的?
大佬,有自定义的层怎么搞
class PtuningEmbedding(Embedding): ''' 定义新的embedding层,只优化部分token ''' def call(self, inputs, mode='embedding'): embeddings = self.embeddings embeddings_sg = K.stop_gradient(embeddings) mask = np.zeros((K.int_shape(embeddings)[0],1)) mask[1:9] += 1 self.embeddings = embeddings * mask + embeddings_sg * (1-mask) outputs = super(PtuningEmbedding, self).call(inputs, mode) self.embeddings = embeddings return outputs
比如这种,在转的时候需要怎么办?
请问不知道grpc要怎么连接呢