原文链接: python json.dumps 中文编码问题

问题描述

python与nodejs通信, 参数为json格式数据, 且包含md5校验值
参数一旦包含中文, nodejs端校验md5码时就会失败

原因

json.dumps参数ensure_ascii所致

https://docs.python.org/3.6/library/json.html
If ensure_ascii is true (the default), the output is guaranteed to have all incoming non-ASCII characters escaped.
If ensure_ascii is false, these characters will be output as-is.

解决办法

json.dumps时设置参数ensure_ascii=False

解决过程

nodejs测试代码
1
2
3
4
5
6
7
8
9
10
md5 = require('md5');
data = {'中':'国'};
console.log(data);
j = JSON.stringify(data);
console.log(j);
var r = md5(j);
console.log(r);

输出

1
2
3
{ '中': '国' }
{"中":"国"}
d2a4c44eea271a0e00f70e4ea6fd88aa

node-md5, https://github.com/pvorb/node-md5

python测试代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
➜ ~ python3
Python 3.5.2 (default, Sep 14 2016, 11:28:32)
[GCC 6.2.1 20160901 (Red Hat 6.2.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import json
>>> import hashlib
>>> _dict = {'中': '国'}
>>> _json = json.dumps(_dict, separators=(',', ':'))
>>> _json
'{"\\u4e2d":"\\u56fd"}'
>>> _json.encode('utf-8')
b'{"\\u4e2d":"\\u56fd"}'
>>> m = hashlib.md5()
>>> m.update(_json.encode('utf-8'))
>>> m.hexdigest()
'84d64021339907e604f1dff60f505c14'
>>> _json = json.dumps(_dict, separators=(',', ':'), ensure_ascii=False)
>>> _json
'{"中":"国"}'
>>> _json.encode('utf-8')
b'{"\xe4\xb8\xad":"\xe5\x9b\xbd"}'
>>> m = hashlib.md5()
>>> m.update(_json.encode('utf-8'))
>>> m.hexdigest()
'd2a4c44eea271a0e00f70e4ea6fd88aa'

其他补充

json.dumps参数separators, 如果不设置为separators=(‘,’, ‘:’)也会导致nodejs端校验md5码失败

The default is (‘, ‘, ‘: ‘) if indent is None and (‘,’, ‘: ‘) otherwise.
To get the most compact JSON representation, you should specify (‘,’, ‘:’) to eliminate whitespace.
Changed in version 3.4: Use (‘,’, ‘: ‘) as default if indent is not None.