...
|
...
|
@@ -8,6 +8,42 @@ |
|
|
|
|
|
[http://git.yoho.cn/fe/spider-ufo](http://git.yoho.cn/fe/spider-ufo)
|
|
|
|
|
|
数据爬取原理:
|
|
|
|
|
|
```
|
|
|
|
|
|
du APP商品的spuId从0递增,故目前实现方案是
|
|
|
https://m.poizon.com/router/product/ProductDetail?spuId=249&sourceName=shareDetail
|
|
|
|
|
|
// 预估,需要手动递增
|
|
|
let productIds = 82000;
|
|
|
|
|
|
for(let i=0; i<productIds; i++) {
|
|
|
fetch(i);
|
|
|
}
|
|
|
|
|
|
获取du所有商品
|
|
|
|
|
|
```
|
|
|
|
|
|
## 稍微修改的潜在bug
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
// price-task.js中会在启动时获取所有的du.json文件的数据
|
|
|
const allData = await DuDataModel.findAll();
|
|
|
|
|
|
因为该文件没有清理,会累加。
|
|
|
|
|
|
node v8的内存限制为 64位系统约为1.4GB,32位系统约为0.7GB
|
|
|
|
|
|
当文件大小超过内存限制时启动回报out-of-memory错误,需要手动备份下
|
|
|
|
|
|
```
|
|
|
|
|
|
![](https://cdn.nlark.com/yuque/0/2020/png/193477/1585277751900-d85a46c4-24f8-4590-b09a-89b31a1433f3.png?x-oss-process=image/resize,w_1492)
|
|
|
|
|
|
**环境发布**
|
|
|
|
|
|
注:需要堡垒机权限,没有就找运维申请
|
...
|
...
|
@@ -35,6 +71,8 @@ pm2 logs 查看运行状态 |
|
|
|
|
|
1.数据存储目录 /Data/node/spider-ufo/db
|
|
|
|
|
|
|
|
|
|
|
|
![](https://cdn.nlark.com/yuque/0/2020/png/193477/1584412736680-a9232c21-a194-47a0-9b05-f83d7a9948d9.png#align=left&display=inline&height=18&originHeight=18&originWidth=396&size=0&status=done&style=none&width=396) 毒全量商品数据(可精简)
|
|
|
|
|
|
![](https://cdn.nlark.com/yuque/0/2020/png/193477/1584412736579-55e20cc9-2899-4ea7-b1ea-6d8dca33d203.png#align=left&display=inline&height=17&originHeight=17&originWidth=483&size=0&status=done&style=none&width=483) ufo商品与毒商品的对应关系(神箭手已停)
|
...
|
...
|
|