promtail + loki + grafana 日志看板平台,帮助开发快速通过日志定位问题

公司一个项目,一个服务有多台服务器,开发每次查看php程序或go程序日志都要去ftp拉取,一台一台找,找个问题搞半天,最近优化了服务器成本,释放了多台云主机,留了一台有docker的机器安装loki和granfana;收集日志采用promtail,并使用supervisor管理进程

docker安装

一、docker-compose 安装loki 、grafana

1.1 安装 docker-compose 命令

curl -L https://get.daocloud.io/docker/compose/releases/download/1.21.1/docker-compose-`uname -s`-`uname -m` -o /usr/bin/docker-compose chmod +x /usr/bin/docker-compose

1.2 loki 目录结构

[root@loki ~]# mkdir loki [root@loki ~]# tree loki loki ├── config │   └── loki │       ├── config.yaml │       └── config.yamlbak └── docker-compose.yaml [root@loki ~]# cd docker-compose/loki 

1.3 编写loki和granfana的docker-compose

[root@loki loki]# vim docker-compose.yaml
cat docker-compose.yaml version: "3"  networks:   loki:  services:     loki:         image: grafana/loki:latest         ports:           - "3100:3100"           - "9095:9095"         command: -config.file=/etc/loki/config.yaml         volumes:             - ./config/loki:/etc/loki             - /data/loki:/loki         networks:           - loki     grafana:         image: grafana/grafana:latest         ports:           - "3000:3000"         volumes:             - /data/grafana:/var/lib/grafana         environment:             GF_SECURITY_ADMIN_PASSWORD: 123456             GF_SERVER_HTTP_PORT: 3000         networks:           - loki

1.4 创建 loki配置 文件

在当前目录下,创建config/loki目录

cd docker-compose/loki mkdir -p config/loki/ vim config/loki/config.yaml
auth_enabled: false  server:   http_listen_port: 3100   grpc_listen_port: 9095   grpc_server_max_recv_msg_size: 1572864000 #grpc最大接收消息值,默认4m   grpc_server_max_send_msg_size: 1572864000 #grpc最大发送消息值,默认4m  ingester:   lifecycler:     address: 172.19.72.235     ring:       kvstore:         store: inmemory       replication_factor: 1     final_sleep: 0s   chunk_idle_period: 5m   chunk_retain_period: 30s   wal:     dir: /loki/wal  compactor:   working_directory: /loki/persistent      # 压缩目录,一般也作为持久化目录   compaction_interval: 10m                 # 压缩间隔   retention_enabled: true                  # 持久化开启   retention_delete_delay: 5m               # 过期后多久删除   retention_delete_worker_count: 150       # 过期删除协程数目 schema_config:     configs:       - from: "2023-10-23"         index:             period: 24h             prefix: loki_index_         object_store: filesystem          # 持久化方式:本地文件         schema: v11         store: boltdb-shipper  storage_config:     boltdb_shipper:         active_index_directory: /loki/boltdb-index    # index 目录         cache_location: /loki/boltdb-cache            # cache 目录     filesystem:         directory: /loki/chunks                       # chunks 目录 limits_config:   retention_period: 240h                              # 多久过期

创建数据目录并给777权限

如果不给777权限,启动会报错mkdir /loki/chunks: permission denied或其他目录无法创建

mkdir /data/loki mkdir /data/grafana chmod 777 /data/loki chmod 777 /data/grafana

安装kilo 、grafana

docker-compose up -d docker-compose logs #查看日志 docker-compose ps #查看进程
image

二、安装promtail

需要收集日志的服务器没安装docker,就直接下载安装包,在命令行启动

wget https://github.com/grafana/loki/releases/download/v2.9.2/promtail-linux-amd64.zip #当前最新版,和loki版本一样 unzip promtail-linux-amd64.zip vim promtail-local-config.yaml
server:   http_listen_port: 9080   grpc_listen_port: 0   grpc_server_max_recv_msg_size: 1572864000   grpc_server_max_send_msg_size: 1572864000  positions:   filename: /tmp/positions.yaml  clients:   - url: http://172.19.72.235:3100/loki/api/v1/push  scrape_configs: - job_name: api   static_configs:   - targets:       - 172.19.72.235     labels:       job: api       __path__: /data/runtime/logs/*/*.log  - job_name: websoket   static_configs:   - targets:       - 172.19.72.235     labels:       job: websoket       __path__: /data/logs/*.log

启动 promtail

./promtail-linux-amd64 --config.file=promtail-local-config.yaml

启动没问题之后添加到supervisor

[program:promtail-log] directory=/usr/local/data/promtail command=/usr/local/data/promtail/promtail-linux-amd64 -config.file=promtail-local-config.yaml autostart=true autorestart=true startsecs=5 priority=1 stopsignal=INT stopwaitsecs=11 stopasgroup=true killasgroup=true
[root@api1 promtail]# supervisorctl update [root@api1 promtail]# supervisorctl status promtail-log                     RUNNING   pid 11360, uptime 1 days, 0:38:48

三、遇到的问题汇总:

问题1 :权限问题

如果loki_loki_1启动失败,基本上都是/loki/persistent、/loki/wal、/loki/chunks、/loki/boltdb-index、/loki/boltdb-cache无法创建,而这些目录是挂在到本地磁盘/data/loki下面,只需要给/data/loki 777权限,重启服务即可

问题2:promtail和loki,发送接收报错

当日志量过大时候,promtail就报以下错误,loki接收也会报错

status: 500. message: rpc error: code = resourceexhausted desc = trying to send message larger than max (5066121 vs. 4194304)

在loki和promtail配置文件server中,加入下面参数,重启服务就好啦

  grpc_server_max_recv_msg_size: 1572864000   grpc_server_max_send_msg_size: 1572864000

上面的配置文件中,已经包含这两个参数

四、配置grafana

4.1 浏览器中打开granfana

4.2 添加数据源

添加数据源
测试数据源
日志过滤
过滤日志

至此就将grafana交给研发使用即可