云岚

记一次CloudFlare故障处理过程
前言 本站域名下的网站一直用的都是CloudFlare的CDN 故障判断 大约在2月3号时候,看网站后台的统计访问...
扫描右侧二维码阅读全文
19
2019/02

记一次CloudFlare故障处理过程

前言

本站域名下的网站一直用的都是CloudFlare的CDN

故障判断

大约在2月3号时候,看网站后台的统计访问量一下子少了好多,于是ping了一下域名,发现不通,第一反应是ip是不是被墙了。

于是上了ping.pe上ping了一下CloudFlare分配的两个IP:104.31.242.9 与 104.31.243.9,发现全球均无法访问,表现为超时。

全球不通

至此,结论已经有了,CloudFlare的这两个IP因为一些原因故障了

临时方案

为了访问量,于是当时是直接改解析记录为A记录,解析到了CloudFlare的一个可用IP上,恢复了网站的正常访问
故障期间还多次收到Google Search Console的服务器不可用警告,对索引量造成了影响

这个方案我用了大概一礼拜,发现原来那两个IP还是没有恢复正常,于是决定发工单
这个临时解决方案短时间内用用还是可以的,用久了怕被CloudFlare发现删域名警告

无用的尝试

由于域名是通过挖站否的Partner接入的,这样我就可以不用配置NS,而是使用CNAME了,于是尝试删除域名重新添加,分配的还是原来的两个IP(重新添加域名后我解析记录还设置错了,这个后面会讲到)

然后升级套餐为Pro套餐,结果分配的IP还是原来的两个,懵逼,LOC的人让我要么找客服,要么开一个新的CloudFlare账号来添加域名,最后还是决定找客服发TK

寻找CloudFlare客服解决问题

在这一步开始之前,我先将国外解析为了CloudFlare提供的cname,不然没法解决这个问题,国内为了可访问依旧A记录

创建TK

进入support.cloudflare.com,点击自己名字会出现一个My Activities & Requests,点击进入后点击Submit a request

支持中心
发TK

进入以后会有三个选项第一个是查看CloudFlare服务可用性,第二个是在论坛询问,第三个是获得更多帮助,直接选择第三个

更多帮助

接着选择出现问题的域名,并描述问题摘要

问题

接着CloudFlare会给出简单问题解决方案,直接无视按Next

无视

最后选择问题所属部门,我这里选择的是Network,然后详细描述下问题以及填写接收提醒的邮箱,最后按下Send发送工单

发送工单

至此,工单就已经发送给CloudFlare的客服部门了

第一次回复

工单发了大概有2个多小时,客服回复了工单,要求我提供错误信息等

    ryu 2019年02月08日10:58AM
Hi ****,

Thank you for contacting Cloudflare Support. We're sorry to read that you're experiencing difficulties.

In order to better assist you with the problem you are experiencing, we will need some additional information from you.

Can you please share the following with us:

The specific error messages being returned and/or behaviours where you are seeing issues while on the website.
Specific step by step instructions on how to reproduce on our end - e.g. if this issue is only replicable behind a login, can you provide a temporary test account for us
A screenshot of the errors you are seeing.
Any relevant access logs from your web server.
A HAR file demonstrating the issue.
Please respond with that information as soon as you can so we can continue to work with you to resolve your issue.

Helpful resources
Cloudflare Error Messages (and what they mean)
How do I check my server's response directly without Cloudflare?
Reporting a bug
Thanks!

Ricky Yu | Support Engineer
Join the Cloudflare Community

这次我是老老实实提供了截图和所需文件的(请无视QQ截图)

第一次回复

第二次回复

等了大概1天后,客服回复我了,大致意思是他们用我的源站ip访问也不行,这时候我才发现我的解析在重新添加后设置错了

Alex M 2019年02月11日12:08AM
Hi ****,

The host name you're CNAME'ing to is currently not resolving:

~$ curl -svo /dev/null https://www.dxmc.net/
* Could not resolve host: www.dxmc.net
* Closing connection 0
~$ curl -svo /dev/null https://www.dxmc.net/
*   Trying 104.31.242.9...
* TCP_NODELAY set
* connect to 104.31.242.9 port 443 failed: Connection refused
*   Trying 104.31.243.9...
* TCP_NODELAY set
* connect to 104.31.243.9 port 443 failed: Connection refused
* Failed to connect to www.dxmc.net port 443: Connection refused
* Closing connection 0
~$ curl -svo /dev/null https://www.dxmc.net/ --connect-to ::***.***.net
* Connecting to hostname: ***.***.net
* Could not resolve host: ***.***.net
* Closing connection 0
~$ date -u
Sun Feb 10 16:02:13 UTC 2019
Please bring this up with your hosting provider or correct the CNAME in the DNS zone file is hosted by our partner Wzfou Co.

We will mark this as solved for the time being. Please let us know if you have any further questions or issues by replying to this e-mail or ticket to have it automatically reopened.

Regards,

Alex M | Support Engineer
Join the Cloudflare Community

赶紧设置好正确的CNAME记录,并且回复说已经设置好正确的记录但是依旧不能访问

第三次窒息的回复

隔了半天后,客服回复了我说他们访问一切正常,并且贴上了他们的连接日志,说如果不能访问就按照日志中的指令贴日志给他们

Alex M 2019年02月11日01:14AM
Hi ****,

We've run a test and notice that the site is now working correctly:

~$ curl -svo /dev/null https://www.dxmc.net/
*   Trying 2606:4700:30::681f:f209...
* TCP_NODELAY set
* Connected to www.dxmc.net (2606:4700:30::681f:f209) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
......剩下太长不贴了

然后,你没有看错,这位客服是用ipv6测试的,我从一开始就说的是两个ipv4 IP不能访问,他用ipv6访问然后告诉我一切正常

从此时开始,我已经有点发火了,给他们贴了我ipv4的连接日志

**** 2019年02月11日07:05AM
ipv6 is work, but ipv4 is down

[[email protected] ~]# curl -4svo /dev/null https://www.dxmc.net 
* About to connect() to www.dxmc.net port 443 (#0) 
* Trying 104.31.243.9... 
* Connection timed out 
* Trying 104.31.242.9... 
* Connection timed out 
* Failed connect to www.dxmc.net:443; Connection timed out 
* Closing connection 0

第四次回复

    Adam M. 2019年02月11日08:03AM
Hi there,

It looks like you're having issues with network connectivity to our edge. Could you provide us with the following?

1) A traceroute to www.dxmc.net
2) The output of https://cloudflare.com/cdn-cgi/trace if that is reachable from where you are.

Kind regards,

Adam M. | Support Engineer
Join the Cloudflare Community

这次,他们认为我的网络连接他们的CDN有问题,要我进测试页面把信息贴给他们,并且路由追踪发给他们

我发了。。。。

追踪

这次我是真的怒了,后面补了一句
this two ip can't be connected at everywhere. whatever China, America, UK. if you don't believe, you can try yourself

第五次回复(解决)

shanshan 2019年02月11日09:45AM
Hi ****,

I've checked with our Network team and we have fixed it:

ping -c 10 104.31.242.9
PING 104.31.242.9 (104.31.242.9): 56 data bytes
64 bytes from 104.31.242.9: icmp_seq=0 ttl=59 time=5.374 ms
64 bytes from 104.31.242.9: icmp_seq=1 ttl=59 time=5.086 ms
64 bytes from 104.31.242.9: icmp_seq=2 ttl=59 time=5.829 ms
64 bytes from 104.31.242.9: icmp_seq=3 ttl=59 time=4.112 ms
64 bytes from 104.31.242.9: icmp_seq=4 ttl=59 time=5.715 ms
64 bytes from 104.31.242.9: icmp_seq=5 ttl=59 time=4.667 ms
64 bytes from 104.31.242.9: icmp_seq=6 ttl=59 time=4.508 ms
64 bytes from 104.31.242.9: icmp_seq=7 ttl=59 time=21.826 ms
64 bytes from 104.31.242.9: icmp_seq=8 ttl=59 time=3.867 ms
64 bytes from 104.31.242.9: icmp_seq=9 ttl=59 time=4.547 ms

--- 104.31.242.9 ping statistics ---
10 packets transmitted, 10 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 3.867/6.553/21.826/5.128 ms
Would you please confirm this from your end as well? Thanks for your understanding and patience while we were working on it.

Best Regards,
Shanshan 
Technical Support Engineer
Join the Cloudflare Community

客服终于发现是自己的问题,并且反馈给了网络团队,网络团队修复了这个问题

总结

最终问题还是被解决掉了,这点可喜可贺
可是客服这反应实在糟心,我一开始就在说那两个ip无论在哪都不能连接,但是这几位客服没有一个发现,一直认为 是我网络的缘故,一直要我去提供信息,没有一个自己去测试ip是不是真的不能连接,直到最后我再次强调无论在哪都不能连接,这位shanshan客服终于给我解决了这个问题(看这名字我都怀疑这位是不是国人)

请无视我那蹩脚的英文,反正对方能理解就行

Last modification:February 19th, 2019 at 06:29 pm
If you think my article is useful to you, please feel free to appreciate

Leave a Comment