当前位置:首页-全部动态

搜索引擎抓取页面失败,是服务器出了什么问题?

来源:http://www.daoshangbao.cn 时间:2019/11/25

  搜索引擎抓取页面失败,是服务器出了什么问题?济阳网站优化公司小编给大家介绍一下!

  Search engine failed to grab the page. What's wrong with the server? Jiyang website optimization company editor to introduce to you!

  1、错误的封禁

  1. Wrong prohibition

  在百度的robots.txt的更新上,假如屡次点击“检测并更新”就会呈现经常能够更新,但是又经常无法更新的问题。如此一来:不应当被收录的东西,在robots.txt上制止的被收录了,又删除就很正常了。那么它的问题是什么呢?并非服务器负载过度,而是由于防火墙错误的将局部Baiduspider列入了黑名单。

搜索引擎抓取页面失败,是服务器出了什么问题?

  In Baidu's robots.txt update, if you click "detect and update" repeatedly, it will appear that you can often update, but often can't update. In this way, it is normal to delete the items that should not be included and those that are stopped from being included in robots.txt. So what's the problem with it? It's not that the server is overloaded, but because the firewall mistakenly blacklisted the local Baidu pider.

  2、服务器异常

  2. Server exception

  常规的服务器就不说了,大家都晓得的,北上广的普通都不错。不过有一些特殊服务器,想必绝大多数的站长都不晓得吧?例如西部数码的“港台服务器”就很有趣,真的是港台的吗?自身机房在国内,还算什么港台?为了逃避备案而用一个港台的IP,数据全部在国内。

  The regular servers will not be mentioned. As we all know, the ordinary servers in beishangguang are good. But there are some special servers, presumably the vast majority of webmasters do not know it? For example, Western Digital's "Hong Kong and Taiwan server" is very interesting. Is it really Hong Kong and Taiwan's? What kind of port and Taiwan is our own computer room in China? In order to avoid the filing, the IP address of a port and Taiwan is used, and the data is all in China.

  这样有什么不好呢?我们会发现:站点的服务器是经过CDN的,哪怕是你上传的一张图片,都会显现为“302状态码”,访问速度是提升上去了,可是这样利于SEO吗?

  What's wrong with that? We will find that: the site server is through CDN, even if you upload a picture, it will appear as "302 status code", the access speed is improved, but this is good for SEO?

搜索引擎抓取页面失败,是服务器出了什么问题?

  3、获取不到真实IP

  3. No real IP can be obtained

  范围较大的网站,普通都会运用CDN加速,但是有些站点不只仅对“设备”运用了CDN加速,而且还对Spider运用了加速功用。后的结果是什么呢?假如CDN节点不稳定,那么对网站spider来讲,这个问题将是致命的。

  For a wide range of websites, CDN acceleration is generally used, but some websites not only use CDN acceleration for "devices", but also use acceleration function for spider. What is the result? If CDN node is not stable, this problem will be fatal for web spider.

  很多大型站点开放CDN的缘由就是容易被攻击,这个时分假如不做“蜘蛛回源”就不可思议了。你的站点做了CDN了吗?请登录百度站长平台查看一下spider能否能够抓取真实IP地址吧!

  Many large sites open CDN is easy to be attacked, this time if not to do "spider back to the source" is incredible. Did your site do CDN? Please log in Baidu webmaster platform to see if spider can grab the real IP address!

  4、频繁的50X类错误

  4. Frequent 50x errors

  这样的链接其中一个共同的特性是:当翻开后,全部都是正常的,那么Spider为什么会报错提示呢?只是由于在爬虫发起抓取的那一刻,httpcode返回了5XX",你的站点能否频繁有这样的问题呢?有的话需求立刻布置技术,或者通报IDC服务商做处理了!

  One of the common features of such links is: when they are opened, they are all normal. Why does spider give an error prompt? It's just that httpcode returned 5xx at the moment when the crawler started crawling. "Can your site have such a problem frequently? In some cases, you need to immediately arrange technology, or inform IDC service provider to deal with it!

济阳网站优化

  5、错误的抓取比例

  5. Wrong grab ratio

  任何网站都做不到100%不出问题,但是万事有一个度:我们以为,这个比例不超越5%,对网站根本上是无影响的,且这样的错误不应当每天都呈现。常见的抓取错误普通都是衔接超时:"抓取恳求衔接树立后,下载页面速渡过慢,招致超时,可能缘由服务器过载,带宽缺乏"这种状况:

  No website can do 100% without problems, but there is a degree in everything: we believe that this proportion is no more than 5%, which has no impact on the website at all, and such errors should not be presented every day. The common grabbing errors are usually the connection timeout: "after the grabbing and pleading connection is established, the download page speed is too slow, resulting in timeout, which may be caused by server overload and lack of bandwidth"

  A:尽量在不影响图片质量的状况下,对图片进行压缩缩,上传的时分就进行了压缩。

  A: try not to affect the quality of the picture, compress the picture, and compress it when it is uploaded.

搜索引擎抓取页面失败,是服务器出了什么问题?

  B:减少如JS脚本文件类型的运用,或者进行兼并

  B: reduce the use of script file types such as JS, or merge

  C:页面大小进行控制,特别是一些阅读量、抓取量较高的页面,不倡议超越2MB。

  C: the page size is controlled, especially for some pages with high reading and grabbing volume. It is not recommended to exceed 2MB.

  D:增加网站的带宽,提升下载速度,或者改换服务器。

  D: increase the bandwidth of the website, improve the download speed, or change the server.

  今天济阳网站优化公司关于网站收录服务器端问题就分享到这里,更多网站建立、优化等问题可咨询济南诺商信息技术有限公司。

  Today, Jiyang website optimization Co., Ltd. shares the server-side problems about website collection here. For more questions about website establishment and optimization, please consult Jinan nuoshang Information Technology Co., Ltd.