但行好事  莫问前程

记一次性能优化过程中的术与道

优化前性能

最近集团要做FY19的机器采购预算,我们原来使用的A8F43机型都不在采购范围内了,新的采购机型推荐F53,需要采购多少台F53?需要有压测数据作为依据来做预算。
优化前,我们的placecard服务在F53上的压测性能结果如下:

备注:

  • 这里的线上是A8机器,32cpu
  • F53机器上压测时,压测客户端上出现connection refused错误,服务端上观察到的qps上不去。
  • 可以看到在F53(96核)和线上A832核)上的qps差不多,不符合预期。

优化过程

从问题入手

压测客户端上报了connection refused,怀疑是和tomcat配置有关,当服务端tomcat所有的线程都被占满之后,最大的排队请求数是由acceptCount设置的,当队列中的请求数达到了acceptCount上限后,随后的请求会被拒绝。
参考tomcatHttp Connector参数配置说明:
Tomcat Http Connector配置参数

查看线上及压测机当前tomcat http connector配置,acceptCount的确设置的不大:

如果是由这个问题引起的,那么如果把acceptCount配置为一个特别大的值,就应该不会再出现这个问题了,我们修改后的Connector参数配置如下:

再压测果然没有connection refused的情况了(针对压测客户端直连服务器上的其中一个tomcat的场景,我们服务器上部署了3个tomcat),而是出现了新的read timeout的情况,说明大量请求都在队列中,客户端等待响应超时,出现read timeout。顺便说一句,我们压测工具使用的是jmeter,当出现read timeout时,压测客户端由于在等待响应,实际发送的请求qps数减少了,服务器上观察到的qps也没有上去。

分析问题的深层原因

虽然在上述场景下没有了connection refused问题,但是服务端的cpu使用率还是在百分之十几,qps也并没有上去。问题还没有解决。革命尚未成功、同志仍需努力。

增大acceptCount后,压测客户端上看到的情况不一样了,connection refused消失了,变为了read timeout的错误。说明大量请求都在服务端的队列中了,那么为什么请求会堆积在队列中而不是迅速的被服务端的多线程给处理掉,从上文的贴图中可以看到,服务端的最大线程数配置为了2048(这也是配置了一个很大的值),那么实际有多少个线程,这些线程都在干嘛,是真的这些线程处理不过来吗?拿当时的服务端的jstack快照来看看。
搜一下业务处理的线程数有多少:

发现确实达到了配置的最大线程数2048,这些线程都在干活吗?搜一下runnable状态的有多少

2048个线程中,runnable状态的线程数有75个,waiting状态的有1973个,见下图:

waiting的这些线程都处于BLOCKED状态,是因为写的业务代码没有考虑并发性能问题吗?我们来从线程栈里找找线索,贴上一段jstack信息如下:

"http-bio-8080-exec-2048" #2637 daemon prio=5 os_prio=0 tid=0x00007ff46412b000 nid=0x1ed48 waiting for monitor entry [0x00007feb63ebd000]
   java.lang.Thread.State: BLOCKED (on object monitor)
    at java.util.Collections$SynchronizedMap.get(Collections.java:2584)
    - waiting to lock <0x00000007409b1450> (a java.util.Collections$SynchronizedMap)
    at org.dom4j.tree.QNameCache.get(QNameCache.java:117)
    at org.dom4j.DocumentFactory.createQName(DocumentFactory.java:199)
    at org.dom4j.tree.NamespaceStack.createQName(NamespaceStack.java:392)
    at org.dom4j.tree.NamespaceStack.pushQName(NamespaceStack.java:374)
    at org.dom4j.tree.NamespaceStack.getQName(NamespaceStack.java:213)
    at org.dom4j.io.SAXContentHandler.startElement(SAXContentHandler.java:234)
    at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.startElement(AbstractSAXParser.java:509)
    at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanStartElement(XMLNSDocumentScannerImpl.java:374)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2784)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:602)
    at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:112)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:505)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:841)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:770)
    at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
    at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213)
    at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:643)
    at org.dom4j.io.SAXReader.read(SAXReader.java:465)
    at org.dom4j.DocumentHelper.parseText(DocumentHelper.java:278)
    at com.autonavi.snowman.lse.datamodel.LocalLbsResultParser.parse(LocalLbsResultParser.java:60)
    at com.autonavi.snowman.placecard.service.server.SearchServer.getQueryResult(SearchServer.java:719)
    at com.autonavi.snowman.placecard.service.server.SearchServer.getQueryIdsResult(SearchServer.java:754)
    at com.autonavi.snowman.placecard.service.server.PlaceLookupServer.buildResponse(PlaceLookupServer.java:42)
    at com.autonavi.snowman.placecard.service.server.PlaceLookupServer.queryAndBuildResponse(PlaceLookupServer.java:35)
    at com.autonavi.snowman.placecard.service.server.autocomplete.SectionsResultBuilder.findAllBusinessPlaces(SectionsResultBuilder.java:138)
    at com.autonavi.snowman.placecard.service.server.autocomplete.SectionsResultBuilder.setSection(SectionsResultBuilder.java:240)
    at com.autonavi.snowman.placecard.service.server.autocomplete.SectionsResultBuilder.add2Section(SectionsResultBuilder.java:582)
    at com.autonavi.snowman.placecard.service.server.autocomplete.SectionsResultBuilder.build(SectionsResultBuilder.java:99)
    at com.autonavi.snowman.placecard.service.server.AutocompleteServer.service(AutocompleteServer.java:146)
    at com.autonavi.snowman.placecard.service.PlacecardService.service(PlacecardService.java:51)
    at com.autonavi.snowman.placecard.service.PlacecardService.service(PlacecardService.java:26)
    at com.autonavi.snowman.framework.service.ProtobufService.doService(ProtobufService.java:39)
    at com.autonavi.snowman.framework.service.ProtobufService.doService(ProtobufService.java:15)
    at com.autonavi.snowman.framework.service.ProtobufService$$FastClassBySpringCGLIB$$255864de.invoke(<generated>)
    at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204)
    at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:721)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157)
    at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:92)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
    at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:656)
    at com.autonavi.snowman.placecard.service.PlacecardService$$EnhancerBySpringCGLIB$$fd8d6e68.doService(<generated>)
    at com.autonavi.snowman.framework.controller.AbstractController.processRequest(AbstractController.java:81)
    at com.autonavi.snowman.placecard.controller.PlaceCardController.handleRequest(PlaceCardController.java:37)
    at sun.reflect.GeneratedMethodAccessor84.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:220)
    at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:134)
    at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:116)
    at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:827)
    at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:738)
    at org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:85)
    at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:963)
    at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:897)
    at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:970)
    at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:872)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:646)
    at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:846)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
    at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
    at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:197)
    at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
    at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:197)
    at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
    at org.springframework.boot.web.support.ErrorPageFilter.doFilter(ErrorPageFilter.java:117)
    at org.springframework.boot.web.support.ErrorPageFilter.access$000(ErrorPageFilter.java:61)
    at org.springframework.boot.web.support.ErrorPageFilter$1.doFilterInternal(ErrorPageFilter.java:92)
    at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
    at org.springframework.boot.web.support.ErrorPageFilter.doFilter(ErrorPageFilter.java:110)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
    at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
    at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
    at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:504)
    at com.taobao.tomcat.valves.ContextLoadFilterValve$FilterChainAdapter.doFilter(ContextLoadFilterValve.java:191)
    at com.taobao.eagleeye.EagleEyeFilter.doFilter(EagleEyeFilter.java:81)
    at com.taobao.tomcat.valves.ContextLoadFilterValve.invoke(ContextLoadFilterValve.java:150)
    at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170)
    at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
    at org.apache.catalina.valves.RemoteIpValve.invoke(RemoteIpValve.java:683)
    at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
    at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:423)
    at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1080)
    at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:611)
    at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:314)
    - locked <0x000000075808e9b0> (a org.apache.tomcat.util.net.SocketWrapper)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1152)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:627)
    at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
    at java.lang.Thread.run(Thread.java:852)

"http-bio-8080-exec-2046" #2635 daemon prio=5 os_prio=0 tid=0x00007ff474120800 nid=0x1ed47 waiting for monitor entry [0x00007feb63fbe000]
   java.lang.Thread.State: BLOCKED (on object monitor)
    at java.util.Collections$SynchronizedMap.get(Collections.java:2584)
    - waiting to lock <0x00000007409b1450> (a java.util.Collections$SynchronizedMap)
    at org.dom4j.tree.QNameCache.get(QNameCache.java:117)
    at org.dom4j.DocumentFactory.createQName(DocumentFactory.java:199)
    at org.dom4j.tree.NamespaceStack.createQName(NamespaceStack.java:392)
    at org.dom4j.tree.NamespaceStack.pushQName(NamespaceStack.java:374)
    at org.dom4j.tree.NamespaceStack.getQName(NamespaceStack.java:213)
    at org.dom4j.io.SAXContentHandler.startElement(SAXContentHandler.java:234)
    at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.startElement(AbstractSAXParser.java:509)
    at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanStartElement(XMLNSDocumentScannerImpl.java:374)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2784)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:602)
    at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:112)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:505)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:841)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:770)
    at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
    at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213)
    at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:643)
    at org.dom4j.io.SAXReader.read(SAXReader.java:465)
    at org.dom4j.DocumentHelper.parseText(DocumentHelper.java:278)
    at com.autonavi.snowman.lse.datamodel.LocalLbsResultParser.parse(LocalLbsResultParser.java:60)
    at com.autonavi.snowman.placecard.service.server.SearchServer.getResult(SearchServer.java:437)
    at com.autonavi.snowman.placecard.service.server.SearchServer.getQueryResult(SearchServer.java:525)
    at com.autonavi.snowman.placecard.util.TaiwnSiriProxy.getQueryResult(TaiwnSiriProxy.java:108)
    at com.autonavi.snowman.placecard.service.server.SiriServer.service(SiriServer.java:131)
    at com.autonavi.snowman.placecard.service.PlacecardService.service(PlacecardService.java:51)
    at com.autonavi.snowman.placecard.service.PlacecardService.service(PlacecardService.java:26)
    at com.autonavi.snowman.framework.service.ProtobufService.doService(ProtobufService.java:39)
    at com.autonavi.snowman.framework.service.ProtobufService.doService(ProtobufService.java:15)
    at com.autonavi.snowman.framework.service.ProtobufService$$FastClassBySpringCGLIB$$255864de.invoke(<generated>)
    at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204)
    at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:721)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157)
    at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:92)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
    at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:656)
    at com.autonavi.snowman.placecard.service.PlacecardService$$EnhancerBySpringCGLIB$$fd8d6e68.doService(<generated>)
    at com.autonavi.snowman.framework.controller.AbstractController.processRequest(AbstractController.java:81)
    at com.autonavi.snowman.placecard.controller.PlaceCardController.handleRequest(PlaceCardController.java:37)
    at sun.reflect.GeneratedMethodAccessor84.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:220)
    at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:134)
    at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:116)
    at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:827)
    at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:738)
    at org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:85)
    at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:963)
    at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:897)
    at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:970)
    at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:872)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:646)
    at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:846)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
    at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
    at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:197)
    at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
    at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:197)
    at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
    at org.springframework.boot.web.support.ErrorPageFilter.doFilter(ErrorPageFilter.java:117)
    at org.springframework.boot.web.support.ErrorPageFilter.access$000(ErrorPageFilter.java:61)
    at org.springframework.boot.web.support.ErrorPageFilter$1.doFilterInternal(ErrorPageFilter.java:92)
    at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
    at org.springframework.boot.web.support.ErrorPageFilter.doFilter(ErrorPageFilter.java:110)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
    at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
    at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
    at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:504)
    at com.taobao.tomcat.valves.ContextLoadFilterValve$FilterChainAdapter.doFilter(ContextLoadFilterValve.java:191)
    at com.taobao.eagleeye.EagleEyeFilter.doFilter(EagleEyeFilter.java:81)
    at com.taobao.tomcat.valves.ContextLoadFilterValve.invoke(ContextLoadFilterValve.java:150)
    at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170)
    at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
    at org.apache.catalina.valves.RemoteIpValve.invoke(RemoteIpValve.java:683)
    at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
    at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:423)
    at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1080)
    at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:611)
    at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:314)
    - locked <0x0000000755391918> (a org.apache.tomcat.util.net.SocketWrapper)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1152)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:627)
    at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
    at java.lang.Thread.run(Thread.java:852)

"http-bio-8080-exec-2044" #2633 daemon prio=5 os_prio=0 tid=0x00007ff470130000 nid=0x1ed46 waiting for monitor entry [0x00007feb640bf000]
   java.lang.Thread.State: BLOCKED (on object monitor)
    at java.util.Collections$SynchronizedMap.get(Collections.java:2584)
    - waiting to lock <0x00000007409b1450> (a java.util.Collections$SynchronizedMap)
    at org.dom4j.tree.QNameCache.get(QNameCache.java:117)
    at org.dom4j.DocumentFactory.createQName(DocumentFactory.java:199)
    at org.dom4j.tree.NamespaceStack.createQName(NamespaceStack.java:392)
    at org.dom4j.tree.NamespaceStack.pushQName(NamespaceStack.java:374)
    at org.dom4j.tree.NamespaceStack.getQName(NamespaceStack.java:213)
    at org.dom4j.io.SAXContentHandler.startElement(SAXContentHandler.java:234)
    at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.startElement(AbstractSAXParser.java:509)
    at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanStartElement(XMLNSDocumentScannerImpl.java:374)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2784)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:602)
    at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:112)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:505)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:841)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:770)
    at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
    at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213)
    at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:643)
    at org.dom4j.io.SAXReader.read(SAXReader.java:465)
    at org.dom4j.DocumentHelper.parseText(DocumentHelper.java:278)
    at com.autonavi.snowman.lse.datamodel.LocalLbsResultParser.parse(LocalLbsResultParser.java:60)

查看锁0x00000007409b1450有多少:

锁都发生在这行代码上 'org.dom4j.tree.QNameCache.get(QNameCache.java:117)‘,这是dom4j包里的代码,看来不是因为我们开发同学的代码引起的。
我们这个业务的和下游服务(地图搜索引擎)之间的通讯序列化方式是xml,上述锁发生在responsexml串解析时。按理说不同线程不同请求之间的xml解析相互之间不应该有锁。分析dom4j的代码发现QNameCacheget方法里的确是有synchronizedMap的同步锁,相关代码如下:

public class QNameCache {
    /** Cache of {@link QName}instances with no namespace */
    protected Map noNamespaceCache = Collections
            .synchronizedMap(new WeakHashMap());

    /**
     * Cache of {@link Map}instances indexed by namespace which contain caches
     * of {@link QName}for each name
     */
    protected Map namespaceCache = Collections
            .synchronizedMap(new WeakHashMap());

    /**
     * The document factory associated with new QNames instances in this cache
     * or null if no instances should be associated by default
     */
    private DocumentFactory documentFactory;

    ...
    ...

    /**
     * DOCUMENT ME!
     * 
     * @param name
     *            DOCUMENT ME!
     * @param namespace
     *            DOCUMENT ME!
     * 
     * @return the QName for the given local name and namepsace
     */
    public QName get(String name, Namespace namespace) {
        Map cache = getNamespaceCache(namespace);
        QName answer = null;

        if (name != null) {
            answer = (QName) cache.get(name);
        } else {
            name = "";
        }

        if (answer == null) {
            answer = createQName(name, namespace);
            answer.setDocumentFactory(documentFactory);
            cache.put(name, answer);
        }

        return answer;
    }

    ...
    ...

    /**
     * DOCUMENT ME!
     * 
     * @param namespace
     *            DOCUMENT ME!
     * 
     * @return the cache for the given namespace. If one does not currently
     *         exist it is created.
     */
    protected Map getNamespaceCache(Namespace namespace) {
        if (namespace == Namespace.NO_NAMESPACE) {
            return noNamespaceCache;
        }

        Map answer = null;

        if (namespace != null) {
            answer = (Map) namespaceCache.get(namespace);
        }

        if (answer == null) {
            answer = createMap();
            namespaceCache.put(namespace, answer);
        }

        return answer;
    }

QNameCache里使用的noNamespaceCache(和namespaceCache)是一个SynchronizedMapget操作的时候会阻塞,在并发量大的时候,多个解析xml的线程之间发生阻塞,大量的线程处于BLOCKED状态。

解决dom4j QNameCache的锁竞争问题

解决上述问题的办法是消除这个锁竞争,去掉SynchronizedMap,改为一个ThreadLocalWeakHashMap,因为是ThreadLocal的,线程独有,所以不存在多个线程之间的锁竞争。当然这样做也有其弊端,多个线程各自去占用nameSpace空间,JVM内存占用的空间比以前大一点,但是在有限的nameSpace情况下,这个多出来的空间基本可以忽略不计。我们的业务场景也不会有无限多的nameSpace类型。最后这个map是一个WeakHashMap,在jvm内存空间不够的时候,WeakHashMap也是可以被JVM强制回收的。
修改之后的jar包已上传集团maven仓库,版本号为:1.6.1-snowman。可能有同学会问,为什么不用最新的dom4j,因为最新版本的dom4j这个问题也依然存在,因而没法采用升级版本的方式来解决这个问题,鉴于我们原来是使用的1.6.1版本,为了避免大的改动带来的风险,我在原来的1.6.1基础之上修复了这个问题,命名版本号:1.6.1-snowman,如果大家遇到类似dom4jxml解析性能问题,可以尝试下载我们的这个版本来解决这个问题。

继续压测

解决了服务端的上述并发锁问题,我们继续压测,单个tomcat的情况下,qps由以前的1600左右上升到了2200,以为这次能够顺利的提升qps,没想到继续加压后,压测客户端仍然报connection refused问题,服务端上监测到的qps呈如下波动状:

比较奇怪的是这个qps呈波浪形,振荡周期是2分钟,而且很规律,怀疑是压测客户端发送的请求数就是这样的,怀疑是不是压测客户端有问题,这次让压测同学把客户端上报connection refused的完整堆栈信息发出来:

org.apache.http.conn.HttpHostConnectException: Connection to http://10.103.215.139:8080 refused
at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:190)
at org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294)
at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:643)
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:479)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
at com.autonavi.snowman.util.HttpURL.returnChunkMessage(HttpURL.java:144)
at com.autonavi.snowman.util.HttpURL.executeSnowmanPbRequest(HttpURL.java:100)
at com.autonavi.snowman.placecard.PlaceCardSearch.runTest(PlaceCardSearch.java:55)
at org.apache.jmeter.protocol.java.sampler.JavaSampler.sample(JavaSampler.java:191)
at org.apache.jmeter.threads.JMeterThread.process_sampler(JMeterThread.java:434)
at org.apache.jmeter.threads.JMeterThread.run(JMeterThread.java:261)
at java.lang.Thread.run(Thread.java:849)&#xa;Caused by: java.net.ConnectException: Cannot assign requested address
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:643)
at org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:127)
at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:180)

发现客户端报错Cannot assign requested address,这其实是客户端自己的端口不够的原因,之前在另外一篇性能调优文章里提到过,在短连接高并发场景下,服务端需要调整tcp端口超时回收时间。没注意到在短连接高并发场景下的压测客户端上同样有类似问题。在短连接高并发场景下,client端频繁建立连接,而端口释放较慢,socket端口正常关闭后,端口不是立即释放的,而是处于TIME_WAIT状态,默认等待60s之后才释放,这就解释了上图中qps的振荡周期是2分钟的原因。

解决压测客户端端口耗尽的问题

在压测客户端上配置下述几个参数可解决这个问题:
1.调低端口释放后的等待时间,默认60s,修改为15s

sysctl -w net.ipv4.tcp_fin_timeout=15

2.修改tcp/ip协议配置, 开启对于TCP时间戳的支持,若该项设置为0,则端口快速回收tcp_tw_recycle配置不起作用

sysctl -w net.ipv4.tcp_timestamps=1

3.修改tcp/ip协议配置,快速回收socket资源,默认为0,修改为1

sysctl -w net.ipv4.tcp_tw_recycle=1</blockquote>

做了上述参数修改之后,重新压测,在F53机器上,qps可以到9000了,cpu也上到百分之40多了。

继续找瓶颈

cpu只到百分之40多,但是qps到了9000就上不去了,还得继续找原因,先看看服务端的线程都在干什么,分析过程比较类似,这里就不再赘述了,直接贴一段找到问题的jstack堆栈信息,然后说结论:

"http-bio-8080-exec-6124" #6763 daemon prio=5 os_prio=0 tid=0x00007f7c0023a000 nid=0xe924 waiting for monitor entry [0x00007f72f9a5c000]
   java.lang.Thread.State: BLOCKED (on object monitor)
    at org.apache.catalina.loader.WebappClassLoader.findResources(WebappClassLoader.java:1367)
    - waiting to lock <0x000000074045d298> (a [Ljava.util.jar.JarFile;)
    at java.lang.ClassLoader.getResources(ClassLoader.java:1170)
    at java.util.ServiceLoader$LazyIterator.hasNextService(ServiceLoader.java:348)
    at java.util.ServiceLoader$LazyIterator.hasNext(ServiceLoader.java:393)
    at java.util.ServiceLoader$1.hasNext(ServiceLoader.java:474)
    at javax.xml.parsers.FactoryFinder$1.run(FactoryFinder.java:293)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.xml.parsers.FactoryFinder.findServiceProvider(FactoryFinder.java:289)
    at javax.xml.parsers.FactoryFinder.find(FactoryFinder.java:267)
    at javax.xml.parsers.SAXParserFactory.newInstance(SAXParserFactory.java:127)
    at org.dom4j.io.JAXPHelper.createXMLReader(JAXPHelper.java:46)
    at org.dom4j.io.SAXHelper.createXMLReaderViaJAXP(SAXHelper.java:125)
    at org.dom4j.io.SAXHelper.createXMLReader(SAXHelper.java:78)
    at org.dom4j.io.SAXReader.createXMLReader(SAXReader.java:894)
    at org.dom4j.io.SAXReader.getXMLReader(SAXReader.java:715)
    at org.dom4j.io.SAXReader.read(SAXReader.java:435)
    at org.dom4j.DocumentHelper.parseText(DocumentHelper.java:278)
    at com.autonavi.snowman.lse.datamodel.LocalLbsResultParser.parse(LocalLbsResultParser.java:55)
    at com.autonavi.snowman.placecard.service.server.SearchServer.getQueryResult(SearchServer.java:861)
    at com.autonavi.snowman.placecard.service.server.SearchServer.getQueryIdsResult(SearchServer.java:896)
    at com.autonavi.snowman.placecard.service.server.PlaceLookupServer.buildResponse(PlaceLookupServer.java:42)
    at com.autonavi.snowman.placecard.service.server.PlaceLookupServer.queryAndBuildResponse(PlaceLookupServer.java:35)
    at com.autonavi.snowman.placecard.service.server.autocomplete.SectionsResultBuilder.findAllBusinessPlaces(SectionsResultBuilder.java:138)
    at com.autonavi.snowman.placecard.service.server.autocomplete.SectionsResultBuilder.buildInterleaveSection(SectionsResultBuilder.java:429)
    at com.autonavi.snowman.placecard.service.server.autocomplete.SectionsResultBuilder.build(SectionsResultBuilder.java:102)
    at com.autonavi.snowman.placecard.service.server.AutocompleteServer.service(AutocompleteServer.java:146)
    at com.autonavi.snowman.placecard.service.PlacecardService.service(PlacecardService.java:51)
    at com.autonavi.snowman.placecard.service.PlacecardService.service(PlacecardService.java:26)
    at com.autonavi.snowman.framework.service.ProtobufService.doService(ProtobufService.java:39)
    at com.autonavi.snowman.framework.service.ProtobufService.doService(ProtobufService.java:15)
    at com.autonavi.snowman.framework.service.ProtobufService$$FastClassBySpringCGLIB$$255864de.invoke(<generated>)
    at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204)
    at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:721)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157)
    at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:92)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
    at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:656)
    at com.autonavi.snowman.placecard.service.PlacecardService$$EnhancerBySpringCGLIB$$6fe0082e.doService(<generated>)
    at com.autonavi.snowman.framework.controller.AbstractController.processRequest(AbstractController.java:81)
    at com.autonavi.snowman.placecard.controller.PlaceCardController.handleRequest(PlaceCardController.java:37)
    at sun.reflect.GeneratedMethodAccessor102.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:220)
    at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:134)
    at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:116)
    at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:827)
    at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:738)
    at org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:85)
    at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:963)
    at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:897)
    at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:970)
    at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:872)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:646)
    at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:846)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
    at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
    at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:197)
    at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
    at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:197)
    at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
    at org.springframework.boot.web.support.ErrorPageFilter.doFilter(ErrorPageFilter.java:117)
    at org.springframework.boot.web.support.ErrorPageFilter.access$000(ErrorPageFilter.java:61)
    at org.springframework.boot.web.support.ErrorPageFilter$1.doFilterInternal(ErrorPageFilter.java:92)
    at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
    at org.springframework.boot.web.support.ErrorPageFilter.doFilter(ErrorPageFilter.java:110)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
    at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
    at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
    at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:504)
    at com.taobao.tomcat.valves.ContextLoadFilterValve$FilterChainAdapter.doFilter(ContextLoadFilterValve.java:191)
    at com.taobao.eagleeye.EagleEyeFilter.doFilter(EagleEyeFilter.java:81)
    at com.taobao.tomcat.valves.ContextLoadFilterValve.invoke(ContextLoadFilterValve.java:150)
    at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170)
    at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
    at org.apache.catalina.valves.RemoteIpValve.invoke(RemoteIpValve.java:683)
    at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
    at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:423)
    at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1080)
    at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:611)
    at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:314)
    - locked <0x00000007608a90b0> (a org.apache.tomcat.util.net.SocketWrapper)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1152)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:627)
    at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
    at java.lang.Thread.run(Thread.java:852)

"http-bio-8080-exec-6122" #6762 daemon prio=5 os_prio=0 tid=0x00007f7c00238800 nid=0xe923 waiting for monitor entry [0x00007f72f9b5d000]
   java.lang.Thread.State: BLOCKED (on object monitor)
    at org.apache.catalina.loader.WebappClassLoader.findResources(WebappClassLoader.java:1367)
    - waiting to lock <0x000000074045d298> (a [Ljava.util.jar.JarFile;)
    at java.lang.ClassLoader.getResources(ClassLoader.java:1170)
    at java.util.ServiceLoader$LazyIterator.hasNextService(ServiceLoader.java:348)
    at java.util.ServiceLoader$LazyIterator.hasNext(ServiceLoader.java:393)
    at java.util.ServiceLoader$1.hasNext(ServiceLoader.java:474)
    at javax.xml.parsers.FactoryFinder$1.run(FactoryFinder.java:293)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.xml.parsers.FactoryFinder.findServiceProvider(FactoryFinder.java:289)
    at javax.xml.parsers.FactoryFinder.find(FactoryFinder.java:267)
    at javax.xml.parsers.SAXParserFactory.newInstance(SAXParserFactory.java:127)
    at org.dom4j.io.JAXPHelper.createXMLReader(JAXPHelper.java:46)
    at org.dom4j.io.SAXHelper.createXMLReaderViaJAXP(SAXHelper.java:125)
    at org.dom4j.io.SAXHelper.createXMLReader(SAXHelper.java:78)
    at org.dom4j.io.SAXReader.createXMLReader(SAXReader.java:894)
    at org.dom4j.io.SAXReader.getXMLReader(SAXReader.java:715)
    at org.dom4j.io.SAXReader.read(SAXReader.java:435)
    at org.dom4j.DocumentHelper.parseText(DocumentHelper.java:278)
    at com.autonavi.snowman.placecard.service.builder.engineparam.RgeoRequestBuilder.getQueryResult(RgeoRequestBuilder.java:132)

在找SAXParserFactory的实现类时,最终这里走到了ClassLoader.getResources,出现了读取资源的锁,相关代码如下:

/**
     * Finds the implementation Class object in the specified order.  Main
     * entry point.
     * @return Class object of factory, never null
     *
     * @param type                  Base class / Service interface  of the
     *                              factory to find.
     * @param fallbackClassName     Implementation class name, if nothing else
     *                              is found.  Use null to mean no fallback.
     *
     * Package private so this code can be shared.
     */
    static <T> T find(Class<T> type, String fallbackClassName)
        throws FactoryConfigurationError
    {
        final String factoryId = type.getName();
        dPrint("find factoryId =" + factoryId);

        // Use the system property first
        try {
            String systemProp = ss.getSystemProperty(factoryId);
            if (systemProp != null) {
                dPrint("found system property, value=" + systemProp);
                return newInstance(type, systemProp, null, true);
            }
        }
        catch (SecurityException se) {
            if (debug) se.printStackTrace();
        }

        // try to read from $java.home/lib/jaxp.properties
        try {
            if (firstTime) {
                synchronized (cacheProps) {
                    if (firstTime) {
                        String configFile = ss.getSystemProperty("java.home") + File.separator +
                            "lib" + File.separator + "jaxp.properties";
                        File f = new File(configFile);
                        firstTime = false;
                        if (ss.doesFileExist(f)) {
                            dPrint("Read properties file "+f);
                            cacheProps.load(ss.getFileInputStream(f));
                        }
                    }
                }
            }
            final String factoryClassName = cacheProps.getProperty(factoryId);

            if (factoryClassName != null) {
                dPrint("found in $java.home/jaxp.properties, value=" + factoryClassName);
                return newInstance(type, factoryClassName, null, true);
            }
        }
        catch (Exception ex) {
            if (debug) ex.printStackTrace();
        }

        // Try Jar Service Provider Mechanism
        T provider = findServiceProvider(type);
        if (provider != null) {
            return provider;
        }
        if (fallbackClassName == null) {
            throw new FactoryConfigurationError(
                "Provider for " + factoryId + " cannot be found");
        }

        dPrint("loaded from fallback value: " + fallbackClassName);
        return newInstance(type, fallbackClassName, null, true);
    }

根源在于parseText里面调用的一些方法,SAXParserFactory.newInstance方法里会先根据system property里配置的javax.xml.parsers.SAXParserFactory属性,去找SAXParserFactory的实现类,找不到,则如果找到了lib/jaxp.propertie,使用lib/jaxp.propertie中定义的实现类,如果还找不到,则从current thead’s contect class loadersystem class loader中去找SAXParserFactory的实现类,如果再找不到,则最后使用默认值“com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl”来实例化一个parserFactory在从classpath中找资源的时候,会对资源加锁,就是这个锁影响了系统的并发量。

解决xml解析时找SAXParserFactory实现类时出现的并发锁问题

从上述代码分析中,可以看到系统首先会去找system property里面是否配置了实现类,那么我们的解法一就是在启动脚本里指定以下两个system properties:

-Djavax.xml.parsers.SAXParserFactory=com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl
-Djavax.xml.parsers.DocumentBuilderFactory=com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl

这个解决办法在没上docker化之前,需要PE来修改基线),另外一个对于开发同学比较可控的解法是在代码层面来解决这个问题:代码里面直接new SAXParserFactory的实现类,直接指定其实现类。将原来调用dom4j包里DoucumentHelper.parseText方法的地方都替换为调用下述DocumentUtils.parseText方法:

public class DocumentUtils {
    private static SAXParserFactory factory = new org.apache.xerces.jaxp.SAXParserFactoryImpl();

    public static Document parseText(String text) throws DocumentException, ParserConfigurationException, SAXException {
        SAXReader reader = new SAXReader();
        String encoding = getEncoding(text);

        InputSource source = new InputSource(new StringReader(text));
        source.setEncoding(encoding);
        factory.setValidating(false);
        factory.setNamespaceAware(false);
        XMLReader myreader = factory.newSAXParser().getXMLReader();
        reader.setXMLReader(myreader);

        Document result = reader.read(source);

        // if the XML parser doesn't provide a way to retrieve the encoding,
        // specify it manually
        if (result.getXMLEncoding() == null) {
            result.setXMLEncoding(encoding);
        }

        return result;
    }


    private static String getEncoding(String text) {
        String result = null;
        String xml = text.trim();

        if (xml.startsWith("<?xml")) {
            int end = xml.indexOf("?>");
            String sub = xml.substring(0, end);
            StringTokenizer tokens = new StringTokenizer(sub, " =\"\'");

            while (tokens.hasMoreTokens()) {
                String token = tokens.nextToken();
                if ("encoding".equals(token)) {
                    if (tokens.hasMoreTokens()) {
                        result = tokens.nextToken();
                    }
                    break;
                }
            }
        }

        return result;
    }
}

注,如果使用上述示例代码中的org.apache.xerces.jaxp.SAXParserFactoryImpl实现类,需要额外引入xerces包,也可不引入新的三方库,而是直接使用jdk自带的com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl实现类,我们做过对比测试,性能方面使用xerces包(2.11.0版本)性能会稍微好一点点。

优化结果

做完上述优化后的压测结果如下:

优化后的PlacecardF53机器上的qps达到11k,并且性能瓶颈到了依赖的下游引擎服务端,API端还能承受更高的qps

总结

压测客户端的注意事项

1.短连接高并发场景下,除了要注意服务端的tcp端口回收配置以外,还需要关注压测客户端的端口回收配置,否则压测客户端上可能会出现上述的端口耗尽的情况。
2.如果是压测java应用,需要先对服务端有个预热,让JIT生效后,再来做压测的数据统计。我曾经原创了一个例子(被我们的测试同学笑好久,从此形象全无)比较形象的说明这个问题:这就好比体检时取中段尿体检结果才准~_~

使用xml序列化方式时的性能注意事项

如果应用使用xml的序列化方式,需要做如下两个优化工作以提升应用性能。
1. 如果使用官方的dom4j版本,在并发量大的情况下,会出现大量的QNameCache的并发锁,使得大量线程处于BLOCKED状态,服务的qps上不去。推荐使用1.6.1.2-snowman版本来解决这个问题,这个版本我们已经上传到集团的mvn库,可以直接引用了。
2. 默认的DocumentHelper.parseText方法,在高并发场景下找SAXParserFactory的实现时,会出现ClassLoader.getResources时的并发锁,导致大量线程BLOCKED,应用qps上不去。推荐使用上述我们重写的parseText方法来解决这个问题。

几种常用的序列化协议的性能对比

说到这里,就不得不再提一下几种常用的序列化协议的性能对比了,参考网上对比测试结果:


其中这里的xsteam是用于处理对象序列化为xmlxml反序列化为对象的,和我们本文中的dom4j的功能不是一回事,这里只是拿这个做一个参考比较。和下游搜索引擎之间的序列化协议可以采用protobuf,性能应该会有一个比较明显的提升。

———————————————我是分割线——————————————————————————

写到这里,总结也做完了,似乎可以结束了。
ATA(阿里技术协会)上很多的调优文章一样,整个过程看起来顺理成章,似乎也不麻烦。感觉就算从来没有性能调优经验,只要看过ATA上的调优相关文章,下次遇到需要调优的情况肯定就游(一)刃(脸)有(懵)余(逼)了。

其实上文一直讲的都只是“术”,并没有讲“道”。而每次需要调优的时候遇到的问题可能都不一样,用同样的术是解决不了变化的问题的,我们需要的是“道”。“道”是相对不变化的。
光有“术”、例如懂JVM,不一定能解决上述问题,光有“道”,不懂术,也解决不了上述问题,道术兼修,才能在遇到问题时庖丁解牛。

浅聊性能调优过程中的“道”

爱因斯坦说过:“如果给我1个小时解答一道决定我生死的问题,我会花55分钟弄清楚这道题到底在问什么。一旦清楚它到底在问什么,剩下的5分钟足够回答这个问题。“,性能上不去,很多时候都是不知道问题出在哪里、从哪里下手,调优的首要过程就是搞清楚问题在哪里,然后再去研究解决方案。

像大多数调优文章一样,看起来很顺理成章,一步一步的往上调,其实作者可能是有很多插(岔)曲(路)没有写。在时间允许的情况下,可以大胆做各种假设,然后去验证,假设是这些地方的问题,然后去验证这些点是否真的存在问题,有些假设是正确的,的确是这些点有问题,有些假设点被证实为没有问题,这些假设就是岔路,但是不要紧,岔路是为得到正确路径,岔路也是调优的一部分。比如我们这次调优,当qps上不去的时候,发现服务器上inout的网络流量加起来一直是100多兆,怀疑是因为服务器在千兆机房的网络环境,网络方面达到了上限,这个时候我们使用了requestresponse都更小的case去跑压测,得到的qps结果还是一样,基本就可以排除是网络瓶颈问题了。

可能在极少数情况下,调优时没有明确的线索,这个时候可以像探案一样去调优,链路上的所有的点都是可以怀疑的对象,可以采用排除法来逐一排除可疑的环节。例如本次压测时,压测客户端上报的connection refused错误,tomcatnginx、网络层面都可能报这个错,我们将压测客户端绕过nginx,直连tomcat同样也有connection refused的错误,基本上就可以排除nginx问题了。

在没有解决问题之前,你要调优的对象有点类似于一个黑盒子,你可能解决了一个个问题,但是在没有完成之前,是不知道还有多少问题的,你解决了一个问题,往往觉得就ok了,没想到继续压测的时候还是不行,你已经做了很多努力了,其实可能不是你这个问题没解决,可能是这个问题解了,但是新的问题又冒出来了,借用淘宝DBA团队的一句名言:往往在你即将绝望放弃的时候,其实离最终的终点已经非常非常的接近。调优之道还需要一点坚持的精神。

打赏
欢迎关注人生设计师的微信公众账号
公众号ID:longjiazuoA

未经允许不得转载:人生设计师 » 记一次性能优化过程中的术与道

分享到:更多 ()

人生设计师-接受不同的声音

联系我关于我