13 July 2011

SharePoint performance debugging

We released a SharePoint 2010 portal to a new client today. Part of the portal design included a report center library that contained security trimmer report folders which are updated asynchronously via WCF services. In addition there was an external report center application that we wanted to link into the SharePoint UI so we simply added a Page Viewer Web part to the page that pointed to the external app.

The problem was that we were seeing major performance lag in rendering the external report app UI. When we pointed to the development server, the UI rendered slick and fast, but when we pointed back to the production server, we experienced lag again. And thus the trace/debug efforts began.

My standard rule when hunting bugs/performance issues is to “Assume NOTHING”. Always compare apples to apples and double, tripper, quadruple check everything.
Our analysis of the differences turned up the following:
  1. The architectural difference between the production server and the development server was that the development server had it’s SQL source local to the server while the production server had a separate SQL server it was targeting.  In order to eliminate SQL Server from the equation, we pointed the development server to the production SQL data source and tested against it from the production environment.  The result was fast and responsive which meant that SQL Server wasn’t the problem.
  2. This narrowed the problem down to the report server and the client computer.  Next we compared the production and development report servers.  Since both servers were virtualized, we checked the resource settings in the vSphere console.  It turned out that the servers were identical except for the development server having 4 CPU cores while the production server only had 2 CPU cores.  We upped the production CPU cores to 4 and rebooted the VM.  Testing against the new configuration still showed slow, lagging performance in the UI rendering.  That pretty much eliminated the reports servers from the equation unless one of the VMs was actually bad.
  3. The next step was to install Fiddler2 on the client laptop and trace the two different calls to see where the traffic was going.  Fiddler did show some minor differences in header and body sizes of the HTTP calls, but not nearly enough to justify the lag we were seeing.  The development server was rendering in under a second while the production server took 5-6 seconds.  What struck me as curious was the fact that Fiddler was reporting that the total execution time of the HTTP call was 0.6 seconds in both cases.  So where exactly did that other 5 seconds go in the case of the production server?  Upon closer inspection, we noticed that the development SharePoint server had it’s properties for the Page Viewer Web Part set to point directly to the IP address of the development report server while the production SharePoint server’s Page Viewer was pointing to the report server using the FQDN instead.  And that’s when it TeaKayGee’d me.  The primary DNS server was flakey and after repointing to the secondary DNS server, performance was back as expected!
I was amazed at how big a difference the DNS request performance could make in this case.  That’s one more item I can add to my checklist when trouble shooting SharePoint performance issues. 

No comments:

Post a Comment

Comments are moderated only for the purpose of keeping pesky spammers at bay.

SharePoint Remote Event Receivers are DEAD!!!

 Well, the time has finally come.  It was evident when Microsoft started pushing everyone to WebHooks, but this FAQ and related announcement...