Jira Slow Down And Inaccessible

Scenario

Receive ticket saying the application(jira) server is down and was slow on responding. Proxy server report proxy error; received invalid response from the upstream server. Browser can’t open webpage.

Determine

  • Identify the impact of the application ( internal team, clients, etc.)

  • Identify the issue is on which level

    • Network

      • Open up browser and access the webpage from the internet to confirm seeing the error mentioned in the tickets.

      • Make sure server is accessible from the internet

        ping jira.xyz.com

      • Make sure DNS record is correct.

        nslookup x.x.x.x

        dig jira.xyz.com

    • System

      • Make sure server is up

        • Try access into the server (SSH)

        • View from the the management console

      • Check system resource usage

        • Check disk usage: df -h

        • Check memory usage: lsmem

        • Check other monitoring tool (nagios, observium, etc.)

    • Application

      • Check the application service status

        • service [httpd | haproxy | nginx | etc.] status or systemctl status [httpd | haproxy | nginx]

        • ps auxf | grep [service]

Assumption

  • Webpage is unavailable
  • Server is up and could be ping
  • All resource usage is normal (CPU, MEM, Storage, etc..)
  • Application(jira) is running
  • Proxy application (nginx/haproxy) is running

Diagnose

  • Make sure public facing interface is receiving packets

    tcpdump -vvvs 1024 -A -l -i [interface-name]

  • Make sure application(jira) is still functioning

    curl https://127.0.0.1:8080 (assume ssl/tls only applied on the proxy side)

    curl -kv https://127.0.0.1:8083 (if https is enforced)

  • Search for application(jira) logs, same apply to other applications

    • Log files can be found under /opt/atlassian/jira/logs

    I usually search for the application process. Usually you can find the config file path. From there, read the config file and search for keyword “log”. Otherwise, I use whereis to find out the path. Last, I google for information.

    • ps auxf | grep [service]

    • whereis [service]

  • Check error_log and ssl_error_logs

    • tail [log_file] usually the latest error will appear

    • If nothing helpful or no error, move on to the actual application log

  • Take a sneak peak of the log file or filter the keyword.

    • tail /opt/atlassian/jira/logs/catalina_log.xxxx-xx-xx.log check the last portion of the log

    • cat /opt/atlassian/jira/logs/catalina_log.xxxx-xx-xx.log | less view the log file

      • The navigation works similar to vi, I use /, G, gg to browse the output.

        /: search

        G: go to bottom

        gg: go to top

      • Look for the timestamps that are close or before the time the ticket was created

    • grep 'keyword' [log_file]

At this point, if the error has found and could be easily resolve (10-15min), document the error message and resolve the issue, record while resolving.

However, if no useful information found. Since jira is a ticketing application, temporarily bring it down and restart it will not cause too much impact to the team. And since no one is able to access jira at the moment, I will restart jira immediately and check the result.

Document the date and time and the issue. If similar problem occur frequently, review all logs and perform a in-depth root cause analyze.

Error

Output from catalina.log:

23-Jan-2019 10:01:55.997 SEVERE [ContainerBackgroundProcessor[StandardEngine[Catalina]]] org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.run Unexpected death of background thread ContainerBackgroundProcessor[StandardEngine[Catalina]]
 java.lang.OutOfMemoryError: GC overhead limit exceeded

Solution

Increase JIRA tomcat server memory

vi /opt/atlassian/jira/bin/setenv.sh

Set JVM_MINIMUM_MEMORY and JVM_MAXIMUM_MEMORY to desire capacity:

JVM_MINIMUM_MEMORY="512m"
JVM_MAXIMUM_MEMORY="1024m"