Jira Slow Down And Inaccessible

Scenario

Receive ticket saying the application(jira) server is down and was slow on responding. Proxy server report proxy error; received invalid response from the upstream server. Browser can’t open webpage.

Determine

Identify the impact of the application ( internal team, clients, etc.)
Identify the issue is on which level
- Network
  - Open up browser and access the webpage from the internet to confirm seeing the error mentioned in the tickets.
  - Make sure server is accessible from the internet
    
    ping jira.xyz.com
  - Make sure DNS record is correct.
    
    nslookup x.x.x.x
    
    dig jira.xyz.com
- System
  - Make sure server is up
    - Try access into the server (SSH)
    - View from the the management console
  - Check system resource usage
    - Check disk usage: df -h
    - Check memory usage: lsmem
    - Check other monitoring tool (nagios, observium, etc.)
- Application
  - Check the application service status
    - service [httpd | haproxy | nginx | etc.] status or systemctl status [httpd | haproxy | nginx]
    - ps auxf | grep [service]

Assumption

Webpage is unavailable
Server is up and could be ping
All resource usage is normal (CPU, MEM, Storage, etc..)
Application(jira) is running
Proxy application (nginx/haproxy) is running

Diagnose

Make sure public facing interface is receiving packets

tcpdump -vvvs 1024 -A -l -i [interface-name]
Make sure application(jira) is still functioning

curl https://127.0.0.1:8080 (assume ssl/tls only applied on the proxy side)

curl -kv https://127.0.0.1:8083 (if https is enforced)
Search for application(jira) logs, same apply to other applications
- Log files can be found under /opt/atlassian/jira/logs
I usually search for the application process. Usually you can find the config file path. From there, read the config file and search for keyword “log”. Otherwise, I use whereis to find out the path. Last, I google for information.
- ps auxf | grep [service]
- whereis [service]
Check error_log and ssl_error_logs
- tail [log_file] usually the latest error will appear
- If nothing helpful or no error, move on to the actual application log
Take a sneak peak of the log file or filter the keyword.
- tail /opt/atlassian/jira/logs/catalina_log.xxxx-xx-xx.log check the last portion of the log
- cat /opt/atlassian/jira/logs/catalina_log.xxxx-xx-xx.log | less view the log file
  - The navigation works similar to vi, I use /, G, gg to browse the output.
    
    /: search
    
    G: go to bottom
    
    gg: go to top
  - Look for the timestamps that are close or before the time the ticket was created
- grep 'keyword' [log_file]

At this point, if the error has found and could be easily resolve (10-15min), document the error message and resolve the issue, record while resolving.

However, if no useful information found. Since jira is a ticketing application, temporarily bring it down and restart it will not cause too much impact to the team. And since no one is able to access jira at the moment, I will restart jira immediately and check the result.

Document the date and time and the issue. If similar problem occur frequently, review all logs and perform a in-depth root cause analyze.

Error

Output from catalina.log:

23-Jan-2019 10:01:55.997 SEVERE [ContainerBackgroundProcessor[StandardEngine[Catalina]]] org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.run Unexpected death of background thread ContainerBackgroundProcessor[StandardEngine[Catalina]]
 java.lang.OutOfMemoryError: GC overhead limit exceeded

Solution

Increase JIRA tomcat server memory

vi /opt/atlassian/jira/bin/setenv.sh

Set JVM_MINIMUM_MEMORY and JVM_MAXIMUM_MEMORY to desire capacity:

JVM_MINIMUM_MEMORY="512m"
JVM_MAXIMUM_MEMORY="1024m"