Diagnosing performance issues for Flow Production Tracking Enterprise Docker

Warning:

Local installations of Flow Production Tracking are no longer offered. This documentation is intended only for those with existing instances of Flow Production Tracking Enterprise Docker. Click here for a list of our current offerings.

Scope
Flow Production Tracking architecture at a glance
Understanding performance issues
Investigating performance issues
Generic debugging tips
- pg_top
- Find running database queries

This page gives information about how to troubleshoot performance issues.

For urgent issues that require immediate support, please open a ticket on Flow Production Tracking Support.

Scope

Note:

This document is intended for Flow Production Tracking Enterprise System Administrators (local installs).

Flow Production Tracking architecture at a glance

At a very high level, Flow Production Tracking is composed of two main components:

Application
Database

Application - Containers inside which the Web Application Components are installed (Apache, Passenger, Rails, Flow Production Tracking App). Depending on your topology, you can have many application containers, specialized for serving the Web UI and API requests.

Database - The server on which the database server is installed. It can be the same server as the application server.

Requests are received by Apache and passed along to a Passenger process running an instance of the Flow Production Tracking Application. If no processes are available, it is inserted in a queue. Then, each Flow Production Tracking process passes requests to the database, and is returned upon completion. Each request to Flow Production Tracking usually results in many requests to the database.

Flow Production Tracking_database.png

Understanding performance issues

The main causes of performance and connectivity issues are:

Database locks
Slow queries
Contention on Flow Production Tracking servers

Database locks - While rare, some operations can lock tables, preventing even read access to these tables.

Slow queries - Queries that take a long time to execute not only cause pages and scripts that are executing these queries to be slow, they can also affect other pages and queries. Slow queries monopolize Passenger and Database connections, and can cause the database server to swap a lot of data from disk to memory. Swapping is a costly operation that affects all other requests being handled by the database.

System contention. High load can cause contention at various levels. The first thing to do when Flow Production Tracking is having periodic performance issues is to identify where that contention happens.

These issues can have a similar impact on Flow Production Tracking. If more requests are made than the system can process, the request queue fills up and eventually, requests are rejected. For the end users, this is reflected by a 503 error, misleadingly letting users think the site is in maintenance.

Investigating performance issues

Configure database logs

The first step is to configure PostgreSQL logs to output useful information.

Note:

When an incident happens, it is good practice to persist the database logs, as they are rotated on a weekly basis.

Database locks

In situations where Flow Production Tracking becomes unresponsive, the first thing to investigate is database locks. The main blocking operation in Flow Production Tracking is adding new fields. Adding a new field requires PostgreSQL to lock the modified table, effectively blocking any subsequent requests on that table. The best way to find locks is in the database logs.

If logs are properly configured, the following queries on the database logs will help underline locks:

# Find request that were waiting for a lock
grep "waiting for AccessShareLock" /var/lib/pgsql/15/data/pg_logs/postgres-Thu.log  
# Find alter statement, which are adding the field to a column
grep "ALTER" /var/lib/pgsql/15/data/pg_logs/postgres-Thu.log

Resolution

This behavior is a PostgreSQL limitation, not a Flow Production Tracking one. Flow Production Tracking is dynamic, allowing addition of new fields in a table. The bad thing is that it does lock access to the table during the modification.

The only way around it is to create new fields outside your studio’s active hours.

Slow queries

From a simplified point of view, Flow Production Tracking is a GUI over a database. Flow Production Tracking allows users to craft their own queries. If these queries are not correctly built, they can be slow. If more requests are made than the system can process, the request queue fills up and eventually, requests are rejected.

Identifying slowest queries

The following script will report what are the most expensive queries, most expensive pages, users that are using the most resources, etc. Simply feed it a Flow Production Tracking Production log and a simple report will be produced. See the script header for more details on how to use it.

https://github.com/shotgunsoftware/enterprise-toolbox/blob/master/troubleshooting/shotgun_log_analyzer.rb

It can be run on the complete log, or on specific time ranges extracted with grep to investigate specific periods.

Optimizing slow queries

When the slowest queries have been identified, the following actions can be taken:

Optimize the query. See Flow Production Tracking query performance recommendations.
Execute that query less often. This is a fallback solution for API calls that can’t be optimized.

System contention

Most common causes of contention are:

Passenger contention
Database contention
High load on server

Passenger contention - Too many requests are executed and Passenger can’t process them all. Requests end up being queued, and are eventually rejected, resulting in 503 errors.

Database contention - Too many connections to the database at the same time. This is usually because too many requests are being made, or requests that are too slow to execute.

High load on server - The server is under heavy load. When this happens, it may also result in Passenger or Database contention.

Flow Production Tracking can scale by increasing the number of Application Servers or improving the Database Server specifications. Most Flow Production Tracking Clusters are single server clusters. All components are running on the same server. As your studio’s Flow Production Tracking usage grows, this configuration may no longer be performant enough. But before going there, the problem must be identified.

There are key metrics that can be gathered on your Flow Production Tracking servers to help diagnose these issues. If not already in place, consider using monitoring tools to monitor your servers. Key metrics to keep an eye on are:

System Load
CPU Usage
Memory Usage
Disk I/O
Passenger Queue Size
Postgres Active Connections

While most metrics are standard monitoring metrics, Passenger queue size and PostgreSQL active connections are more custom metrics.

For more information on PostgreSQL monitoring, please refer to https://wiki.postgresql.org/wiki/Monitoring.

Generic debugging tips

pg_top

pg_top is the classic UNIX program top specialized for PostgreSQL. In addition to classic top information gives, it allows you to:

View how many reads and writes are currently in progress.
View currently running SQL statements of a process.
View the query plan of a currently running SQL statement.
View locks held by a process.
View user index and table statistics.

We highly recommend you install it on your database server, if not already installed.

Find running database queries

From the application server:

 psql <site_name> -c "SELECT datname,query FROM pg_stat_activity where state = 'active';"

Diagnosing performance issues for Flow Production Tracking Enterprise Docker

Contents

Scope

Flow Production Tracking architecture at a glance

Understanding performance issues

Investigating performance issues

Configure database logs

Database locks

Resolution

Slow queries

Identifying slowest queries

Optimizing slow queries

System contention

Generic debugging tips

pg_top

Find running database queries

Was this information helpful?