Commit 141e946c authored by Yorick Peterse's avatar Yorick Peterse

Storing of application metrics in InfluxDB

This adds the ability to write application metrics (e.g. SQL timings) to
InfluxDB. These metrics can in turn be visualized using Grafana, or
really anything else that can read from InfluxDB. These metrics can be
used to track application performance over time, between different Ruby
versions, different GitLab versions, etc.

== Transaction Metrics

Currently the following is tracked on a per transaction basis (a
transaction is a Rails request or a single Sidekiq job):

* Timings per query along with the raw (obfuscated) SQL and information
  about what file the query originated from.
* Timings per view along with the path of the view and information about
  what file triggered the rendering process.
* The duration of a request itself along with the controller/worker
  class and method name.
* The duration of any instrumented method calls (more below).

== Sampled Metrics

Certain metrics can't be directly associated with a transaction. For
example, a process' total memory usage is unrelated to any running
transactions. While a transaction can result in the memory usage going
up there's no accurate way to determine what transaction is to blame,
this becomes especially problematic in multi-threaded environments.

To solve this problem there's a separate thread that takes samples at a
fixed interval. This thread (using the class Gitlab::Metrics::Sampler)
currently tracks the following:

* The process' total memory usage.
* The number of file descriptors opened by the process.
* The amount of Ruby objects (using ObjectSpace.count_objects).
* GC statistics such as timings, heap slots, etc.

The default/current interval is 15 seconds, any smaller interval might
put too much pressure on InfluxDB (especially when running dozens of
processes).

== Method Instrumentation

While currently not yet used methods can be instrumented to track how
long they take to run. Unlike the likes of New Relic this doesn't
require modifying the source code (e.g. including modules), it all
happens from the outside. For example, to track `User.by_login` we'd add
the following code somewhere in an initializer:

    Gitlab::Metrics::Instrumentation.
      instrument_method(User, :by_login)

to instead instrument an instance method:

    Gitlab::Metrics::Instrumentation.
      instrument_instance_method(User, :save)

Instrumentation for either all public model methods or a few crucial
ones will be added in the near future, I simply haven't gotten to doing
so just yet.

== Configuration

By default metrics are disabled. This means users don't have to bother
setting anything up if they don't want to. Metrics can be enabled by
editing one's gitlab.yml configuration file (see
config/gitlab.yml.example for example settings).

== Writing Data To InfluxDB

Because InfluxDB is still a fairly young product I expect the worse.
Data loss, unexpected reboots, the database not responding, you name it.
Because of this data is _not_ written to InfluxDB directly, instead it's
queued and processed by Sidekiq. This ensures that users won't notice
anything when InfluxDB is giving trouble.

The metrics worker can be started in a standalone manner as following:

    bundle exec sidekiq -q metrics

The corresponding class is called MetricsWorker.
parent b2c593da
......@@ -208,6 +208,12 @@ gem 'select2-rails', '~> 3.5.9'
gem 'virtus', '~> 1.0.1'
gem 'net-ssh', '~> 3.0.1'
# Metrics
group :metrics do
gem 'influxdb', '~> 0.2', require: false
gem 'connection_pool', '~> 2.0', require: false
end
group :development do
gem "foreman"
gem 'brakeman', '3.0.1', require: false
......
......@@ -65,7 +65,7 @@ GEM
attr_encrypted (1.3.4)
encryptor (>= 1.3.0)
attr_required (1.0.0)
autoprefixer-rails (6.1.1)
autoprefixer-rails (6.1.2)
execjs
json
awesome_print (1.2.0)
......@@ -102,7 +102,7 @@ GEM
bundler-audit (0.4.0)
bundler (~> 1.2)
thor (~> 0.18)
byebug (8.2.0)
byebug (8.2.1)
cal-heatmap-rails (0.0.1)
capybara (2.4.4)
mime-types (>= 1.16)
......@@ -117,6 +117,7 @@ GEM
activemodel (>= 3.2.0)
activesupport (>= 3.2.0)
json (>= 1.7)
cause (0.1)
charlock_holmes (0.7.3)
chunky_png (1.3.5)
cliver (0.3.2)
......@@ -140,10 +141,10 @@ GEM
term-ansicolor (~> 1.3)
thor (~> 0.19.1)
tins (~> 1.6.0)
crack (0.4.2)
crack (0.4.3)
safe_yaml (~> 1.0.0)
creole (0.5.0)
d3_rails (3.5.6)
d3_rails (3.5.11)
railties (>= 3.1.0)
daemons (1.2.3)
database_cleaner (1.4.1)
......@@ -230,7 +231,7 @@ GEM
ipaddress (~> 0.5)
nokogiri (~> 1.5, >= 1.5.11)
opennebula
fog-brightbox (0.9.0)
fog-brightbox (0.10.1)
fog-core (~> 1.22)
fog-json
inflecto (~> 0.0.2)
......@@ -249,7 +250,7 @@ GEM
fog-core (>= 1.21.0)
fog-json
fog-xml (>= 0.0.1)
fog-sakuracloud (1.4.0)
fog-sakuracloud (1.5.0)
fog-core
fog-json
fog-softlayer (1.0.2)
......@@ -277,11 +278,11 @@ GEM
ruby-progressbar (~> 1.4)
gemnasium-gitlab-service (0.2.6)
rugged (~> 0.21)
gemojione (2.1.0)
gemojione (2.1.1)
json
get_process_mem (0.2.0)
gherkin-ruby (0.3.2)
github-linguist (4.7.2)
github-linguist (4.7.3)
charlock_holmes (~> 0.7.3)
escape_utils (~> 1.1.0)
mime-types (>= 1.19)
......@@ -298,7 +299,7 @@ GEM
posix-spawn (~> 0.3)
gitlab_emoji (0.2.0)
gemojione (~> 2.1)
gitlab_git (7.2.21)
gitlab_git (7.2.22)
activesupport (~> 4.0)
charlock_holmes (~> 0.7.3)
github-linguist (~> 4.7.0)
......@@ -370,6 +371,9 @@ GEM
i18n (0.7.0)
ice_nine (0.11.1)
inflecto (0.0.2)
influxdb (0.2.3)
cause
json
ipaddress (0.8.0)
jquery-atwho-rails (1.3.2)
jquery-rails (3.1.4)
......@@ -416,11 +420,11 @@ GEM
net-ldap (0.12.1)
net-ssh (3.0.1)
netrc (0.11.0)
newrelic-grape (2.0.0)
newrelic-grape (2.1.0)
grape
newrelic_rpm
newrelic_rpm (3.9.4.245)
nokogiri (1.6.7)
nokogiri (1.6.7.1)
mini_portile2 (~> 2.0.0.rc2)
nprogress-rails (0.1.6.7)
oauth (0.4.7)
......@@ -783,7 +787,7 @@ GEM
coercible (~> 1.0)
descendants_tracker (~> 0.0, >= 0.0.3)
equalizer (~> 0.0, >= 0.0.9)
warden (1.2.3)
warden (1.2.4)
rack (>= 1.0)
web-console (2.2.1)
activemodel (>= 4.0)
......@@ -836,6 +840,7 @@ DEPENDENCIES
charlock_holmes (~> 0.7.3)
coffee-rails (~> 4.1.0)
colorize (~> 0.7.0)
connection_pool (~> 2.0)
coveralls (~> 0.8.2)
creole (~> 0.5.0)
d3_rails (~> 3.5.5)
......@@ -873,6 +878,7 @@ DEPENDENCIES
hipchat (~> 1.5.0)
html-pipeline (~> 1.11.0)
httparty (~> 0.13.3)
influxdb (~> 0.2)
jquery-atwho-rails (~> 1.3.2)
jquery-rails (~> 3.1.3)
jquery-scrollto-rails (~> 1.4.3)
......
......@@ -3,5 +3,5 @@
# lib/support/init.d, which call scripts in bin/ .
#
web: bundle exec unicorn_rails -p ${PORT:="3000"} -E ${RAILS_ENV:="development"} -c ${UNICORN_CONFIG:="config/unicorn.rb"}
worker: bundle exec sidekiq -q post_receive -q mailers -q archive_repo -q system_hook -q project_web_hook -q gitlab_shell -q incoming_email -q runner -q common -q default
worker: bundle exec sidekiq -q post_receive -q mailers -q archive_repo -q system_hook -q project_web_hook -q gitlab_shell -q incoming_email -q runner -q common -q default -q metrics
# mail_room: bundle exec mail_room -q -c config/mail_room.yml
class MetricsWorker
include Sidekiq::Worker
sidekiq_options queue: :metrics
def perform(metrics)
prepared = prepare_metrics(metrics)
Gitlab::Metrics.pool.with do |connection|
connection.write_points(prepared)
end
end
def prepare_metrics(metrics)
metrics.map do |hash|
new_hash = hash.symbolize_keys
new_hash[:tags].each do |key, value|
new_hash[:tags][key] = escape_value(value)
end
new_hash
end
end
def escape_value(value)
value.gsub('=', '\\=')
end
end
......@@ -421,9 +421,22 @@ production: &base
#
# Ban an IP for one hour (3600s) after too many auth attempts
# bantime: 3600
metrics:
enabled: false
# The name of the InfluxDB database to store metrics in.
database: gitlab
# Credentials to use for logging in to InfluxDB.
# username:
# password:
# The amount of InfluxDB connections to open.
# pool_size: 16
# The timeout of a connection in seconds.
# timeout: 10
development:
<<: *base
metrics:
enabled: false
test:
<<: *base
......@@ -466,6 +479,10 @@ test:
user_filter: ''
group_base: 'ou=groups,dc=example,dc=com'
admin_group: ''
metrics:
enabled: false
staging:
<<: *base
metrics:
enabled: false
if Gitlab::Metrics.enabled?
require 'influxdb'
require 'socket'
require 'connection_pool'
# These are manually require'd so the classes are registered properly with
# ActiveSupport.
require 'gitlab/metrics/subscribers/action_view'
require 'gitlab/metrics/subscribers/active_record'
require 'gitlab/metrics/subscribers/method_call'
Gitlab::Application.configure do |config|
config.middleware.use(Gitlab::Metrics::RackMiddleware)
end
Sidekiq.configure_server do |config|
config.server_middleware do |chain|
chain.add Gitlab::Metrics::SidekiqMiddleware
end
end
GC::Profiler.enable
Gitlab::Metrics::Sampler.new.start
end
module Gitlab
module Metrics
def self.pool_size
Settings.metrics['pool_size'] || 16
end
def self.timeout
Settings.metrics['timeout'] || 10
end
def self.enabled?
!!Settings.metrics['enabled']
end
def self.pool
@pool
end
def self.hostname
@hostname
end
def self.last_relative_application_frame
root = Rails.root.to_s
metrics = Rails.root.join('lib', 'gitlab', 'metrics').to_s
frame = caller_locations.find do |l|
l.path.start_with?(root) && !l.path.start_with?(metrics)
end
if frame
return frame.path.gsub(/^#{Rails.root.to_s}\/?/, ''), frame.lineno
else
return nil, nil
end
end
@hostname = Socket.gethostname
# When enabled this should be set before being used as the usual pattern
# "@foo ||= bar" is _not_ thread-safe.
if enabled?
@pool = ConnectionPool.new(size: pool_size, timeout: timeout) do
db = Settings.metrics['database']
user = Settings.metrics['username']
pw = Settings.metrics['password']
InfluxDB::Client.new(db, username: user, password: pw)
end
end
end
end
module Gitlab
module Metrics
# Class for calculating the difference between two numeric values.
#
# Every call to `compared_with` updates the internal value. This makes it
# possible to use a single Delta instance to calculate the delta over time
# of an ever increasing number.
#
# Example usage:
#
# delta = Delta.new(0)
#
# delta.compared_with(10) # => 10
# delta.compared_with(15) # => 5
# delta.compared_with(20) # => 5
class Delta
def initialize(value = 0)
@value = value
end
# new_value - The value to compare with as a Numeric.
#
# Returns a new Numeric (depending on the type of `new_value`).
def compared_with(new_value)
delta = new_value - @value
@value = new_value
delta
end
end
end
end
module Gitlab
module Metrics
# Module for instrumenting methods.
#
# This module allows instrumenting of methods without having to actually
# alter the target code (e.g. by including modules).
#
# Example usage:
#
# Gitlab::Metrics::Instrumentation.instrument_method(User, :by_login)
module Instrumentation
# Instruments a class method.
#
# mod - The module to instrument as a Module/Class.
# name - The name of the method to instrument.
def self.instrument_method(mod, name)
instrument(:class, mod, name)
end
# Instruments an instance method.
#
# mod - The module to instrument as a Module/Class.
# name - The name of the method to instrument.
def self.instrument_instance_method(mod, name)
instrument(:instance, mod, name)
end
def self.instrument(type, mod, name)
return unless Metrics.enabled?
alias_name = "_original_#{name}"
target = type == :instance ? mod : mod.singleton_class
target.class_eval do
alias_method(alias_name, name)
define_method(name) do |*args, &block|
ActiveSupport::Notifications.
instrument("#{type}_method.method_call", module: mod, name: name) do
__send__(alias_name, *args, &block)
end
end
end
end
end
end
end
module Gitlab
module Metrics
# Class for storing details of a single metric (label, value, etc).
class Metric
attr_reader :series, :values, :tags, :created_at
# series - The name of the series (as a String) to store the metric in.
# values - A Hash containing the values to store.
# tags - A Hash containing extra tags to add to the metrics.
def initialize(series, values, tags = {})
@values = values
@series = series
@tags = tags
@created_at = Time.now.utc
end
# Returns a Hash in a format that can be directly written to InfluxDB.
def to_hash
{
series: @series,
tags: @tags.merge(
hostname: Metrics.hostname,
ruby_engine: RUBY_ENGINE,
ruby_version: RUBY_VERSION,
gitlab_version: Gitlab::VERSION,
process_type: Sidekiq.server? ? 'sidekiq' : 'rails'
),
values: @values,
timestamp: @created_at.to_i
}
end
end
end
end
module Gitlab
module Metrics
# Class for producing SQL queries with sensitive data stripped out.
class ObfuscatedSQL
REPLACEMENT = /
\d+(\.\d+)? # integers, floats
| '.+?' # single quoted strings
| \/.+?(?<!\\)\/ # regexps (including escaped slashes)
/x
MYSQL_REPLACEMENTS = /
".+?" # double quoted strings
/x
# Regex to replace consecutive placeholders with a single one indicating
# the length. This can be useful when a "IN" statement uses thousands of
# IDs (storing this would just be a waste of space).
CONSECUTIVE = /(\?(\s*,\s*)?){2,}/
# sql - The raw SQL query as a String.
def initialize(sql)
@sql = sql
end
# Returns a new, obfuscated SQL query.
def to_s
regex = REPLACEMENT
if Gitlab::Database.mysql?
regex = Regexp.union(regex, MYSQL_REPLACEMENTS)
end
@sql.gsub(regex, '?').gsub(CONSECUTIVE) do |match|
"#{match.count(',') + 1} values"
end
end
end
end
end
module Gitlab
module Metrics
# Rack middleware for tracking Rails requests.
class RackMiddleware
CONTROLLER_KEY = 'action_controller.instance'
def initialize(app)
@app = app
end
# env - A Hash containing Rack environment details.
def call(env)
trans = transaction_from_env(env)
retval = nil
begin
retval = trans.run { @app.call(env) }
# Even in the event of an error we want to submit any metrics we
# might've gathered up to this point.
ensure
if env[CONTROLLER_KEY]
tag_controller(trans, env)
end
trans.finish
end
retval
end
def transaction_from_env(env)
trans = Transaction.new
trans.add_tag(:request_method, env['REQUEST_METHOD'])
trans.add_tag(:request_uri, env['REQUEST_URI'])
trans
end
def tag_controller(trans, env)
controller = env[CONTROLLER_KEY]
label = "#{controller.class.name}##{controller.action_name}"
trans.add_tag(:action, label)
end
end
end
end
module Gitlab
module Metrics
# Class that sends certain metrics to InfluxDB at a specific interval.
#
# This class is used to gather statistics that can't be directly associated
# with a transaction such as system memory usage, garbage collection
# statistics, etc.
class Sampler
# interval - The sampling interval in seconds.
def initialize(interval = 15)
@interval = interval
@metrics = []
@last_minor_gc = Delta.new(GC.stat[:minor_gc_count])
@last_major_gc = Delta.new(GC.stat[:major_gc_count])
end
def start
Thread.new do
Thread.current.abort_on_exception = true
loop do
sleep(@interval)
sample
end
end
end
def sample
sample_memory_usage
sample_file_descriptors
sample_objects
sample_gc
flush
ensure
GC::Profiler.clear
@metrics.clear
end
def flush
MetricsWorker.perform_async(@metrics.map(&:to_hash))
end
def sample_memory_usage
@metrics << Metric.new('memory_usage', value: System.memory_usage)
end
def sample_file_descriptors
@metrics << Metric.
new('file_descriptors', value: System.file_descriptor_count)
end
def sample_objects
@metrics << Metric.new('object_counts', ObjectSpace.count_objects)
end
def sample_gc
time = GC::Profiler.total_time * 1000.0
stats = GC.stat.merge(total_time: time)
# We want the difference of GC runs compared to the last sample, not the
# total amount since the process started.
stats[:minor_gc_count] =
@last_minor_gc.compared_with(stats[:minor_gc_count])
stats[:major_gc_count] =
@last_major_gc.compared_with(stats[:major_gc_count])
stats[:count] = stats[:minor_gc_count] + stats[:major_gc_count]
@metrics << Metric.new('gc_statistics', stats)
end
end
end
end
module Gitlab
module Metrics
# Sidekiq middleware for tracking jobs.
#
# This middleware is intended to be used as a server-side middleware.
class SidekiqMiddleware
def call(worker, message, queue)
# We don't want to track the MetricsWorker itself as otherwise we'll end
# up in an infinite loop.
if worker.class == MetricsWorker
yield
return
end
trans = Transaction.new
begin
trans.run { yield }
ensure
tag_worker(trans, worker)
trans.finish
end
end
def tag_worker(trans, worker)
trans.add_tag(:action, "#{worker.class.name}#perform")
end
end
end
end
module Gitlab
module Metrics
module Subscribers
# Class for tracking the rendering timings of views.
class ActionView < ActiveSupport::Subscriber
attach_to :action_view
SERIES = 'views'
def render_template(event)
track(event) if current_transaction
end
alias_method :render_view, :render_template
private
def track(event)
path = relative_path(event.payload[:identifier])
values = values_for(event)
current_transaction.add_metric(SERIES, values, path: path)
end
def relative_path(path)
path.gsub(/^#{Rails.root.to_s}\/?/, '')
end
def values_for(event)
values = { duration: event.duration }
file, line = Metrics.last_relative_application_frame
if file and line
values[:file] = file
values[:line] = line
end
values
end
def current_transaction
Transaction.current
end
end
end
end
end
module Gitlab
module Metrics
module Subscribers
# Class for tracking raw SQL queries.
#
# Queries are obfuscated before being logged to ensure no private data is
# exposed via InfluxDB/Grafana.
class ActiveRecord < ActiveSupport::Subscriber
attach_to :active_record
SERIES = 'sql_queries'
def sql(event)
return unless current_transaction
sql = ObfuscatedSQL.new(event.payload[:sql]).to_s
values = values_for(event)
current_transaction.add_metric(SERIES, values, sql: sql)
end
private
def values_for(event)
values = { duration: event.duration }
file, line = Metrics.last_relative_application_frame
if file and line
values[:file] = file
values[:line] = line
end
values
end
def current_transaction
Transaction.current
end
end
end
end
end
module Gitlab
module Metrics
module Subscribers
# Class for tracking method call timings.
class MethodCall < ActiveSupport::Subscriber
attach_to :method_call
SERIES = 'method_calls'
def instance_method(event)
return unless current_transaction
label = "#{event.payload[:module].name}##{event.payload[:name]}"
add_metric(label, event.duration)
end
def class_method(event)
return unless current_transaction
label = "#{event.payload[:module].name}.#{event.payload[:name]}"
add_metric(label, event.duration)
end
private
def add_metric(label, duration)
file, line = Metrics.last_relative_application_frame
values = { duration: duration, file: file, line: line }
current_transaction.add_metric(SERIES, values, method: label)
end
def current_transaction
Transaction.current
end
end
end
end
end
module Gitlab
module Metrics