いい加減終わらせにかかります。2日目と同じようにメモはほぼ生ログです。

Speeding Up Rails 4.2

Rails 4.2で行われた高速化の話。

個人的に驚いたのはUtilsのhメソッドの高速化の話。SafeBufferを経由せずに行けるから削ってしまおう、ということでまるっと何もしないメソッドになった結果アロケーションが減って高速化できます、という割と冗談みたいな高速化がこの段階になっても残っていたのは驚きである。

そして日本語での講演だったのだが、ただ日本語でやるだけでなく日本の人が何にウケるかを下調べした上で組み込んでくるところにものすごい芸の細かさを感じた。英語が標準ということにしようとしているところに日本語のスピーチを準備してきてくれるというのはまさにRubyコミュニティだからこそだろう。

「毎年恒例の日本語能力試験」

タコヤキ仮面(Aaron Patterson)

x レッドハト o RedHat

ManageIQ 仮想マシンを管理するソフトウェアのチームに所属 Ruby Core, Rails Core Railsの#1 Committer その秘密を教えてあげる: Revert Commits Count Too! More mistakes == more points!!

ペット: オガワさん（マリモ）、ゴブちゃん（ネコ、失業中）、チューチュー（ネコ、プログラマ？）

ビッグデータ（巨人）

メタルに近くなるためにNode.JS

Speeding Up Rails 4.2 本編

Rack

最近maintainerになった。けどRackはもう終わり。Rackは楽じゃない。 Rack 2.0は発表しない*(します) 私は諦めたくないので新しいプロジェクトを勝手に作った

the_metal テストプロジェクト。うまくいけば2.0になります。

1.xには問題がある。

1

def call(env)
end

envはグローバル変数

2: ストリーミング

制御がめんどくさい。他のシステムに入れるとめんどくさい。

直すために

requestはIOとリクエスト、レスポンスの情報を持ってる

完全実例 node.jsみたいですねー

Great for HTTP/1.1 , Extendable for HTTP 2.0

Rails 4.2の話。

Adequate Recordというプロジェクトが含まれる。 ActiveRecodの速度を2倍にする。

Performance Tools：問題を見つけるツール benchmark/ips - テストコードが5秒間に何回繰り返すかを iterations per second

Setのincludeは 300000（あとでスライド見る） Arrayのinclude: 4000 IPS: Higher is Better

Railsの実装がわからないときはこれでBlackbox Testing

キャッシュのテスト実装はわからないけど速度からヒントを得ることができる定数: Hash, 線形: Array

Routes Route表の大きさはlink_toの速度に関係ある？ routeに10, 100, 1000...の要素を入れて計る -> route はどれだけ入れても変わらない

URLの長さは実行速度に関係ある？ 10, 100, 1000...とURLの長さを変える -> 線形。定数に変換して

オブジェクト割当 GC.stat() 合計割当: GC.stat(:total_allocated_object) 具体的にはViewsのオブジェクト割当を計ってみよう

グラフでどうやって嘘をつくか

4-0-stableから14%くらい割当が減った

allocation_tracer ko1/allocation_tracer

オブジェクトはどこで作られているか？ -> Tag Optionsという機能で作られている。 ERB::Util.h()

HTML Sanitization Railsでは普通の文字列は安全ではないとする。エスケープする必要がある。

ERB::Utils.hはgsubを使っている。 create 2 objects tag optionsでさらに1つ。 String -> SafeBuffer -> String 真ん中のSafeBufferを抜ける

だいたい200くらいのallocationを減らせる。 Your mileage may vary

Compiled Template HTML文字列はsafe_append=の引数として渡される。 nilチェックするけど何でerbコンパイラはnilを渡さない nilチェックのコードを消す。文字列も渡すからto_sも消す。 superを呼ぶだけなので消すことができる。safe_append=は要らない。

高度なコード！

allocations per request: 4.0 stableから19%、4.1から14%減った

まとめ。

eliminate object 一番速いコードは存在しないコード limit types fewer types = less code less code = faster code

一番重要なことは measure, measure, measure, measure, MEASURE!

計らないことには何が改善されたかわからない。

Rails 4.2 will be the fastest ever!

@ko1
- "allocation tracer has more useful features, so please use more useful features"
- what's your next target? A: remove garbage from view generation. we can't precompile views(all views compiled lazily), has bad effects with copy-on-write
about SymbolGC
- We have to support older version of rubies...

Practical meta-programming in Application

こちらもスライドのチェックと通訳打ち合わせを担当したセッション。会場にいるRubyistの猛者たちはもはや日常的に使いこなしていそうなメタプログラミングだが、改めてどのように使うのがよいのかを考え直すいいきっかけになった講演だった。

メタプロ: コード実行時に言語の要素に触る

Rubyの言語要素？

class/module
object and its state, @instance_variable
method
a series of procedures (like Proc)

class Book
  def title
    @title
  end
  def title=(title)
    @title = title
  end
end

class Book
  attr_accessor :title
end

とするとsetter, getterが自動で定義される

ほんとはCで書いてあるけどRubyでもそれっぽく書ける

define_method(:title) do
  instance_variable_get("@title")
end

みたいなことをする

言語要素を操作するAPIを指してReflection APIsと呼ぶ。 Rubyにはそれが豊富。メタプロしやすい。初心者お断りではなく相性がいい。

落とし穴。

リフレクションAPI、使いすぎると辛い。

どうやってシンプルなコード書けるの？

Extract code from 'meta' aspect & write code simple 問題領域をメタな観点からとらえなおしてシンプルなコードを書く

~~~ https://speakerdeck.com/moro/practical-metaprogramming-in-application ~~~

8bit Game Development With Ruby

こちらはノートなし…デモが中心で、しかも現場でコードの紹介とそのコードから作られた8bitゲームの紹介が行われた。そして音楽は自作！

Scaling is hard, let's go server shopping! Or, How we scaled Travis CI, and helped Open Source at the same time.

スケーリングをするにあたって、最適化をするのが必ずしも最良の解ではなく、サーバースペックにお金を使うのも悪いことではない、という話。最適化は新規のお客さんを引きつけるとは限らないしそれによってお客さんを逃す可能性もある、というのはパフォーマンスチューニングに追われることの多い業界からはなかなか聞くことのできない面白い視点だったと思う。

Travis does crazy amount of OSS testing 137194 oss hostings 27000 builds per day

1 push to GH = build req

79930 jobs per day

2.96 jobs per build

Thoughtbot/paperclip 15 test = 5 rubies, 3 gemfiles

oss testing : testing different combinations

private repo testing is different

on dot com platform

16340 repos (in last 12 months) 30000 builds per day 44255 jobs per day 1.47 jobs per build

they only need confidence for their production environment

you need a few server

68+ dedicated servers 1100+ vms

growth took time

4 life stages of travis

1. beginning

opensource, moustaches and shite shirts

@svenfuchs rails heroku resque websockets

contributors interests projects grew

-> need more servers! -> needed sandbox!

virtualbox snapshots, shutting down ...

mri/jruby

one project travis-ci/travis-ci

split up: travis-worker, travis-core, travis-hub, travis-ci still needed more servers and more servers

needed sponsors for servers

2. creating a team

companies, crowd funding, and onsies

6 months -> left my job, contracting, but ended, -> "why isn't this our job", we were committed!

needed a way to fund the company -> love.travis-ci.org

$134,000

crowd funded company

more features need to implement, more servers needed, more languages ...

3. Travis Pro

first 10 customers

$ -> servers, services, and people

start charging ASAP

get private repos and billing to work ASAP

people wanted to be able to test private repos

feature requests are flying at us

there was never a shortage of work

4 developers, trying to also be business people

4. in the black

where are we now?

3 yrs, self-sustaining, 68+ servers, 125000+ jobs a days, 9000 support emails

but wait!

how did it scale?

throw money at problem

features
bug reports
optimizations

new users want features

customers want stability and bugs fixed

optimizations are complicated: you want more customers, you want to keep current customers, not a bug, not a feature, until they are

workers and vms

openvz

125000 jobs per day

30 secs on average to start

growth on the platform is still increasing

optimise? -> we don't know how long it might take

time == money developers are not a free resource possible opportunity cost

we have been tinkering in the background

docker 3 second boot time runs on EC2 scale up and down different instances

summarize

sometimes throwing money at a problem isn't a bad thing

ruby 2.1.3 / ruby 2.2.0-preview1 will be available today! (on Travis)

sudo使わなければDockerでtravis ci使えます mail to support

Make your own synchronization mechanism.

こちらも通訳打ち合わせ担当分。かなり深入りした内容でプログラマでも理解が難しい内容のところ通訳しきった同時通訳の方は本当にすごい…！

同期メカニズムと聞くとかなり身構えてしまいがちな内容だがこうやってステップバイステップで見ていくと実際に欲しい実装に至るまでの手順をじっくり追うことができてわかりやすかった。自分でもできそう！と思えるのはでかい。

Goの人気がねたましい Goチャネルがすごく人気、RubyのQueueがすぐに使える

Rubyのおまけの同期の仕組みのおさらい Go風チャネルの作成を通じて得られた素晴らしい知見

小さい頃読んでたMINIX オペレーティングシステムいろんな同期の話が出てきて、一つ実装できると他の種類の同期メカニズムが実装できる世界の仕組みの構築！

俺の同期メカニズム

Rubyの同期メカ

待ち合わせそのもの
待ち合わせと情報の交換

同期メカの注意：信号機のようなもの。みんな守るとうまくいく。守らないと事故ることもある。でも事故がおきなかったからといって正しいとは限らない。

情報の交換をする同期メカ Queue,TupleSpace socket,pipe

依存関係と待ち合わせ：データの依存関係をプログラミングすることで並行処理が書けるあの情報が来なければ進まないよねーということをプログラミングする

Goのチャネル

chan <- obj
it = <- chan

SizedQueueに似てる…

やる気を高めるにはよく調べないほうがいい。調べすぎると難しさに気づいてしまってやる気を失ってしまう。

SizedQueueそのものは短い。

でもGoのチャネルはSizedQueueじゃなかった。大きさ0のチャネルは送信者と受信者がそろうまで待ち合わせ。

-> 仕様をよく調べたほうがいい。

ph.2 自分で休んだほうがいいんじゃないの？ size0のときの挙動を特別にする

-> wakeupのあとにsleepするのが起こる。まずい。

ダメな例：自分でsleepするような実装は大抵間違えている。Rubyについてる部品を使おう。

ph.3 カウンティングセマフォを使おう

最初からそうしろよ真面目にかくとめんどいのでQueueを使う

readingというQueueがセマフォのふりをする

完成したと思った。Queueは単純で応用も簡単。同期メカ作りやすい。しかしGoチャネルにはselectもあった。

selectはめんどい。多元待ち（複数Queueを同時に待つ）複数のチャネルについて、読んでもブロックしないチャネルを返す。

selectはめんどい。複数のチャネルのうちどれか1つがreadableまでblock queue#popはブロックしてしまう

複数のQueueを1つのQueueにまとめるチャネルの数だけスレッドがいる popできたときには要素を消費してる selectというからには消費しないほうがモテ同じ効果のアプリケーションは書けるけど同じインターフェースは難しい

-> Queueベースの作戦はだめなのでは…

でもまてよ？

ph.4 TupleSpaceを使う

高機能すぎてズルい。 write, take, red takeとreadにはパターンが指定できる: === (case equal) 複雑な状態をだいたい待てる readableなチャネルを待つ

作戦: TupleSpaceに隠す Handle -> ChannelSpace(TupleSpace) の部分しかアプリケーションには見えてない

ph.5 作らなかった

Three Ruby usages - High-level interface, Glue and Embedding - Inside Droonga

他の言語やソフトウェアとの組み合わせでどのようにRubyを使うことができるか、という話。低レベルの機能のより便利なインターフェースを提供するのは確かに非常によく見るが、言語組み込みのRubyでの開発の実例として、mrubyにすることによるオーバーヘッドの計測の話が聞けたのは面白かった。

「実質日本語のKeynote」

ClearCode -> Silver sponsor

High-level interface
Glue
Embed

High-lv intf -> Pure rubyists Glue, Embed -> Rubyists who also write C/C++

Case study

Implement distributed full-text search engine in Ruby

(DFTSE)

why? > droongaを作ってるから。

high-level interface

下のレイヤーの機能をもっと上のレイヤーに提供使いやすい、シンプルなAPI

Vagrant, ActiveRecord

Droongaでどうやって使っているか…という話はしない。

みなさんも自分でライブラリとか作ったらわかる。

glue

Rubyと他の言語の間を渡す、機能と機能の間を渡す

ActiveRecord
mysql2 gem = glue
access to mysql, libmysqlclient.so

 vagrant = glue
VM   Provision

VM -- virtualbox provision -- chef

why do we use glue? -> reuse external features

glueを使うには？外部のライブラリ - bindingを実装する (mys コマンドを使う - spawn command (vag

Rroonga: Groonga bindings Groonga: FTSE C library

Cool.io: libev bindings libev: event loop C library

Sert: clustering tool

-> Rroongaについて。

重い処理はCでやる。RubyらしいAPIは残す。メモリアロケーションを減らす。なかにバッファを持っててそれを再利用する。 multiprocessing: groonga supports multiprocessing CPUネックになるならマルチプロセスにすることを考えたほうがいい

重い処理をCで

Groonga::Database.open(ARGV[0])
entries = Groonga["Entries"]

entries.find_all do |record|
  /Ruby/ =~ record.description
end

entries.select do |record|
  record.description =~ "ruby"
end

-> 実はループ回してない。Cのレベルでやる。expressionというものを作って検索。

るりまに対してRubyで検索 C 0.6ms Ruby 0.8ms

embed

アプリケーションに埋め込む方。

internal engine - groonga's query optimizer in Ruby
interface - vim-ruby to vim

embed in droonga: droonga - rroonga - groonga - mruby

CRuby vs. mruby CRuby - fully featured! signal handler isn't needed mruby - multi-interpreters in a process! but you may miss some features ケースによって使い分けるべき

mruby in Groonga

Query optimizer
Command Interface (plan)
plugin api(plan)

Query optimizer -> plan how to search, light operation than FTS, depends on data -> cでは書きたくなく、rubyで書きたい。アルゴリズム的にしんどいところを無理にCで書くのは大変

組み込んで割にあうのか？ > 測定せよ何を？ > mrubyによるoverhead, optimizerによる効果

Ruby 2.1 in Production

クロージングキーノート。GitHubでどのようにRuby/Railsのバージョンアップと戦ってきたか、そしてその中でどのように高速化を行ってきたか、という話。 RubyKaigiのDay 3で何度も繰り返された「パフォーマンスを上げたければまず計測せよ」というテーマがここでも繰り返される。そしてその計測の過程で生み出されたツールのなんとも鮮やかに問題を描き出し解決の道筋を示していることか！会場からため息が出るほど鮮やかに「皆が通ったあるある問題」を解決し、そのさらに先をいく様子が披露されて驚くほかなかった。

V.Zero RubyKaigi Demonstration - Ruby Powered Payments

Steven Cooper @ PayPal

Braintree - Simple, powerful payments started from only developers

Uber: taxi service

Small footprint, big ideas v.zero : a modern foundation for accepting payments

PayPal & credit card in any website any backend language that you want coinbase -> bitcoin Apple Pay

tokenized transaction everything's

developers.braintreepayments.com

12 lines of code

blueprint.paypal.com

engineer.east.vc

Ruby 2.1 in Production

Aman Gupta @tmm1 ruby core commiter since 2 yrs ago performance improvements: most difficult parts of ruby impl

How I upgraded github.com from 1.8.7-p72 to 2.1.2

bit.ly/ruby21-in-production

github.com rails app : one of the largest app host on the internet

312ms: 98th percentile response time we don't like to use average response time ajax tends to be faster than browser requests

history

met ruby in 2002 huge php fanatic, used php 3d games for php (opengl binding)

went back to php

2005 - hearing about rails realtime support - didn't work rails 10 was crazy codebase

ramaze, sequel

ruby/eventmachine: real-time push

2008: moved to SF

Josh Susser, ICHR

great way to meet ruby developers

what took over @ sf -> GH meetup/GH drinkup every 2 weeks

2009: involved in community, ruby hero award for eventmachine conference circuit

best developers were going to github

2011: started to work at github shipped a feature on the 1st day at work

github: rails app + git stack post-receive hook that runs in ruby

"what is going on in this service" -> very first tools in the first two weeks: rbtrace get a lot of information out of it (inspired by strace)

1.8.7-p72 was highly out of date

tried ot use ree

how i think about performance first: measure only then can you make changes

measuring performance -> break down to 2: cpu time and idle time cpu time = system(kernel) + user(c,ruby) idle time = disk io, network io (idle time : not spent on cpu)

system time -> strace (use every day) can attach to any process, trace every system call

strace against ci build strace -c 76s of the github test suite in clone() git cmd calls fork() which calls clone() switching to posix-spawn gem sped up test suite by 2x!

most linux/bsd has a different system call optimized for running processes fork->exec does not need copying at fork time -> posix spawn!

rebuild, set path, use ree -> ree-187-2012.02

ruby 1.9: there was already 1.9.3 library usage, syntax changes, ...

marshal compatibility was hard a lot of data was in cache(memcached), several TBs

rational class: rational has become a pure-c -> wrote a monkey-patch

encoding issues... added option to vm, ruby --encoding-compatibility ASCII-8BIT + UTF-8 -> swallow the exception, ASCII-8BIT

bin/app-ruby - specify(hard-code) which ruby to use

ruby-1.9.3-p231-tcs-github

script/bench-app write a lot of tools where time was taking

rblineprof per-line profiler for erb templates and other ruby code enjoy looking into others' ruby code profiler that focused on line-by-line performance found lots of easy-wins all over the place

Ruby Performance group; open group

-> requested commit access

feel alot of power developing a lot of tools

profiling/tracing, GC APIs needed to design apis to profile with 0 allocation overhead

2.1.2-github encoding-compatibility patch @funnyfalcon's method cache patch @funnyfalcon's pool allocator Hash#[] optimization backport

ruby-tcmalloc ruby-jemalloc choose allocator per app, choose against tradeoffs

ruby 2.1 in branch deploy, almost compatible with 1.9 so pretty easy dramatic, dramatic, DRAMATIC improvement in GC

-> 2.1 in production!

take advantage of these new apis

stackprof sampling cpu profiler built on rb_profile_frames() in ruby2.1

faster checkboxes

stackprof can generate a graph. callgraph. clear sense of where the time is spent. -> 15% of time was spent in rendering!

parametrize took time in looping -> pull parametrize call out of loop!

stackprof:flamegraphs

4KB "leak" per File.expand_path() call! -> nobu-san fixed it in minutes

jemalloc profiler

peek status bar github.com/peek/peek (OSS version)

template visualization quickly see wich templates drive parts of the page hover over any element over the page, which template renders it , how long it took

where in Ruby do you want to improve?

-> tool that go to a specific class and see info about specific class, specific method. currently tools for only global scope

how do you figure out random slowness?

-> make profiler to run low-cost and run in production, because that is the only way to find out where the problem is

when can i have visual profiler?

-> it's a huge monkey patch on rails, too hacky... but will ship it later

そんなことはさておいて

RubyKaigiに行ってきた(4): 3日目

Speeding Up Rails 4.2

Speeding Up Rails 4.2 本編

Rack

1

2: ストリーミング

直すために

Rails 4.2の話。

まとめ。

Practical meta-programming in Application

8bit Game Development With Ruby

Scaling is hard, let's go server shopping! Or, How we scaled Travis CI, and helped Open Source at the same time.

1. beginning

2. creating a team

3. Travis Pro

4. in the black

how did it scale?

Make your own synchronization mechanism.

Three Ruby usages - High-level interface, Glue and Embedding - Inside Droonga

high-level interface

glue

embed

Ruby 2.1 in Production

V.Zero RubyKaigi Demonstration - Ruby Powered Payments

Ruby 2.1 in Production