Skip to content
  • Yorick Peterse's avatar
    Improve AutolinkFilter#text_parse performance · dd35c3dd
    Yorick Peterse authored
    By using clever XPath queries we can quite significantly improve the
    performance of this method. The actual improvement depends a bit on the
    amount of links used but in my tests the new implementation is usually
    around 8 times faster than the old one. This was measured using the
    following benchmark:
    
        require 'benchmark/ips'
    
        text = '<p>' + Note.select("string_agg(note, '') AS note").limit(50).take[:note] + '</p>'
        document = Nokogiri::HTML.fragment(text)
        filter = Banzai::Filter::AutolinkFilter.new(document, autolink: true)
    
        puts "Input size: #{(text.bytesize.to_f / 1024 / 1024).round(2)} MB"
    
        filter.rinku_parse
    
        Benchmark.ips(time: 15) do |bench|
          bench.report 'text_parse' do
            filter.text_parse
          end
    
          bench.report 'text_parse_fast' do
            filter.text_parse_fast
          end
    
          bench.compare!
        end
    
    Here the "text_parse_fast" method is the new implementation and
    "text_parse" the old one. The input size was around 180 MB. Running this
    benchmark outputs the following:
    
        Input size: 181.16 MB
        Calculating -------------------------------------
                  text_parse     1.000  i/100ms
             text_parse_fast     9.000  i/100ms
        -------------------------------------------------
                  text_parse     13.021  (±15.4%) i/s -    188.000
             text_parse_fast    112.741  (± 3.5%) i/s -      1.692k
    
        Comparison:
             text_parse_fast:      112.7 i/s
                  text_parse:       13.0 i/s - 8.66x slower
    
    Again the production timings may (and most likely will) vary depending
    on the input being processed.
    dd35c3dd
To find the state of this project's repository at the time of any of these versions, check out the tags.