Just Enough Developed Infrastructure

Webby Rss feed and Syntax Highlighting

When I first started blogging I used Wordpress, while it was nice at that time, it felt rather cumbersome to use for posting technical stuff like code samples. Also having everything in the database make it difficult to use my favorite text editor to manage my content. So I decided to move to Webby : it allows you to use the power of ruby erb inside you html pages without the overkill of running a rails Another advantage is that it creates static html so it's both fast and secure.

I've been running it now for more then a year and have been happy ever since. This blogpost shares some of my scripts/enhancements I've used:

  • Wordpress exporter to Webby : this was my first challenge, how do I get everything from my wordpress out of that database. I wrote a little script that reads from the RSS feed of the wordpress and export everything into webby structure . It's not the most robust script (it was early stages ruby for me at that time), but you'll get the idea.

  • RSS feed in Webby: I didn't find any good rss feed code, most of it didn't pass the rss validation. So here you can find my take on it. As a bonus the script tries to make your images absolute, which is important for having your images displayed in some rss feedreaders.

  • Highlighting with Coderay : webby includes a way to highlight parts of the code. This works very well, but I found myself missing a way to include a file instead of pasting the code directly into the blogpost. This has many advantages: I can edit the file separate and the correct syntax highlighting in my favorite text editor still works. And the escaping works better for ruby code as there can not be a problem with escaping special characters. After a while I switched to Ultraviolet because it has way more syntaxes supported. The downside is that it requires the onigurama library . This isn't really a problem on Mac using Macports.

patricks-iMac:jedi-webby patrick$ sudo port install oniguruma4
patricks-iMac:jedi-webby patrick$ gem install ultraviolet -- -I/opt/local/lib
Building native extensions.  This could take a while...
ERROR:  Error installing ultraviolet:
    ERROR: Failed to build gem native extension.

/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/bin/ruby extconf.rb -I/opt/local/lib
checking for main() in -lonig... no
creating Makefile

make
gcc -I. -I. -I/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/universal-darwin10.0 -I. -D_XOPEN_SOURCE -D_DARWIN_C_SOURCE   -fno-common -Wall  -c oregexp.c
oregexp.c:2:23: error: oniguruma.h: No such file or directory

#Solution
patricks-iMac:jedi-webby patrick$ gem install ultraviolet -- --with-opt-dir=/opt/local

This is all for now.

Codify helper (with Ultraviolet): put inside your webby-root/lib

# require uv
if try_require 'uv'
require 'enumerator'

Loquacious.configuration_for(:webby) {
  desc <<-__
    Options for CodeRay syntax highlighting. See the CodeRay home page
    (http://coderay.rubychan.de/) for more information about the available
    options.
  __
  codify {
    desc 'The language being highlighted (given as a symbol).'
    lang :ruby

    desc 'The file you want to read instead of a string'
    lang :file

    desc 'Include line numbers in :table, :inline, :list or nil (no line numbers).'
    line_numbers nil

    desc 'Where to start line number counting.'
    line_number_start 1

    desc 'Make every N-th number appear bold.'
    bold_every 10

    desc 'Tabs will be converted into this number of space characters.'
    tab_width 8
  }
}

module Webby::Helpers
module CodifyHelper

  # The +codify+ method applies syntax highlighting to source code embedded
  # in a webpage. The CodeRay highlighting engine is used for the HTML
  # markup of the source code. The page sections to be highlighted are given
  # as blocks of text to the +coderay+ method.
  #
  # Options can be passed to the CodeRay engine via attributes in the
  # +codify+ method.
  #
  #    <% codify( :lang => "ruby", :line_numbers => "inline" ) do -%>
  #    # Initializer for the class.
  #    def initialize( string )
  #      @str = stirng
  #    end
  #    <% end -%>
  #    
  # The supported Codify options are the following:
  #
  #    :lang               : the language to highlight (ruby, c, html, ...)
  #    :file               : the file to highlight
  #    :line_numbers       : include line numbers in 'table', 'inline',
  #                          or 'list'
  #    :line_number_start  : where to start with line number counting
  #    :bold_every         : make every n-th number appear bold
  #    :tab_width          : convert tab characters to n spaces
  #
  
  def codify( *args, &block )
    opts = args.last.instance_of?(Hash) ? args.pop : {}

    parent=File.dirname(@page.path)
     file=opts[:file]
    text=""
    if (!file.nil?) 
      begin
        filename=parent+File::SEPARATOR+file
        file=File.open(filename,"rb")
        text=file.read
        syntax=guess_syntax(filename)
        if !syntax.nil? && opts[:lang].nil? then
          opts[:lang]=syntax
       end
      rescue
        puts "Error reading code file"+filename
      end
    else
      text = capture_erb(&block)
    end
    
    return if text.empty?

   defaults = { :lang => "ruby", :line_numbers => false , :theme => "mac_classic"}
   lang = opts.getopt(:lang, defaults[:lang]).to_s
   line_numbers = opts.getopt(:line_numbers, defaults[:line_numbers])
   theme = opts.getopt(:theme, defaults[:theme])

    lang=case lang
      when "shell","sh" then "shell-unix-generic"
      when "text" then :plain_text
      else
        lang
      end
    out = %Q{<div class="UltraViolet">\n}
    out << Uv.parse(text, "xhtml", lang.to_s, false,"pastels_on_dark")
    out << %Q{\n</div>}

    # put some guards around the output (specifically for textile)
    out = _guard(out)

    concat_erb(out, block.binding)
    return
  end

  def guess_syntax(filename)
    filetype=Uv.syntax_for_file(filename)[0][0]
    return filetype

  end

end  # module CodefyHelper

register(CodifyHelper)

end  # module Webby::Helpers
end  # try_require

# EOF

Codify helper (with Coderay): put inside your webby-root/lib

if try_require 'coderay'
require 'enumerator'

Loquacious.configuration_for(:webby) {
  desc <<-__
    Options for CodeRay syntax highlighting. See the CodeRay home page
    (http://coderay.rubychan.de/) for more information about the available
    options.
  __
  codify {
    desc 'The language being highlighted (given as a symbol).'
    lang :ruby

    desc 'The file you want to read instead of a string'
    lang :file

    desc 'Include line numbers in :table, :inline, :list or nil (no line numbers).'
    line_numbers nil

    desc 'Where to start line number counting.'
    line_number_start 1

    desc 'Make every N-th number appear bold.'
    bold_every 10

    desc 'Tabs will be converted into this number of space characters.'
    tab_width 8
  }
}

module Webby::Helpers
module CodifyHelper

  # The +codify+ method applies syntax highlighting to source code embedded
  # in a webpage. The CodeRay highlighting engine is used for the HTML
  # markup of the source code. The page sections to be highlighted are given
  # as blocks of text to the +coderay+ method.
  #
  # Options can be passed to the CodeRay engine via attributes in the
  # +codify+ method.
  #
  #    <% codify( :lang => "ruby", :line_numbers => "inline" ) do -%>
  #    # Initializer for the class.
  #    def initialize( string )
  #      @str = stirng
  #    end
  #    <% end -%>
  #    
  # The supported Codify options are the following:
  #
  #    :lang               : the language to highlight (ruby, c, html, ...)
  #    :file               : the file to highlight
  #    :line_numbers       : include line numbers in 'table', 'inline',
  #                          or 'list'
  #    :line_number_start  : where to start with line number counting
  #    :bold_every         : make every n-th number appear bold
  #    :tab_width          : convert tab characters to n spaces
  #
  
  def codify( *args, &block )
    opts = args.last.instance_of?(Hash) ? args.pop : {}

    parent=File.dirname(@page.path)
     file=opts[:file]
    text=""
    if (!file.nil?) 
      begin
        filename=parent+File::SEPARATOR+file
        file=File.open(filename,"rb")
        text=file.read
        syntax=guess_syntax(filename)
        if !syntax.nil? && opts[:lang].nil? then
          opts[:lang]=syntax
       end
      rescue
        puts "Error reading code file"+filename
      end
    else
      text = capture_erb(&block)
    end
    
    return if text.empty?

    defaults = ::Webby.site.coderay
    lang = opts.getopt(:lang, defaults.lang).to_sym

    cr_opts = {}
    %w(line_numbers       to_sym
       line_number_start  to_i
       bold_every         to_i
       tab_width          to_i).each_slice(2) do |key,convert|
      key = key.to_sym
      val = opts.getopt(key, defaults[key])
      next if val.nil?
      cr_opts[key] = val.send(convert)
    end

    #cr.swap(CodeRay.scan(text, lang).html(opts).div)
    out = %Q{<div class="CodeRay">\n<pre>}
    out << ::CodeRay.scan(text, lang).html(cr_opts)
    out << %Q{</pre>\n</div>}

    # put some guards around the output (specifically for textile)
    out = _guard(out)

    concat_erb(out, block.binding)
    return
  end

  def guess_syntax(filename)
    extension=case File.extname(filename).downcase
      when ".rb" then :ruby
      when ".html",".html" then :html
      when ".sh" then :sh
      when ".css" then :css
      when ".js" then :java_script
      when ".diff" then :diff
      when ".yaml" then :yaml
      when ".json" then :json
      when ".java" then :java
      when ".xml" then :xml
      when ".txt" then :plaintext
      when ".py" then :python
      when ".xml" then :xml
      when ".c" then :c
      when ".sql" then :sql
      else nil
    end
    return extension

  end

end  # module CodefyHelper

register(CodifyHelper)

end  # module Webby::Helpers
end  # try_require

# EOF

Rss feed file

---
title:     JEDI - Just Enough Developed Infrastructure
subtitle:  www.jedi.be
description: development, infrastructure and other stuff
site:      www.jedi.be
author:    Patrick Debois
email:     Patrick.Debois@jedi.be
extension: xml
layout:    nil
dirty:     true
filter:    
- erb
# <?xml-stylesheet type="text/css" media="screen" href="http://www.jedi.be/css/blueprint/screen.css"?>
---
<%- require 'rexml/document' -%>
<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/">
    <channel>
        <atom:link href="http://www.jedi.be/blog/feed/" rel="self" type="application/rss+xml" />
        <title><%= h(@page.title) %></title>
        <description><%= h(@page.description) %> </description>
        <link>http://<%= @page.site %>/blog</link>
        <pubDate><%= Time.now.gmtime.rfc822 %></pubDate>
        <generator>webby rss script</generator>
        <lastBuildDate><%= Time.now.gmtime.rfc822 %></lastBuildDate>
        <managingEditor><%= @page.email %> (<%= @page.author %>)</managingEditor>
        <webMaster><%= @page.email %> (<%= @page.author %>)</webMaster>
        <language>en</language>
     <%- @pages.find( :limit => 10,
        :in_directory => 'blog',
        :recursive => true,
        :sort_by => 'created_at',
        :reverse => true).each do |article|

        next if article.blog_post.nil?
    -%>
       <item>
            <title><%= h(article.title) %></title>
            <link>http://<%= @page.site %><%= article.url %></link>
            <% if article.guid.nil? %>
           <guid isPermaLink="false">http://<%= @page.site %><%= article.url%></guid>
            <% else %>
           <guid isPermaLink="false"><%= article.guid %></guid>
            <% end %>
           <pubDate><%= article.created_at.gmtime.rfc822 %></pubDate>
            <comments>http://<%= @page.site %><%= article.url %>#comments</comments>
            <description><%= normaltext=render(article); 
            normaltext.gsub!('<img src="/', '<img src="http://'+@page.site+'/')
            normaltext.gsub!("<img src='/", "<img src='http://"+@page.site+'/')

             h(normaltext) %></description>
        
            <%- if !article.keywords.nil? %>
               <%- article.keywords.each do |keyword| %>
                   <category><![CDATA[<%= keyword %>]]></category>
                <%- end %>
          <% else %>
                   <% if !article.tags.nil? %>
                   <%- article.tags.each do |tag| %>
                           <category><![CDATA[<%= tag %>]]></category>
                    <% end %>
               <% end %>
           <% end %>
           <content:encoded><![CDATA[<%= REXML::CData.new(render(article)) %>]]></content:encoded>
        </item>
    <% end %>
   </channel>
</rss>

Wordpress to webby conversion script

#!/usr/bin/env ruby
require 'rubygems'
require 'rfeedparser'
require 'pp'
require 'net/http'
require 'uri'

def write_feed(feed)
  pp feed.title
  pp feed.subtitle
  pp feed.links[0].href

  @site_url=feed.wp_base_blog_url
  #feed.wp_base_site_url
  #feed.updated
  #feed.generator
  #feed.updated_time
  #feed.updated_parsed
  #feed.wp_tag_name
  #feed.wp_tag_slug
  #feed.wp_cat_name
  #feed.wp_category_parent
  #feed.wp_category
  #feed.wp_wxr_version
end

def write_entry(e)
  #File.open(local_filename, 'w') {|f| f.write(doc) }
  
  base=e.link.gsub(@site_url+'/','').gsub(e.wp_post_name+'/', '')
  path=@webby_dir+"/"+base+'/'+e.wp_post_name
  FileUtils.mkdir_p path
  filename="index.html"
  @counter=@counter+1
  
  #write entries
  puts "[#{@counter}]creating "+path+"/"+filename
  File.open(path+"/"+filename, 'w') { |f| 
    f.write("---\n") 
    f.write("title: "+"\""+e.title+"\""+"\n")
    f.write("author: "+e.author+"\n")
    f.write("guid: "+e.guid+"\n")
    format="%Y-%m-%d %H:%M:%S.0 +00:00"
    #created_at: 2009-09-10 22:16:41.382708 +02:00
    f.write("created_at: "+ Time.parse(e.wp_post_date).strftime(format)+"\n")
    f.write("blog_post: "+ "true\n")
    f.write("filter:\n")
    f.write("  - erb\n")
    f.write("  - basepath\n")
#    f.write("  - tidy\n")

    tags=Hash.new
    categories=Hash.new
    rssterms=e.tags
    rssterms.each { |t|
      scheme=t['scheme']
      term=t['term']
      if scheme=='tag'
        tags[term]=''   
      end
      if scheme=='category'
        categories[term]=''   
      end
    }
    f.write("categories:\n")
    categories.keys.each { |c| 
      f.write("   - #{c}\n")
    }
    f.write("tags:\n")
    tags.keys.each { |c| 
      f.write("   - #{c}\n")
    }

    #f.write("  - maruku\n")
    f.write("---\n")
    #fixing empty lines to be a hard break
    f.write(e.content[0].value.gsub(/\n\n/,"\n<br>\n")) 
    
    #fixing URL's for the images and relative paths
  }
  #fixing directory urls
  
  #browsing subdirectories index.html?


  puts "---------------------------"
  
  
  return
  
  puts e.link

  puts counter.to_s+":"
  #e.wp_post_date
  pp e.wp_post_type
  pp e.title
  #e.summary_detail[0]
  #e.post_meta
  #e.wp_meta_key
  pp e.wp_post_name
  #e.wp_post_date_gmt
  pp e.author
  pp e.wp_post_id
  pp e.wp_status
  #e.wp_comment_status
  #e.summary
  #e.guidislink
  #e.title_detail[0]
  #e.wp_meta_value
  pp e.wp_post_parent
  #e.wp_ping_status
  #pp e.content[0].value
  pp e.content[0].type
  pp e.content[0].language
  pp e.links[0].href
  #e.links[0].rel
  #e.links[0].type
  pp e.link
  #e.wp_post_password
  #e.wp_menu_order
  #e.wp_post_id
  #e.excerpt_encoded
  pp e.updated
  #e.updated_time
  e.updated_parsed
  
end

def download_attachment(e)
  url=e.wp_attachment_url
  puts url
  base=@webby_dir+"/"+url.gsub(@site_url+'/','').gsub(e.wp_post_name+'/', '')
  directory=File.dirname(base)
  filename=File.basename(base)
  puts directory
  puts filename
  FileUtils.mkdir_p directory
  
  myURI = URI.parse(url)
  pp myURI
  Net::HTTP.start(myURI.host) { |http|
    resp = http.get(myURI.path)
    open(directory+'/'+filename, "wb") { |file|
      file.write(resp.body)
     }
  }
end

def write_entries(entries)
  @counter=0
  entries.each { |e|
    
    if e.wp_post_type == 'post' && e.wp_status == 'publish'

      write_entry(e)
  end

    if e.wp_post_type == 'attachment'
      #we should download it      
      download_attachment(e)
  end

  }
end

if ARGV.length < 2
  puts "Usage: wp2ruby <feed_url> <webby_dir>"
  exit
end

@feed_url=ARGV[0]
@webby_dir=ARGV[1]
@site_url=""

wpr=""
begin
  wpr=FeedParser.parse(@feed_url)
rescue
  puts "Error parsing feed"+$!
  exit
end

write_feed(wpr.feed)
write_entries(wpr.entries)