Software to detect when words rhyme

Posted by Dave Bryson on August 25, 2007

The other night my daughter was telling me about some interesting facts she read in one of her school books. One fact in particular caught my interest:

Orange and Silver are the only two words in the (American)English language that do not ryhme with any other word

My first thought was “How the hell did they figure this out? Did someone go through the entire dictionary and test each word?” Then I thought, “you can probably automate this task with software; but how? What makes two words rhyme?” To me, a rhyme is when two words sound the same. But maybe there is an obscure rule in Grammar I can use to programmatically test a set of words to see if they rhyme?

A Google search turned up this information on using phonetics to detect rhyme The blog talks about using the International Phonetic Alphabet (IPA) to translate words into their phonetic equivalent and then inspecting the words for a match. Check out this example

Aha! I’m getting close. Now if I could just capture the IPA in code and use it to translate words on the fly I’d have the next killa app. However, there’s just one problem. After grabbing the IPA chart it was obvious the translation is based on how something sounds. Even for a human it appears extremely difficult considering accent, dialect, and other factors.

So it seems ( at least right now ) it’d be nearly impossible to write software to detect rhyme.

Create a ShapeFile with Ruby

Posted by Dave Bryson on August 20, 2007

Here’s a quick snippet on how to install and create a ShapeFile from data in your Model.

The setup

  1. Download and Install ShapeLib. Make sure to note where the install puts the libshp.so and the shapefil.h (you may need that information later).
  2. Download ruby-shapelib
  3. Unzip ruby-shapelib and run: “ruby ./extconfig.rb”. Depending where step 1 put your files,you made to alter some options you pass to this program. Specifically—with-shapelib-dir and—with-shapelib-include
  4. Once you’ve done that, make sure everything is working right with irb:
$ irb
>> require 'shapelib'
=> true

If you get “true”, you’re good to go.

Simple example

Let’s say we have a table called markers, with the fields lat (float), lng (float), and created_at (datetime). We want to create a shapefile for the points and also collect the time (created_at) as an attribute in the shapefile.

require 'shapelib'

# Create a shapefile from an array of markers
def make_shapefile(markers)
  # Create the shapefile.
  # First argument: is the name of the file to create
  # Second: The shapefile type
  # Third: An array of array(s) describing the attribute (name, type, size)
  fp = ShapeLib::ShapeFile::new("test1.shp",:Point, [['date', :String, 32]])

  # Loop over the markers
  markers.each do |m|
    point = ShapeLib::Point::new(m.lng,m.lat,{"date" => m.created_at})
    fp.write point
  end

  fp.close
end

# try it out...
make_shapefile( Marker.find(:all) )

If all is working right, you should end up with 3 files: test1.shp, test1.shx, and test1.dbf

If you don’t have one already, here’s a nice open-source application to tinker with your new shapefiles: qgis

Scrape the Wayback machine with this little script

Posted by Dave Bryson on August 17, 2007

Here’s a little script I use to scrape archived pages from the Alexa Wayback Machine . Basically, it works like this:

  1. Query Alexa for an old URL you’re looking for and the Years you’re interested in
  2. Use Hpricot to look in the results for links to archived pages. The pattern is http://web.archive.org/web/200301../url. Where the number is the timestamp and the url on the end is the old page you’re looking for. Return and array of successful matches
  3. Loop over the results of above and download the pages locally using curl (you could also use wget)
  4. Save the pages with the name “archive_timestamp.html”

Here’s the code:

require 'hpricot'
require 'open-uri'

urls = %w[http://sample.com http://sample2.com ...]
years = %w[2002 2003 2004]

# Search Alexa for the following URLS and Years
# extract the relevent links from the search result pages
def extract_links_from_search(search_urls=[],years=[])
  results = []
  search_urls.each do |u|
    years.each do |y|
      search_alexa = "http://web.archive.org/web/#{y}*/#{u}"
      doc = Hpricot(open(search_alexa))
      (doc/:a).each do |link|
        ul = link.attributes['href']
        # Search result pages have the following url, followed
        # by the timestamp (20030313094512)
        # followed by the search url
        if ul =~ /http://web.archive.org/web/d+/http:/
          results << ul
        end
      end
    end
  end
  results
end

def download_and_store_pages(results=[])
  results.each do |url|
    #Create a file name based on the Timestamp
    fn =  "archive_#{$&}.html" if url =~ /d+/
    puts "Saving as: #{fn}"
    `curl #{url} -o #{fn}`
  end
end

outp = extract_links_from_search(urls,years)
puts "Getting the data"
download_and_store_pages(outp)

This is quick and dirty and took about 10 minutes to write. It could probably be simplified, but it does the job for me.

Modify the XML output from your Model

Posted by Dave Bryson on August 16, 2007

So your app needs to generate XML. No problem, ActiveRecord gives you it for free. Simply call mymodel.to_xml and your done. But what happens if you need to generate more complicated…specialized XML? There are a few options:

  1. Don’t call to_xml, generate the XML using a template (.rxml)
  2. Override the to_xml method. As mentioned in the docs
  3. Create a separate method for generating the XML

To keep things simple, and for reasons we’ll see later, let’s use 3.

The example

Ok. I have a model, Car, with 3 attributes (year,make,model). Here’s what the default XML looks like:

car = Car.find(:first)
car.to_xml
=>

<car>
 <make>Nissan</make>
 <model>Pickup</make>
 <year>1995</year>
</car>

So let’s customize the XML to add a namespace for the elements and change the tag type. In the Car model we’ll create a new method called my_xml instead of overriding to_xml:

def my_xml(options={})
  options[:indent] ||= 2
  xml = options[:builder] ||= Builder::XmlMarkup.new(:indent => options[:indent])
  xml.instruct! unless options[:skip_instruct]
  xml.mycar(:Vehicle, "xmlns:mycar" => "http://crazystuff.org/car/ns") do
    xml.mycar(:make, self.make)
    xml.mycar(:model, self.model)
    xml.mycar(:year, self.year)
  end
end

The Model uses the Builder library for creating the XML. That was easy. Now when I call car.my_xml I get this:

<?xml version="1.0" encoding="UTF-8"?>
<mycar:Vehicle xmlns:mycar="http://crazystuff.org/car/ns">
  <mycar:make>Nissan</mycar:make>
  <mycar:model>Pickup</mycar:model>
  <mycar:year>1995</mycar:year>
</mycar:Vehicle>

Perfect! Now let’s try and query all Cars and see what we get:

all_cars = Car.find(:all)
all_cars.to_xml
=>
NoMethodError: undefined method 'my_xml_' for #<Array:0x1379810>

What the *$%@! That’s not right. Calling Car.find(:all) returns an Array. Array doesn’t have a method my_xml.

But how does Rails do it? If “all_cars” is an Array, then Array within Rails must support the to_xml method. As it turns out Rails adds some tricks to some of the core pieces of the Ruby language. Of interest to us right now is the module ActiveSupport::CoreExtensions::Array::Conversions. It defines a to_xml method that is a mixin for the Array class.

We could open up the Module and change it. Or we could just create our own method and include it into Array. Let’s do something like that:

module MyConversion
  def my_xml
    options[:builder]  ||= Builder::XmlMarkup.new(:indent => options[:indent])
    # TODO: Move the xmlns from Vehicle to here...
    options[:builder].tag!("mycar:AllVehicles") do
      # Here's we loop over each entry (model) and call it's my_xml
      each { |e| e.my_xml(options.merge!({ :skip_instruct => true })) }
    end
  end
end

# Don't forget to do this!
class Array
  include MyConversion
end

Ok. That’s it. Now when we call my_xml regardless of whether it’s a Array or a single object it works as expected.

Have a look around in the ActiveSupport Core Ext. There’s a lot to learn there.

Clean all .svn or cvs diretories from your source

Posted by Dave Bryson on August 16, 2007

You need or want to import your code into a new svn or cvs repository. But, the source code is filled with CVS/.svn folders from an old repository. Here’s a quick way to do it (Unix):

find . -type d -name CVS | xargs rm -rf