Wednesday, January 29, 2014

OMG It Worked!

So what did I do wrong? :)  I'm half-way between the two points on this comic:



Here is part one of the second iteration of my word count script/mini program:

word_count = {}
File.open('test_input.txt', 'r') do |f1|
  while line = f1.gets
    words = line.split(" ")

    words.each do |word|
      word = word.downcase
      if !word_count.has_key?(word) #word not in hash?
        word_count[word] = 1        #add it and count it
      else
        word_count[word] += 1       #only increment count
      end                           #if it's there
    end
  end
end


Next up, I need to figure out how to feed it html pages. I could just feed it straight up html files, but I ultimately want to word count blogs and such. On to the next!

Sometime later, we rejoin our hero:

So I added a line to allow me to alternatively feed the wc any file I like via the command line:
File.open( ARGV[0]? ARGV[0] : 'default.txt', 'r' ) do
I'm using ARGV[0] ? to check to see if there are any arguments sent along with the request to run the script. If there are, we use them. If not, we use my creatively-named 'default.txt' so that we don't blow a gasket and throw an error for not having anything to work with. I snagged some HTML from a random website and fed it to the wc program.

Next up: parsing out the tags so that all we're left with is the actual content of the site. After that, I need to figure out how to get the generated HTML in the first place. I've heard of screen scrapers (and usually not in a positive way) but I think that's what I need to build here. Ultimately, I would like to give this little program the urls for two different websites and have it compare the two. I'm a long way from there, but it's nice to have a goal. :)

No comments:

Post a Comment

Comments? Questions? Complaints? Coladas?