csammisrun

A rare situation

Blast from the past

with 2 comments

Extremely nerdy post ahoy!

The new version of the .NET Framework will be coming out relatively soon, and I spied this little gem on the BCL Team Blog: ObservableCollection<T> is moving into System.dll so implementers can avoid a dependency on WPF.

That would have been nice about a year and a half ago :(

Written by Chris

October 22nd, 2009 at 4:02 pm

Posted in Development,shaim

We’ve set a date

with 2 comments

Courtney and I are getting married on October 9, 2010. Mark your calendars!

Written by Chris

October 16th, 2009 at 4:59 pm

Posted in General

Surprise, I didn’t forget about this project!

without comments

Way back in April I talked about a new programming project: extracting and analyzing text from the Codex Seraphinianus. It’s been several months since any updates. Progress has been really sporadic (life happens), but I can now present the next step in my silly past-time.

In my last post I mentioned that the pixel connection algorithm wasn’t perfectly suited to isolating words. One of the problems was that the scan quality produced a lot of “broken” text – cursive lines don’t quite line up and sometimes a single word is carved up into multiple regions. I’ve implemented morphological functions to “close” words and produce better connections, but I haven’t taken the time to sit down and tune the process so that it doesn’t accidentally turn fine loops into dense blocks of black. The other big problem was that the process didn’t group diacritics with the modified word, and it is the solution to this that I’d like to share.

First, let’s understand how the connected components algorithm outputs its information. It starts with an image that looks like this:

The algorithm starts at the upper-left and works down and right, numbering the regions as it encounters them (usually, more on this later):

The challenge now is to attach the diacritic, region 2, to the larger word represented by region 1. To do this, each region gets a bounding rectangle…

…and the following algorithm is applied to all N regions:

for each region X between 1 .. N:
  for each region Y between X + 1 .. N:
    if region Y is fully contained by region X:
      region X consumes region Y

In the example image, region 2 is indeed fully contained by region 1, so it becomes part of region 1. This is good for the Codex script, most diacritics appear close to the letter it modifies and usually within the boundaries of the full word. Applying this procedure to the first page of the Codex yields some pretty good results.


Click for full-size

Still not perfect, though. You can see that there are some diacritics that are clearly fully contained by the parent word, but they aren’t consumed by the word.

It turns out that there are some pathological cases in region numbering. The connected region algorithm does not always correct number regions in a top-down and left-right manner, so the enclosure algorithm listed above doesn’t catch everything. Let’s say that the example image is one of these cases:

Now region 1 doesn’t enclose region 2, and since the algorithm only counts up (for efficiency’s sake), the diacritic isn’t grouped in with the word.

It’s not a great situation, but there’s a fix. Rather than mess around with the connection algorithm and trying to figure out the numbering sequence and where it breaks down – even for a small image, the numbers start getting pretty hard to keep track of in one’s head or on paper – I added an extra step when the bounding rectangles are calculated. Working again in a top-down/left-right fashion, I simply renumber the bounding rectangles as they’re encountered. This fixes the misnumbered cases and doesn’t add complexity for the normal cases, and the result is much better:


Click for full-size

We’re still not at 100% diacritic capture – if you look closely, there are three diacritics that aren’t included with the word that they’re intended to be included with (left of the first word, right of the fourth word, right of the last word). This isn’t another pathological case or anything, those marks actually do not fall completely within the bounding rectangle of their word. Oh well. It’s time to move on to other things – frankly, if I get to the point where I’m statistically identifying language features and a couple missing diacritics make a huge difference, I can go back and tweak things then :)

Written by Chris

October 15th, 2009 at 9:44 am

Posted in Development,General

The first harvest from our tomato plant

without comments

tiny tomato

Damn squirrels, eating all the good ones before we thought to put up a net…

Written by Chris

September 13th, 2009 at 1:29 pm

Posted in General

I don’t have a girlfriend anymore

with one comment

…because I have a fiancee!

DSC03452
Courtney’s ring – green sapphire, 3/4 carat brilliant round, in a white gold setting.

More pictures on Flickr.

Written by Chris

August 9th, 2009 at 12:57 pm

Posted in General,Photography

Some pictures from around the house

without comments

We got new living room furniture!

DSC03367
Bookcase, coffee table, loveseat. The big bushy plant on the left of the bookcase is a prayer plant, the smaller one is a dwarf umbrella tree.

DSC03344
The loveseat and dining room table. Also pictured: French horn and spider plant.

DSC03325
The couch and a small palm tree (I think) on the wall facing the bookcase.

DSC03351
Our new vacuum (thanks Bed Bath and Beyond gift card contributors!) and one of the reasons for it.

Written by Chris

August 3rd, 2009 at 8:57 am

Posted in General,Photography

Upping the Nerdiness Ante

without comments

I was working on some database stuff today when I had the greatest (read: dorkiest) idea for an avatar / custom text combo:
 
 

Chris
A Big Nerd
Posts: too many

 
E. F. Codd
I smoke two JOINs in the morning

 

Written by Chris

July 21st, 2009 at 2:39 pm

Posted in Development,General

Observations on the Unmentionable Acts of g. bear haribo

with one comment

From the co-respondence of C. B. Sammis, this Twenty-Ninth Day of June, Two Thousand and Nine:

To my colleagues at the Ministry of Gummi Studies:

First let it be known that, though the subject of this letter is most lascivious, I do not apologize for writing it. Those who are familiar with the corpus of publications under My Own Name are by necessity also familiar with the purpose of my life’s labors: to catalog thoroughly and without exception the behaviors of g. bear so that the public, yearning to hear of the wonders of our Science, may be edified and enlightened. It is noted among learned men that I do not shy away from any topic which advances the accomplishment of this ideology! This having been said I am sure that my closest friend, the gentleman-bastard A. W. Haasbeck (who is no doubt already in a drunkerous stupor even at this early time of postal delivery), will derive much puerile humor from the contents of my upcoming publication. Gentlemen, I present early notice of my findings so that Mr. Haasbeck’s head may be covered with a sack for some time following the release of this letter. It would not serve the interests of this Ministry to have him capering about and hooting lewdly, a situation which the following discovery is sure to provoke.

Reproductive mechanisms of g. bear haribo
I recently accompanied the lovely C. A. Hansen on an expedition by motorcar to her own Ministry, located in a neighboring state. The purpose of the expedition was to increase and augment Ms. Hansen’s already vast knowledge of her Science, and I found myself wandering the paved streets for hours while this admirable goal was underway.

Immersed in a work of literature, I found myself desirous of a coffeed beverage and so made my way to a store dedicated to the sale of foreign delicacies. The vendor of same, gibbering out of his mind with liberal notions on the nature of payment for services rendered, caused me to redouble my attentions toward the printed word. So vigorous was my reading that I scarcely noticed a collection of cylindrical casings enclosed within a glass container until I had almost passed them. It is indeed fortunate that the zeal of my research into the genera gummi has given me what the layman might call a “sixth sense,” for at the last moment I turned to examine the vessel with all the attention to detail known to exist in my publications.

I could not ascertain at once what lay before me; they appeared as shapeless discs no larger than the eye of a seeing-hound. A swift investigation revealed that the composition of these discs was the same as that of a g. bear species, specifically haribo! I abandoned my reading, paid the vendor for his wares – here I must request that the Ministry promptly investigate the man for Contra-band Sale of gummi, Disreputably Obtained – and sequestered myself and the curios in a private cubicle nearby.


A photo-graph of the unfamiliar discs.

It was not long before I began to consider that the discs represented a prototypical member of the g. bear species. Flustered though I was by the indecent path that my increasingly strong hunches were following and though my writing arm pains me still, I produced from memory a rough sketch of a haribo.


The sketch of haribo, included here for posterity.

Examine the attached sketch, gentlemen, or retrieve a detailed lithograph from the archivists, and see here what I have seen – and do not look away or scrub at your eyes! – see that there are no re-productive ducts as exist in a red blooded beast. As the evidence of our own surveys proves that there are increasingly many members of the g. bear species, I am forced therefore to conclude and report that the mysterious discs are in fact eggs!

Rest assured that I stowed the vulgar drawing and disposed of the eggs before Ms. Hansen returned from her studies, lest her thoughts be made less than pure by the wanton depiction I was obliged to create. Let me forestall the objections of R. Y. Ginns, the government’s Chief Examiner of Gummi Propriety: yes, I was indeed obliged to create the enclosed sketch in pursuit of the truth! The subject of the research may not be the most chaste, but I maintain nothing less than utmost professionalism and exacting precision regarding the studies conducted under My Own Name.

And here’s my marque to prove it. Gentlemen, I remain:

C. B. Sammis

Written by Chris

July 9th, 2009 at 7:13 pm

Posted in General

csammisrun – now in stunning HD!

with 3 comments

Written by Chris

June 21st, 2009 at 2:56 pm

Posted in General

Ba-dum-bum!

with 3 comments

Ever heard of the “Anal Game”? Yes, it sounds like a terrible pickup line, but also it’s a keen new driving game a la License Plate Bingo (not really). Courtney’s friend told her about this: You take the model of a car that you see and put the word “anal” in front of it. Nine times out of ten, this results in lollin’. Take for example the Nissan Pathfinder – “Anal Pathfinder” yields some reasonable guffaws.

Tonight Courtney and I were driving around Champaign and marveling at how great Isuzu seems to be for this game (Amigo, Rodeo, Trooper, Ascender, and Hombre being the ones we listed). I said that Courtney’s car, a Honda Accord, seemed rather regal-sounding.

Courtney: “There’s a Honda Prelude, too. That one’s kinda funny.”
Chris: “Heh, yup.”
Courtney: “It doesn’t make much sense though, because I don’t know what it’s a prelude to.”
Chris: “Yeah, anal seems like more of an endgame.”

It’s a good thing we were pulling into a parking lot, because we were close to tears with laughter.

This post brought to you by the Illinois Commission for Chris Blogging More Often!

Written by Chris

June 14th, 2009 at 7:36 pm

Posted in General