Thursday, July 29, 2010

I PRAY2 license plate spotted in Delaware

I spotted this I PRAY2 license plate in Delaware. The I and the P are separated by one of those donate to the police stickers. Do thise things really get you out of tickets?

I suppose I PRAY ALSO was too long to fit.

Tuesday, July 27, 2010

20th Anniversary Caddyshack quiz from Mental Floss

Back in the early 90's, I once had the honor of playing golf at Rolling Hills Country Club and Golf Course in Ft. Lauderdale where Caddyshack was filmed. That experience didn't give me any extra information I might have used to do better on the Caddyshack 20th anniversary quiz from Mental Floss.

I got 67%, 6 out of 9, how well can you do?

Good Luck.

Sussex county tax scofflaws plotted and extended to $12 million

Sussex county in Delaware published their top 100 tax delinquents. The .pdf is available from the News Journal. pdf's are hard to use so I put the data into a Google Doc so that anyone can run calculations on it. Kudos to the News Journal and Sussex county for actually providing the data to the public.

First the graphs. The plot below shows amount owed by the tax delinquent plotted as a function of the rank for the top 100 tax delinquents in Sussex county according to their records.

The decay of the amount owed vs. the rank suggests plotting the data on a log-log scale (click for larger).

The plot of the data at higher ranks is a straight line. This dependence is characteristic of a power law function for amount owed vs. rank. This is typical for this kind of data. This plot is similar to a Zipf plot of ln(frequency) vs. ln(rank order), where we have used the amount owed instead of frequency.

Notice that that the first few items have higher values than a simple power law, the $80,000 of rank 1 and even the $30,000 of rank 7. Sometimes the tails of these types of distributions are called the "long tail" Most of the total can be in this tail, yet it is spread among many categories. I attempted three fits of the data. The first in blue includes the early parts of the data that are not really a power law, so this fit is not very good. The next two fits are for the data above rank 10, magenta, and then above rank 62, green, where the data is better suited to a power law fit.

The article reveals the total amount of tax that is delinquent for the top 100 and for the total for Sussex county:
"The list unveiled Tuesday includes developers, estates, homeowners and farmers who collectively owe about $1.6 million. The county collects taxes on its behalf and for all Sussex school districts. About $12 million is owed in total."
We can use the information from the article to figure out how many more tax delinquents there are in Sussex country by using the fit to the power law portion of the curve and extending it out to higher ranks until the total amount is the $12 million total quoted above. The chart below shows two fits, fit 1 for the portion of the the curve greater than rank 10 and fit 2 for the portion of the curve greater that rank 62, which are then extended to higher ranks.

The two curves are very similar in the tail portion and fit one yields 1829 tax delinquents and fit two yields 1717 tax delinquents to add up to $12 million of overdue taxes, with the lower ranks owing about $4000. This number is based on the assumptions above and is dependent on the curve fit and the extrapolation of the power law fit to these higher values. It is really best done with more orders of magnitude than with the data we have here. If Sussex county publishes their top 500 delinquents we might be able to check this work.

Friday, July 23, 2010

Fascinating archeological finds left by the flooding Shellpot Creek

I finally got around to cleaning up some of the junk (archaeological finds) floated in the backyard by the rising and then receding waters of the Shellpot Creek. There was a ton of styrofoam that I wanted to get what I could of, and an assortment of other interesting things.

A sampling of the most strange things I collected is shown in the art project photo above.
  • Enough styrofoam to create my own garbage pile in the Pacific gyre. Please carefully dispose of styrofoam in the trash or recycle in the entire Shellpot Creek watershed (.pdf link) as a favor to me.
  • Plastic soda and water bottles, a scourge on the landscape if ever there was one.
  • Several sizes of containers which once carried alcohol.
  • A piece of a pool noodle
  • Foam padding for a helmet, I can only imagine when the helmet will float by.
  • And eroded sole of a flip flop of some sort.
  • A fisher price bat. I am keeping that one for Linus.
  • A light bulb
  • A scoured clean tennis ball
  • A Christmas ornament

The only reason the Christmas ball survived is that it is plastic, and in fact the side away from the sun remained blue while the exposed side faded to silver.

There was larger stuff in the creek.

The piece of siding and the bucket lids were actually down in the creek, but they were so obvious I had to go get them and throw them in the trash, before they were carried away in the next storm.

Shellpot Creek Watershed.

Tuesday, July 20, 2010

Homegrown yellow pear and grape tomato and onion salad

I finally got those yellow pear tomatoes and grape tomotoes into a salad with onions and while balsamic vinegar and olive oil. The tomatoes and possibly the onion is from our own garden. Homegrown delicious.

Monday, July 19, 2010


Tons of yellow pear tomatoes

and grape tomatoes from the tomato plants this season this far.

I am not sure how many more yellow pear tomatoes I will get as that plant looks wilted and dying, but I am not sure from what. We have had a long heat wave followed by a lot of rain, and while we kept watering the garden at least during the heat, this one tomato plant out of the four doesn't seem happy at all.

Tonight, a yellow pear and grape tomato and onion salad with white balsamic vinegar and olive oil.

Who do you write like? Submit your writing sample.

Violins and Starships points to a website that will take your text, perform some machinations on it and determine which author your writing is the most like. I tried some samples and the author it suggests does seem to depend on which text I use. Most of the comments I read seem to be people who have posted work from their short stories, novels and space operas that they are writing. Unfortunately all my writing is here so the blog is what gets analyzed.

This post on my chances of winning playoff fantasy football and my first post ever, as well as others from that month, and a lot of other posts, yield Cory Doctorow as the author that my writing most resembles.

These mundane posts on how to drill glass tile and where to recycle styrofoam packaging yield

This blue hen round up post as well as this one yield

So I am mostly like Cory Doctorow with disturbing input from Chuck Palahniuk and H.P. Lovecraft. The input from David Foster Wallace is not so scary.

(via Violins and Starships, via twitter, via Byzantium's Shores)

Saturday, July 17, 2010

Number unemployed vs. unemployment rate

While collecting the unemployment data I also ran some reports on the time that people are unemployed. The data is available showing up to 5 weeks, 5 to 14 weeks, 15 to 26 weeks and longer than 26 weeks. One of the signs of the depth of this recession is not just the number of people unemployed but also the length of time they have been unemployed. The charts below show the number of people unemployed for a given number of weeks in a bar chart which builds the categories on top of each other. I have overlaid the rate of unemployment which corresponds to the right axis. (click for larger)

The chart below just shows the faction in the various unemployment categories adding up to 100%, instead of the actual number. (click for larger)

If I have time, I want to overlay dates of unemployment extensions and see if there is any correlation. I still don't think that would be completely relevant because it is obvious that if an extension was passed then the time people receive unemployment insurance will be longer. Will I be able to see whether this in fact means they stayed unemployed longer or do extensions occur when Congress sees that people are already unemployed longer and the extension is to address that issue?

(Once again the data is available in this spreadsheet on Google Docs. - How Long unemployed vs rate unemployed )

Friday, July 16, 2010

Recreating Laffer's unemployment benefits payout vs. unemployment rate chart but not his ridiculous conclusions

Howard points to an excellent refutation of the ridiculous claims by Laffer that unemployment benefits drive unemployment. The chart from the opinion page clearly shows a lag of 10 months to a year of the peak in benefits payouts vs. unemployment rate peaks. Perhaps it is unemployment which drives unemployment benefits.

One thing that drives me nuts on these charts is when the presenter doesn't cite the data. I would make it a requirement that for any charts like these that you must publish your work and the data you used, but as a minimum the presenter should cite the references. Laffer cites as his source Laffer associates, but I guarantee you they didn't generate the data. References for the data are given below. I have recreated the chart below (click for large size).

The unemployment rate is the straight data from the Bureau of Labor Statistics, seasonally adjusted, while the benefits paid is the average weeks on unemployment multiplied by the average weekly benefits per person and is from the Dept of Labor source below, but the key is that it needed to be adjusted to 2010 dollars. Perhaps Mr. Laffer could have included that in his description.

Once I recreated the chart and data, it is simple to see that the peaks in the benefits paid occur from 9 to 13 months after the peak in unemployment rate. Even the most egregious examples of correlation being mistaken for causation have the cause preceding the effect, unlike here. I am sure someone even more mathematically inclined can run some sort of time series correlation with the data, now that I have provided it.

Data sources and links follow:

Unemployment statistics from the Bureau of Labor Statistics. Use this form to enter in your choices and you can get an excel file of the data. I only changed the Labor force status to Unemployed or to Unemployed rate and checked the monthly data and seasonally adjusted boxes to generate my files for actual number unemployed and rate of unemployment. You may need to adjust the years on the next page. Not used but you can get the duration of unemployment benefits using this page and checking the appropriate boxes.

Unemployment claim amounts paid can be found under the on the program statistics page, which leads to this form from the United States Department of Labor. You can uses this calculator to get the yearly adjustments for inflation. I actually used the tables from here for CF adjusted to 2007, and the calculator to adjust 2007 to 2010 (1.05).

Data to create the charts above is in this google docs spreadsheet - Unemployment statistics 1971 to 2010.

Wednesday, July 14, 2010

Overflowing Shellpot Creek video and photos from todays monsoon

For the first time since we have lived in this house, Shellpot Creek lept its banks and flooded portions of the backyard.

The waterfall is an almost flat raging, rushing torrent.

A view of the flooded backyard from an upstairs window.

Just as dramatic was that it couldn't drain fast enough in front of the house.

The creek flows from the cul-de-sac...

...down the street.

Technically, for a short period this afternoon, our house was on an island in the middle of Shellpot creek as the creek flowed down Stoney Creek Lane in front of the house and joined the main branch further upstream, even as it flooded from its usual location in the back of the house.

Here is a video showing the creek first beginning to leave its banks.

Here is a video showing the height of the flooding of the backyard.

There was really no damage except some lost mulch that had just been put down, perhaps some plants were swept away and the regular creek junk left in our yard instead of high in the creek bed. You can see the creek is so high that the waves from the creek just crash across the yard. I also watched as large logs and debris from the creek floated through my yard and under the fence to the neighbors (and hopefully back to the creek so they don't have to deal with it).

It was a dramatic flooding event, but the house suffered no damage whatsoever since it sits high on the lot.

UPDATE: Charts from the the USGS stream gage on Shellpot Creek. This gage is downstream from my house, but since the water at my house, plus some more goes through the gage, it is a useful, official measurement of the creek flow.

The discharge in cubic feet per second. You can see July 13th's morning rain, July 14th's morning rain, and the high peak of the flood recorded in this post. Note that the scale is a log scale, so the peak at 3100 cuft/s at 2:35pm on 7-14-2010 is 50% more than the 2130 cuft/s at 6:45am on 7-13-2010.

The gage height. Gage height is a linear scale.

At least now I know that ~3000 cu ft/ second and a gage height of ~7.5 ft means water in the backyard.

Highest Shellpot Creek has ever been.


Technically, with this flooding, my house is on an island in the middle of Shellpot Creek because the creek has overflowed its banks and is going down the road in front of the house as well as the turbulent mess behind the house.

This video is of the waterfall.

Phylogentic Trees and Language

Recent searches led me to several interesting articles by Mark Pagel and co-authors on the evolution of languages. These links may be behind Science or Nature paywalls or subscriptions, so I apologize if you are unable to read them.

The basic point is that:

1.) Researchers are using phylogenetic trees to classify the relatedness of languages and their divergence from earlier languages. This is an old idea. The link above is to a review of recent work using the justifying using these tools due to the genetic-like properties of language.

2.) The proposal is that languages evolve in punctuated bursts, not gradually. Perhaps because populations becames seperated physically, Perhaps because groups wanted to differentiate themselves culturally.

Languages Evolve in Punctuational Bursts by Quentin D. Atkinson, Andrew Meade, Chris Venditti, Simon J. Greenhill, Mark Pagel; Science, 1 February 2008: Vol. 319. no. 5863, p. 588, DOI: 10.1126/science.1149683

Human language as a culturally transmitted replicator by Mark Pagel, Nature Reviews Genetics 10, 405-415 (June 2009) | doi:10.1038/nrg2560

Tuesday, July 13, 2010

More video of the raging Shellpot Creek


Watch the large branch go under just above the waterfall at the beginning of the video. There were many large logs going by. Just as long as they keep going by and don't get stuck on my part of the creek.

Video of Raging Shellpot Creek


The creek is roaring today.

Shellpot Creek is a raging torrent this morning

Shellpot Creek is pretty full this morning after a long downpour with thunder and lightning. It has been dry so long and even the few rain breaks hadn't really filled the creek to the brim this summer yet. It is interesting to see it be wild like this Video to come.

Monday, July 12, 2010

I missed a solar eclipse on Easter Island

Apparently there was a total eclipse Sunday that was only visible across large swaths of the uninhabitated Spouth Pacific (yahoo coverage). It was visible for Chile, and Tahiti and Easter Island. I have been to Easter Island! Given how tiny that island is it is a surprise that an eclipse would catch it, though I suppose almost everywhere on the globe has them sometime.

With the baby and the expense and so many other places to travel to we wouldn't have gone back to see it, but a solar eclipse and Moai in the same shot would have been awesome.

I am holding out for the United States eclipses of August 21st 2017, and April 8th 2024. They will be close together and there is a spot in Southern Illinois near Kentucky including Carbondale, Cape Girardeau, Missouri, and Paducah, Kentucky which will experience both solar eclipses, two in seven years.

Monster Truck on I95 in Delaware

Yesterday we were driving back home from Norristown down I476 to I95 when we spotted a unusually tall pickup truck in front of us.

It was a monster truck driving with us on the road.

It was driving next to us on I95 from Norristown to at least the Marsh road exit just south of the Delaware state line where we got off. This vehicle was just a little pickup truck on huge tires and a suspension to go with them. The tires and modifications made the vehicle literally twice as tall as our Prius.

We got a closeup of one of the tires.

Besides the silliness of seeing a monster truck on the road was the advertising on the side for Estep Electrical Services. You know the "owner" of the truck "uses" it for work and the expense of those crazy modifications were written off of their taxes.

Thursday, July 08, 2010

Psychological studies skewed by WEIRDness

At Carnegie Mellon I remember that a prerequisite for my psychology classes was to participate in a certain number of psychology experiments. We were the free subjects for the Carnegie Mellon Psychology department. Our department was not unique in this practice and Heinrich Joseph, Stephen Heine, and Ara Norenzayan from the University of British Columbia have a review paper in Behavioral and Brain Science says that the overwhelming number of psychology studies in the literature use undergraduates from the United States and Europe in their studies. The problem with that is that these undergraduates are unique in the world, typically being, Western, educated, industrialized, rich and democratic and that these WEIRDos are not representative of humanity as a whole.

The typical example of test that yields startling different results for the US and Europe vs. other countries on the world is the classic Müller-Lyer illusion where the test subject must decide which of the lines is longer, Americans and Europeans typically think that the line with the arrows pointing out is shorter, but this is not a typical answer around the world. The chart below shows the increased length of the "short" line required to make the lines seem similar. People in industrialized nations (to the right on the chart below) show the most effect.

The review paper discusses a wide range of psychology experiments designed to explore the free-rider problem and punishment from economics, through language development and family based reasoning (clumped by function - notebook-pencil) vs. rule based reasoning (notebook-magazine), to other social psychology problems.

The authors conclusions include that generalizing experimental results from WEIRD test subjects to the world population as a whole may yield very incorrect generalizations.

In the open peer commentary to the paper (found at the end of the paper at this .pdf link), other psychology researchers propose other reasons and commentary for and on the presumed outlier nature of WEIRD subjects in psychology experiments vs. the rest of the world. Some highlights are briefly listed below.

  • One group proposes that the experiments are weird, not the subjects.

  • Another suggests the variations within populations are not taken into account correctly.

  • Some argue that brain scientists are just as guilty as psychologists in using WEIRD subjects, and that generalizing results from a narrow sample of WEIRD brain subjects to the entire species is also unwise.

  • Many provide suggestions in reaching a wider sample pool. Although the one suggestion to use the Internet will then select for a worldwide population with access to the Internet - that may also be an unusual population The authors address this in their response to the commentary.

  • Others caution against generalization in animal studies and coin their own acronymical description of chimpanzees in animal experiments: Barren, Institutional, Zoo, And other Rare Rearing Environments (BIZARRE) chimpanzees. This is only one of several acronyms coined in the commentary (WRONG - When Researchers Overlook uNderlying Genotypes, ODD - Observation- and Description-Deprived psychological research)

  • Some point out that it is not just the subjects, but there are also too many WEIRD researchers.

  • One points out that the WEIRD subjects are a harbinger of what the world is becoming and wants more research on all of the variations in human experience before we become culturally homogenized. One of the more hopeful comments I thought, though it implies a trade-off between prosperity and diversity.
I suggest that you read the review and the back and forth commentary if this peaks your interest. The take home lesson appears to be that you need to be aware of the selection of your sample subjects in social, psychological and cultural experiments, using caution when reasoning from a narrow sample to the larger population (or species) and do not assume that everyone (in the world, in your culture, in the same room as you) are all the same, except when they are.

Monday, July 05, 2010

Modeling soccer (and perhaps your work team) as a social network

Frequent readers will be aware that an occasional hobby of mine is to numerically model sports and other interesting activities yet I lack much of the time or tools to do so. I have taken on NFL playoff football, both the entire playoffs and detailed modeling of the Superbowl down to the player level. I have also attempted, unsuccessfully, to model March Madness, the NCAA basketball playoffs, performing mostly analysis as opposed building a model that helps me win a March madness pool. Thus, I love to read interesting modeling papers, especially those which model sports or games as models for other real world activities.

The contributions of individual players in sports like football, baseball and basketball are helped by the large amount to statistics collected and available for these sports. Thus the contribution for individual team members to the team success is easier to model. Science online has a report of some work done by Jordi Duch and other researchers, at Northwestern and in Spain, that attempts to model the contributions of soccer players to the success of their team.

They point out that soccer is a very fluid game compared to baseball, or football and that combined with the very low scores makes statistics like goals and assists insufficient to model the contribution of players to the performance of the team. They hypothesize that the passes and flow of the game leading up to the rare goals are important for determining the outcome of the game and they use networks to model this flow. Players are nodes in the network and the lines between the nodes, called arcs, represent passes. Much as a Facebook or Twitter can be modeled as a network with friendship and interactions or follower/following being the connections, soccer is a "social" sport.

They also include nodes for the goal and for shots wide of the goal. To each of these arcs the attach statistics and probabilities from the 2008 European football championship on play pass accuracy, and goal accuracy to the arcs. One could them follow the ball through this "ball flow" network to a goal, a miss or to the other team. Combined with more calculations the group attempts to predict the outcome of soccer games.

Even more interestingly, the authors apply this concept to a work team that is writing a paper with several co-authors. Instead of the nodes being soccer players in paper network, a node represents a co-author in the manuscript, and the lines between the nodes represent communications directed from one co-author to the others. The e-mails represent communications between coauthors and the effectiveness of the authors is measured by completion of tasks like performing a calculation or scheduling a meeting. In the diagram below, author A2 (I think A3 in the second chart is a typos) seems to be an important and strong contributor.

One of the authors, comments on how the scheme can be used to assess the contribution of individual team members.
"One of the issues with any kind of teamwork is assigning the right credit," says Amaral. "The wild, loud people get more credit, but with this analysis you can get a picture of how much an individual really contributes to an outcome."
As work continues to evolve to be more team driven and highly networked, perhaps a scheme like this can not only point out strong contributors to a team, but also help an entire team work at a higher level. Imagine it applied to the work of developing open source software or Wikipedia articles.

(via Science online, the paper at Public Library of Science, PLoS, figures above are from the paper can be found at this .pdf link)

Thursday, July 01, 2010

Fireworks from Greenville Country Club

Good friends of ours live behind Greenville Country Club in Greenville, Delaware. Each year they invite us to watch the country club's fireworks from the comfort of their backyard meadow. This year's fireworks were on July 1st, tonight. As always I tried to get some pictures.