Carpooling to Summer Camp, part IV

July 4th, 2008

Let’s see what we can get from the Geolocation services of Google and Yahoo. This script should give us a good overview.

<?php
 
function get_clean_positive_integer_string($string) { // FORCE IT ALL NUMERIC
  $new  = trim(ereg_replace("[^0-9$]", "", $string));
return ( $new );
}
 
function compute_distance($from_lat, $from_lon, $to_lat, $to_lon) {
  $from_lat  = floatval($from_lat);
  $from_lon  = floatval($from_lon);
  $to_lat    = floatval($to_lat);
  $to_lon    = floatval($to_lon);
 
  $dist  = acos(sin(deg2rad($from_lat))
              * sin(deg2rad($to_lat))
              + cos(deg2rad($from_lat))
              * cos(deg2rad($to_lat))
              * cos(deg2rad($from_lon - $to_lon)));
 
  $dist  = rad2deg($dist);
  $miles  = (float) $dist * 69;
return round($miles,1);
// To get kilometers, multiply miles by 1.61
// $km    = (float) $miles * 1.61;
// return round($km,1);
}
 
require_once('../root/db_link.php');
 
// CHECK FOR A VALID 5-DIGIT INTEGER STRING LIKE A ZIP CODE
$zip = substr(get_clean_positive_integer_string($_GET["zip"]),0,5);
if (strlen($zip) != 5) { ?>
<html><head><title><?=$zip?> Geocoders Update</title></head>
<body>
<form method="get">
Zip: <input type="text" name="zip" autocomplete="off" />
</form>
</body>
</html>
<?php die(); }
 
// WITH A VALID-LOOKING ZIP, CALL THE GEOCODE SERVICES
echo "<br />";
echo "<a target=\"geo_Z\" href=\"http://maps.google.com/maps?q=zip+code+" . $zip . "&ie=UTF8&t=h&z=14\">$zip</a> ";
 
// GOOGLE CALL
unset ($g_lat, $g_lon, $g_buf);
$g_AOK = FALSE;
$google_url  = "http://maps.google.com/maps/geo?q=" . $zip . "&output=csv&key=$google_API";
if ($fp = fopen($google_url, 'r')) {
  while (!feof($fp)) { $g_buf .= fgets($fp); }
  fclose($fp);
}
if ($g_buf != '') { // NOT EMPTY, WE GOT DATA
  $x = explode(',', $g_buf);
  if ($x[0] == 200) { // RETURN CODE OK
    if ($x[1] >= 5) { // ACCURACY VALUE AT LEAST 5 (ZIP CODE PROXIMITY)
      $g_lat = $x[2];
      $g_lon = $x[3];
 
      $usql  = "UPDATE googleRecodes SET latitude = $g_lat, longitude = $g_lon WHERE zip = \"$zip\" LIMIT 1";
      if (!$u  = mysql_query($usql)) { die(mysql_error()); }
      $g_AOK = TRUE;
    }
  }
}
if ($g_AOK) { // LINK TO GOOGLE MAP FOR THE GOOGLE - CODED ZIP
  echo "<a target=\"geo_G\" href=\"http://www.google.com/maps?f=q&hl=en&geocode=&q=" . $g_lat .','. $g_lon . "&ie=UTF8&z=14&iwloc=addr\">G</a> $g_lat $g_lon \n";
}
 
// YAHOO CALL
unset ($y_lat, $y_lon, $y_buf);
$y_AOK = FALSE;
$yahoo_url  = "http://local.yahooapis.com/MapsService/V1/geocode?zip=" . $zip . "&appid=$yahoo_API;";
if ($fp = @fopen($yahoo_url, 'r')) {
  while (!feof($fp)) { $y_buf .= fgets($fp); }
  fclose($fp);
}
if ($y_buf != '') { // NOT EMPTY, WE GOT DATA
  $ydata  = new SimpleXMLElement($y_buf);
  $y_lat  = $ydata->Result->Latitude;
  $y_lon  = $ydata->Result->Longitude;
  $y_zip  = $ydata->Result->Zip;
  if ($y_zip == $zip) {
 
    $usql  = "UPDATE yahooRecodes SET latitude = $y_lat, longitude = $y_lon WHERE zip = \"$zip\" LIMIT 1";
    if (!$u  = mysql_query($usql)) { die(mysql_error()); }
    $y_AOK = TRUE;
  }
}
if ($y_AOK) { // LINK TO GOOGLE MAP FOR THE YAHOO! - CODED ZIP
  echo "<a target=\"geo_Y\" href=\"http://www.google.com/maps?f=q&hl=en&geocode=&q=" . $y_lat .','. $y_lon . "&ie=UTF8&z=14&iwloc=addr\">Y</a> $y_lat $y_lon \n";
}
 
// RESULTS SUMMARY
if ( ($g_lat != '') && ($y_lat != '') ) {
  $miles_apart = compute_distance($y_lat, $y_lon, $g_lat, $g_lon);
  echo "Miles Apart: $miles_apart \n";
} else {
  echo "ERROR ";
  if ($g_AOK == FALSE) { echo "G"; }
  if ($y_AOK == FALSE) { echo "Y"; }
}
?>

Here is the result for Georgetown, Washington, DC 20007:

20007 G 38.914063 -77.077395 Y 38.913980 -77.074144 Miles Apart: 0.2

As Yogi Berra said, “You can see a lot just by looking.” Left to right you’re looking at these fields:
20007 is the zip code and is a link to the Google map for that zip code
G is a link to the Google map for the Google geolocation of that zip code
the Google latitude/longitude pair
Y is a link to the Google map for the Yahoo! geolocation of that zip code
the Yahoo! latitude/longitude pair
the distance between the Google and Yahoo locations.

If you try the script with bogus zip codes you get some pretty interesting results. Yahoo does not have anything for zip code 99990, but Google maps this zip code to some place in Scandinavia. Then when you look for that location via a non-API method, Google says it can’t find the zip code.

Carpooling to Summer Camp, part III

July 4th, 2008

Taking out my electronic magnifying glass and examining the zip code geolocation data reveals a panoply of sin and iniquity!

Many of the zips have duplicate lat/lon pairs. Out of the 40,000+ records, more than 9,600 zip codes have lat/lon pairs that exactly match a different zip code. This would be expected in cases where more than one zip code is handled by the same post office. But 9,600 seems (intuitively) way too high.

A bunch of records have no lat/lon data at all - just empty fields or zeros.

I probably don’t care about APO/FPO zip codes — you can’t drive to camp from there, but I want to bring as much accuracy to the underlying data as possible. And I might as well eliminate the zip codes that don’t represent physical locations - no point in querying data that we know is useless!

I deleted the zipcodes table and reloaded it from the original text file, then set out to scrub the data.

The first thing I noticed was the prevalence of latitudes with plus signs in front of them. Not necessary and a potential source of trouble if I ever processed these fields as character strings instead of decimal values. This script got rid of that issue.

$sql  = "SELECT * FROM zipcodes ORDER BY zip ";
if (!$s  = mysql_query($sql)) { die(mysql_error()); }
 
$old_zip = "ZIP";
$kount = 0;
 
while ($row = mysql_fetch_assoc($s)) {
  $kount++;
  extract($row);
  $usql = "UPDATE zipcodes SET latitude = $latitude, longitude = $longitude WHERE zip = \"$zip\" LIMIT 1";
  if (!$u  = mysql_query($usql)) { die(mysql_error()); }
  echo "<br />$zip $latitude $longitude $kount \n";
}

Next, I decided to start with the zipclass field and see what I could learn. This query will give me some information to get started.

$sql  = "SELECT * FROM zipcodes ORDER BY zipclass ASC, zip ASC";
if (!$s  = mysql_query($sql)) { die(mysql_error()); }
 
$old_zipclass = "FOO";
$kount = 0;
 
while ($row = mysql_fetch_assoc($s)) {
  extract($row);
  if ($old_zipclass != $zipclass) {
    $kount++;
    echo "<br />$kount $zipclass $zip\n";
  }
  $old_zipclass = $zipclass;
}

1 MILITARY 09007
2 PO BOX ONLY 00611
3 STANDARD 00601
4 UNIQUE 00501

Wonder how many of each…

if (!$s  = mysql_query("SELECT * FROM zipcodes WHERE zipclass = \"MILITARY\" ")) { die(mysql_error()); }
echo "<br />Military: " . mysql_num_rows($s);
/* [etc...] */

Military: 539
PO Box Only: 5089
Standard: 34602
Unique: 2510

Let’s scan some of the data visually…

if (!$s  = mysql_query("SELECT * FROM zipcodes WHERE zipclass = \"MILITARY\" ")) { die(mysql_error()); }
echo "<br />Military: " . number_format(mysql_num_rows($s));
 
while ($row = mysql_fetch_assoc($s)) {
  echo "<br />\n";
  print_r($row);

Some of the MILITARY zip codes represent physical places (34090) but others do not (96687) and none of these zip codes have people driving to summer camp. I used a military solution to handle them.

if (!$s  = mysql_query("DELETE FROM zipcodes WHERE zipclass = \"MILITARY\" ")) { die(mysql_error()); }

I also deleted the records for non-states, such as Palua and Guam, bringing the total number of records down to 41,976.

Next I looked at the PO BOX ONLY records. A visual scan of the file showed that they were USA locations - they had lat/lon pairs and city, state information. But many had duplicate lat/lon pairs. For example, 11930 and 11970 were both listed at [latitude] => 40.92232600 [longitude] => -72.63707800 — and neither of those zip codes are located in the rough of the Indian Island Golf Course! Looks like this is a segment of the data base that needs to be scrubbed.

Thence to the UNIQUE records where I found the same sort of stuff: 22037 and 22047 are both allegedly located at [latitude] => 38.83181300 [longitude] => -77.28875500. Uhh, nope. That location is in zip code 22032.

So it’s time to start cleaning up the remaining, hopefully useful, zip code location records. First, I ran this query to identify any records that were missing a lat/lon pair:
$sql  = "SELECT * FROM zipcodes WHERE (latitude = 0.0 OR longitude = 0.0) ORDER BY zip ASC";
if (!$r  = mysql_query($sql)) { die(mysql_error()); }
while ($row  = mysql_fetch_assoc($r)) {
  extract($row);
  echo "<br />$zip $latitude $longitude $state $city \n";
}

and I got only one errant record:

89494 0.00000000 0.00000000 NV SPARKS

I happen to know that Sparks, NV is near Reno. That zip code is probably wrong; it might be 89434. Google geolocated 89434 somewhere in the mountains. I use phpMyAdmin and put Sparks, NV 89434 in its rightful place.

Next, I decided to identify all the records that had duplicate lat/lon pairs matched to another zip code. These replicants were, unfortunately, more than a little plentiful. I ran this script, then used phpMyAdmin to make two copies of the resulting table.

// CREATE A NEW TABLE
$csql = "
CREATE TABLE IF NOT EXISTS `recodes` (
  `zip` varchar(5) NOT NULL,
  `latitude` decimal(16,8),
  `longitude` decimal(16,8),
  `city` varchar(40) NOT NULL default '',
  `state` char(2) NOT NULL default '',
  `county` varchar(40) NOT NULL default '',
  `id` mediumint(6) NOT NULL auto_increment,
  PRIMARY KEY  (`id`),
  KEY `city` (`city`,`state`),
  KEY `state` (`state`,`city`)
) ENGINE=MyISAM DEFAULT CHARSET=ascii ";
if (!$c = mysql_query($csql)) { die(mysql_error()); }
 
// CREATE A HOLDING TANK
$csql = "CREATE TEMPORARY TABLE goofy (_key mediumint(6) NOT NULL AUTO_INCREMENT,
  zip varchar(5),
  latitude decimal (16,8),
  longitude decimal(16,8),
  city varchar(40),
  state char(2),
  county varchar(40),
  lat_lon varchar(36),
  PRIMARY KEY (`_key`) ) ENGINE=MEMORY ";
if (!$c = mysql_query($csql)) { die(mysql_error()); }
 
// SELECT THE RECORDS
if (!$s  = mysql_query("SELECT * FROM zipcodes ORDER BY latitude, longitude")) { die(mysql_error()); }
 
// FIND THE RECORDS WITH DUPLICATES
$old_lat_lon = 'Latitude Longitude';
while ($srow = mysql_fetch_assoc($s)) {
  extract ($srow);
  $lat_lon = "$latitude" . ' ' . "$longitude";
  if ($lat_lon == $old_lat_lon) { // AT LEAST ONE MORE lat/lon PAIR MATCHES THIS - SAVE IT
    $xlat_lon = mysql_real_escape_string($lat_lon);
    $isql = "INSERT INTO goofy     (    zip,    latitude,  longitude,    city,      state,      county,       lat_lon   )
                VALUES ( \"$zip\", $latitude, $longitude, \"$city\", \"$state\", \"$county\", \"$xlat_lon\" ) ";
    if (!$i = mysql_query($isql)) { die(mysql_error()); }
    $dup_count++;
  }
  $old_lat_lon = $lat_lon;
}
 
// FIND THE ORIGINAL MATCHES FOR THE DISTINCT lat/lon PAIRS
$gsql = "SELECT * FROM zipcodes WHERE latitude IN ( SELECT DISTINCT(lat_lon) FROM goofy )";
if (!$g = mysql_query($gsql)) { die(mysql_error()); }
 
while ($grow = mysql_fetch_assoc($g)) {
  extract ($grow);
  $isql  = "INSERT INTO recodes (    zip,    latitude,  longitude,    city,      state,      county    )
              VALUES ( \"$zip\", $latitude, $longitude, \"$city\", \"$state\", \"$county\"  ) ";
  if (!$i = mysql_query($isql)) { die(mysql_error()); }
}

I decided to look toward Google and Yahoo to get the right geolocation for each zip code. Google Maps has a service that will give you back XML (actually KML) but that may be more than I need. They also have a service available over HTTP that can give you back a succinct CSV. You need to get a Google Maps API Key to use either service. Once you have your key, it’s as easy as this to get the lat/lon pair for a zip code:

http://maps.google.com/maps/geo?output=csv&q=[Zip]&key=[API-key]

You get back four fields. A ‘200′ return code means OK, the ‘5′ is a precision indicator (higher is better) and the next two fields are latitude and longitude:

200,5,38.935827,-77.162140

A five digit zip code does the trick. Google can accept a zip-plus-4 code in the format ZZZZZ-SSSS but it does not improve the accuracy of the geolocation data.

You can also get geocoding services via the Yahoo API. There is a moderately detailed registration requirement. Yahoo’s RESTful service works like this:

http://local.yahooapis.com/MapsService/V1/geocode?zip=[Zip]&appid=[API-key]

This is what you get back:

<?xml version=”1.0″?>
<ResultSet xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance”
    xmlns=”urn:yahoo:maps”
    xsi:schemaLocation=”urn:yahoo:maps
    http://api.local.yahoo.com/MapsService/V1/GeocodeResponse.xsd”>
    <Result precision=”zip”>
       <Latitude>38.933290</Latitude>
       <Longitude>-77.168839</Longitude>
       <Address></Address>
       <City>McLean</City>
       <State>VA</State>
       <Zip>22101</Zip>
       <Country>US</Country>
    </Result>
</ResultSet>
<!– ws03.search.re2.yahoo.com compressed/chunked Fri Jul 4 12:59:43 PDT 2008 –>

Like Google, Yahoo will accept a zip+4 code but its response is the same as for the five digit zip. Unlike Google, who permit 15,000 queries a day, Yahoo limits you to 5,000 queries per day. Not so great for my effort here — I have 9,663 questionable zip codes I want to geolocate!

Interestingly, Google and Yahoo disagree about the exact location of a zip code. For my zip code, they said: Distance from [38.935827 / -77.162140] to [38.933290 / -77.168839] is 0.4 miles. As I move through the process of geolocating the 9,663 zip codes I will compare Google and Yahoo lat/lon pairs to see how closely they line up.

Carpooling to Summer Camp, part II

July 3rd, 2008

Back to the code… I think I should test the performance of this script, just to be sure I am not going to tank the server when I do a lot of queries.

I’ll put this at the top:

list ($x, $y) = explode(" ", microtime());
$script_start_time = $x + $y;

And this at the bottom:

list ($x, $y) = explode(" ", microtime());
$script_stop_time = $x + $y;
$script_duration = number_format( ($script_stop_time - $script_start_time) ,3);
echo "<br />Script Duration $script_duration Seconds \n";

When I run it with no input the reported time is 2 milliseconds, +/- 2ms.

When I run it with a zip code input the reported time varies greatly depending on how many zip codes come back from the query. Some took as long as 0.9 seconds and others finished in 0.1 seconds. I added ENGINE=MEMORY to the temporary table and the large queries are now completing in one fourth of the original time. So far, so good. It looks like I can consistently locate and measure the distance between 400 zip codes in about a fourth of a second.

But this testing process reveals an unpleasant surprise. I find that the distance between several zip codes is wrong in my computations. Some zips have the same lat/lon pairs. For example, when I calculate the distance between zip codes 20914 to 20827, I get zero miles. Clearly an error. Google thinks the driving distance is 11.3 miles and Google is right. But my distance comparison comes out looking like this:

Distance from 20914 to 20827 is 0 miles

FROM
Array
(
[id] => 8715
[zip] => 20914
[latitude] => +39.14397900
[longitude] => -077.20761700
[city] => SILVER SPRING
[state] => MD
[fullstate] => MARYLAND
[county] => MONTGOMERY
[zipclass] => STANDARD
)

TO
Array
(
[id] => 8655
[zip] => 20827
[latitude] => +39.14397900
[longitude] => -077.20761700
[city] => BETHESDA
[state] => MD
[fullstate] => MARYLAND
[county] => MONTGOMERY
[zipclass] => STANDARD
)

Looks like I’m going to have to scrub this data, or purchase a license for a scrubbed data set. Let’s see what geocoder.us can tell us about these zip codes.

39.143979, -77.207617, Silver Spring, MD, 20914
39.143979, -77.207617, Bethesda, MD, 20827

Nope, their data is wrong, as well. That dog won’t hunt. At least they got this one right!

I’ll look at this some more tomorrow.

Carpooling to Summer Camp

June 30th, 2008

Undoubtedly I am not the only one who has noticed that driving your kid to sleep-away camp can cost $100 in gasoline alone, so the stimulus to carpool is notable. But how do you find other folks from your home town who are going to the same camp? I decided this was a good web application.

But wait, you say, “It’s been done. There are hundreds of thousands of listings for “Virginia ride share” alone.” And you are right. But those sites are limited in their ability and reach when it comes to this sort of application. Here is what makes carpooling to summer camp unique.

  • A pre-constituted community of campers
  • A common destination and arrival time
  • A common departure and return time
  • A one-time opportunity to avoid a substantial expense

In other words, everyday carpooling is different from carpooling to camp.

First, I needed some basic geo-location data. I figured that a Zipcode-to-Latitude/Longitude pair would be a good way to get started. This led to a LOT of Google searching. The US Geological Service or the Census Bureau or the Postal Service would seem to be the logical place to look. Wrong.

For my first try at getting useful data, I went to http://www.populardata.com but I found a few missing lat/lon pairs. Next I went to NerdVittles and grabbed two zip files that, taken together, contain a MySQL dump of the zip code data base. Nerdvittles apparently took Novak’s file and made it into MySQL statements. There is a license document, available here: http://nerdvittles.com/trixbox123/ZIP_CODES_license.pdf. The zipcodes download is available here: http://nerdvittles.com/trixbox123/zipcodes.zip

The zipcodes files were on an interesting page. Gotta go back there and read about that phone-weather-zipcode technology! http://nerdvittles.com/index.php?p=160

Moving right along, the next thing I needed to do was load the data into a data base. This came along in the zip code files:

CREATE TABLE IF NOT EXISTS `zipcodes` (
`id` mediumint(6) NOT NULL auto_increment,
`zip` varchar(5) NOT NULL default '',
`latitude` varchar(11) NOT NULL default '',
`longitude` varchar(11) NOT NULL default '',
`city` varchar(40) NOT NULL default '',
`state` char(2) NOT NULL default '',
`fullstate` varchar(30) NOT NULL default '',
`county` varchar(40) NOT NULL default '',
`zipclass` varchar(20) NOT NULL default '',
PRIMARY KEY  (`id`),
KEY `zip` (`zip`),
KEY `city` (`city`,`state`),
KEY `state` (`state`,`city`)
) ENGINE=MyISAM DEFAULT CHARSET=ascii AUTO_INCREMENT=42742 ;

I copied that into a PHP script, but because I wanted to do math with the lat/lon pairs, I changed their definition.

  `latitude` decimal(16,8),
  `longitude` decimal(16,8),

Then I attached all the necessary DB-connect stuff, named it db_create.php, stored it on the server and ran it to create the zipcodes table.

Here is what the payload information looks like:

INSERT INTO `zipcodes` VALUES (1, '00501', '+40.922326', '-072.637078', 'HOLTSVILLE', 'NY', 'NEW YORK', 'SUFFOLK', 'UNIQUE');
INSERT INTO `zipcodes` VALUES (8153, '20016', '+38.938260', '-077.091149', 'WASHINGTON', 'DC', 'D.C.', 'DISTRICT OF COLUMBIA', 'STANDARD');
INSERT INTO `zipcodes` VALUES (42741, '99950', '+55.542007', '-131.432682', 'KETCHIKAN', 'AK', 'ALASKA', 'KETCHIKAN GATEWAY', 'STANDARD');

By actual count, we have 42,740 unique zip codes. There is one key missing at #22001. Doesn’t seem like a big deal for my needs. I put this text file on the server and named it db_data.txt.

Next I ran this script to load the data base.

<?php
require_once('../root/db_link.php');
$file = fopen('db_data.txt', 'r');
while (!feof($file)) { // READ THE INSERT STATEMENTS
  $sql = fgets($file);
  if (!$result = mysql_query($sql)) { die(mysql_error()); }
}
echo "Done!";
?>

There was a blank line at the end of the data, so I got “Query was empty” instead of “Done!” I used phpMyAdmin to check the cardinality of the data base. It was 42,740 — close enough to the right number of US zip code records.

I wanted to test a quick query of the data base, so I wrote up db_query.php, thus:

<?php
function get_clean_positive_integer_string($string) { // FORCE IT ALL NUMERIC
  $new  = trim(ereg_replace("[^0-9]", "", $string));
return ( $new );
}
 
require_once('../root/db_link.php');
 
if (!empty($_GET["_submit"])) {
  $zip = get_clean_positive_integer_string($_GET["geo_zip"]);
  $sql = "SELECT * FROM zipcodes WHERE zip=\"$zip\" LIMIT 1";
  if (!$result = mysql_query($sql)) { die(mysql_error()); }
  $row = mysql_fetch_assoc($result);
  foreach ($row as $k => $v) { echo "<br />$k => $v \n"; }
}
?>
<html><head><title>Zip Test</title></head>
<body>
<form method="GET">
Zip:
<input type="text" name="geo_zip" autocomplete="off" />
<input type="submit" value="go" name="_submit" />
</form>
</body>
</html>
 

This seemed to work fine. I put in 22101 and out came this:

id => 9151
zip => 22101
latitude => +38.93520200
longitude => -077.18672000
city => MC LEAN
state => VA
fullstate => VIRGINIA
county => FAIRFAX
zipclass => STANDARD

Now I needed a way to figure out the distance between zip codes using their lat/lon pairs. This seemed to be helpful: http://en.wikipedia.org/wiki/Great-circle_distance and I found the code to execute those formulae on the Experts Exchange (see the compute_distance() function in the code below).


<?php
function get_clean_positive_integer_string($string) { // FORCE IT ALL NUMERIC
  $new  = trim(ereg_replace("[^0-9]", "", $string));
return ( $new );
}
 
function compute_distance($from_lat, $from_lon, $to_lat, $to_lon) {
  $from_lat  = floatval($from_lat);
  $from_lon  = floatval($from_lon);
  $to_lat    = floatval($to_lat);
  $to_lon    = floatval($to_lon);
 
  $dist  = acos(sin(deg2rad($from_lat))
              * sin(deg2rad($to_lat))
              + cos(deg2rad($from_lat))
              * cos(deg2rad($to_lat))
              * cos(deg2rad($from_lon - $to_lon)));
 
  $dist  = rad2deg($dist);
  $miles  = (float) $dist * 69;
return round($miles,1);
// To get kilometers, multiply miles by 1.61
// $km    = (float) $miles * 1.61;
// return round($km,1);
}
 
require_once('../root/db_link.php');
 
if (!empty($_GET["_submit"])) {
  $zip_from  = get_clean_positive_integer_string($_GET["geo_zip_from"]);
  $zip_to    = get_clean_positive_integer_string($_GET["geo_zip_to"]);
 
  $sql  = "SELECT * FROM zipcodes WHERE zip=\"$zip_from\" LIMIT 1";
  if (!$r  = mysql_query($sql)) { die(mysql_error()); }
  $from  = mysql_fetch_assoc($r);
 
  $sql  = "SELECT * FROM zipcodes WHERE zip=\"$zip_to\" LIMIT 1";
  if (!$r  = mysql_query($sql)) { die(mysql_error()); }
  $to    = mysql_fetch_assoc($r);
 
  if ( empty($from) ) {
    echo "<br />ERROR in From Zip: $zip_from \n";
  }
 
  if ( empty($to) ) {
    echo "<br />ERROR in To Zip: $zip_to \n";
  }
 
  if (!empty($from) && !empty($to)) {
    $miles = compute_distance($from["latitude"], $from["longitude"], $to["latitude"], $to["longitude"]);
    echo "<br />Distance from $zip_from to $zip_to is $miles miles \n";
    echo "<pre> FROM\n";
    print_r($from);
    echo "\n\n TO\n";
    print_r($to);
    echo "</pre>\n";
  }
}
?>
<html><head><title>Zip Distance Test</title></head>
<body>
<form method="get">
From Zip:
<input type="text" name="geo_zip_from" autocomplete="off" />
 
To Zip:
<input type="text" name="geo_zip_to" autocomplete="off" />
<input type="submit" value="go" name="_submit" />
</form>
</body>
</html>

Now we were getting somewhere! I checked the zip codes for the National Presbyterian Church in Washington, DC and First Presbyterian Church in Pittsburgh, PA. The output looked like this:

Distance from 20016 to 15222 is 186.9 miles

FROM
Array
(
[id] => 8153
[zip] => 20016
[latitude] => +38.93826000
[longitude] => -077.09114900
[city] => WASHINGTON
[state] => DC
[fullstate] => D.C.
[county] => DISTRICT OF COLUMBIA
[zipclass] => STANDARD
)

TO
Array
(
[id] => 5945
[zip] => 15222
[latitude] => +40.43443600
[longitude] => -080.02481700
[city] => PITTSBURGH
[state] => PA
[fullstate] => PENNSYLVANIA
[county] => ALLEGHENY
[zipclass] => STANDARD
)

For a quick reality check, I headed over to a Google Maps Distance Calculator on daftlogic.com and got 187.157 miles. Pretty close.

But if you ask the same question of Google Maps when you are looking for driving directions, you get 237 miles, a difference of about 25%. The point-to-point distance calculation will invariably understate the driving distance. There are a number of reasons for anomalies in the output.

For one thing, the distance calculations compute minimum distances between places that are roughly the central points of the zip code area, “as the crow flies,” if you will. But you and I don’t drive on the great circle route. We drive on roads, and if we ever turn the steering wheel, our odometer mileage is certain to be greater than the distance reported by the compute_distance() function. If we have to travel a lot of tortuous routes, we will find that the program may have underestimated the mileage by a lot. I have tinkered with the parameters a little, but have not come to the conclusion that I can just add 25% to the computed mileage. It’s not that simple. In the Northeast where roads are wiggly the underestimation is greater. In the Midwest where roads are straight the estimates tend to be more accurate. In the mountains where the roads are all switchbacks, the estimates are pretty much wrong. It’s not a show stopper; just something to keep in mind.

For another thing, a zip code is not the same as the place you are going — it is a collection of streets, postal routes if you will. Even the US Postal Service may have more than one office in the same zip code. And in geographically large zip codes there will be some increased chance that the computed distances will not reflect your personal travel mileage. This phenomenon may be magnified if the zip codes are close to each other. Your point of origin could be one street away from a destination that just happened to be over the line into a different zip code, and the program would overestimate your mileage. Or you could be traveling between opposite ends of the same zip code. The program would tell you that your mileage is zero. But again, these are not show-stoppers, just bumps in the road. And our goal is to organize one-time carpools for long distances, so we are still going to be able to create a useful tool.

I decided these error phenomena are within the acceptable locus of confusion, and development can continue. But at the same time, I am keeping the compute_distance() function isolated, so I can modify it if it looks like it’s going to produce significant incorrout. One thought might be to compare driving mileage to computed distance for each zip code pair, but I currently don’t have an automated way to get the driving mileage. Full of optimism, I posted a question on EE about Google driving distances.

Next, I decided I would like to be able to understand the lat/lon pairs, so I coded a script that ignored the data base and just took the lat/lon pairs from an input form.


<?php
function get_clean_numeric_string($string) { // FORCE IT ALL FLOATING POINT NUMERIC
  $new  = trim(ereg_replace("[^0-9\.\-]", "", $string));
return ( $new );
}
 
function compute_distance($from_lat, $from_lon, $to_lat, $to_lon) {
  $from_lat  = floatval($from_lat);
  $from_lon  = floatval($from_lon);
  $to_lat    = floatval($to_lat);
  $to_lon    = floatval($to_lon);
 
  $dist  = acos(sin(deg2rad($from_lat))
              * sin(deg2rad($to_lat))
              + cos(deg2rad($from_lat))
              * cos(deg2rad($to_lat))
              * cos(deg2rad($from_lon - $to_lon)));
 
  $dist  = rad2deg($dist);
  $miles  = (float) $dist * 69;
return round($miles,1);
// To get kilometers, multiply miles by 1.61
// $km    = (float) $miles * 1.61;
// return round($km,1);
}
 
// NOT NEEDED HERE require_once('../root/db_link.php');
 
if (!empty($_GET["_submit"])) {
  $lat_from  = get_clean_numeric_string($_GET["geo_lat_from"]);
  $lat_to    = get_clean_numeric_string($_GET["geo_lat_to"]);
  $lon_from  = get_clean_numeric_string($_GET["geo_lon_from"]);
  $lon_to    = get_clean_numeric_string($_GET["geo_lon_to"]);
 
  $miles = compute_distance($lat_from, $lon_from, $lat_to, $lon_to);
  echo "<br />Distance from [$lat_from / $lon_from] to [$lat_to / $lon_to] is $miles miles \n";
}
?>
<html><head><title>LAT/LON Distance Test</title></head>
<body>
<form method="get">
From<br />
Lat:
<input type="text" name="geo_lat_from" autocomplete="off" />
Lon:
<input type="text" name="geo_lon_from" autocomplete="off" />
<br /><br />
To <br />
Lat:
<input type="text" name="geo_lat_to" autocomplete="off" />
Lon:
<input type="text" name="geo_lon_to" autocomplete="off" />
<br /><br />
<input type="submit" value="go" name="_submit" />
</form>
</body>
</html>

Using that, I was able to figure out that in most of North America you can get about a 30 mile radius if you go +/- 0.2 on the latitude and longitude. So my locale can now be extracted with a couple of SQL queries. Using these, I should be able to find all the nearby zip codes and the distance to each. Here’s the first cut at the code…


<?php
function get_clean_positive_integer_string($string) { // FORCE IT ALL NUMERIC
  $new  = trim(ereg_replace("[^0-9$]", "", $string));
return ( $new );
}
 
function compute_distance($from_lat, $from_lon, $to_lat, $to_lon) {
  $from_lat  = floatval($from_lat);
  $from_lon  = floatval($from_lon);
  $to_lat    = floatval($to_lat);
  $to_lon    = floatval($to_lon);
 
  $dist  = acos(sin(deg2rad($from_lat))
              * sin(deg2rad($to_lat))
              + cos(deg2rad($from_lat))
              * cos(deg2rad($to_lat))
              * cos(deg2rad($from_lon - $to_lon)));
 
  $dist  = rad2deg($dist);
  $miles  = (float) $dist * 69;
return round($miles,2);
// To get kilometers, multiply miles by 1.61
// $km    = (float) $miles * 1.61;
// return round($km,2);
}
 
// CONSTANTS FOR SOME ROOM AROUND OUR ZIP CODE
$latitude_offset  = 0.20000;
$longitude_offset  = 0.20000;
 
require_once('../root/db_link.php');
 
if (!empty($_GET["_submit"])) {
  $zip_from  = get_clean_positive_integer_string($_GET["geo_zip_from"]);
 
  $sql  = "SELECT * FROM zipcodes WHERE zip=\"$zip_from\" LIMIT 1";
  if (!$r  = mysql_query($sql)) { die(mysql_error()); }
  $from  = mysql_fetch_assoc($r);
 
  if ( empty($from) ) {
    echo "<br />ERROR in From Zip: $zip_from \n";
  }
  extract($from);
  if ($latitude == 0.0) { echo "<br />ERROR: ZIP $zip_from HAS NO PHYSICAL LOCATION<br /><br />\n"; }
 
// COMPUTE THE MIN AND MAX LAT/LON PAIRS
  $minlat = $latitude - $latitude_offset;
  $maxlat = $latitude + $latitude_offset;
  $minlon  = $longitude - $longitude_offset;
  $maxlon  = $longitude + $longitude_offset;
 
// SAVE MY STARTING POINT LAT/LON PAIR
  $my_lat  = $latitude;
  $my_lon  = $longitude;
 
// FIND NEARBY ZIP CODES
  $sql  = "CREATE TEMPORARY TABLE local_zips ( ";
  $sql  .= "distance  decimal(6,2) ) SELECT * FROM zipcodes WHERE ";
  $sql  .= "latitude >= $minlat AND latitude <= $maxlat AND longitude >= $minlon AND longitude <= $maxlon ";
  $sql  .= "AND latitude <> 0.0 AND longitude <> 0.0 ";
  if (!$s  = mysql_query($sql)) { die(mysql_error()); }
 
// COMPUTE THE DISTANCE FOR EACH LOCAL ZIP CODE
  $sql  = "SELECT * FROM local_zips";
  if (!$s  = mysql_query($sql)) { die(mysql_error()); }
  while ($row = mysql_fetch_assoc($s)) {
    extract ($row);
    $distance = compute_distance( $my_lat, $my_lon, $latitude, $longitude );
    $usql  = "UPDATE local_zips SET distance = $distance WHERE id = $id LIMIT 1";
    if (!$u = mysql_query($usql)) { die(mysql_error()); }
  }
 
// PRINT OUT THE NEARBY ZIP CODES AND THE DISTANCES
  $sql  = "SELECT * FROM local_zips ORDER BY distance ASC";
  if (!$s  = mysql_query($sql)) { die(mysql_error()); }
  while ($row = mysql_fetch_assoc($s)) {
    extract ($row);
    echo "<br /><a target=\"_blank\" href=\"http://maps.google.com/maps?f=q&hl=en&geocode=&q=" . $zip . "&ie=UTF8&z=13&iwloc=addr\">$zip</a> is $distance miles away \n";
  }
}
?>
<html><head><title>Zip Locality Test</title></head>
<body>
<form method="get">
Zip:
<input type="text" name="geo_zip_from" autocomplete="off" />
<input type="submit" value="go" name="_submit" />
</form>
</body>
</html>

At this point, I decided I would take a break from coding and look at the quality of the data. I ran a query to pick out all the places that did not have a lat/lon pair. APO and FPO zip codes don’t count. And I wasn’t going to worry about Pahrump, Nevada. But there were 153 omissions. Some were PO BOX ONLY, but many were STANDARD. (Curiously, zip code 20500, was among the missing, but I guess you don’t have to carpool to Camp David). Still, with only 153 holes in over 40,000 records the data quality was not too bad. Maybe I will come back to the quality issue and see if I can fix up some of the omissions. But for now, I’ll go ahead with the appdev.

Inserting a String into a Template

June 29th, 2008

This was another question from the EE. The OP was modifying his site navigation menu. He had two files, one with the static portions of the menu and another with the dynamic portions. He had configured the static file as a template, with start and stop markers for the insertion of the dynamic portions.

<?php
$inserted = file_get_contents("from_mysql_files_menu2to4_it.php");
$template = file_get_contents("menu2_functions.php");
$place1 = "//# ooPLACEHOLDERoo1 #";
$place2 = "//# ooPLACEHOLDERoo2 #";
 
// GRAB THE LEFT PART OF THE STRING PLUS THE PLACEMARKER
$x = explode($place1,$template);
$template_part1 = $x[0] . $place1;
 
// GRAB THE RIGHT PART OF THE STRING PLUS THE PLACEMARKER
$x = explode($place2, $template);
$template_part2 = $place2 . $x[1];
 
// REASSEMBLE THE FILE WITH THE NEWLY INSERTED STRING
$out_file = $template_part1 . $inserted . $template_part2;

Then you might echo the new menu to the screen or save it back to disk

if (!file_put_contents("menu2_functions.php", $out_file)) { /* error processing */ }

How cool is strtotime() ?

June 22nd, 2008

Find all the Fridays in a given month. Imagine this task without the strtotime function!

<?php
function find_fridays($date) {
  $fridays = array();
  $timestamp  = strtotime(date('Y-m-0', strtotime($date)));
  $fridays[0] = date('Y-m-d D', strtotime('first friday', $timestamp));
  $fridays[1] = date('Y-m-d D', strtotime('second friday', $timestamp));
  $fridays[2] = date('Y-m-d D', strtotime('third friday', $timestamp));
  $fridays[3] = date('Y-m-d D', strtotime('fourth friday', $timestamp));
  $fridays[4] = date('Y-m-d D', strtotime('fifth friday', $timestamp));
  if ( substr($fridays[4],5,2) != substr($fridays[0],5,2) ) { unset($fridays[4]); }
return $fridays;
}
 
function last_friday($date) {
  $fridays  = find_fridays($date);
  $num_fridays  = count($fridays) - 1;
return $fridays[$num_fridays];
}
 
echo "<pre>\n";
 
$my_fridays = find_fridays('today');
print_r($my_fridays);
var_dump(last_friday('today'));
 
$my_fridays = find_fridays('February 2008');
print_r($my_fridays);
var_dump(last_friday('Feb 2008'));
?>

A PHP Script that Exposes Itself

June 20th, 2008

When you are teaching PHP it’s useful to be able to illustrate the behavior of a script and also reveal the code behind the behavior. Here is a script that exposes itself on demand.

<?php // download myself
if ($_GET["q"] == "d") {
  $script  = $_SERVER["SCRIPT_FILENAME"];
  $string = file_get_contents($script);
  header("Content-type: text/plain");
  header("Content-disposition: attachment; filename=\"" . $script . "\";" );
  echo $string;
  flush();
} else {
  echo "<h2>This is a header</h2>\n";
  echo "<a href=\"$PHP_SELF?q=d\">To download this script, click here</a>\n";
}
die();
?>

Checkboxes - with a wave of your mouse

June 20th, 2008

This was not just an intellectual exercise to modify the predictable behavior of the checkbox; I really wanted to do this for a client who had a page with a huge collection of checkboxes. Instead of manually clicking each box, this code lets them run the mouse over the checkbox and it changes the status of the checkbox. Here are two versions. One of them is a simple inverter for on/off. The other uses control keys to determine on/off.

<?php
if (!empty($_POST["_submit"])) {
echo "<pre>"; var_dump($_POST); echo "</pre> \n";
}
?>
 
<html>
<body>
 
<form action="<?=$PHP_SELF?>" method="POST">
 
Group 1 - mouse over to check/uncheck<br/>
<input name=box1 onmouseover="if(this.checked == true){this.checked = false;} else {this.checked = true;}" type="checkbox"><br/>
<input name=box2 onmouseover="if(this.checked == true){this.checked = false;} else {this.checked = true;}" type="checkbox"><br/>
<input name=box3 onmouseover="if(this.checked == true){this.checked = false;} else {this.checked = true;}" type="checkbox"><br/>
<input name=box4 onmouseover="if(this.checked == true){this.checked = false;} else {this.checked = true;}" type="checkbox"><br/>
<input name=box5 onmouseover="if(this.checked == true){this.checked = false;} else {this.checked = true;}" type="checkbox"><br/>
 
<br/>
Group 2 - Hold Shift and mouse over to check, Hold Ctrl and mouse over to uncheck<br/>
<input name=cbox1 onmouseover="if(event.shiftKey){this.checked = true;} else if(event.ctrlKey){this.checked = false;};" type="checkbox"><br/>
<input name=cbox2 onmouseover="if(event.shiftKey){this.checked = true;} else if(event.ctrlKey){this.checked = false;};" type="checkbox"><br/>
<input name=cbox3 onmouseover="if(event.shiftKey){this.checked = true;} else if(event.ctrlKey){this.checked = false;};" type="checkbox"><br/>
<input name=cbox4 onmouseover="if(event.shiftKey){this.checked = true;} else if(event.ctrlKey){this.checked = false;};" type="checkbox"><br/>
<input name=cbox5 onmouseover="if(event.shiftKey){this.checked = true;} else if(event.ctrlKey){this.checked = false;};" type="checkbox"><br/>
 
<br />
<input type="submit" name="_submit"  value="go" />
 
</form>
</body>
</html>

Grab all the URLs that are linked from a web page

June 20th, 2008

Another EE query…

<?php
if (empty($_GET["u"])) {
  ?>
  <form action="<?=$PHP_SELF?>" method="get">
  URL (like http://www.google.com):
  <input type="text" name="u" value="http://" />
  <input type="submit" name="_submit" value="go" />
  </form>
  <?php
} else {
  $url  = $_GET["u"];
  if (!$html = file_get_contents($url)) {
    echo "<br /><a href=\"$PHP_SELF\">Try again</a>\n";
    die();
  }
  $html  = strip_tags($html, "<a><img>");
  preg_match_all("/(<([\w]+)[^>]*>)(.*)(<\/\\2>)/", $html, $matches, PREG_SET_ORDER);
  foreach ($matches as $value) {
    $link = htmlentities($value[0]);
    echo "<br />$link \n";
  }
}
?>

Find all the Saturdays in the last year

June 20th, 2008

Somebody on Experts Exchange asked how to find all the Saturdays in the last year. This was my first cut (quick and dirty mode) and it worked well enough. But it was inefficient and not well generalized.

<?php
$old_ts = strtotime("last year");
$new_ts = strtotime("today");
$saturdays = array();
while ($old_ts < $new_ts) {
  if (date("D", $old_ts) == "Sat" ) {
    $saturdays[] = date('Y-m-d', $old_ts); // ISO 8601 DATE
  }
  $old_ts = $old_ts + (60*60*24); // ONE DAY IN SECONDS
}
var_dump($saturdays);
?>

I took another swipe at it with this function.

  • Can be used for any day of the week
  • Can be used for any past period of time
  • More efficient - one-time computations out of loop

<?php
function pastdays($past, $days='Saturday') {
  $old_ts = strtotime("$past");
  $new_ts = strtotime("$days");
  if (date('l') != "$days") { $new_ts = strtotime("last $days"); }
  $daylist = array();
  $week = 60*60*24*7; // SEVEN DAYS IN SECONDS
  while ($old_ts <= $new_ts) {
    $daylist[] = date('Y-m-d D', strtotime("$days", $old_ts)); // ISO 8601 FORMAT DATE + DAY OF WEEK ABBR.
    $old_ts = $old_ts + $week;
  }
return($daylist);
} // !pastdays();
 
$fridays = pastdays("last year", "Friday");
echo "<pre>";
print_r($fridays);
?>