Undoubtedly I am not the only one who has noticed that driving your kid to sleep-away camp can cost $100 in gasoline alone, so the stimulus to carpool is notable. But how do you find other folks from your home town who are going to the same camp? I decided this was a good web application.
But wait, you say, “It’s been done. There are hundreds of thousands of listings for “Virginia ride share” alone.” And you are right. But those sites are limited in their ability and reach when it comes to this sort of application. Here is what makes carpooling to summer camp unique.
- A pre-constituted community of campers
- A common destination and arrival time
- A common departure and return time
- A one-time opportunity to avoid a substantial expense
In other words, everyday carpooling is different from carpooling to camp.
First, I needed some basic geo-location data. I figured that a Zipcode-to-Latitude/Longitude pair would be a good way to get started. This led to a LOT of Google searching. The US Geological Service or the Census Bureau or the Postal Service would seem to be the logical place to look. Wrong.
For my first try at getting useful data, I went to http://www.populardata.com but I found a few missing lat/lon pairs. Next I went to NerdVittles and grabbed two zip files that, taken together, contain a MySQL dump of the zip code data base. Nerdvittles apparently took Novak’s file and made it into MySQL statements. There is a license document, available here: http://nerdvittles.com/trixbox123/ZIP_CODES_license.pdf. The zipcodes download is available here: http://nerdvittles.com/trixbox123/zipcodes.zip
The zipcodes files were on an interesting page. Gotta go back there and read about that phone-weather-zipcode technology! http://nerdvittles.com/index.php?p=160
Moving right along, the next thing I needed to do was load the data into a data base. This came along in the zip code files:
CREATE TABLE IF NOT EXISTS `zipcodes` (
`id` mediumint(6) NOT NULL auto_increment,
`zip` varchar(5) NOT NULL default '',
`latitude` varchar(11) NOT NULL default '',
`longitude` varchar(11) NOT NULL default '',
`city` varchar(40) NOT NULL default '',
`state` char(2) NOT NULL default '',
`fullstate` varchar(30) NOT NULL default '',
`county` varchar(40) NOT NULL default '',
`zipclass` varchar(20) NOT NULL default '',
PRIMARY KEY (`id`),
KEY `zip` (`zip`),
KEY `city` (`city`,`state`),
KEY `state` (`state`,`city`)
) ENGINE=MyISAM DEFAULT CHARSET=ascii AUTO_INCREMENT=42742 ;
I copied that into a PHP script, but because I wanted to do math with the lat/lon pairs, I changed their definition.
`latitude` decimal(16,8),
`longitude` decimal(16,8),
Then I attached all the necessary DB-connect stuff, named it db_create.php, stored it on the server and ran it to create the zipcodes table.
Here is what the payload information looks like:
INSERT INTO `zipcodes` VALUES (1, '00501', '+40.922326', '-072.637078', 'HOLTSVILLE', 'NY', 'NEW YORK', 'SUFFOLK', 'UNIQUE');
INSERT INTO `zipcodes` VALUES (8153, '20016', '+38.938260', '-077.091149', 'WASHINGTON', 'DC', 'D.C.', 'DISTRICT OF COLUMBIA', 'STANDARD');
INSERT INTO `zipcodes` VALUES (42741, '99950', '+55.542007', '-131.432682', 'KETCHIKAN', 'AK', 'ALASKA', 'KETCHIKAN GATEWAY', 'STANDARD');
By actual count, we have 42,740 unique zip codes. There is one key missing at #22001. Doesn’t seem like a big deal for my needs. I put this text file on the server and named it db_data.txt.
Next I ran this script to load the data base.
<?php
require_once('../root/db_link.php');
$file = fopen('db_data.txt', 'r');
while (!feof($file)) { // READ THE INSERT STATEMENTS
$sql = fgets($file);
if (!$result = mysql_query($sql)) { die(mysql_error()); }
}
echo "Done!";
?>
There was a blank line at the end of the data, so I got “Query was empty” instead of “Done!” I used phpMyAdmin to check the cardinality of the data base. It was 42,740 — close enough to the right number of US zip code records.
I wanted to test a quick query of the data base, so I wrote up db_query.php, thus:
<?php
function get_clean_positive_integer_string($string) { // FORCE IT ALL NUMERIC
$new = trim(ereg_replace("[^0-9]", "", $string));
return ( $new );
}
require_once('../root/db_link.php');
if (!empty($_GET["_submit"])) {
$zip = get_clean_positive_integer_string($_GET["geo_zip"]);
$sql = "SELECT * FROM zipcodes WHERE zip=\"$zip\" LIMIT 1";
if (!$result = mysql_query($sql)) { die(mysql_error()); }
$row = mysql_fetch_assoc($result);
foreach ($row as $k => $v) { echo "<br />$k => $v \n"; }
}
?>
<html><head><title>Zip Test</title></head>
<body>
<form method="GET">
Zip:
<input type="text" name="geo_zip" autocomplete="off" />
<input type="submit" value="go" name="_submit" />
</form>
</body>
</html>
This seemed to work fine. I put in 22101 and out came this:
id => 9151
zip => 22101
latitude => +38.93520200
longitude => -077.18672000
city => MC LEAN
state => VA
fullstate => VIRGINIA
county => FAIRFAX
zipclass => STANDARD
Now I needed a way to figure out the distance between zip codes using their lat/lon pairs. This seemed to be helpful: http://en.wikipedia.org/wiki/Great-circle_distance and I found the code to execute those formulae on the Experts Exchange (see the compute_distance() function in the code below).
<?php
function get_clean_positive_integer_string($string) { // FORCE IT ALL NUMERIC
$new = trim(ereg_replace("[^0-9]", "", $string));
return ( $new );
}
function compute_distance($from_lat, $from_lon, $to_lat, $to_lon) {
$from_lat = floatval($from_lat);
$from_lon = floatval($from_lon);
$to_lat = floatval($to_lat);
$to_lon = floatval($to_lon);
$dist = acos(sin(deg2rad($from_lat))
* sin(deg2rad($to_lat))
+ cos(deg2rad($from_lat))
* cos(deg2rad($to_lat))
* cos(deg2rad($from_lon - $to_lon)));
$dist = rad2deg($dist);
$miles = (float) $dist * 69;
return round($miles,1);
// To get kilometers, multiply miles by 1.61
// $km = (float) $miles * 1.61;
// return round($km,1);
}
require_once('../root/db_link.php');
if (!empty($_GET["_submit"])) {
$zip_from = get_clean_positive_integer_string($_GET["geo_zip_from"]);
$zip_to = get_clean_positive_integer_string($_GET["geo_zip_to"]);
$sql = "SELECT * FROM zipcodes WHERE zip=\"$zip_from\" LIMIT 1";
if (!$r = mysql_query($sql)) { die(mysql_error()); }
$from = mysql_fetch_assoc($r);
$sql = "SELECT * FROM zipcodes WHERE zip=\"$zip_to\" LIMIT 1";
if (!$r = mysql_query($sql)) { die(mysql_error()); }
$to = mysql_fetch_assoc($r);
if ( empty($from) ) {
echo "<br />ERROR in From Zip: $zip_from \n";
}
if ( empty($to) ) {
echo "<br />ERROR in To Zip: $zip_to \n";
}
if (!empty($from) && !empty($to)) {
$miles = compute_distance($from["latitude"], $from["longitude"], $to["latitude"], $to["longitude"]);
echo "<br />Distance from $zip_from to $zip_to is $miles miles \n";
echo "<pre> FROM\n";
print_r($from);
echo "\n\n TO\n";
print_r($to);
echo "</pre>\n";
}
}
?>
<html><head><title>Zip Distance Test</title></head>
<body>
<form method="get">
From Zip:
<input type="text" name="geo_zip_from" autocomplete="off" />
To Zip:
<input type="text" name="geo_zip_to" autocomplete="off" />
<input type="submit" value="go" name="_submit" />
</form>
</body>
</html>
Now we were getting somewhere! I checked the zip codes for the National Presbyterian Church in Washington, DC and First Presbyterian Church in Pittsburgh, PA. The output looked like this:
Distance from 20016 to 15222 is 186.9 miles
FROM
Array
(
[id] => 8153
[zip] => 20016
[latitude] => +38.93826000
[longitude] => -077.09114900
[city] => WASHINGTON
[state] => DC
[fullstate] => D.C.
[county] => DISTRICT OF COLUMBIA
[zipclass] => STANDARD
)
TO
Array
(
[id] => 5945
[zip] => 15222
[latitude] => +40.43443600
[longitude] => -080.02481700
[city] => PITTSBURGH
[state] => PA
[fullstate] => PENNSYLVANIA
[county] => ALLEGHENY
[zipclass] => STANDARD
)
For a quick reality check, I headed over to a Google Maps Distance Calculator on daftlogic.com and got 187.157 miles. Pretty close.
But if you ask the same question of Google Maps when you are looking for driving directions, you get 237 miles, a difference of about 25%. The point-to-point distance calculation will invariably understate the driving distance. There are a number of reasons for anomalies in the output.
For one thing, the distance calculations compute minimum distances between places that are roughly the central points of the zip code area, “as the crow flies,” if you will. But you and I don’t drive on the great circle route. We drive on roads, and if we ever turn the steering wheel, our odometer mileage is certain to be greater than the distance reported by the compute_distance() function. If we have to travel a lot of tortuous routes, we will find that the program may have underestimated the mileage by a lot. I have tinkered with the parameters a little, but have not come to the conclusion that I can just add 25% to the computed mileage. It’s not that simple. In the Northeast where roads are wiggly the underestimation is greater. In the Midwest where roads are straight the estimates tend to be more accurate. In the mountains where the roads are all switchbacks, the estimates are pretty much wrong. It’s not a show stopper; just something to keep in mind.
For another thing, a zip code is not the same as the place you are going — it is a collection of streets, postal routes if you will. Even the US Postal Service may have more than one office in the same zip code. And in geographically large zip codes there will be some increased chance that the computed distances will not reflect your personal travel mileage. This phenomenon may be magnified if the zip codes are close to each other. Your point of origin could be one street away from a destination that just happened to be over the line into a different zip code, and the program would overestimate your mileage. Or you could be traveling between opposite ends of the same zip code. The program would tell you that your mileage is zero. But again, these are not show-stoppers, just bumps in the road. And our goal is to organize one-time carpools for long distances, so we are still going to be able to create a useful tool.
I decided these error phenomena are within the acceptable locus of confusion, and development can continue. But at the same time, I am keeping the compute_distance() function isolated, so I can modify it if it looks like it’s going to produce significant incorrout. One thought might be to compare driving mileage to computed distance for each zip code pair, but I currently don’t have an automated way to get the driving mileage. Full of optimism, I posted a question on EE about Google driving distances.
Next, I decided I would like to be able to understand the lat/lon pairs, so I coded a script that ignored the data base and just took the lat/lon pairs from an input form.
<?php
function get_clean_numeric_string($string) { // FORCE IT ALL FLOATING POINT NUMERIC
$new = trim(ereg_replace("[^0-9\.\-]", "", $string));
return ( $new );
}
function compute_distance($from_lat, $from_lon, $to_lat, $to_lon) {
$from_lat = floatval($from_lat);
$from_lon = floatval($from_lon);
$to_lat = floatval($to_lat);
$to_lon = floatval($to_lon);
$dist = acos(sin(deg2rad($from_lat))
* sin(deg2rad($to_lat))
+ cos(deg2rad($from_lat))
* cos(deg2rad($to_lat))
* cos(deg2rad($from_lon - $to_lon)));
$dist = rad2deg($dist);
$miles = (float) $dist * 69;
return round($miles,1);
// To get kilometers, multiply miles by 1.61
// $km = (float) $miles * 1.61;
// return round($km,1);
}
// NOT NEEDED HERE require_once('../root/db_link.php');
if (!empty($_GET["_submit"])) {
$lat_from = get_clean_numeric_string($_GET["geo_lat_from"]);
$lat_to = get_clean_numeric_string($_GET["geo_lat_to"]);
$lon_from = get_clean_numeric_string($_GET["geo_lon_from"]);
$lon_to = get_clean_numeric_string($_GET["geo_lon_to"]);
$miles = compute_distance($lat_from, $lon_from, $lat_to, $lon_to);
echo "<br />Distance from [$lat_from / $lon_from] to [$lat_to / $lon_to] is $miles miles \n";
}
?>
<html><head><title>LAT/LON Distance Test</title></head>
<body>
<form method="get">
From<br />
Lat:
<input type="text" name="geo_lat_from" autocomplete="off" />
Lon:
<input type="text" name="geo_lon_from" autocomplete="off" />
<br /><br />
To <br />
Lat:
<input type="text" name="geo_lat_to" autocomplete="off" />
Lon:
<input type="text" name="geo_lon_to" autocomplete="off" />
<br /><br />
<input type="submit" value="go" name="_submit" />
</form>
</body>
</html>
Using that, I was able to figure out that in most of North America you can get about a 30 mile radius if you go +/- 0.2 on the latitude and longitude. So my locale can now be extracted with a couple of SQL queries. Using these, I should be able to find all the nearby zip codes and the distance to each. Here’s the first cut at the code…
<?php
function get_clean_positive_integer_string($string) { // FORCE IT ALL NUMERIC
$new = trim(ereg_replace("[^0-9$]", "", $string));
return ( $new );
}
function compute_distance($from_lat, $from_lon, $to_lat, $to_lon) {
$from_lat = floatval($from_lat);
$from_lon = floatval($from_lon);
$to_lat = floatval($to_lat);
$to_lon = floatval($to_lon);
$dist = acos(sin(deg2rad($from_lat))
* sin(deg2rad($to_lat))
+ cos(deg2rad($from_lat))
* cos(deg2rad($to_lat))
* cos(deg2rad($from_lon - $to_lon)));
$dist = rad2deg($dist);
$miles = (float) $dist * 69;
return round($miles,2);
// To get kilometers, multiply miles by 1.61
// $km = (float) $miles * 1.61;
// return round($km,2);
}
// CONSTANTS FOR SOME ROOM AROUND OUR ZIP CODE
$latitude_offset = 0.20000;
$longitude_offset = 0.20000;
require_once('../root/db_link.php');
if (!empty($_GET["_submit"])) {
$zip_from = get_clean_positive_integer_string($_GET["geo_zip_from"]);
$sql = "SELECT * FROM zipcodes WHERE zip=\"$zip_from\" LIMIT 1";
if (!$r = mysql_query($sql)) { die(mysql_error()); }
$from = mysql_fetch_assoc($r);
if ( empty($from) ) {
echo "<br />ERROR in From Zip: $zip_from \n";
}
extract($from);
if ($latitude == 0.0) { echo "<br />ERROR: ZIP $zip_from HAS NO PHYSICAL LOCATION<br /><br />\n"; }
// COMPUTE THE MIN AND MAX LAT/LON PAIRS
$minlat = $latitude - $latitude_offset;
$maxlat = $latitude + $latitude_offset;
$minlon = $longitude - $longitude_offset;
$maxlon = $longitude + $longitude_offset;
// SAVE MY STARTING POINT LAT/LON PAIR
$my_lat = $latitude;
$my_lon = $longitude;
// FIND NEARBY ZIP CODES
$sql = "CREATE TEMPORARY TABLE local_zips ( ";
$sql .= "distance decimal(6,2) ) SELECT * FROM zipcodes WHERE ";
$sql .= "latitude >= $minlat AND latitude <= $maxlat AND longitude >= $minlon AND longitude <= $maxlon ";
$sql .= "AND latitude <> 0.0 AND longitude <> 0.0 ";
if (!$s = mysql_query($sql)) { die(mysql_error()); }
// COMPUTE THE DISTANCE FOR EACH LOCAL ZIP CODE
$sql = "SELECT * FROM local_zips";
if (!$s = mysql_query($sql)) { die(mysql_error()); }
while ($row = mysql_fetch_assoc($s)) {
extract ($row);
$distance = compute_distance( $my_lat, $my_lon, $latitude, $longitude );
$usql = "UPDATE local_zips SET distance = $distance WHERE id = $id LIMIT 1";
if (!$u = mysql_query($usql)) { die(mysql_error()); }
}
// PRINT OUT THE NEARBY ZIP CODES AND THE DISTANCES
$sql = "SELECT * FROM local_zips ORDER BY distance ASC";
if (!$s = mysql_query($sql)) { die(mysql_error()); }
while ($row = mysql_fetch_assoc($s)) {
extract ($row);
echo "<br /><a target=\"_blank\" href=\"http://maps.google.com/maps?f=q&hl=en&geocode=&q=" . $zip . "&ie=UTF8&z=13&iwloc=addr\">$zip</a> is $distance miles away \n";
}
}
?>
<html><head><title>Zip Locality Test</title></head>
<body>
<form method="get">
Zip:
<input type="text" name="geo_zip_from" autocomplete="off" />
<input type="submit" value="go" name="_submit" />
</form>
</body>
</html>
At this point, I decided I would take a break from coding and look at the quality of the data. I ran a query to pick out all the places that did not have a lat/lon pair. APO and FPO zip codes don’t count. And I wasn’t going to worry about Pahrump, Nevada. But there were 153 omissions. Some were PO BOX ONLY, but many were STANDARD. (Curiously, zip code 20500, was among the missing, but I guess you don’t have to carpool to Camp David). Still, with only 153 holes in over 40,000 records the data quality was not too bad. Maybe I will come back to the quality issue and see if I can fix up some of the omissions. But for now, I’ll go ahead with the appdev.