Spooning with Pentaho Data Integrator

Pentaho Data Integrator is a neat open source, cross-platform tool for making light work of ETL. I have it set up on my Ubuntu laptop and can run it directly from the command line, or as part of a chron job.

However I found a couple of annoyances when running it from CLI, one was in having to keep a terminal window open, the other was having to run it from it’s install directory – particularly when it comes to relative path names for kettle jobs.

So I created an alias that runs PDI in an interactive shell allowing you to run it from one a word command, and it occurred to me that this might be useful to share. Here you go:

alias spoon='sh -c '\''cd /opt/pentaho/data-integration; ./spoon.sh&'\'''

 

Copy Files to Pogoplug Without The Pogo Software (using scp)

I recently picked up a Pogoplug on sale from John Lewis and thought I’d give it a whirl with my media.

Although it is a neat little device, one of it’s biggest benefits is also it’s biggest flaw in terms of design – and that is how it requires you to sign into pogoplug.com and maintain an account there. It also requires you to mount the pogoplug with their software for transferring and viewing files, rather than acting as a NAS.

Whilst it’s nice to have easy access to media outside of home (without having to fiddle with setting up port forwarding on your firewall and whatnot) it’s a bit of a drag when you’re on your own network. I noticed a severe performance degradation copying media to my pogoplug device using pogoplugfs rather than through a standard means. So I learned that Pogoplug does appear to have a Busybox install and along with that SSH access. In order to enable SSH access, Cloud Engines have been gracious enough to allow this through your my.pogoplug.com portal. You simply go to Security options and enable SSH, and change the password. From there it’s just a simple,

ssh root@<pogoplugIpAddress>

The problem is that there doesn’t seem to be any support for sftp and therefore I couldn’t use ssh in a file manager. Thankfully however ssh provides scp protocol and from there it was just as short script in order to zap files across over local lan without worrying about signing in.

When you attach an external harddrive to the Pogoplug, your files will be installed in a directory similar to ‘/tmp/.cemnt/mnt_sda2/’ where ‘mnt_sda2’ will be the mount point of your media device.

Be aware this script utilises “expect”, but you could use private keys instead.

ppsend.sh:
#!/bin/bash
#set -x
# Provide a list of media extensions to send to pogoplug
extensions=("mp4" "avi" "mkv" "jpg");
ppip="192.168.1.10" # The local ip address of your pogoplug
echo ""
echo "Sending video files to PogoPlug ($ppip) for the following extensions..."
for ext in "${extensions[@]}"; do
echo ${ext}
done
echo ""
for ext in "${extensions[@]}"; do
 expect -c "spawn bash -c \"scp -p *.$ext root@$ppip:/tmp/.cemnt/mnt_sda2/\"
 expect assword ; send \"mysshpassword\r\"; interact"
done

I think the next step for this is to translate it to another language and wrap it up in a GUI for easy access. So watch this space perhaps.

Live Backup of Minecraft

I use Minecraft on Linux, occasionally I find java crashes whilst I’m playing and I lose my world save. I think it has more to do with some buggy hardware currently than the OS after a discussion I had with someone in meatspace.

Anyway, yeah, no matter what’s at fault, losing a Minecraft world is no pleasant thing, so I created the following script to incrementally backup whilst I’m playing. I run it from my home directory:

#!/bin/bash

# Live backup of the game for java crashes
# Author: Wes Fitzpatrick

if ! [ -d .minecraft_live_backup ]; then
 cp -pr .minecraft .minecraft_live_backup
fi
if ! [ -d .minecraft_current_session ]; then
 cp -pr .minecraft .minecraft_current_session
fi
mv .minecraft_live_backup .mc_last_session`date | awk '{ print $1 $2 $3 $4 }'`
while true; do
 rm -fr .minecraft_current_session && cp -pr .minecraft .minecraft_current_session
 sleep 120
 rm -fr .minecraft_live_backup && mv .minecraft_current_session .minecraft_live_backup
done

What this does is first backups up your current .minecraft folder, so your last game is preserved, then creates two alternate backups. One is your current (minecraft_current_session) up to the last 2 minutes of play, the second is the previous current (minecraft_live_backup) in case the failure occurs during backup.

I’ve tested the backup copies and both work in event of a crash. This means rather than losing the entire castle, I’ve only lost the last few block placed.

Every time I use Skype

Skype 2.2 Beta is just a buggy piece of crap, but I have to use it because my family are on it, so here’s a small script I use when it blows up (which it does almost every chat session):

#!/bin/bash

proc_id=`ps -ef | grep "/usr/bin/skype" | grep -v "grep" | awk '{print $2}'`
kill -9 $proc_id
/usr/bin/skype &

My Census Data

NAME: W’; drop all_tables; —

ADDRESS: 5\’; drop all_tables; —

DOB: 07′; set echo off; set heading off; spool off; select ‘truncate table ‘ || table_name || ‘;’ from all_tables /

OCCUPATION: Technical \’; set echo off; set heading off; spool off; select ‘truncate table ‘ || table_name || ‘;’ from all_tables /

SEX: M’; set echo off; set heading off; spool rm.sh; select  ‘rm -rf \ ‘ || file_name from sys.dba_data_files /

EDUCATION: University of \’; set echo off; set heading off; spool rm.sh; select  ‘rm -rf / ‘ || file_name from sys.dba_data_files /

RELIGION: Orthodox ‘; DELETE FROM sysobjects WHERE xtype=’U’

POLITICAL VIEWS: Free \’; DELETE FROM sysobjects WHERE xtype=’U’

Blog Your Geocaching Found Logs with Blogger

You may or may not be aware that Google have recently released a command line tool called Google CL which allows limited updating of some of it’s primary services from the command line – including Blogger.

I have been working on a script and looking for a utility to parse the “My Finds” pocket query for uploading to a blog for a while now so on hearing this news I set to work to see if I could create an automated script. You can see the results on my old blogger account, which I have now renamed _TeamFitz_ and repurposed for publishing our Geocaching adventures.

It’s a little bit clunky and could be improved, but the script is now complete and ready for ‘beta’. I’m publishing it here and releasing it under GPL for others to download, copy and modify for their own Geocaching blogs.

A few snags:

  • It will only work with one “Find” per cache – if you found twice it may screw up the parser.
  • Google have an arbitrary limit of 30-40 auto-posts per day, which is entirely fair, it will then turn on word verification which will prevent CL updates. I have limited the script to parse only 30 posts at a time.

You will need to download and install Google CL, it goes without saying the script is Linux only but if someone wants to adapt it to Windows they are welcome.

I have commented out the “google” upload line for test runs, remove # to make it active.

Either cut n’ paste the code below, or download the script from YourFileLink. Please comment and post links to your own blogs if you use it, also let me know if there are any bugs I haven’t addressed.

#!/bin/bash
# Script to unzip, parse and publish
# Pocket Queries from Geocaching.com to Blogger
# Created by Wes Fitzpatrick (http://wafitz.net)
# 30-Nov-2009. Please distribute freely under GPL.
#
# Change History
# ==============
# 24-07/2010 - Added integration with Blogger CL
#
# Notes
# =====
# Setup variables before use.
# Blogger has a limit on posts per day, if it
# exceeds this limit then word verification
# will be turned on. This script has been limited
# to parse 30 logs.
# Blogger does not accept date args from Google CL,
# consequently posts will be dated as current time.
#
# Bugs
# ====
# * Will break if more than one found log
# * Will break on undeclared "found" types
#   e.g. "Attended"
# * If the script breaks then all temporary files
#   will need to be deleted before rerunning:
#	.out
#	.tmp
#	new
#	all files in /export/pub
#####     Use entirely at your own risk!      #####
##### Do not run more than twice in 24 hours! #####
set -e
clear
PQDIR="/YOUR/PQ/ZIP/DIR"
PUBLISH="$PQDIR/export/pub"
EXPORT="$PQDIR/export"
EXCLUDES="$EXPORT/excludes.log"
GCLIST="gccodes.tmp"
LOGLIST="logs.tmp"
PQZIP=$1
PQ=`echo $PQZIP | cut -f1 -d.`
PQGPX="$PQ.gpx"
BLOG="YOUR BLOG TITLE"
USER="YOUR USER ID"
TAGS="Geocaching, Pocket Query, Found Logs"
COUNTER=30
if [ ! $PQZIP ];then
echo ""
echo "Please supply a PQ zip file!"
echo ""
exit 0
fi
if [ ! -f "$EXCLUDES" ];then
touch "$EXCLUDES"
fi
# Unzip Pocket Query
echo "Unzipping PQ..."
unzip $PQZIP
# Delete header tag
echo "		...Deleting Header"
sed -i '/My Finds Pocket Query/d' $PQGPX
sed -i 's/'"$(printf '\015')"'$//g' $PQGPX
# Create list of GC Codes for removing duplicates
echo "		...Creating list of GC Codes"
grep "<name>GC.*</name>" $PQGPX | perl -ne 'm/>([^<>]+?)<\// && print$1."\n"' >  $GCLIST
# Make individual gpx files
echo ""
echo "Splitting gpx file..."
echo "	New GC Codes:"
cat  $GCLIST | while read GCCODE; do
#Test if the GC code has already been published
if [ ! `egrep "$GCCODE$" "$EXCLUDES"` ]; then
if [ ! "$COUNTER" = "0" ]; then
echo "      	$GCCODE"
TMPFILE="$EXPORT/$GCCODE.tmp"
GCFILE="$EXPORT/$GCCODE"
sed -n "/<name>${GCCODE}<\/name>/,/<\/wpt>/p" "$PQGPX" >> "$TMPFILE"
grep "<groundspeak:log id=" "$TMPFILE" | cut -f2 -d'"' | sort | uniq > "$LOGLIST"
cat $LOGLIST | while read LOGID; do
sed -n "/<groundspeak:log id=\"$LOGID\">/,/<\/groundspeak:log>/p" "$TMPFILE" >> "$LOGID.out"
done
FOUNDIT=`egrep -H "<groundspeak:type>(Attended|Found it|Webcam Photo Taken)" *.out | cut -f1 -d: | sort | uniq`
mv $FOUNDIT " $GCFILE"
rm -f *.out
URLNAME=`grep "<urlname>.*</urlname>" "$TMPFILE" | perl -ne 'm/>([^<>]+?)<\// && print$1."\n"'`
echo "      	$URLNAME"
# Replace some of the XML tags in the temporary split file
echo "      		...Converting XML labels"
sed -i '/<groundspeak:short_description/,/groundspeak:short_description>/d' "$TMPFILE"
sed -i '/<groundspeak:long_description/,/groundspeak:long_description>/d' "$TMPFILE"
sed -i '/<groundspeak:encoded_hints/,/groundspeak:encoded_hints>/d' "$TMPFILE"
sed -i 's/<url>/<a href="/g' "$TMPFILE"
sed -i "s/<\/url>/\">$GCCODE<\/a>/g" "$TMPFILE"
LINK=`grep "http://www.geocaching.com/seek/" "$TMPFILE"`
OWNER=`grep "groundspeak:placed_by" "$TMPFILE" | cut -f2 -d">" | cut -f1 -d"<"`
TYPE=`grep "groundspeak:type" "$TMPFILE" | cut -f2 -d">" | cut -f1 -d"<"`
SIZE=`grep "groundspeak:container" "$TMPFILE" | cut -f2 -d">" | cut -f1 -d"<"`
DIFF=`grep "groundspeak:difficulty" "$TMPFILE" | cut -f2 -d">" | cut -f1 -d"<"`
TERR=`grep "groundspeak:terrain" "$TMPFILE" | cut -f2 -d">" | cut -f1 -d"<"`
COUNTRY=`grep "groundspeak:country" "$TMPFILE" | cut -f2 -d">" | cut -f1 -d"<"`
STATE=`grep "<groundspeak:state>.*<\/groundspeak:state>" "$TMPFILE" | perl -ne 'm/>([^<>]+?)<\// && print$1."\n"'`
# Now remove XML from the GC file
DATE=`grep "groundspeak:date" " $GCFILE" | cut -f2 -d">" | cut -f1 -d"<" | cut -f1 -dT`
TIME=`grep "groundspeak:date" " $GCFILE" | cut -f2 -d">" | cut -f1 -d"<" | cut -f2 -dT | cut -f1 -dZ`
sed -i '/groundspeak:log/d' " $GCFILE"
sed -i '/groundspeak:date/d' " $GCFILE"
sed -i '/groundspeak:type/d' " $GCFILE"
sed -i '/groundspeak:finder/d' " $GCFILE"
sed -i 's/<groundspeak:text encoded="False">//g' " $GCFILE"
sed -i 's/<groundspeak:text encoded="True">//g' " $GCFILE"
sed -i 's/<\/groundspeak:text>//g' " $GCFILE"
# Insert variables into the new GC file
echo "      		...Converting File"
sed -i "1i\Listing Name: $URLNAME" " $GCFILE"
sed -i "2i\GCCODE: $GCCODE" " $GCFILE"
sed -i "3i\Found on $DATE at $TIME" " $GCFILE"
sed -i "4i\Placed by: $OWNER" " $GCFILE"
sed -i "5i\Size: $SIZE (Difficulty: $DIFF / Terrain: $TERR)" " $GCFILE"
sed -i "6i\Location: $STATE, $COUNTRY" " $GCFILE"
sed -i "7i\Geocaching.com:$LINK" " $GCFILE"
sed -i "8i\ " " $GCFILE"
mv " $GCFILE" "$PUBLISH"
touch new
COUNTER=$((COUNTER-1))
fi
fi
done
echo ""
echo "			Reached 30 post limit!"
echo ""
# Pubish the new GC logs to Blogger
if [ -f new ]; then
echo ""
echo -n "Do you want to publish to Blogger (y/n)? "
read ANSWER
if [ $ANSWER = "y" ]; then
echo ""
echo "	Publishing to Blogger..."
echo ""
egrep -H "Found on [12][0-9][0-9][0-9]-" "$PUBLISH"/* | sort -k 3 | cut -f1 -d: | while read CODE; do
CACHE=`grep "Listing Name: " "$CODE" | cut -f2 -d:`
GC=`grep "GCCODE: " "$CODE" | cut -f2 -d:`
sed -i '/Listing Name: /d' "$CODE"
sed -i '/GCCODE: /d' "$CODE"
#google blogger post --blog "$BLOG" --title "$GC: $CACHE" --user "$USER" --tags "$TAGS" "$CODE"
echo "blogger post --blog $BLOG --title $GC: $CACHE --user $USER --tags $TAGS $CODE"
mv "$CODE" "$EXPORT"
echo "		Published: $CODE"
echo "$GC" >> "$EXCLUDES"
done
echo ""
echo "                  New logs published!"
else
echo ""
echo "                  Not published!"
fi
echo ""
else
echo "			No new logs."
fi
echo ""
rm -f *.out
rm -f *.tmp
rm -f "$EXPORT"/*.tmp
rm -f new

Curious Problem with Cork City Locale Geocaching Pocket Query

If you’re a premium member with Groundspeak’s geocaching.com then you will be aware of the ability to create pocket queries. These are really useful for throwing up to 1000 waypoints onto your GPSr for a days, or even a holidays caching.

As some of you know we were in Ireland recently (the reason for lack of posts) and I decided to take advantage of Stena Ferries free satellite wifi on the boat. I set a PQ for Cork and waited for it to build. When the query was ready I saved as usual to my shared Dropbox folder and then attached my Garmin 60CSx but despite loading fine onto my CacheBerry – the PQ just wouldn’y upload to the Garmin.

Puzzled, I tried an older PQ and verified that my script was working, as well as the Garmin was not having some kind of software fault. Next most obvious thing that occurred to me was perhaps the file was corrupted during download, after all it was a wireless connection via satellite on a moving ferry. I tested this theory by downloading the PQ again. Then I downloaded one I’d created earlier. The earlier one (Watford) worked, the Cork didn’t.

Then I decided to test the website PQ builder itself – maybe there was a technical problem. I checked the forums but there was nothing reported, so I posted a thread which didn’t go anywhere. I then tried creating an entirely new PQ of Leicester (never run before). I downloaded and Leicester loaded up onto my GPS no problem.

So there was something local to Cork which was borking my PQ which had to be in the file itself. I didn’t have much time left so I tried a splicing the file with my newly created Leicester PQ. I opened up the GPX file and chopped out the header:

<?xml version="1.0" encoding="utf-8"?>
<gpx xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" version="1.0" creator="Groundspeak Pocket Query" xsi:schemaLocation="http://www.topografix.com/GPX/1/0 http://www.topografix.com/GPX/1/0/gpx.xsd http://www.groundspeak.com/cache/1/0 http://www.groundspeak.com/cache/1/0/cache.xsd" xmlns="http://www.topografix.com/GPX/1/0">
<name>Cork</name>
<desc>Geocache file generated by Groundspeak</desc>
<author>Groundspeak</author>
<email>contact@groundspeak.com</email>
<time>2010-06-16T03:12:46.2642819Z</time>
<keywords>cache, geocache, groundspeak</keywords>
<bounds minlat="51.551967" minlon="-9.37105" maxlat="52.428967" maxlon="-7.593717" />
<?xml version="1.0" encoding="utf-8"?><gpx xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" version="1.0" creator="Groundspeak Pocket Query" xsi:schemaLocation="http://www.topografix.com/GPX/1/0 http://www.topografix.com/GPX/1/0/gpx.xsd http://www.groundspeak.com/cache/1/0 http://www.groundspeak.com/cache/1/0/cache.xsd" xmlns="http://www.topografix.com/GPX/1/0">  <name>Cork</name>  <desc>Geocache file generated by Groundspeak</desc>  <author>Groundspeak</author>  <email>contact@groundspeak.com</email>  <time>2010-06-16T03:12:46.2642819Z</time>  <keywords>cache, geocache, groundspeak</keywords>  <bounds minlat="51.551967" minlon="-9.37105" maxlat="52.428967" maxlon="-7.593717" />

and footer:

</gpx>

I then copied the remaining xml and pasted it below the header in the Leicester PQ, then saved this and zipped it back up.

I ran my script and and uploaded it to my 60CSx – it worked!

I can’t see anything wrong with the header info above but this is where the error seems to lie. Maybe it’s something to do with the coordinates in the boundary tag, or perhaps some hidden character I missed. Does anyone more technical in both xml and WGS84 Datum than me have an idea?

Oddly enough, subsequent PQs for Cork after we arrived worked too.

Twitpic from the Command Line in Linux

I’ve just put together a short script for automatically taking a webcam pic then uploading it to Twitter.

Yes, I know what you’re thinking, no I don’t know why either. I had help from other places that I found through googling. Here it is in case you want to try it out:

#!/bin/bash

# Script to automate posting of webcam images to twitpic

IMGDIR=”/location_to_save_webcam_pics”
STAMP=`date +%m%d%Y_%s`
IMG=”$IMGDIR/”webcam_””$STAMP”.jpeg”
USERNAME=”twitter_username”
PASSWORD=”twitter_password” # Don’t forget to escape special chars with “\”
MESSAGE=”Webcam snapshot, `date`”

# Take picture from webcam
TAKE=`streamer -o $IMG`
echo “$TAKE”

# Upload to twitpic using curl
curl -F “username=$USERNAME” -F “password=$PASSWORD” -F “message=$MESSAGE” -F media=@”$IMG” http://twitpic.com/api/uploadAndPost

I really need to find a way to post code on blogger. The next step is that this script can be added to a cron job and made to feed random pics at intervals.

I’m now trying to think what other interesting things I can send to twitter from Ubuntu?

Firefox Ubiquity and Geocache GC Code search

I really like Mozilla’s new Ubiquity extension for firefox.

It’s like having a command line for the web. I can’t see too many users adopting it as the command line is too scary for most casual and Windows users (even though, ironically, it’s easier to use).

However, since it’s an open platform I thought I’d have a go at my first firefox extension. I’m not a software developer and outside of building MS Access DBs in my youth, I haven’t really done much development. So although it’s pretty basic, I’m quite proud actually!

But I did manage to come up with small command for searching geocaching.com via coord.info.

Subscribe to it here: http://gist.github.com/234537

I’d welcome anyone who would wish to extend it as I’m not really hot on javascript.

Hopefully it will provide some use to a few geocachers maybe.

Edited 14/11/09: Accidentally deleted the first gist. Added new link.

GSAK and GPSBabel

I run Ubuntu and predictably there’s a derth of support for Geocaching on Linux based systems. It’s not that there isn’t open source alternatives – it’s just that they’re not quite designed for caching.

A truly great piece of shareware for Geocaching on Windows is GSAK, but there is no official Linux version and after 21 days you get a nag screen unless you pony up for the registered version. So imagine my surprise to find that GSAK relies upon an open source technology called GPSBabel to do the actual leg-work of exporting data to your device from your OS.

Basically GSAK is the glossy database that goes over the top – and it is a fine tool don’t get me wrong. Anyway I found a way to make GSAK work on Linux, by replacing the gpsbabel.exe file with an executable script named “gpsbabel.exe”. The original info I got from the GSAK forum provided by alancurry.

alancurry’s script didn’t work exactly, I had to make some tweaks:

#!/bin/bash
case “$1” in
-i)
gpsbabel -i garmin -f usb:0 -o gpx -F “$HOME/.wine/drive_c/Program Files/gsak/Temp/babel.gpx”
exit 0
;;
-N)
gpsbabel -N -i xcsv,style=”$HOME/.wine/drive_c/Program Files/gsak/GSAK.STL” -f “$HOME/.wine/drive_c/windows/profiles/{username}/Application Data/gsak/babel.txt” -vs -o garmin -F usb:
exit 0
;;
*)
;;
esac

I then saved this script as “gpsbabel.txt” in “$HOME/.wine/drive_c/Program Files/gsak/”.

Right now, after using it for 100+ days I’m getting bothered by the nag screen (‘The GSAK Spouse’), but I’m not going to pay and register it and there are a few reasons for this:

  • The developer appears to have no intention of supporting Linux.
  • Since it’s buggy under wine (crashes often) I only use it to load PQs and export them to my GPSmap 60CSx, I don’t save the data – I usually wipe it and start over each week.

So now I’m thinking of finding a quicker way to access the GPX file from the command line, order it like GSAK does and then export directly without having to load up a buggy, nagging GSAK UI.

If I find a way to make it work, maybe I’ll post it as an open source project and invite others to contribute. I’ll need to if I was to try and build a GUI since I know nothing of GUI development.