Top 8 Unix Commands for the Developer

As a developer, there are certain UNIX commands you find yourself typing repeatedly. Whether it’s to debug a production issue or just modifying some files, these commands have helped me do my job time and time again. Here’s my top 8:

  1. grep – Prints the lines that match the pattern provided in the files specified
    • Usage: grep <options> <pattern> <files>
    • Example: grep -n Exception production.log
      • Prints all the line (showing line numbers) in the file production.log that contain the string ‘Exception’
  2. tail – Only interested in a the last couple of lines in a file? tail allows you to quickly view the end of the file
    • Usage: tail <options> <file>
    • Example: tail -fn100 production.log
      • Shows the last 100 lines of the log and waits to display any new text appended to the file
  3. ssh – Log into remote servers
    • Usage: ssh -p<port> <username>@<hostname>
    • Example: ssh -p1234 [email protected]
      • Logs into the server named production on port 1234
  4. scp – Copies files to/from remote servers
    • Usage: scp -P<port> <source> <target>
    • Example: scp -P1234 /home/theo/myfile.txt production@/home/jsmith
      • Copies myfile.txt from /home/theo to the server named production under /home/jsmith
  5. rm – Deletes stuff!
    • Usage: rm <options> <file>
    • Example: rm -rf mydir
      • Removes the entire directory and files with no prompt for confirmation (Use with caution!)
  6. ps – Shows process status
    • Usage: ps <options>
    • Example: ps aux
      • Displays the process status of processes for all users including those that are controlled by a terminal (system processes) sorted by CPU usage
  7. top – Similar to ps but it periodically updates the information such as CPU and memory usage
    • Usage: top
    • Example: top (duh!)
  8. kill – terminates a process
    • Usage: kill <option> <pid>
    • Example: kill -9 12345
      • Terminates the process with id of 12345 using a non-catchable, non-ignorable signal (that just means you REALLY mean to kill it)

I use lots of these commands in combination. For example, if tomcat seems to hang and won’t properly shut down I would do the following:

  >> ps aux | grep tomcat

I would then take the pid of tomcat and run:

  >> kill -9 <tomcat-pid>

Now you may be wondering why the “Top 8”, why not “Top 10”. Well, because 8 is the new 10 and those are all UNIX commands I know :).

What are some of the commands that you use to get through the day?

Top 8 Unix Commands for the Developer

Open Source and Caching Algorithms

I wanted to go through the exercise of contributing to open source with a project of my own. After thinking about it for probably 15 minutes, I decided I wanted to try to build my own caching system in Java. Too bad I knew next to nothing about caching. I went off and did some research.

There are certain known algorithms that have become popular when implementing caches. Given that caches have a finite size (either you run out of space or memory), the cache algorithms are used to manage the cache. These algorithms determine things like how long an item remains in the cache and what gets booted out of the cache when it reaches its maximum size. Wikipedia describes the most efficient caching algorithm “would be to always discard the information that will not be needed for the longest time in the future”. You need to take a look at the data you want to cache before deciding on a caching strategy. Do you need to support random access (the access to the data is uniformly distributed) or sequential access ( you’re interested in large chunks of data at a time)? Is certain data accessed more often that other pieces of data?

Here’s a couple common algorithms:

  • Least Recently Used (LRU) – the items that haven’t been accessed the longest get the boot first. This is implemented by keeping a timestamp for all items in the cache. Check out this simple LRU implementation.
  • Least Frequently Used (LFU) – the items that are sitting in the cache but have been accessed the least are booted out first. This is implemented by a counter to see how often an item is accessed.
  • First In First Out (FIFO) – the item that first entered the cache is the first to go when it gets full. This can be easily implemented by a queue.

Of course, there are projects like EHCache and OSCache out there that have addressed this issue.

OSCache provides a FIFO and a LRU implementation of a cache.

In addition to FIFO and LRU, EHCache provides a LFU implementation of a cache.

Thinking about how these algorithms work, it is easy to see that there are certain cases where using one over the other provides a great advantage. For example in the case of LRU, which seems to be the widely accepted and most used caching algorithm, this cache works great when the majority of the hits come to a very concentrated group of items. This way, most hits, if not all, are retrieved from the cache. However, if there is a large scan of all the data, once the cache reaches its max size LRU will just remove items out on every hit. If the cache can hold a max of 50 items and you have 100 records, as you iterate over the 100 records, the cache will empty out the first 50 records to put in the second half of the records, resulting in lots of add/removing to the cache and 0 cache hits. Algorithms that prevent this from happening, like LFU, are known as scan-resistant.

I was interested in finding if there was some middle ground that gave me the best of both worlds LRU and LFU. It turns out there is.

The algorithm is known as Adaptive Replacement Cache (ARC). It gives you the benefits of LRU as well does a balancing act to prevent data scans from polluting the cache. It does by keeping track of two lists, one for recently references items and another or frequently referenced items. If you read about it, it’s a pretty cool algorithm.

I was excited when I came across this algorithm because I thought it would make such a fine addition as an open source project. And then I discovered it was patented. Apparently, PostgreSQL already went through this exercise and deemed it safer to not use it.

So, now I’m thinking I need a new idea for a project.

Open Source and Caching Algorithms

A Case for Agile: Benefits for a Programmer’s Career

After graduating college as a computer science major and preparing to enter the work force, I found that there is a big gap between what you learn in college and the practice of being a software engineer.

In college, you learn the the basics of data structures, OOP, and algorithms. But being a software engineer is so much more than that. At my first job out of college, I was lucky enough to be a part of a great team of engineers. They understood the value of best practices, introducing me to design patterns, the power of open source, and how to approach hard problems.

However, it was very one dimensional. While I improved my technical experience, I really didn’t get a chance to be exposed to other perspectives. What I mean is it was very much like college again. My work was my own work. I studied, researched, and provided answers. I didn’t really collaborate with my coworkers. I was given an assignment and an estimated deadline and I would do it kind of like homework.

At my second and current job, I’ve been introduced to and been practicing agile software development, more specifically Scrum, on a daily basis. I’ve seen the HUGE personal benefits that a software engineer, like myself, can take away from developing software this way.

I’ll break it down through what is known as the four values of Agile Software Development.

Individuals and interactions over processes and tools
While being exposed to version control, bug tracking, and continuous integration systems is great for a resume, working with other human beings is much more rewarding and fun. You develop strong relationships and you are able to learn so much from other people’s experiences and perspectives. Pair programming has helped me develop a better understanding of what I don’t know and an even stronger understanding of what I already know. And it’s not only other developers you develop these relationships with. As requirements change, you interact with people from marketing, product, legal, qa, project managers, designers, and the list goes one. This is something you would never get as a programmer at an IT shop that practiced something like the Waterfall method for development, especially early in your career. You will be pigeon-holed into just being code monkey by the time the requirements get to you. Being able to interact with different types of people allows you to grow your network and it exposes you to different aspects of a business.

Working software over comprehensive documentation
As developers, we just like to get things done. We have a tendency to measure productivity with written code. With agile, we can embrace this 100%. I get to concentrate on writing bug fixes, add new features, refactor code, improving the build process rather than documenting some API that no one will ever need since the code is self-documenting. I take great joy in actually doing what I love doing and that’s coding. However, it is a fallacy to interpret this value to mean I don’t have to document anything. But whenever there is no documentation, don’t blame agile. Just rely on the first value and ask someone what that method does.

Customer collaboration over contract negotiation
Developers don’t usually have face to face time with actual paying customers. Usually the “customer” is some other department in the company asking for a new development feature. The main point to be made here is that you want to engage the stakeholders. Not only do you develop relationships, you gather in-depth knowledge of what problems other people are facing on a daily basis. Understanding requirements is often the most difficult part of developing software. As requirements change, with constant engagement from the customer you have a better chance at meeting those requirements. The other piece of this value has to do with deadlines. Contracts are hard deadlines. In the software development, with all its moving pieces, it’s pretty silly to commit to arbitrary dates six months from today. But if you engage stakeholders into your development process, they have clear visibility into the progress of the project. You don’t have to make excuses for why the project is late. They already know what you’re working on and where you are. How does this benefit you? Well, you get to deliver software that actually meets the customer’s needs. Who’s the rock star now?

Responding to change over following a plan
Nothing is ever set in stone. Being able to respond to the evolution of a project is critical to a project’s success. If you are able to learn to design programs that have the flexibility to adapt to your business, that is one awesome accomplishment. It’s a great feeling to be able to say to a product manager that his requested change is already supported by the system, it’s just a configuration change. The language you are using now is most likely not the one you will be using in the future, if you choose to remain a developer. Being able to respond to this change in sought after skills will benefit you well.

Through agile development, I’ve been able to deliver working software time and time again. I’ve been exposed to all different aspects of the business. I’ve learn what I like and don’t like to do. I’ve learn what pieces of business I’m interested in and the pieces I don’t care much for. I’ve developed some really good working relationships. I’ve tackled some hard problems. I’ve learned to respond and adapt to the change and turmoil of a startup.

Most importantly, I still feel I’m growing as a developer. I honestly believe the best thing a developer can do in their career is to always be learning. Everything else will follow.

A Case for Agile: Benefits for a Programmer’s Career

Installing Java SDK/Tomcat to Ubuntu Feisty (Part 2)

In the previous post, I described how to get your Java SDK installed on your machine. In this piece, I’ll lay out how to get your tomcat instance up and running.

Let’s setup our JAVA_HOME. While you can set this setting in many places, I decide to stick it in bash.bashrc.

My Java installation from apt-get put Java under /usr/lib/jvm/java-6-sun.

sudo vi /etc/bash.bashrc

Enter in:

# set JAVA_HOME environment variable
JAVA_HOME=/usr/lib/jvm/java-6-sun

Save and quit vi. Log out and log back into your console and you should be able to do

 >>echo $JAVA_HOME
/usr/lib/jvm/java-6-sun

I went through the initial attempt of installing tomcat from apt-get. This resulted in a successfully download of all the files, but I wasn’t able to get tomcat to start up. Not only that, it installed tomcat and created symbolic links everywhere and there were permission issues galore. Thanks to the beauty of slicehost, I just rebuilt my machine from my previous snapshot (luckily I made a snapshot after the previous post).

So, I decided to just download and install tomcat manually. I decided on apache-tomcat-5.5.25.

To get started, download the tarball and extract it somewhere.

wget http://apache.cs.utah.edu/tomcat/tomcat-5/v5.5.25/bin/apache-tomcat-5.5.25.tar.gz
tar xzvf apache-tomcat-5.5.25.tar.gz
cp -R apache-tomcat-5.5.5 <PATH_TO_TOMCAT_INSTALL>

You will need to go to the bin directory and set some file permissions so you can execute some of the files.

cd <PATH_TO_TOMCAT_INSTALL>/bin
sudo chmod a+x *.sh

It’s a probably good time to see if things are looking good. So try,

 >> ./version.sh
Using CATALINA_BASE:   /usr/share/tomcat
Using CATALINA_HOME:   /usr/share/tomcat
Using CATALINA_TMPDIR: /usr/share/tomcat/temp
Using JRE_HOME:       /usr/lib/jvm/java-6-sun
Server version: Apache Tomcat/5.5.25
Server built:   Aug 24 2007 05:33:50
Server number:  5.5.25.0
OS Name:        Linux
OS Version:     2.6.16.29-xen
Architecture:   amd64
JVM Version:    1.6.0-b105
JVM Vendor:     Sun Microsystems Inc.

Sexy. Let’s go ahead and try to startup tomcat.

./startup.sh

By default, tomcat should be running on port 8080. (If you tried to install from apt-get, it’s setup on port 8180)


tomcat default page

Setting up the tomcat manager

sudo vi conf/tomcat-users.xml

We need to setup a manager role to see the tomcat status page:

<tomcat-users>
  ...
  <role rolename="manager">
  ...
  <user username="admin" password="mysecretpassword" roles="tomcat,manager">
</tomcat-users>

Restart tomcat and confirm you can log into the Tomcat status page and we’re done!

TODOS:

  • I need to setup the system to run tomcat on startup
  • I need to write an init.d script
  • I need to get tomcat to proxy through apache via mod_jk
Installing Java SDK/Tomcat to Ubuntu Feisty (Part 2)

Installing Java SDK/Tomcat to Ubuntu Feisty (Part 1)

When I first got my slice, I had it setup with java and tomcat. Apparently, I updated or rebuilt my slice and forgot to install those two again. Too bad I didn’t take good notes because I totally forgot how to do it.

So this time, I’m going to document it via this post.

I am currently running Ubuntu Feisty (v. 7.04). You can find out what version you’re running by running the following command:

cat /etc/issue

The usual apt-get command allows me to install sun-j2re1.4. That’s not very helpful if I want to compile java code off of my slice or set it up to be a continuous build server. I remember that I had to add a specific repository in order to be able to install the Java SDK via apt-get. After some digging around, I came up with the following:

sudo vi /etc/apt/sources.list

Paste the following into the file:

deb http://archive.ubuntu.com/ubuntu feisty universe multiverse
deb-src http://archive.ubuntu.com/ubuntu feisty universe multiverse

The Ubuntu docs recommend enabling the universe/multiverse security/updates repositories to prevent version mismatches during installs or upgrades but I didn’t run into any issues proceding without them.

Save and exit out of vi.

Next, run

sudo apt-get update

apt-get should be pulling down an updated list of installable packages. So if you type

sudo apt-get install sun-j

and then hit TAB, you should get a list of installable sun packages.

I decided to install Java 6.0 SDK.

sudo apt-get install sun-java6-sdk

It will begin to download and install the SDK. You will have to accept a license. Just hit TAB to get select “OK” and hit enter.

Once it completes, you should be able to confirm the JDK was installed successfully.

  >> javac -version
javac 1.6.0

The next post will lay out steps to install Apache Tomcat.

Installing Java SDK/Tomcat to Ubuntu Feisty (Part 1)

Dynamically targeted links

In the previous post, I quickly created a widgetized blog feed. However, the dynamically generated HTML included links that would open in the same window that the links was on. Under normal circumstances, this is the best user experience. However since Clearspring generates HTML-based feeds in an iframe, clicking on any of the links in the widget caused the page to be rendered in the small, confined space of 300×236 pixels.

I know I could have used document.getElementsByTagName('a') but I was wondering if there was a nicer way to grab all the links on a page. Googling around pointed me to this post. I can get all the links on a page by using document.links, which returns an array of the links on a page as DOM objects. I haven’t tested this on all browsers, but that’s pretty sweet.

Here’s the code in it’s entirety to make all links on a page open up in a new window:

<script type="text/javascript">
var links = document.links;
for( var i = 0 ; i < links.length ; i++ ) {
   links[i].setAttribute('target','_blank');
}
</script>
Dynamically targeted links

Fun with widgets

I came across a Clearspring back a months ago when they were TechCrunch’d and more recently, I came across Wigetized.com, made by a fellow slichoster.

I wanted to try out both and I thought it would be interesting to combine the powers of both services.

I created my widgetized blog feed from wigitized.com, created a widget cname, updated my apache2 config, created a new directory, and pasted in the generated wigetized html into a new file. Oh, and I bounced apache2.

And TA-DA! A widgetized feed!

Next step was to create a new Clearspring Widget from this. Clearspring has done a good job of making this ridiculously easy. You have a couple of options of how to create widget (flash, HTML, js, image). I went with HTML since I had HTML was just to provide the URL to the HTML and a name for the widget.

Once I had the widget added to the Clearspring platform, all I really had to do was publish it and now I have a Clearspring tracked widget.


Next, I’ll be interested in see how I can leverage the platform in different ways to make this widget “viral” and what sort of analytics come with Clearspring to help me track its viralocity (did I make that word up?).

Fun with widgets