As a developer, there are certain UNIX commands you find yourself typing repeatedly. Whether it’s to debug a production issue or just modifying some files, these commands have helped me do my job time and time again. Here’s my top 8:

  1. grep – Prints the lines that match the pattern provided in the files specified
    • Usage: grep <options> <pattern> <files>
    • Example: grep -n Exception production.log
      • Prints all the line (showing line numbers) in the file production.log that contain the string ‘Exception’
  2. tail – Only interested in a the last couple of lines in a file? tail allows you to quickly view the end of the file
    • Usage: tail <options> <file>
    • Example: tail -fn100 production.log
      • Shows the last 100 lines of the log and waits to display any new text appended to the file
  3. ssh – Log into remote servers
    • Usage: ssh -p<port> <username>@<hostname>
    • Example: ssh -p1234 theo@production
      • Logs into the server named production on port 1234
  4. scp – Copies files to/from remote servers
    • Usage: scp -P<port> <source> <target>
    • Example: scp -P1234 /home/theo/myfile.txt production@/home/jsmith
      • Copies myfile.txt from /home/theo to the server named production under /home/jsmith
  5. rm – Deletes stuff!
    • Usage: rm <options> <file>
    • Example: rm -rf mydir
      • Removes the entire directory and files with no prompt for confirmation (Use with caution!)
  6. ps – Shows process status
    • Usage: ps <options>
    • Example: ps aux
      • Displays the process status of processes for all users including those that are controlled by a terminal (system processes) sorted by CPU usage
  7. top – Similar to ps but it periodically updates the information such as CPU and memory usage
    • Usage: top
    • Example: top (duh!)
  8. kill – terminates a process
    • Usage: kill <option> <pid>
    • Example: kill -9 12345
      • Terminates the process with id of 12345 using a non-catchable, non-ignorable signal (that just means you REALLY mean to kill it)

I use lots of these commands in combination. For example, if tomcat seems to hang and won’t properly shut down I would do the following:

  >> ps aux | grep tomcat

I would then take the pid of tomcat and run:

  >> kill -9 <tomcat-pid>

Now you may be wondering why the “Top 8″, why not “Top 10″. Well, because 8 is the new 10 and those are all UNIX commands I know :) .

What are some of the commands that you use to get through the day?

Posted in Uncategorized at January 31st, 2008. View Comments.

I wanted to go through the exercise of contributing to open source with a project of my own. After thinking about it for probably 15 minutes, I decided I wanted to try to build my own caching system in Java. Too bad I knew next to nothing about caching. I went off and did some research.

There are certain known algorithms that have become popular when implementing caches. Given that caches have a finite size (either you run out of space or memory), the cache algorithms are used to manage the cache. These algorithms determine things like how long an item remains in the cache and what gets booted out of the cache when it reaches its maximum size. Wikipedia describes the most efficient caching algorithm “would be to always discard the information that will not be needed for the longest time in the future”. You need to take a look at the data you want to cache before deciding on a caching strategy. Do you need to support random access (the access to the data is uniformly distributed) or sequential access ( you’re interested in large chunks of data at a time)? Is certain data accessed more often that other pieces of data?

Here’s a couple common algorithms:

  • Least Recently Used (LRU) – the items that haven’t been accessed the longest get the boot first. This is implemented by keeping a timestamp for all items in the cache. Check out this simple LRU implementation.
  • Least Frequently Used (LFU) – the items that are sitting in the cache but have been accessed the least are booted out first. This is implemented by a counter to see how often an item is accessed.
  • First In First Out (FIFO) – the item that first entered the cache is the first to go when it gets full. This can be easily implemented by a queue.

Of course, there are projects like EHCache and OSCache out there that have addressed this issue.

OSCache provides a FIFO and a LRU implementation of a cache.

In addition to FIFO and LRU, EHCache provides a LFU implementation of a cache.

Thinking about how these algorithms work, it is easy to see that there are certain cases where using one over the other provides a great advantage. For example in the case of LRU, which seems to be the widely accepted and most used caching algorithm, this cache works great when the majority of the hits come to a very concentrated group of items. This way, most hits, if not all, are retrieved from the cache. However, if there is a large scan of all the data, once the cache reaches its max size LRU will just remove items out on every hit. If the cache can hold a max of 50 items and you have 100 records, as you iterate over the 100 records, the cache will empty out the first 50 records to put in the second half of the records, resulting in lots of add/removing to the cache and 0 cache hits. Algorithms that prevent this from happening, like LFU, are known as scan-resistant.

I was interested in finding if there was some middle ground that gave me the best of both worlds LRU and LFU. It turns out there is.

The algorithm is known as Adaptive Replacement Cache (ARC). It gives you the benefits of LRU as well does a balancing act to prevent data scans from polluting the cache. It does by keeping track of two lists, one for recently references items and another or frequently referenced items. If you read about it, it’s a pretty cool algorithm.

I was excited when I came across this algorithm because I thought it would make such a fine addition as an open source project. And then I discovered it was patented. Apparently, PostgreSQL already went through this exercise and deemed it safer to not use it.

So, now I’m thinking I need a new idea for a project.

Posted in geekery at January 28th, 2008. View Comments.

Thanks to the WordPress plugin found here, I’ve been able to import my old blogger posts to WordPress. So far, it’s looking good!

Posted in Uncategorized at January 27th, 2008. View Comments.

After graduating college as a computer science major and preparing to enter the work force, I found that there is a big gap between what you learn in college and the practice of being a software engineer.

In college, you learn the the basics of data structures, OOP, and algorithms. But being a software engineer is so much more than that. At my first job out of college, I was lucky enough to be a part of a great team of engineers. They understood the value of best practices, introducing me to design patterns, the power of open source, and how to approach hard problems.

However, it was very one dimensional. While I improved my technical experience, I really didn’t get a chance to be exposed to other perspectives. What I mean is it was very much like college again. My work was my own work. I studied, researched, and provided answers. I didn’t really collaborate with my coworkers. I was given an assignment and an estimated deadline and I would do it kind of like homework.

At my second and current job, I’ve been introduced to and been practicing agile software development, more specifically Scrum, on a daily basis. I’ve seen the HUGE personal benefits that a software engineer, like myself, can take away from developing software this way.

I’ll break it down through what is known as the four values of Agile Software Development.

Individuals and interactions over processes and tools
While being exposed to version control, bug tracking, and continuous integration systems is great for a resume, working with other human beings is much more rewarding and fun. You develop strong relationships and you are able to learn so much from other people’s experiences and perspectives. Pair programming has helped me develop a better understanding of what I don’t know and an even stronger understanding of what I already know. And it’s not only other developers you develop these relationships with. As requirements change, you interact with people from marketing, product, legal, qa, project managers, designers, and the list goes one. This is something you would never get as a programmer at an IT shop that practiced something like the Waterfall method for development, especially early in your career. You will be pigeon-holed into just being code monkey by the time the requirements get to you. Being able to interact with different types of people allows you to grow your network and it exposes you to different aspects of a business.

Working software over comprehensive documentation
As developers, we just like to get things done. We have a tendency to measure productivity with written code. With agile, we can embrace this 100%. I get to concentrate on writing bug fixes, add new features, refactor code, improving the build process rather than documenting some API that no one will ever need since the code is self-documenting. I take great joy in actually doing what I love doing and that’s coding. However, it is a fallacy to interpret this value to mean I don’t have to document anything. But whenever there is no documentation, don’t blame agile. Just rely on the first value and ask someone what that method does.

Customer collaboration over contract negotiation
Developers don’t usually have face to face time with actual paying customers. Usually the “customer” is some other department in the company asking for a new development feature. The main point to be made here is that you want to engage the stakeholders. Not only do you develop relationships, you gather in-depth knowledge of what problems other people are facing on a daily basis. Understanding requirements is often the most difficult part of developing software. As requirements change, with constant engagement from the customer you have a better chance at meeting those requirements. The other piece of this value has to do with deadlines. Contracts are hard deadlines. In the software development, with all its moving pieces, it’s pretty silly to commit to arbitrary dates six months from today. But if you engage stakeholders into your development process, they have clear visibility into the progress of the project. You don’t have to make excuses for why the project is late. They already know what you’re working on and where you are. How does this benefit you? Well, you get to deliver software that actually meets the customer’s needs. Who’s the rock star now?

Responding to change over following a plan
Nothing is ever set in stone. Being able to respond to the evolution of a project is critical to a project’s success. If you are able to learn to design programs that have the flexibility to adapt to your business, that is one awesome accomplishment. It’s a great feeling to be able to say to a product manager that his requested change is already supported by the system, it’s just a configuration change. The language you are using now is most likely not the one you will be using in the future, if you choose to remain a developer. Being able to respond to this change in sought after skills will benefit you well.

Through agile development, I’ve been able to deliver working software time and time again. I’ve been exposed to all different aspects of the business. I’ve learn what I like and don’t like to do. I’ve learn what pieces of business I’m interested in and the pieces I don’t care much for. I’ve developed some really good working relationships. I’ve tackled some hard problems. I’ve learned to respond and adapt to the change and turmoil of a startup.

Most importantly, I still feel I’m growing as a developer. I honestly believe the best thing a developer can do in their career is to always be learning. Everything else will follow.

Posted in Uncategorized at January 24th, 2008. View Comments.


Wiki wiki wiki. Neat-o!

Posted in Uncategorized at January 21st, 2008. View Comments.