Saturday, March 21, 2015

7 Shell Commands You Must Know When Digging Through Log Files

We've all been there - staring through tons of log lines and trying to find a needle in a hay stack, trying to make some sense in all those log lines. I've recently noticed that a lot of people I work with are familiar with only a basic set of commands, usually only 'tail' and 'grep' (and also - only very very basic use of those commands) which makes it hard for them to dig through log files and find what they're really looking for.

In this post, I'll share some of the command-line tools I love to use when digging through log files. I will only brief through the basics, the more advanced stuff could be found in the man pages.
Hope you'll find it useful. Feel free to add your own in the comments section.

Clicking on any of the examples will lead you to the awesome tool explainshell that explains what linux commands actually do.


0. cat - Show me what you've got


Well, the use of cat is usually so simple that I thought of not putting it here at all. But I decided, for the record, to add it here. So we'll use cat to print file content to the screen. Yes, it has more options, but that's the most common one. So I think that'll do for now :)


1. tail - Wait for it...

tail man page

The first one (aside from cat which is not counted) is the famous tail command that most people use. I actually very rarely use the 'tail' command. I would much rather use the less as you'll see below. The tail command, in its naive use, allows us to print to the standard output the last X lines from a file as follows:


This will print out the last 10 lines of the file mylog.log.
But What most people use it for is to stream log files (or other streams of messages) to the standard output using:



That's usually very helpful. But I would usually rather use less (with shift +F) instead as you'll see next. When I do use tail usually is when I want to pipe it to grep and see only specific lines:


This will stream only the log lines that contain some_string in them which is not as convenient to do with less.


2. less - Less is more


less lets you navigate through a large input file. Unlike text-editors (like vi for example), less does not read the entire file before starting up which makes it faster to load. There are loads of features for less but I'll give here a few basic ones which will get you up and running quickly:
  1. Shift + f: Will throw you to the end of the file and let you watch the stream of lines as more lines are added to the file/stream. So if you have a log file that is being filled with log lines, you'll see those log lines written, just like with "tail -f". The big advantage over tail in my opinion is the option to use Ctrl+C to stop the stream and perform searches on it. tail just prints stuff to the terminal so you can't use search commands like you can do with less and that's why I would more commonly use less with shift+f and not tail.
  2. / - Typing / allows you to then type a search term. less will now allow you to browse through all of the places in which this search term appears either with typing n to find the next appearance of the term or Shift+n to find the previous appearance.
  3. ? - Typing ? does the exact opposite of /. It will search a term backwards (this also means that n will search for the term in the lines above the line you're at and Shift+n will search for the lines after the line you're at).
  4. Shift + G - Will take you to the last line of the file. It goes well with combination with ? - if you want to search the latest place a phrase appeared just browse to the end of the file with Shift+G and then type ? and the phrase.
  5. Line number + g - Typing a number of line and then g will bring you to the specific line. I usually use it to go to the first line (typing 1g) and then using / to find the first time a phrase appears.
There are, of course, a lot more options to less but that pretty much covers the basics.

3. grep - Is it here?


grep is a very useful command for finding only relevant lines. Basically, the easiest and probably most common pattern would be this one:


This pattern finds all the lines that contain the word 'phrase' in the file mylog.log. Of course you can use file pattern such mylog.log* to search in all the mylog.log files. Again, grep has tons of options you can use but here are the ones I find most useful to start with:
  1. -v - Adding -v finds all the lines that don't contain the phrase, instead of the ones that do.
  2. -c - Count the number of matching (or non-matching if you use -v) lines.
  3. -e - Allows to provide a regexp instead of a a plain phrase.
  4. -i - Ignore case.
  5. -r - Allows recursive search inside the directory tree.
  6. -n - Prints the line number in the file in which the phrase was found
  7. -A 3 - Prints the 3 lines right after the match (of course, you can use other numbers and not only 3 :) ).
  8. -B 3Prints the 3 lines right before the match.


4. awk - Make it look better


awk could, by itself, fill a full blog-post. But this is a post for only the basics. So, I would usually use awk by piping with something like grep or tail. awk basically allows you to manipulate the input. The basic use I usually use awk for is to pretty-print the relevant data I grep-ed from the log line, and I believe an example would be easier to understand here. Let's say we have a file mylog.log that this is its content:


User Avi has id of 49240924
Some unneeded data
Some unneeded data
Some unneeded data
User George has id of 895042
Some unneeded data
Some unneeded data
User Elaine has id of 90348235
Some unneeded data
Some unneeded data
User Jerry has id of 9235239
Some unneeded data
Some unneeded data
User Kramer has id of 94023920
Some unneeded data
Some unneeded data
Some unneeded data

I want to extract only those lines that contain a user name and the user's id. Using 'grep' would make it easy to extract the lines:


~ $ grep "has id of" mylog.log
User Avi has id of 49240924
User George has id of 895042
User Elaine has id of 90348235
User Jerry has id of 9235239
User Kramer has id of 94023920

But still I won't be able to have a clear mapping, user to id. I won't be able, for example to copy the result to a CSV file easily enough nor will I be able to further manipulate it with a few commands we'll learn below. But luckily, I have awk for the rescue:


~ $ grep "has id of" mylog.log | awk {'print $2": "$6'}
Avi: 49240924
George: 895042
Elaine: 90348235
Jerry: 9235239
Kramer: 94023920

In this case, I used awk to re-print the results from the grep using the print command. The $2 and $6 mark the 2nd and 6th token respectively of each line resulted from the grep assuming each token is separated by a space.

In order to change the delimiter from space to something else you could use the -F option and provide some other separator.

As I mentioned, there are tons of uses to awk besides what I showed here like printing lower/upper-case, substring, split and more but those are beyond the scope of this post. It is important, however, to know that those options exist, in case you ever find yourself in need of using them.


5. uniq - The only one


uniq allows you to play with repeating lines. So, for example, if we take the example file from 'awk' we can print it and avoid the repetitions of the line "Some unneeded data":

~ $ cat mylog.log | uniq
User Avi has id of 49240924
Some unneeded data
User George has id of 895042
Some unneeded data
User Elaine has id of 90348235
Some unneeded data
User Jerry has id of 9235239
Some unneeded data
User Kramer has id of 94023920
Some unneeded data

As you can see, the uniq-ness is reset whenever a non identical line is found.  Here are the main extra features of uniq:
  1. -c - Adding the -c switch will print the number of repetitions before each line.
  2. -d - Will print only the lines that repeat more than once.
  3. -u - Will print only the lines that do not repeat more than once.
  4. -i - Ignore case

6. sort - From A to Z


Well, I guess it's easy to figure out what this command does. sort will sort its input and print it sorted to the standard output. Of course it goes well by piping with previous commands. For example, it might be a great tool after using awk to print a specific detail from a log line and print it sorted. For example, if we again take the example file from the awk command and add sort to the result we'll get the user ids sorted by names:

~ $ grep "has id of" mylog.log | awk {'print $2": "$6'} | sort
Avi: 49240924
Elaine: 90348235
George: 895042
Jerry: 9235239
Kramer: 94023920

Another useful way of using sort would be by combining it with "uniq -c" if you want to sort by number of line repetitions.
Most useful options of sort:
  1. -f - Ignore case (because -i is such a cliche already).
  2. -r - Reverse order.
  3. -n - Sort according to numeric value
Again, there are more options to sort which you can find in the man page.

7. xargs - Now, let's do something else


And what if, all of the data you extract is just needed as an input for another command? xargs is meant for building and executing commands from the standard input. How is it done? Well, by piping usually. Again, an example would be much easier here. Let's take a different example file - mylog2.log:

User Avi has id of 49240924
User George has id of 895042
User Elaine has id of 90348235
User Jerry has id of 9235239
User Kramer has id of 94023920
User 49240924 presses the green button
User 49240924 presses the yellow button
User 895042 presses the red button
User 9235239 presses the green button
User 49240924 presses the green button
User 94023920 presses the red button

Now, we want to find every action that 'Avi' took. Problem is - we know the name of the user, Avi, but the logs are written with the user id and not its name. Let's combine everything we have learned so far to get the desired output.

We first want to get the user's id:


~ $ grep "User Avi" mylog2.log
User Avi has id of 49240924

We already know how to print only the id without the prefix:



Now, we want to grep all the log lines that contain this id but we don't want to enter the id manually, so instead we'd use xargs as follows:


~ $ grep "User Avi" mylog2.log | awk {'print $6'} | xargs -I user-id grep user-id mylog2.log
User Avi has id of 49240924
User 49240924 presses the green button
User 49240924 presses the yellow button
User 49240924 presses the green button

The switch -I defines a parameter named user-id which can later be used in the grep command and assigned with the value of the result from the piping to the awk command, in this case - the user id. If you want to take it one step further and get rid of the line "User Avi has id of 49240924" you can use "grep -v" as follows:


~ $ grep "User Avi" mylog2.log | awk {'print $6'} | xargs -I user-id grep user-id mylog2.log | grep -v "has id"
User 49240924 presses the green button
User 49240924 presses the yellow button
User 49240924 presses the green button


And now we get only the actions that the user Avi made.

That's it!

So, that's what I usually use when digging through log files. There is tons of more options and possibilities for those tools and also there are other tools that I left out because I wanted to focus on the absolute must in my opinion. I hope you enjoyed and learned that the world has more to offer than just tail and grep.

Feel I left something too important out? Want to add something or correct me? Please feel free to leave your comment!



Find me on Twitter: @AviEtzioni


More interesting posts from this blog:

Monday, November 3, 2014

IoT Hackathon @ Journey 2014

A few weeks ago, my friend Alon Herbst told me about the first Israeli IoT (Internet of Things) hackathon that was organized and sponsored by Texas Instruments, IBM and Pitango VC. 3 days of developing end-to-end IoT applications, or what I heard when he told me about it - "3 days of vacation in order to work even harder than a normal day". Sounds very tempting indeed...

But after some discussion I agreed to go under one condition - "we're going there to win!".

We discussed some ideas and agreed that we need something that is both technological and can make a "show" on the demo day. After some ideas that revolved around babies and dogs (those things that steal the show) we decided to go with a sport training app. An app that will automatically sense your workout, figure out which exercise you do, how well you do it, how many repetitions, will track progress and will engage through gamification.
For the "show"-factor we agreed upon bringing a winners podium and medals for the demo and do a little contest.


The Hackathon itself

We arrived at Afeka collage where the hackathon was held and started to work day and night. 

While Alon was working on a motion analyzer algorithm (you won't believe how difficult it is to figure out a simple exercise repetition from a motion sensor and make it work in 3 days!), I worked on creating a fully real-time app that will connect to the motion detection sensors and demonstrate a real-time live competition between a few competitors. A third friend joined us and worked on the hardware stuff of configuring TI's CC3200 board to talk with IBM's cloud. For the gicky readers, here's our architecture (taken from our presenation):


And here's a screenshot of the app:


Amazingly, after 3 days this worked!! We had 2 contestants that wore the sensors and the push-up counters counted it live.
Our little show with the medals and the winners podium helped setting up the mood and we indeed won the hackathon, together with 3 more groups that built smart lighting-poles, health-monitoring for the elders and Wifi->BLE->IR connectors, and went up to the final stage. That's when the real hard work began.

JournEY conference 2014

Well, the final of the hacakthon was part of the prestigious annual JournEY conference held by E&Y (Ernst & Young). This was a much more formal event and as such we had to meet a presentation coacher, to shoot a demo-video for case things go wrong, do a lot of rehearsals and basically align with the major league. 

Here's the demo for the case things go wrong (unfortunately, we had to use it on the real stage):




After tons of work, the big day arrived. Instead of the plain-old-presentation that everyone is already used to, we decided to take the "show" one step further and performed a skit on stage in which I was the lazy person who can't finish his workout and Alon joins and motivates me to build this app to help me perform my workout and increase my engagement.




Everything went well. Well, almost everything. When we did the push-up contest, the WiFi broke down and the push-ups were not counted. Although it was kind of embarrassing, at the end of the contest the numbers did show up on the screen (in my mind I reminded myself that Bill Gates had a blue-screen when presenting Windows 98 and he lived. This is nothing...). And even though we were afraid it would cost us the victory we did win at the end. Well, we had to win, that was my agreement with Alon for going ;)



This hackathon was just an amazing experience and I thank TI, IBM & Pitango VC for organizing it and also to our hosts at "Afeka" collage and of course E&Y that gave us a very respectable stage to show-off on. And our names were even mentioned on Globes.

I can't wait for the next hackathon ;)


Find me on Twitter: @AviEtzioni


More interesting posts from this blog:

Saturday, September 13, 2014

Clean Code With Builders

To anyone who says that software engineering is not an art I say "Have you heard about design patterns?". Design patterns are like poetry to a software engineer. I never meant to write a post about them because tons of posts and books were already published on those. But following a talk I gave this week I understood that people are not familiar with the ways the "Builder" pattern can help them create a cleaner code.




What Is The Builder Pattern?

You can skip this part if you're familiar with the Builder pattern for creating immutable objects.

The "Builder" pattern helps us, not surprisingly, to build objects. It is often used for building immutable objects. For example, let's say we have this class:


public class Student {
    private final String givenName;
    private final String lastName;
    private final int averageGrade;
    private final int age;

    public Student (final String givenName, final String lastName, 
                    int averageGrade, int age) {
        this.givenName = givenName;
        this.lastName = lastName;
        this.averageGrade = averageGrade;
        this.age = age;
     }

     // Rest of the class is only getters with no setters here
}        

Then, if we want to split the assembly of the fields from the actual construction, we can use a builder in the following manner:

public class Student {
    private final String givenName;
    private final String lastName;
    private final int averageGrade;
    private final int age;

    public Student (final String givenName, final String lastName, 
                    int averageGrade, int age) {
        this.givenName = givenName;
        this.lastName = lastName;
        this.averageGrade = averageGrade;
        this.age = age;
     }

     // Rest of the class is only getters with no setters here

   public static class StudentBuilder {
 
     private String givenName;
     private String lastName;
     private int averageGrade;
     private int age;
        
     public StudentBuilder() { }

     public StudentBuilder withGivenName(final String givenName) {
       this.givenName = givenName;
       return this;
     }

     // The rest of the 'with' setters look the same...

     public Student build() {
       return new Student(givenName, lastName, averageGrade, age);
     }
  }
}        


This way we have a mutable inner class StudentBuilder which gathers the fields we need for the immutable class and when the time has come to create the immutable object we just call build() and we get the immutable object already fully constructed.

So what does that have to do with clean code?


Express Yourself With Builders

I'm sure you have seen a pattern similar to that in the past:
StudentGrades grades = new StudentGrades();
grades.setStudentId(97); // The student id in the DB
grades.setMath(84);
grades.setEnglish(92);
grades.setChemistry(75);
grades.setLiteratue(88);
grades.setGymnastic(55);
grades.setBiology(76);
grades.setHistory(81);

What we'll usually try to do next is something that will wrap these lines into a single line like this:
private StudentGrades createStudentGrades (int id, int math, int english,
                                           int chemistry, int literature,
                                           int gymnastic, int biology, int history) {
  StudentGrades grades = new StudentGrade();
  grades.setStudentId(id); // The student id in the DB
  grades.setMath(math);
  grades.setEnglish(english);
  grades.setChemistry(chemistry);
  grades.setLiteratue(literature);
  grades.setGymnastic(gymnastic);
  grades.setBiology(biology);
  grades.setHistory(history);
  return grades;
}
And indeed this will make the whole clutter of code into a one-liner:
StudentGrades grades = createStudentGrades(97, 84, 92, 75, 88, 55, 76, 81);
Well, this is great. Much less verbose. But without looking back on the createStudentGrades method, will you be able to tell what 75 stands for? Or 88?

This pattern is very hard to read. You have to scroll back and forth or at least open the tooltip of the method to understand what each number says.

Using the builder pattern can help us create something that is less verbose than the original version of the code (the one with the oh so many lines) and a bit more descriptive than this one liner. We'll create a builder for StudentGrades as follows:
// This doesn't have to be inner class now. It's not the immutability that we need the builder for. It's the readability
public StudentGradesBuilder { 
   int id;
   int math;
   int english;
   int chemistry;
   int literature;
   int gymnastic;
   int biology;
   int history;

   private StudentGradesBuilder() { 
     // Making the constructor private in order to enforce construction
     // with the readable static construction method.
   }

   public StudentGradesBuilder studentGrades() {
      return new StudentGradesBuilder();
   }

   public StudentGradesBuilder forUser(int id) {
      this.id = id;
      return this;
   }

   public StudentGradesBuilder withHistory(int history) {
     this.history = history;
     return this;
   }

   // Rest of setters

   public StudentGrades build() {
     StudentGrades grades = new StudentGrade();
     grades.setStudentId(id); 
     grades.setMath(math);
     grades.setEnglish(english);
     grades.setChemistry(chemistry);
     grades.setLiteratue(literature);
     grades.setGymnastic(gymnastic);
     grades.setBiology(biology);
     grades.setHistory(history);
     return grades;
   }
}

Now we can construct StudentGrades in a verbal, yet concise manner:
// Static importing of StudentGradesBuilder.studentGrades() method
// allows to call it directly which makes it readable like an English sentence
StudentGrades studentGrades = studentGrades().forUser(97).withMath(84)
                                             .withEnglish(92).withChemistry(75)
                                             .withLiterature(88).withGymnastic(55)
                                             .withBiology(76).withHistory(81)
                                             .build();

This can be read almost as an English sentence "studentGrades for user 97 with math 84...."

Don't know about you - but I really like it this way. This make me feel I read a sentence rather then trying to decrypt the meaning of a random group of numbers.


Find me on Twitter: @AviEtzioni


More interesting posts from this blog:

Friday, September 5, 2014

Elephant Carpaccio - Use Case

In my previous post I talked about a technique called "Elephant Carpaccio" for splitting large projects and epics to smaller, measurable and valuable stories and tasks.

I would like to share now an example for a real-life scenario and show how to apply the carpaccio technique for this scenario.


Our example case

Let's say we're working on an enterprise product. We started with one language in our UI - English. And now the company's salespersons say we're missing great deals due to our lack of support in a multi lingual interface. And now your PO asks your team to implement this new feature.

Let's try to think about this feature request. Where do we start? What is most important? And one of the most important questions of all - how long will it take?!

It's obvious we can't just run and implement such a feature because it probably requires some infrastructure to support a generic addition of new languages and most probably this addition of a new multi-language support will require a lot of changes in a lot of places.

Ask questions

A good start I found useful for me is to first ask questions. Many questions. It will be very beneficial to include a few people (at least the team and a product representative) in the discussions and let everybody ask questions. Here are some questions that you will probably want to answer before running and implementing the feature:

  1. Will the users of the product be able to change the language whenever they want? Or is it set on the system level for all users?
  2. Where in the UI will the user change the language?
  3. Do we need to support also things like error messages in our translations? Or is it ok to leave them in English?
  4. Should we support any RTL (Right-To-Left) languages?
  5. If we support RTL languages - should the entire UI be viewed in an RTL direction?
  6. Should the language selection be persisted or is it ok to always start a user session in English and allow the user to change?

Make assumptions

Asking these questions will help both product and R&D to understand where the value for this feature lies and what can be delayed for a later version/sprint.

After the value is clear, you can prioritize and create user stories. Each user story must deliver some value to the user and be estimated according to the efforts and risks that are assumed to be put into it:

Story #1 - Support Spanish in a specific UI menu:

    1. DoD (Definition of Done): The user will have a language selection list. When the user selects Spanish, a specific menu will be changed to Spanish
    2. Assumptions:
      1. The language is not persisted and the next time the user will log-in the system will be in English again.
      2. Only one menu should be translated when choosing Spanish
    3. Story points - 8: This is a hard story - we need to create the infrastructure.
    4. Tasks:
      1. Infrastructure design. (1 day - net, after design review and discussions)
      2. UI addition (0.5 day - just adding the language list widget is easy. It can be done while waiting for feedback on the design)
      3. Implementing infrastructure - storage of language codes, maybe failover (if a message doesn't exist in Spanish we'd like to fall back to English), a generic API for converting message codes to locale strings, etc... (4 days)
      4. Change the UI menu to use the converter API (0.5 day)
    5. Value: After this story there's a robust infrastructure and an already working spanish menu. The value addition is huge both in the user/product experience it adds and the easement of now changing other places.

Story #2 - Support Spanish in whole of the UI (No error support yet)

  1. DoD: All UI texts will be in spanish
  2. Assumptions:
    1. No support for errors (exceptions)
  3. Story points - 5: This is not hard but requires a lot of tedious work of replacing any string to the translation API call. This also requires a lot of time from QA to make sure everything is replaced correctly.
  4. Tasks: Just replacing string with the translated text (2 days + 3-4 QA days)
  5. Value: After this story we can see a UI that's in a different language and get the main multi-language experience.

Defining other languages

After the previous stories are done we can, in fact, add languages in a very low development (and even QA) costs. If the infrastructure is built correctly it will require no more then adding a file or a DB records with the translated strings. We can now choose 2 methods for splitting the stories:
  • All other languages once - Due the easiness of adding a new language splitting each language to its own story would be an overhead so all the languages could be combined into a 2-3 story-points story. Better approach
  • Adding languages one at a time - in case other languages are not given in advance, splitting the stories into 1 story-points stories is also fine.

Future features

After we finished the main feature, and we no longer loosing deals it's much easier to prioritize the remaining features like translating error codes, and support RTL and so on. We lowered the pressure from the business side, we provided a huge amount of value.
If these features that haven't made the cut are important enough - they will be waiting at the head of the backlog stack. If not, they will be pushed down the backlog which probably means that they were not that valuable to begin with.



This example is of course very specific and was born out of former projects I worked on. I hope I managed to give you the idea of how to do such tasks. Mastering Carpaccio takes time, but will benefit you a lot.

Find me on Twitter: @AviEtzioni


More interesting posts from this blog:

Sunday, July 13, 2014

Cooking an elephant carpaccio

Disclaimer: No elephants or other animals were harmed during the writing of this post.


Make me an elephant

What if I asked you to make me an elephant? Can you tell me how long would it take you? Can you guarantee that under this time estimation you gave me you will be able to create a great elephant, with all the capabilities and characteristics an elephant possesses?


Image taken from here

Tackle this huge story

Ok, so us, software engineers, are not that good at creating elephants. We'll leave this to the lady elephants to cook new elephants. But we aim to be good at creating software.
Tackling a large story may be very puzzling. Both from our (developers) perspective and from the product manager's eyes.


How many times have you started working on such an enormous story and found out that there were a lot of holes in your complete mega-design? How many times have you been mistaken in your estimations? How many times have you found out that large portion of your code is not in use or not what the customer/product described?

It's hard for us to plan every little detail in the system ahead. There are a lot of unknowns down the road, which will be careless of us to presume we know, and will make it difficult for us to give an honest estimation. Also it's not easy for the product as well to define every little detail given all these unknowns.


Elephant carpaccio

Elephant carpaccio is a technique that helps us tackle such a big problem. Instead of trying to create an entire elephant, we can try breaking the manufacturing process to small stories. Each such story must stand on its own - meaning, provide some (some == more than 0) value to our users and be independently testable (meaning - no need to wait for other parts of the story to end in order to test it).

In an elephant carpaccio we'll try to follow a simple pattern:
  1. Create a very thin flow
  2. Thicken this flow by each time adding another layer or sub-feature to it.

Creating the initial flow

In order to find this main flow we better ask ourselves what is the main problem we try to solve. We can then map the main use-case. The one that solves a large portion of the problem. Then we can decide this will be the flow.

It's important to understand that this is ok to state that this flow will not stand on its own for releasing to the market. If we would to design an ATM, our main flow would probably be - withdraw money. But we can't send it to production without making sure the flow is transactional, secured, audited in the bank's books and so on. But creating the really basic flow of withdrawing money, in a very naive way (maybe even without checking if the customer has sufficient funds in his account) will already give an enormous value and a great starting point.

Thickening the flow

After we understand the initial flow we're in a much better position to split the story. In this point we need to ask ourselves some questions to understand the more smaller details by which we can split the story. I'll give example on how we do it in my next post.

Focus, Build, Increase Trust

Sometimes in the work of engineers with product we encounter a lot of trust issues (if you speak Hebrew, take 5 minutes to hear from a former colleague of mine about this in this ignite talk or read her blog-post, in English, instead). Product managers often feel they do not exactly understand where the engineering are in the process and whether things go as they wanted. The engineers are not always aligned with the vision the product managers lead to. It sometimes makes them scatter to the less important areas of the feature, the ones that are a nice-to-have (for example - having an ultra beautiful button is important in an ATM, but less important than having the functionality to withdraw money).

By working with small portions we allow product to define the priorities of the work by having the engineering focused and aligned with what's really important. The engineers, when the carpaccio is done right, know exactly what they should do now. They're (usually) not distracted because they work in a very small units of work. It also allows product to change requirements on the way because the engineering process starts with the certainties and the most important things. We can change the rest of the plan as we continue to progress.
When combining such story-splitting techniques with other agile methodologies like sprints, dailies and so on, we can create a better communication between product and engineering, improve trust and improve the productivity of the team as a whole.

My Rules of Thumb for Carppacioing

  1. Split an epic to stories in such a way that each story would provide value. Any value.
    How you define value you ask? Value is defined as something that you can show to a user and (s)he would actually care and be able to give you feedback about it. 
    (A new button that doesn't do anything yet is value; A new table in the DB is not).
  2. Repeat bullet #1 for each story you created and try to split it some more until you're absolutely certain you can't or that further splitting would just increase the overhead of implementing it too much.
  3. In your sprint planning split each story to very short tasks. These tasks do not need to provide user value. They can and should be very small but not too small (less than 0.5 day is probably too small) in order to not increase overhead.

For those of you who would like to deepen your understanding on the subject I would suggest  Lars Thorup's presentation on the subject which I found very interesting and concise.
Also, in my next post I'll give a more detailed example of the carpaccio method. Stay tuned...


Find me on Twitter: @AviEtzioni


More interesting posts from this blog:

Friday, May 23, 2014

So Long Spring XMLs... (@Configuration class quick start guide)

This post is based on a tech-talk I gave in Outbrain

This time I decided to dedicate the post to something a bit more technical than my usual posts. In this post I will try to show those of you that use XMLs to define their Java Spring application context, how to use a method which I find much more convenient for most cases - the spring @Configuration class.

When Spring just started, the only way to configure the wirings of an application, was to use XMLs which defined the dependencies between different beans. As Spring had continued to develop, 2 more methods were added to configure dependencies - the annotation method and the @Configuration method.


What is this @Configuration class?

You can think of a @Configuration class just like XML definitions, only defined by code. Using code instead of XMLs allows some advantages over XMLs which made me switch to this method:

  1. No typos - You can't have a typo in code. The code just won't compile
  2. Compile time check (fail fast) - With XMLs it's possible to add an argument to a bean's constructor but to forget to inject this argument when defining the bean in the XML. Again, this can't happen with code. The code just won't compile
  3. IDE features come for free - Using code allows you to find usages of the bean's constructor to find out easily the contexts that use it; It allows you to jump back and forth between beans definitions and basically everything you can do with code, you get for free.
  4. Feature flags - In Outbrain we use feature-flags a lot. Due to the continuous-deployment culture of the company, a code that is pushed to the trunk can find itself in production in a matter of minutes. Sometimes, when developing features, we use feature flags to enable/disable certain features. This is pretty easy to do by defining 2 different implementations to the same interface and decide which one to load according to the flag. When using XMLs we had to use the alias feature which makes it not intuitive enough to create feature-flags. With @Configuration, we can create a simple if clause for choosing the right implementation.

Our example case

So, let's start with a simple example of a Spring XML, and migrate it to Spring @Configuration class:

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-3.0.xsd">
    
  <import resource="another-application-context.xml"/>

  <bean id="someBean" class="avi.etzioni.spring.configuration.SomeClassImpl">
    <constructor-arg value="${some.interesting.property}" />
  </bean>
  
  <bean id="anotherBean" class="avi.etzioni.spring.configuration.AnotherClassImpl">
    <constructor-arg ref="someBean"/>
    <constructor-arg ref="beanFromSomewhereElse"/>
  </bean>
</beans>


Step 1: Migrate <beans> to @Configuration

In XMLs the highest tag in the hierarchy is <beans>. This tag will be replaced with a class, annotated with @Configuration

@Configuration
public class ByeXmlApplicationContext {

}


Step 2: Create a method for each Bean

Each <bean> tag in the XML will be replaced with a method that's annotated with @Bean annotation. Usually it would be a better practice for the method to return an interface type as follows:

@Configuration
public class ByeXmlApplicationContext {

  @Bean(name = "someBean")
  public SomeClass getSomeClass() {
      return new SomeClassImpl(someInterestingProperty);
  }

  @Bean(name = "anotherBean")
  public AnotherClass getAnotherClass() {
     return new AnotherClassImpl(getSomeClass(), beanFromSomewhereElse);
  }
}
A few things to notice:
  • Each method is defined to return an interface type. In the method body we create the concrete class.
  • The name that's defined in the @Bean annotation is the same as the id that is defined in the XML for the beans.
  • The bean anotherBean is injected with someBean in the XML. In the scenario here, we just call the getSomeClass() method. This doesn't create another bean, this just uses the bean someBean (the same as it was in the XML).
We notice that we're missing the property someInterestingProperty and the bean beanFromSomewhereElse.

Step 3: Import other XMLs or other @Configuration classes

The bean beanFromSomewhereElse comes from a different XML file named another-application-context.xml and which was imported in the original XML. In order to use it, we need to import this XML here as well. To do so, we'll just annotate the class with the annotation @ImportResource as follows:

@ImportResource("another-application-context.xml")
@Configuration
public class ByeXmlApplicationContext {
  . . .
}


That's in fact equivalent to the <import resource=""/> tag in the XML format.
If this bean resides in another @Configuration class you can use a different annotation @Import to import it:


@Import(OtherConfiguration.class)
@Configuration
public class ByeXmlApplicationContext {
 ...
}


In order to complete the picture, here's how you can import a @Configuration class from an XML configuration file:

<context:annotation-config/>
<bean class="some.package.ByeXmlApplicationContext"/>

The <context:annotation-config/> needs to be defined once in the context in order to make spring aware to @Configuration classes


Step 4: Import beans from other XMLs (or @Configuration class, or @Component etc... classes)

In order to use beans that were not defined in this @Configuration class we can either declare a private member annotated with @Autowired and @Qualifier as follows:

  @Autowired
  @Qualifier(value = "beanFromSomewhereElse")
  private final StrangeBean beanFromSomewhereElse;

This member can now be used to construct the bean anotherBean.
Another option is to declare a method argument to getAnotherClass() as follows:


  @Bean(name = "anotherBean")
  public AnotherClass getAnotherClass(@Qualifier (value = "beanFromSomewhereElse")
    final StrangeBean beanFromSomewhereElse) {
     return new AnotherClassImpl(getSomeClass(), beanFromSomewhereElse);
  }
I usually prefer the first method as it is less verbose. But of course, that's just a matter of taste.
Just remember - the beans you import must be loaded to the application context - either by @Import or @ImportResource from this class, or using any other method from anywhere else (XML, @Configuration or annotations).

Step 5: Import properties

So, we still need to import somehow the property someInterestingProperty which was defined in the XML using ${some.interesting.property}. Well, that will be very similar to autowiring a bean, but instead of the @Qualifier annotation, we'll use the @Value annotation:

@Autowired
@Value("${some.interesting.property}")
private final String someInterestingProperty;


You can also use a SpEL (Spring Expression Language) expressions with @Value.


Step 6: Import @Configuration from web.xml

At the final step, we would like to be able to import an entire application-context without using any XMLs. If we have a web app, this can be done by declaring the class in the web.xml as follows:

<servlet>
    <servlet-name>my-dispatcher</servlet-name>
    <servlet-class>org.springframework.web.servlet.DispatcherServlet</servlet-class>
    <init-param>
      <param-name>contextClass</param-name>
      <param-value>
        org.springframework.web.context.support.AnnotationConfigWebApplicationContext
      </param-value>
    </init-param>
    <init-param>
      <param-name>contextConfigLocation</param-name>
      <param-value>some.package.ByeXmlApplicationContext</param-value>
    </init-param>
    <load-on-startup>1</load-on-startup>
</servlet>



Summary

As you can see - spring @Configuration classes can be a powerful tool in defining your application context. But with great power comes great responsibility. Code is much easier to abuse than XMLs. It's easy to make complexed @Configuration classes. Try to think of the @Configuration class as a more flexible XMLs and behave them as if they were XMLs:
  • Split to different @Configuration classes and don't put all of your beans in one class
  • Give meaningful names and even decide on a naming convention
  • Avoid any logic inside the @Configuration classes. Aside maybe for things like feature-flags.

Of course, I just gave here the basics. The internet is full of resources about using @Configuration classes. And of course, you are more than welcome to contact me for any further help. I'll do my best to assist.



Find me on Twitter: @AviEtzioni



More interesting posts from this blog: