Isthmus Blog Live

Pura Vida Amigos!,
We at Isthmus are pleased to present you our Architecture blog. The idea here is to provide more value to our clients thru sharing technical information that can be useful for your projects, current or future. We will be sharing with you our experiences with the latest technologies including the good, the bad and the ugly, keeping of course the confidentiality related with each project and each client.
We invite you to challenge us with your thoughts, comments and questions to increase the knowledge base so we all gain.
Let’s create synergy!
Thank you,

Adolfo Cruz
Delivery Director

Friday, June 27, 2008

Grid Computing in the Enterprise World

Highly scalable, maintainable and reliable applications is the holy grail for any Enterprise Architect; although, hardware availability and high demand processing will restraint the design capacity. Among the questions that come into mind when outlining an architecture for an Enterprise system, would be:

  • How transactional the data is?
  • How many transactions per time unit will the system have to handle?
  • Are there any data and time intensive process?
  • How many concurrent users will operate the system?
  • Will be the users in a disconnected environment; therefore, getting the system to deal with outdated data?

Note that in the first paragraph I use holy grail to describe the pursuit of the three principles of enterprise architectures, some times the answers to the previous questions would lead to a solution that will attempt against one or many of the principles of scalability, maintainability or reliability; therefore, it becomes a holy quest that many would consider impossible to reach. For example take a highly transactional web application, by definition highly transactional means data is updated following ACID principles ( Atomic, Consistency, Insolated, Durability), and by being a web application it has to work in a disconnected environment; thus, if two user are modifying the same data at the same time one of the two is guarantee to have outdated data.

Many of the problems in terms of data consistency have to be address during architecture design or can be address with a re-architecture of a system, if for some reason where not taken into consideration the first time. Once the factors inside the scope of data management and processing are addressed, the scalability of the system comes into play. Think about the scalability as your main line of defense against user demands for processing, if your system depends on a single player (a single piece of hardware) you will get a bigger player (better hardware) in an attempt to stop user progress (performance degradation), as show bellow:

As you can notice a team of one would be outperformed if the user offensive grows beyond their strength; therefore, changing the defense by adding some help to our star player and organizing the defense in a pyramidal approach as shown below, will still depend on the correct organization and will be eventually overwhelmed by the offensive if one the defense players fails and moreover if the one failing is our star player.

For us looking to win the super bowl of scalability; linear scalability should be our secret play; we should not depend on one star player, but in a team that can growth in number to stand the user offensive, all players should be the same and they all shall stand side by side with their peers and if one fails the next ones can close the gap leaved by their team mate; take a look into the following diagram on linear scalability defense play. If know our system as good as you should; you can know how many user hits the player on defense can handle; therefore, if having a projected growth of the offensive, we know a priori how many new defense players we have to get into play.

Systems looking to get linear scalability into their game strategy, should be architecture in such a way that allows linearity on their processing, in terms of hardware, the linearity can be handle in home adding more hardware on your own data center or it can be handle by acquiring services for hardware gridifiying company; we will talk about those later. In terms of processing, there are several solutions out in the market about process gridifiying and we will look into one open source for Java in quick demo next. The framework presented here is http://www.gridgain.com/, is a solution that requires no configuration, the code gets deployed automatically in the nodes of the grid and executed transparently as it where in a single JavaVM.

1. Get data from database, we are going to get the list of customers from a Northwind database in a Microsoft SQL Server 2005 using JPA.

2. We have to create a task to process the data in the grid; we want to change the Region on the customers living in Germany, to do so we import the libraries from Gridgain into our Netbeans JavaSE project; thereafter, we create the task to inherit from GridTaskSplitAdapter, List> this adapter allows us to grid-enable the task in common split-aggregate way, by first splitting the collection into multiple Jobs and later aggregate the results back. The two generic types provided will determine the value given to the split method and the return value from the reduce method respectively.

3. In the split method we create a job for each customer in the task and override the execute method for the job and add our processing logic inside.

4. In the reduce method we get all the results and aggregate the data back into a new List object and return it. For the sake of simplicity we added some prints to see how the work is done in the different nodes of the grid.

5. To test the tasks are performed by different nodes in the grid two more instances were created, plus the one being created when the application runs. Therefore, our grid topology will show three nodes, sharing the same CPUs and Network interface.

6. When we run this demo we can see jobs being executed simultaneously in different nodes and finally in the main node we can see the aggregated data with the changes made to the customer regions. If you look into the Customer IDs is easy to notice the Customers being process by the nodes are different.

All nodes in our topology should get the same amount of jobs; depending in its load; therefore, linear scalability. The load will be evenly distributed in our grid if we need more processing will be just matter of adding new nodes; however, more nodes will require; eventually; more hardware, adding the extra hardware could be prove difficult due company policies and procurement delays; moreover, hardware gets outdated quickly and the investment in your datacenter fades away.

Hardware has become a simple commodity what really matter is processing, bandwidth and storage, many companies decided to outsource their entire datacenter; although, there is a new trend into managing datacenters, it goes beyond getting your hardware out to someone else’s facility but to virtualize the entire IT infrastructure; these could be bad news for your infrastructure team; no more wiring, no more A/C problems, no more midnight maintenance, no more corrupted backups, and a lot of other “no more”.

In a virtual world everybody is happy; all the hassle is taken away from your server setup and maintenance, in matter of minutes you can have your own infrastructure; systems can growth beyond the projected demand by a click of a button and can be downsized when the waters come back to its channel. We get an Infrastructure that can growth with the minimum impact at the minimum cost; even better; we can return the processing and storage capacity when is needed no more. The best thing about it is that we get charged for what we use; meaning; that we will pay for the CPU minutes, the terabytes we store and the bandwidth we use.

Having someone so big (the virtualization company) dealing for better prices in servers, storage and bandwidth will translate in better and better prices for our systems, think outside de box, even the Staging and QA environments for your development teams can be virtualized; whenever a new project requires Staging and QA we can add two new servers; once again; with the click of a button.

There are two big companies out in the market for this kind of services, with different approaches:

- Google with http://code.google.com/appengine/ it allows run your applications in Google infrastructure it has several limitations the worth of mention is that Python is the only supported language and on top of that it is still on beta; it has some benefits if you want to interact with other Google services; although, this is not very helpful if you have applications in production already.

- Amazon with their web services ( http://aws.amazon.com/ ) this services allows access to Amazon infrastructure, it is a virtual environment to deploy any range of applications and supports deployment of different Operative System (with different amounts of effort in each case) and on top of that any application written in any language.

Finally I came across a third company that offers yet another different approach to scalable systems, I like this one more as an option for those having systems in production, it provides virtualization of Windows and Linux servers in a single infrastructure, very much alike any Enterprise Environment we have already, the company is http://www.gogrid.com/ they provide a web interface to manage the entire infrastructure allowing to add servers, load balancers, storage and private networks, pretty much as if your had physical access to the hardware.

Even though some recommend Amazon and Google services for those having a new business ideas coming from the Web 2.0; in the other hand, GoGrid would be a good alternative for those having systems that can be easily scale depending only in the availability of hardware.

System willing to take advantage of the new trend in datacenters should architect their systems in such a way that it can be easily scale; either by using processing grids solutions such as the one presented before for Java or some alternative of an “in house” architecture design to allow fast and easy deploy in a grid-enable environment.

Monday, March 31, 2008

Moving to C# 3.0

In order to accommodate new programming options and frameworks (such as LINQ) in .NET Framework 3.5, Microsoft added new capabilities and extensions to the C# language, resulting in C# 3.0. In this post we will have an overview of the most important modifications that were done, and how can they be used in our projects.

Lambda Expressions

Lets imagine an scenario where we need a method signature to be implemented in different ways. For example a method signature to filter an array of chars, where the specific filtering implementation can vary depending on what needs to be filtered. In order to do this we will have a delegate that will indicate the work to be done:

The solution in the days of C# 1.0 was to create a named method that will perform our filtering operation. We will then pass this method as the delegate.

In order to avoid having to specifically create the method, the solution using the advantages of C# 2.0 was to use anonymous methods, but it turned out to be verbose and difficult to comprehend:

With C# 3.0 we have the option of using lambda expressions, which provides a nicer and more readable way of doing the same work. Lambda expressions take the following form:

(param1, param2, ...) => { expr }

So to code the same filtering example using this new C# feature, we will need to do the following:


Using lambda expressions will help us write less code and can help us make our code easier to understand, hence easier to maintain.

Anonymous Types

It is a common scenario to have to create business entity classes to move information from the data layer into the presentation layer. But this could get very tiresome and code extensive if we need to create a huge amount of business entity classes because each method in the business layer returns different information. This gets worse if each of the business layer methods is called only once by a specific presentation layer action, because that will force us to have several business entity classes with little use.

Anonymous types can help us deal with situations such as the one described above. They allow us to have methods returning unnamed business entities, without the need to specifically create the business entity class. Lets look at the following example:

This example shows us three things. First the use of the var keyword, which is new to C# 3.0 and, amongst other things, allows us to declare variables of anonymous or unknown types. Second the use of the new initializers in C# 3.0, that help us avoid having to specifically create constructors for each of the classes we want to use, and let us initialize the attributes right away when we initialize the variable. Third the use of anonymous types, where in the example we are creating a unnamed type with the attributes FirstName and LastName.

Visual Studio 2008 provides IntelliSense for all var datatypes, which helps the developer to know what does the variable contains once it has been initialized. Also there is compile time validation on the var datatypes, that prevents programmers from assigning a different datataype to a var variable that has already been initialized.

Extension methods

There may be cases when we would like to add additional functionality to a particular class, but we do not have access to the source code, or maybe it's just not feasible to modify it. Those particular scenarios are an example of where extension methods may help us in our development.

Lets have a look at the following code:

As you may notice, we are creating a method called Invert, and we are extending the .NET framework string class adding this method as an extension method. It is easy to recognize extension methods since they all start with the keyword this as part of the parameters.

Having extended the string class with the Invert method, we are allowed now to call this method from any string attribute in our code.

Partial methods

If there is a need to have methods that may or may not be called, depending on the specific need of the developer, partial methods are the right choice.

The idea behind partial methods is to allow the developer to choose wither he wants to execute a method or not. If he provides the method, then it gets executed, if he does not create the method then it will be just like if the method didn't exist. This helps a lot for cases when we may need to have pre and post methods for some particular action. Lets take a look at the following code:

Here we have created a Log partial class that writes a log entry whenever the DoLog() method is executed. This class contains a PreLog and a PostLog partial methods, which for the moment doesn't have any functionality. This code compiles fine, and if we call the DoLog() method, the string "Doing my logging." will be the only thing printed to the console. The story changes if we add the following code to the mix:In this other partial class we are adding functionality to the Pre and Post partial methods we defined in the first partial class. Now that we have some code for both methods it will be executed. This means that whenever the DoLog() method is called, the following will be printed to the console:

Doing pre-logging

Doing my logging

Doing post-logging


Query expressions

There are two ways in which we may query IEnumerable data using LINQ: using dot notation or using query expressions. Lets look at the following example:



This code shows the different ways in which we may query the array of strings. As you may see, the do notation treat queries as regular methods, while query expressions use a syntax very similar to sql queries (but inverted). In the end they both perform the same since query expressions get compiled into dot notation expressions.

Conclusion

We have taken a quick look into the most significant modifications that took place to C# in the .NET framework 3.5. It is important to keep in mind that most of these changes where required for the Microsoft team to put in place what they had in mind for LINQ, so some of them may look a little awkward or unnatural. Whatever the reason, this new additions have taken place, and we certainly have to study them to understand them, and to be able to identify the scenarios where this new features will provide great advantage to any code we may write for our applications.

Wednesday, March 26, 2008

Java Profiling

Since NetBeans 6.0 profiler is no longer a plug-in; therefore, is bundle as a basic component in the Platform. What implications does this have to NB users? Well in practical terms: none. However despite de IDE we are using we should address the fact that writing Java code should go beyond simple typing it as it would be coming from a memory dump; from your head; this statement is even truer when it comes to complex business logic. In the other hand; take source code from a CRUD application using JPA it has not much to optimize; all control has been taken from our hands and given to the compiler and the JPA framework.

In most cases there will be no need no optimization for common database access applications; eventually we might face a challenge beyond annotating POJO classes. It will depend on the nature of the application and the complexity of the business logic at hand or probably you will have to locate the piece that has to be optimized first; there is where the profiling facilities come at hand; so let’s take a look into NetBeans profiling.

For matter of demo we will take one of the latest internal applications a Java Platform, Enterprise Edition called TestOnline currently at QA, the application architecture detailed in the following diagram is basically Isthmus standard architecture for Java Platform, Enterprise Edition applications.

Since we have the source code in a NetBeans’ project we can profile the application easily; however, any Java application can be profiled using NetBeans’ integrated profiler, so first we show how to profile the current project, then how to enable it for a previously compiled application.

From the Profile menu we select the Profile Main Project menu option as shown in the following image:

After selecting this option the profiler will ask for confirmation since it has to modify the build script to enable profiling, so we click ok in the dialog shown in the following image:

Thereafter NetBeans modified the build script, lets see what were the changes it did to our build file, this is very simple since the file is versioned under a SVN repository. The change made to NetBeans build.xml is very simple it includes a newly created file called profiler-build-impl.xml this new file simply adds some information to run the Application Server with the extra parameters required to the profiler to be able to gather information, the parameter is the same we will manually add later to profile any precompiled application.

Now everything is almost set to start profiling our application; although, before we start the information recorded by the profiler can be customize for us to analyze later, the window opened for the options has three sections; first Monitor section as shown in the following image allows to enable monitoring for threads:

Second the CPU section allows us to customize some more options, for this demo we will select the Entire Application option and basically left the other options with their default values.

Finally we can modify the last option section; this section allows the modification of profiling parameters for memory for this test we will modify the default ones and select the option to Record both object creation and garbage collection.

Finally all set clicking on Run button and after waiting for a while we see the following image in the NetBeans status bar. All the sections defined before will get the profiler to store different information; each one will help into analyzing the application in different ways, for this demo we will carry profiling in CPU for time on methods and Memory for object creation and GC.

The previous screenshot shows the AS is starting and therefore our application, the profiler adds some overhead to normal VM operation; therefore, it would take longer to almost any application to start. Another factor to take into consideration is that the AS that works the best with NetBeans’ profiler integration is the bundled GlashFish; although, any Java application should be able to provide profiling information including JBoss AS (as long they run on a Java VM supporting profiling); however, it may not work the integration easily; therefore, it should be done manually, for example the JBoss AS run.bat/sh script should be modified for the JVM to wait for profiler connections. The parameter is agentpath and it default is shown here for the NetBeans installed in the default location, one way to get the parameter to use is by means of NetBeans profiler, first select External Application in the Attach to combo in the Attach Profiler wizard (from the Attach Profiler submenu from the Profile menu); as seen in the following image.

Then click in the change link (if it is the first time you select to attach an external application the wizard will come out right away without having to click on change link); seen in the following image;

to change the attach mode, follow the wizard by first selecting the Target Type as Application

Click next Review Attach Settings, next again and in the Manual Integration step the parameter to add will show up as seen in the following image

Just select and copy the parameter to add when starting the Java (SE 5) application; it should read something similar to this:

-agentpath:"C:\Program Files\NetBeans 6.0\profiler2\lib\deployed\jdk15\windows\profilerinterface.dll=\"C:\Program Files\NetBeans 6.0\profiler2\lib\"",5140

After adding this parameter the Java application should wait until the Profiler connects to it before starting; afterwards the Profiler starts getting profiling information from the application.

In order to show Profiler at work we added some silly code as seen bellow, this code will generate enough overhead; both, on GC and execution time.

Then we started the profiling session to track memory only, this mode allows us to take a look into object creation and garbage collection during the profiled session; although, for this particular example we had silly object creation section we tried the profiler and the results are showed in the next screenshot.

In the results from the Memory profiling session we can see the biggest number of allocated objects are Strings this seems logical since we create lots of this objects in the silly object creation section. Later we test the same application once again but we change the profiler to record execution time information.

This screenshot shows the information after the profiling session has ended; here we have several tabs and the one being shown is the Call Tree view, it shows a summary of the methods using most of the time; first it shows the one we modified with the silly code; this naïve code created lots of objects inside three nested for clauses creating enough overhead to become the most time consuming method in the application; it actually shows up as being called four times only, imaging how big this time might become if the system has 100 simultaneous clients. The second biggest time consuming method is the findEnabled this one shows up from a class $Proxy89; we have enough expertise in the code to point out this method is one in an EJB; that is why it shows up in a $Proxy class. This method does not have any complex business logic what it does is to query the data base to get a small set of data using JPA.

Now that we have located the candidates for optimization the work switches from profiling to coding; first we have to identify which of the methods will give the best results if optimized (reducing execution time) with the minor effort. After analyzing the code we found out the naïve code in the generateExam method; since it is pretty simple to fix we start the optimization work there. Then we can work in the next identified method; findEnabled for this particular one; it appears that the database query is the one adding most of the processing time; therefore; we can try to get some database expert to enhance our database or we can workout JPA to enhance the response time adding cache and some optimizations or perhaps creating an alternate query to the one being automatically created by JPA and run it using JDBC, but who knows maybe is not much we can do to optimize this method or the gain does not justify the effort.

Outstanding Points

NetBean’s profiler could be of great help into identifying potential tuning candidates but good coding practices will save the day; even though; modern compilers and JVM might work out some common coding performance killers.

Graphical profilers such as NetBeans’ could ease the work of identifying possible performance problems; however, most JVM come with command line options to allow profiling.

Profiler features shown in here are just a bit of what NetBeans’ profiler can do for us, graphs, drill-down and profiling points are part of the things available for us to ease the analysis of our applications, for a great demo on these features at work check out http://www.netbeans.org/kb/60/java/profiler-screencast.html.

References:

[1] Shirazi, Jack; Java Performance Tuning Second Edition. O’Reilly Media, Inc.

Tuesday, March 4, 2008

Previewing the ASP.NET MVC Framework

Along with the release of Visual Studio 2008 some months ago, there has been a lot of complementary tools being developed by some teams at Microsoft; one of these tools is the ASP.NET 3.5 Extensions which provides new functionality being added not only to ASP.NET 3.5 but to ADO.NET 2008.

Some of these features contempled in this pack are, for instance, new silverlight controls, ADO.NET Data Services, ADO.NET Entity Framework, ASP.NET AJAX back button support, ASP.NET Dynamic Data and last but not least the ASP.NET MVC Framework.

Before we move forward, it's important to mention that this new fuctionality is in a "preview" state and therefore it is not officially supported by Microsoft.

Now, let's concentrate on the ASP.NET MVC Framework.
The MVC is a framework methodology that divides the implementation of a given application into three component roles: models, views and controllers.

"Models" are the components of the application that are in charge for maintaining the state of the aplication. It could be persisting the state in a database or in memory.

"Views" are the components in charge for displaying the application's user interface. Almost always the UI is a representation or reflection of what the model data does.

"Controllers" are the components in charge for handling the user interaction, manipulating the model and lastly choosing a view to render.

Main features of the MVC Framework:

  • It doesn't use postbacks or viewstate. In other words, this model is not attached to the traditional ASP.NET postback model and page lifecycle for interactions with the server. All the user interactions are routed to a controller class.


  • It supports all the existing ASP.NET features such as output and data caching, membership and roles, Forms authentication, Windows authentication, URL authorization, session state management and other areas of ASP.NET.


  • It gives support to the use of existing markup ASP.NET pages (.aspx files), user controls (.ascx files), and master page (.master files) as view templates.


  • It contains a URL mapping component that enables you to build applications with clean URLs. The URL routing feature explicitly breaks the connection between physical files on disk and the URL that is used to access a given bit of functionality. This also helps the search engines. For instance, rather than access http://localhost/Products/ProductDetail.aspx?item=2 you now use http://localhost/Products/LaysFrieds.


  • Everything in the MVC framework is designed to be extensible. You can create your own view engine or URL routing policy, just to mention a couple.


  • Separation of the application tasks such as UI logic, input logic and business logic as well as testibility and test-driven development (TDD). Due to the loosely coupled model, running unit tests is quite easy.


Creating a simple ASP.NET MVC Application

We'll create a basic application that displays a list of Products based on the category and subcategory chosen by the user. This sample will hopefully clear things up and set a way to start digesting this new ASP.NET feature.

So, let's get started.
Using Visual Studio 2008 let's create a project of type ASP.NET MVC Web Application.



After some seconds you'll get an already working project template that has a skeleton with a default page, also an index and about page.

The default project template will look like this:



From now on you're all set to start working and modifying to your needs this simple template. Next, what we'll do is to create all the "Model" logic associated with our sample. All this logic should be placed in the Model folder defined by the template. For this sample I'm using the ADO.NET Entity Framework shipped in the same package as the ASP.NET MVC Framework. This will speed things up and it'll let us have a Data Access component up and running very quickly.
I'm using the AdventureWorks database shipped with SQL Server 2005 and from there I'm only using three tables: ProductCategory, ProductSubCategory and Product.

The model created by the Entity Framework looks like this:



One important thing to remember is the routing model being used. Since this is a simple application I'm ok on using the one proposed by default by the project template which looks like the image below.



What this is telling us is that rather than go with http://localhost/Products/SubCategories.aspx?id=2 as the URL for accessing a page, I'll respond to http://localhost/Products/SubCategories/2.

Now that we're done with the Model, it's time to start coding the Controller that will act as the interperter and handler between the model and the view. We could say this is the heart of the framework. Application and data retrieval logic should only be written inside controller classes. Controller classes then choose which view to render.

Let's then add a new file type of type MVC Controller Class underneath the folder named Controllers.



From there we start defining all the methods that will interact with our view.
A way to define these methods is as follows:



To make this post short, I'll only explain one of the three needed methods to have this application running as expected. Hopefully you'll get the idea and it'll be no problem implementing the others.

What we do is to define an attribute (ControllerAction) for all those methods that will act as a controller. Then we implement the logic and after that we call the RenderView method that passes the data and calls the view.

Finally, let's get into the view. No rocket science here. What we do first is to add a new item of type MVC View Content Page. This type of view will ask us for a Master Page. If you haven't notice it yet, there's a folder named Shared inside the Views folder. In this folder we put all the views that are common to the application and eventually could be reused.



After adding the view, notice that the page inherets from System.Web.Mvc.ViewPage base class. This class provides some helper methods and properties. One of these properties is named "ViewData" which provides access to the view-specific data objects that the Controller passed as arguments to the RenderView() method.

In order to access the data in ViewData we need to make the page inherets from ViewPage<T> where T is a strongly type, in this case, a List of Product Categories.



This guarantee us to get full type-safety, intellisense, and compile-time checking within the view code.

In the HTML part of the view we need to iterate through the ViewData object and use the ActionLink method of the Html object provided by the ViewPage class. The ActionLink method is a helper that generates HTML hyperlinks dynamically that link back to action methods on controllers.

Let's write a foreach loop that generates a bulleted HTML category list like the image below.



Conclusions

First of all, the MVC framework doesn't come as a replecement for the WebForms and its Page Controller Model. This is more an alternative way for those looking to implement the MVC approach. The fact that It'll facilitate you to develop a more clean application code as well as loosely coupled application with the adventage that this gives in terms of testability and test driven development is amazing.

Now, one thing that, in my opinion needs more improvement is related with the way data is displayed. Having to iterate using in-line code and is not one of my favorites things to do. This reminds me of the spaghetti programming code that came along with the ASP programming model.

Definately, there needs to be a more enhanced version of the ASP.NET MVC Framework that hopefully will act as those bindable, easy to use server controls available with the ASP.NET WebForms model.

As of now, without a doubt, it's a promising framework that we'll certainly keep an eye on.