Monday, July 9, 2007

some notion in terracotta

Shared Roots

A shared root is a class data member that is transparently mirrored across all like-configured DSO applications running against the same DSO server. The term root refers to the fact that the field may represent the top of an arbitrarily large object graph. For instance, a java.util.ArrayList is a collection of java.lang.Objects, each of which may also be composed of further sub-objects. A root can also be a simple primitive such as an int or boolean.

Distributed Locks

A distributed lock is a form of a distributed semaphore, protecting areas of code that modify members of a shared root graph from simultaneous access and providing the hook DSO uses to synchronize data values across to the server for subsequent broadcast to other connected clients. In the distributed programming paradigm, a lock is analogous to a distributed synchronized method.

There are two categories of locks, named and automatic.

Named Locks

A named locks is meant to be used with code not specifically written for multi-thread safety. A method can be associated with a particular name, meaning that all similarly named methods use the same lock. In other words, if two methods share the same lock name and one method invokes the other, only a single actual lock will be taken out for the duration of the outer invocation.

Automatic Locks

An auto lock is meant for the case were the code is written with multiple threads in mind, using the synchronized keyword to demarcate protected areas. A synchronized block or method denotes a section of control flow that will be serialized with respect to thread access. Only a single thread can enter the block at a time.

Saturday, May 26, 2007

Project Cajo

Contents

As the community manager of java.net, I am always trying to get better data about what projects on java.net are really active. One of my interns recently implemented a great report that shows the top 50 most active projects on java.net for a given month. "Most active" means with the largest number of actions (writes of some sort) that we can detect to the code or mailing lists within each project. (We also included forums, but that measure is broken at the moment.)

I notified the owners of the top 50 projects for February, and one of them, John Catherino, project owner of the Cajo project, asked why I don't make the data public. I said, "Yes, obviously we should and we are working on how to do that, but also I want to interview the owners of some of these most active projects," and would he like to be my first victim? He said yes.

A quick look at Cajo proves not to be so very quick because it is so interesting: a small, free library that simplifies the use of Remote Method Invocations; is easy to use; is in use all over the world; and is a "uniquely 'drop-in' distributed computing framework." And how could I not click on a link to a blog entry titled "World Wide Virtual Machine"?

What's Cajo?

Q: How would you describe your project, in a nutshell?

A: The Cajo project allows people to distribute their applications across multiple Java Virtual Machines, without source impact. This makes it exceptionally valuable to scale large, existing applications. It also allows graphical user interfaces to be transparently remoted, automatically, through both WebStart and via applets.

Q: Who is using Cajo and what are they doing with it?

A: As the Cajo project is quite fundamental in its purpose--i.e. distributing objects between Java Virtual Machines transparently--it is being used by groups of all sizes; university students, companies small and large, and even national governments. Its primary applications are scaling applications, distributing user interfaces, and remote control.

Q: Can you give me an example?

A: mBooster. From their page:

mBooster is the world-leading optimization suite for J2ME applications for professional J2ME game and application developers. mBooster automatically reduces the size and increases the performance of your midlet JAR files.

And you can find the acknowledgment of Cajo as one of the building blocks on their license page.

Q: What does it mean to have an official TCP and UDP port assignment, and an official multicast address, assigned by the Internet Assigned Numbers Authority (IANA)?

A: It means that TCP and UDP port 1198, and UDP address 224.0.23.162, are registered to the Cajo project for worldwide operation, on the internet. To me it is an astonishing recognition of our project's maturity, usefulness, and popularity. We are now part of the internet infrastructure.

Working on Cajo

Q: What made you decide to start this project?

A: I have been developing network distributed applications for many years; in that time I have seen too many people struggle, and decry all of the complexity. Worse yet, some would just give up on their great ideas. For a long time I tried to individually help people, but there was a limit to how much I could accomplish this way. I felt the Java Runtime Environment offered such tremendous capability, I knew there had to be a way to make Java object distribution over the network truly transparent. This would be a revolutionary simplification. Then in late 1998, when Sun JSE 1.2 introduced reflection, and powerful enhancements to RMI, I knew I had the foundation I needed. The Cajo project was ready to be born.

Q: Is working on Cajo part of your day job?

A: Not exactly. I currently work in distributed grid-based software development, yet building this project has allowed me to bring a lot of insight to the table. Perhaps one day, the company may base a project on Cajo; if a customer had a requirement for an open source implementation, for example.

Q: A popular open source project is a lot of work; what keeps you going?

A: When development gets very busy, it is usually because one or more individuals, project development teams, or companies are newly and actively adopting Cajo. I get such tremendous feedback and satisfaction from helping people achieve (and quite often, wildly exceed) their goals. That makes it worth all the work to me. The most common response: "I can't believe it's this easy!"

Q: What would you like to see added to the java.net project section?

A: More project tracking statistics, visible to everyone: page hits, downloads, both community and site ranking. It is this type of information that enables surprisingly emergent development to occur.

The Cajo Community

Q: Your project home page has some handy links and stats on the right side, like 168 members, and the percentage breakdown by continent. How many of them are developers who contribute to Cajo?

A: I guess it depends on what is considered contribution. A little over 10 percent suggest actual code improvements. However, well over half are significantly improving the site, through questions and comments. Personally, I consider every post to any Cajo project forum a contribution. There are even some who have made very significant contribution, and aren't even members.

Q: Why did you create a cajo.Contributor role?

A: Currently any java.net role other than Observer, is granted repository commit privileges. This implies a significant amount of responsibility, given the large user base of the Cajo project. The cajo.Contributor role allows members to post to the "Documents & files" section of the site. I grant this role to anyone with a plan to extend or add to the project. I find it so useful, I would recommend it as a an official java.net role.

Q: How do you manage community member contributions?

A: I use two approaches. In the more common case: developers make suggestions for new features, and we discuss them in the project forum. Then I go create a proposed implementation for ratification. Once the version is agreed upon, it is incorporated into the codebase. In the less common case: members want to make a significant code contribution to the project, in an area outside of my expertise. Then they are welcomed to create an add-in release, and are free to develop and document it as they wish on the project site. The Cajo extension for JEE is an example of this second case.

Q: Why did you chose the GNU LGPL?

A: For two fundamental reasons. First, I wanted the ability for the Cajo library to be used between both open and proprietary applications. To me this requirement seemed to allow all but the GNU GPL. Second I wanted to prohibit distribution of proprietary versions this library. In light of the first requirement: to me this requirement seemed to exclude all but the GNU LGPL. While the selection has caused several conversations with both developers and companies, so far I have never had anyone who felt unable to live up to the terms.

Challenges and Surprises

Q: What has been your most difficult challenge?

A: Documentation. Most developers, and I am no exception, do not like to work on it. However, an open source project without detailed documentation will most likely be ignored. The Cajo project source code is nearly half Javadoc! Then I had to generate all of the web pages. Most challenging is creating good examples. This is something on which I continue to work. Sometimes I even get great examples from developers.

Q: What has been your biggest surprise?

A: To be honest, at the start I worried people would flame what I thought was the best code of my career. Even worse, I worried nobody would care. On the whole, open source software developers are a really very accepting and enthusiastic community. That was very happily surprising. Yet I'd have to say, recently discovering that some people had created Wikipedia articles about the Cajo project, in both English and German, for the first time, I went from pleasantly surprised to completely overwhelmed with appreciation.

Editor's note: We'd like to thank John for discussing the Cajo project in this interview. You can look for more interviews with prominent java.net project leaders in the coming weeks and months.

Resources

Cajo project
Some sites using Cajo:
- Individual: coderspiel: Out of hibernation
- Commercial: Innaworks mBooster
- Education: Magicka Interactive Ltd. Adminix
- Professional: Srihitha Technologies Project 28
- Academia: Laborator 6

Marla Parker is the Community Manager of java.net.

Thursday, May 24, 2007

How we pass parameter in RMI

To begin introducing who we pass parameter in RMI JAVA , let's review how parameter are passed in a single JVM.
In single JVM a primitive data type is passed by value.The mechainics of passing an object as a parameter is more complex. Often the reference of the object is passed to the method.In the method ,code use the copy of reference to access the object.

Now we look at the differences in RMI.

Primitive Parameters
for primitive parameters,the mechanics is much more the same.RMI will make a copy of the data, and send it to the remote method.

Object Parameters
The mechanics of RMI is quite different from that in a single JVM . RMI send the object itself rather than the reference of the object. It is the same case if the return value is a object.
RMI use a technology called Object serialization to transform a object into a linear format that can then be sent over network.
And there is one thing we should pay attention to .If the object is complex, like graph-like structure, it will use a lot of cpu time and network bandwidth.

Remote object Parameters

When passing an exported remote object as a parameter or return value in a remote method call, the stub for that remote object is passed instead. Remote objects that are not exported will not be replaced with a stub instance. A remote object passed as a parameter can only implement remote interfaces.

Saturday, May 12, 2007

Steps in the Development Process

Developing a simple JavaServer Faces application usually requires these tasks:

Mapping the FacesServlet instance.

Creating the pages using the UI component and core tags.

Defining page navigation in the application configuration resource file.

Developing the backing beans.

Adding managed bean declarations to the application configuration resource file.

Saturday, May 5, 2007

Model-View-Controller (MVC)

The MVC architecture is a widely used architectural approach for interactive applications that distributes functionality among application objects so as to minimize the degree of coupling between the objects. To achieve this, it divides applications into three layers: model, view, and controller. Each layer handles specific tasks and has responsibilities to the other layers:

The model represents business data, along with business logic or operations that govern access and modification of this business data. The model notifies views when it changes and lets the view query the model about its state. It also lets the controller access application functionality encapsulated by the model.
The view renders the contents of a model. It gets data from the model and specifies how that data should be presented. It updates data presentation when the model changes. A view also forwards user input to a controller.
The controller defines application behavior. It dispatches user requests and selects views for presentation. It interprets user inputs and maps them into actions to be performed by the model. In a web application, user inputs are HTTP GET and POST requests. A controller selects the next view to display based on the user interactions and the outcome of the model operations.

Friday, May 4, 2007

What Is a Servlet?

A servlet is a Java programming language class that is used to extend the capabilities of servers
that host applications access via a request-response programming model. Although servlets can respond to any type of request, they are commonly used to extend the applications hosted by web servers. For such applications, Java Servlet technology defines HTTP-specific servlet classes.

Servlet Life Cycle

The life cycle of a servlet is controlled by the container in which the servlet has been deployed. When a request is mapped to a servlet, the container performs the following steps.

If an instance of the servlet does not exist, the web container

Loads the servlet class.

Creates an instance of the servlet class.

Initializes the servlet instance by calling the init method. Initialization is covered in Initializing a Servlet.

Invokes the service method, passing request and response objects. Service methods are discussed in Writing Service Methods.

If the container needs to remove the servlet, it finalizes the servlet by calling the servlet's destroy method. Finalization is discussed in Finalizing a Servlet.

Thursday, May 3, 2007

Java web application

Figure 2-1 Java Web Application Request Handling

Servlets are best suited for service-oriented applications (web service endpoints are implemented as servlets) and the control functions of a presentation-oriented application, such as dispatching requests and handling nontextual data.

JSP pages are more appropriate for generating text-based markup such as HTML, Scalable Vector Graphics (SVG), Wireless Markup Language (WML), and XML.

DD web application deployment descriptor

The process for creating, deploying, and executing a web application can be summarized as follows:

Develop the web component code.

Develop the web application deployment descriptor.

Compile the web application components and helper classes referenced by the components.

Optionally package the application into a deployable unit.

Deploy the application into a web container.

Access a URL that references the web application.

Monday, April 30, 2007

Terracotta open sources JVM clustering(From InfoQ)

Terracotta open sources JVM clustering

Posted by Floyd Marinescu on Dec 04, 2006 06:15 PM

Community: Java
Topics: Performance & Scalability,; Clustering & Caching

One of the most innovative products to come out of the Java community in recent years is Terracotta, a JVM clustering solution that can turn single-node, multi-threaded apps into distributed, multi-node apps with no code changes. Terracotta today has released their entire product as open source (MPL-based license) via the new OpenTerracotta project. Also open sourced are Terracotta for Spring and Terracotta Sessions (for HTTP Session clustering). With this announcement, the VC-backed Terracotta is switching from a commercial license business model to a professional open source business model.

OpenTerracotta’s DSO (distributed shared objects) allows the transparent clustering of user-defined classes, Java primitives, and even core Java class libraries, including its concurrency abstractions. DSO is a hub and spoke architecture, with a central server replicating data across JVMs as needed. Objects retain identity across a cluster, and the central data store (which itself can be replicable) ensures smooth failover. Terracotta replicates changes from one heap to heaps in other VM's. Terracotta also coordinates synchronization between threads, and can migrate threads across server instances. Bytecode instrumentation of classes declared clusterable is also used, resulting in fine-grained replication field-level changes across the cluster, without the need to declare objects as serializeable.

Terracotta for Spring is a customization of DSO which allows the transparent clustering of Spring Application Contexts, Singleton bean types, Application context events, as well as Spring’s JMX support.

Terracotta Sessions is also a customization of DSO that can cluster HTTPSession objects, supporting Apache Tomcat, BEA WebLogic and Websphere CE. Being based on DSO, Terracotta sessions was proven faster than other open source session clustering alternatives due to it's ability to distribute only field-level changes and also push changes only to servers that needed them.

Terracotta can also be used as a Grid solution, implementing the Master/Worker pattern defined in the CommonJ spec from IBM and BEA.

InfoQ spoke to Terracotta founder and CTO Ari Zilka to learn more about how and why this VC-backed startup went open source.

InfoQ asked how Ari was able to convince the investors to make such a drastic change to Terracotta's business model. Ari explained that Terracotta had board members supporting both sides, but that two datapoints easly aligned the board. Firstly, 80% of installs of Terracotta were in environments that were mixed with open source, usually Spring, Struts, and Tomcat. Second, many companies told Terracotta that the process of going through a proof of concept and evaluation is too time consuming from both a practical and legal standpoint, but if Terracotta were already out as open source then we "would have already been using you, running in production and we would have been calling you for support."

In terms of licensing, Terracotta created the Terracotta Public License which extends the Mozilla Public License with an attribution clause requiring companies shipping with Terracotta to include the equivalent of a 'powered by' logo. Ari explalined:

We wanted MPL, we needed a license that was a house hold name. We liked GPL from a business safety perspective but the community seems against it because of its viral nature... we wanted people to be able to adopt the software, both ISV and end user, minimize forking and get improvements back into the product, and attribution ensures that direct competitors have to give us credit which would give us more protection from forking.

Comparing License contract revenues with support contract revenues, Ari explained that all the legalize associated with buying commercial licenses, idemnities and escrow issues is daunting, whereas with a subscription support model:

You pay a certain amount per year, and that amount is much less than a license so you don't have customers asking for a 90% discount... Subscription means that you can hold to your prices as long as they are commensurate with your costs... No one asks for source code escrow for example because it's open source. We don't need to define trigger provisions that give you rights to Terracotta's source code and I don't need to protect it because it's already open... Our support offering would make us more money after about 5 years than a licnense fee upfront but we have to earn the customers subscriptions year on year and provide value.

On how he would position Terracotta to other open source clustering solutions, Ari explained that for small to medium companies, Terracotta gives you the developer benefits, the operator (datacenter) benefits and lower cost of ownership:

The core value proposition for Terracotta includes performance, ease of development, and ease of operations. Object identity works for clustering at heap level which means you're basically building to stateful model, which nobody else can say. For example, if you' were using serialization you would have to do things differently as a result of clustering than you would have done in a prototype of the same application. For ease of development thus I think we win. For ease of operations we are as good as anything else out there. We have a central server, which can be managed from one console, but peer to peer technologies can do that with JMX broadcast and such. And then our server clusters so there is no single point of failure so we're similar to other solutions. I would position it to an operator that you get the benefits of stateless while developers on your team get the benefits of stateful and we are the only solution that has both... And then from a scalability perspective I think we are one of the youngest companies out there and there are more scalable solutions from us, but we are scalable enough for small to medium companies.

Moving onto more technical topics, Ari explained how Terracotta is typically being used:

Developers use Terracotta to share objects across JVMs (notice, we didn’t say "data" because sharing objects requires cluster-wide object identity and Terracotta’s customers rely on that). Specifically, developers write applications that contain more state in-Heap. If any one node fails, all other JVMs have access to the same data so nothing is lost; sort of a stateful model in a stateless runtime environment. Also, clustered apps scale well with our fine-grained replication (field-level) and fine-grained locking (object-level).

Asked if people are actually clustering Java primitives and using standard Java concurrency features in clustered fashion, Ari gave some examples:

Clustered apps that were previously leveraging JMS and RMI to do things like scatter-gather / MapReduce / Master-worker and the like, are moving to a pure POJO model with LinkedBlockingQueues and wait() / notify() that clusters as part of the runtime. They switch because their current architecture is too intrusive or hard to maintain or not scaling as a function of nodes/size of clustered data.

While Ari was careful to position Terracotta as a no-brainer answer for small to medium companies, he also explained how one customer was using Terracotta in a large-scale usecase:

Large can be defined as a large number of JVMs, large amounts of clustered data, or a significant rate of change to objects. Under one scenario, we encountered all three with thousands of JVMs and upwards of 80,000 transactions per second. The customer helped us design a solution where their application split objects across Terracotta servers by hashing the objects into separate hashmaps, each of which was accessible by all JVMs. In each map the app stored one-nth of the objects in the system. Then, we tuned Terracotta to not prefetch / batch objects and, furthermore, not to cache in the local JVM. There was no load balancer required because Locality of Reference was set to the Terracotta Servers. And, there were 10 of them. Each one was responsible for 8,000 transactions per second. This worked very well for the customer and outran many other options they had tried. While they had to write code to get the app to scale, the objects were still POJO and the customer was satisfied that they could continue to scale as their application grew.

On potential misuses of Terracotta:

Terracotta is location-transparent meaning locality of reference is not maintained. State can be faulted-in from the Terracotta server as needed. However crossing JVMs implies that the calling thread must incur network overhead and latency costs. So it is recommended that applications maintain Locality of Reference as much as possible.

In persistence mode, Terracotta allows state to be persisted across JVM restarts. However, it is not a system-of-record and lacks database features such as recovery, backup etc. So use it as a persistent data-store carefully only in situations where occasional data-loss in edge-cases might be allowable.

On how Terracotta ensures fault tolerance considering its central coordinating hub, Ari explained:

In our operational past w/ several large Internet sites, hub-and-spoke lent itself well to infrastructure services; operators know how to make machines fault tolerant and services highly available. Terracotta’s Server leverages this now-common-place Active/Hot-Standby architecture. Your application’s JVMs transparently connect to the standby Terracotta Server upon failure of the active Terracotta server. The design avoids split-brain / network partitioning issues. Honestly, we will have to start clustering Terracotta servers in a peer-to-peer fashion once a use case outstrips the capacity of a single active Terracotta instance. For now though, the fact that we push as much data as changes in your objects means that even the largest installations have not outgrown a single server instance.

Finally, on how Terracotta supports data partitioning architectures (co-locating data access with data residency to avoid the latency of inter-node data-sharing), Ari explained:

Terracotta has implemented the Master/Worker pattern.It is intended to help with Locality of Reference by moving the app to the data instead of the data to the app, but it requires uses POJOs to function. Our customers call this many things from “grid” to “Master / Worker” to MapReduce to scatter / gather. Conceptually, work gets input to a queue and is processed by multiple, collaborating worker-JVMs, each of which owns some subset of the data. The pattern provides all the required plumbing to route data to workers and keep it there (off of other workers and off the network).

An Enterprise edition will also be available for purchase, but it is identical to open, except that there is a contract with indemnification and perpetual license to a particular version of TC's product.

In terms of community integration with Terracotta, Ari told InfoQ that Tomcat support company Covalent will be bundling Terracotta in their Tomcat distribution. The Geronimo team is considering bundling Terracotta in the Websphere CE as a clustering option. Terracotta will be linked to as clustering option on the Tomcat clustering FAQ and by Glassfish. Apache Tomcat and Tribes committer Filip Hanik is also a committer on openTerracotta, as is Jeff Genender, Geronimo committer.

distributed java in a distributed world