The development of large-scale web sites is a complex process involving the interactions of multiple web servers and back end application servers. There are few technologies that are ready for developers to put together million user plus systems. You could use systems like Oracle’s application server – but the requirements of a large web farm are likely to be such that no off the shelf package is suitable. You’ll need to develop your own components and tools. It’s here that Java has come into its own, thanks to two key technologies: Java Servlets and Java Server Pages (more commonly known as JSP).
The first appearances of Java on the web were not too popular. Large animations and applets didn’t inspire users to access sites with significant Java content. However, the language has evolved considerably, and has found an important role at the heart of enterprise systems. Part of this comes form the development of a series of well-defined APIs – including Java servlets.
Java servlets are at heart applets that run on a server. However, they have one big advantage over CGI techniques, as you only need one instance of a servlet to handle multiple requests, as they take advantage of Java’s multi-threaded nature.
In order to use servlets in your applications you’ll need a servlet engine. This acts as a bridge between your web server and your servlets, translating information flowing to and from the web. Whilst there is a servlet engine in Sun’s Java Servlet Development Kit, ServeletRunner, you’re more likely to work with engines from third party suppliers. IBM includes one in its WebSphere web applications server, whilst Allaire has recently bought the well regarded JRun combined servlet and JSP engine. An alternative approach is the open source Apache JServ engine, which is widely available and can be compiled to run on most operating systems.
The role of the servlet engine is to act as a container for your servlets. The Servlet API defines how it will interact with the servlets it contains, by creating instances, managing threading, and by destroying the instances of the servlet when no longer required. As the container is a Java application in its own right, with its own interfaces, there’s no reason why it needs to be on the same machine as the web server – and it’s usually best to keep it separate unless you need the speed of local communications.
Java initially implements a generic servlet class, so you can use servlet technologies in your Java applications. If you want to use servlets to interact directly with web servers and browsers, you’ll need to implement an HTTPServlet. This class extends the generic servlet, and implements HTTP specific interfaces – through request and response services. You can use the request interface to retrieve the information submitted by an HTML form. The response interface handles the information returned to the server.
Servlets are able to work with both POST and GET, however you do need to write some code in order to configure it to work with the form submission method you’ve chosen to use. This is actually quite easy, as the servlet interface supports both doGet() and doPost() methods. It’s best to just implement one submission method in your servlets. It’s
Whilst you’re probably most likely to implement servlets as stand-alone processes, you can use servlets as server side includes. SSI support for servlets isn’t in all web servers, and the syntax used for it can vary from server to server. One approach is the use of in-line servlet tags, which are very similar to the familiar applet tags:
<SERVLET CODE=ServletName CODEBASE=http://server:port/dir initParam1 = initValue1 initParam2 = initValue2><PARAM NAME=param1 VALUE=value1><PARAM NAME=param2 VALUE=value2> Text to indicate that server does not support servlet tag</SERVLET>You can include multiple server-side include servlets in a single page of HTML. It’s also possible to construct chains of servlets, which pass information from one to the other. With a servlet chain, you can create chains of business logic that can be changed quickly, by just rewriting one of the servlets in a chain. A typical servlet chain will have an input servlet, a processing servlet and a display servlet.
As a servlet is a threaded application, with different instances running in different threads, you will need to be careful if you have any synchronised code running in your servlet. Of course this isn’t just a servlet issue, but one common to all threaded application development – however the fact that an external container threads your servlets can conceal the synchronous nature of the servlet architecture. Alternatively you can define a servlet to be single threaded, and hold a pool of waiting servlets ready to handle incoming requests. You may find this necessary if you need to quickly service requests in a high load environment, such as managing and pooling database connections.
One important role for servlets in a large-scale web application is the management of session state. Servlets are able to handle this by using any of the common authentication techniques. One option is to use hidden form fields, however the servlet API gives you access to the javax.servlet.http.Cookie class, which contains all the tools you need to work with cookies. A constructor is used to create cookies, which can then be delivered to and retrieved from a browser with the addCookie() and getCookie() methods. However, Java’s Session Tracking API makes things simpler. New sessions can be created, and session objects tracked by just a few basic calls to a session object. Sessions are handled on the client by using cookies, or URL rewrites – it’s even possible to use SSL identifiers.
Servlets give you the ability to develop web application components in one software architecture that will run on any servlet engine, with any web server. Using this you can have an internal NT-based development team working with IIS, deploying onto public facing Unix servers running Apache. With large-scale web farms this also means that you can use a single server as a test and staging server, which can then deploy components to all the elements of your farm. Forte’s SynerJ distributed Java development environment is designed to handle just this situation, and has to be one of the main reasons why they were bought by Sun last year!
Of course servlets don’t only need to work with HTTP as a CGI replacement. It’s easy enough to see them as part of a message-queue architecture, or providing workflow functionality in an email groupware system. Once you’ve got a generic servlet engine, the world is your oyster.
If you’ve used technologies like Microsoft’s Active Server Pages or Netscape’s LiveWire, you’ll find that JSP is easy enough to understand. Like ASP a JSP page mixes HTML content along with scripting code. However, instead of VBScript calling COM objects, JSP uses inline Java code to call JavaBeans. When a JSP page is called for the first time, it is compiled into a servlet for future use – making JSP pages faster than the equivalent interpreted technologies.
JSP is a more acceptable method of including servlet functionality in your web pages than the SSI servlet method shown earlier. Instead of complex SERVLET tag calls, JSP uses the same shorthand as Microsoft’s ASP to indicate a section of code, using <% … %> blocks to surround your inline Java. There are four predefined variables that can be used by a JSP page that make working with JSP a lot easier than traditional servlets.
These are the request, response, out and in objects, and they are used to create the actions of the servlet produced when the JSP page is first run. JSP also supports directives, which control the background servlet. It’s possible to use directives to describe the basic method of the background servlet, or the imported classes. JSP uses a separate code delimiter to show directives: <%@ …%>
You can also evaluate Java expressions directly in a JSP page, rather than having to build output statements. Anything enclosed in a <%= …%> block is evaluated and delivered as ASCII text to the output device. So, the following line of code will display the contents of a section of a form submission in the browser:
<%= request.getParameter(“value”) %>A JSP page can also contain JavaBeans. Once you’ve started including JavaBeans in a JSP page, you can use them to handle complex business logic, and then display the results in your pages. You can use JavaBeans to handle database connectivity via EJB, or to implement specific services that you need in your web site – such as credit card verification. One of the more interesting features that JSP’s JavaBean interface has is the ability to define the scope of any bean used in your pages. A JavaBean can be either a “request” or a “session” bean. Request beans are used only for a specific request, and are then destroyed, whilst a session bean is available for the lifespan of a user’s session.
By using JavaBeans in your JSP pages you can separate business logic from display functions very clearly, allowing designers to make a page look good to the users, whilst you get the background code correct. A simple JSP page involving a bean that takes an input value and displays it would look like this:
<%@ import = “DisplayBean” %><BEAN NAME=”display” TYPE=”DisplayBean” INTROSPECT=”yes” CREATE=”yes” SCOPE=”request”></BEAN><HTML> <HEAD></HEAD><BODY><H1>Welcome to a BeanTest, <%= display.beanMethod() %></H1></BODY> </HTML>JSP is still a very new technology, and the tools for working with JSP pages are still few and far between. One company developing JSP tools is Macromedia, and the recently released Dreamweaver 3.0 HTML editor will allow your design team to design JSP pages. Their Drumbeat 2000 web application development tool has recently added a JSP version, designed to work with the JSP engine provided in IBM’s latest release of WebSphere. It’s very likely that JSP will become the favoured output method for Java-based application servers, thanks to its separation of code and design.
Running a large-scale web farm is a complex procedure. Whilst it certainly gives you improved reliability and scalability, you need to consider the target environment through out the development of your web application.
The key issue for the web farm development team is session management. No matter which of the many load-balancing techniques you use to front-end your web farm, you will always be stuck with the problem of how to manage state – especially in the case of fail over between web servers.
Most of the tools used to handle load balancing do have some awareness of state. No matter if you’re using software or an OS-level tool, or (my preferred solution) a network appliance like Cisco’s LocalDirector or F5’s BigIP, session information will be handled and users always directed to the same web server. These systems handle sessions by monitoring three things: session cookies, IP addresses and SSL connections. A good load-balancing tool will direct an incoming session to the least loaded server, and then maintain the connection throughout the lifetime of the session. However, if the server fails, then the session will need to be handed over to another server – and your users will expect there no be minimal loss of state in this case.
You don’t want them to be lost in the middle of an e-commerce transaction, as they’re probably likely to go somewhere else, and you certainly don’t want to throw someone back to the beginning of a complex banking transaction, or to be unable to send that email they’ve just been composing…
The solution is to be more active about managing state, and to make state a web application level tool rather than a web server level convenience. One solution is to implement a “session registry”. This can be a servlet running on a server (or a cluster of servers, if reliability is a key concern) that is accessible by all the web servers in your web farm. Every new connection is registered in the session registry, and the resulting session object is used to store key information about the state of a user’s interactions with your web application. Whilst at the simplest level it may just hold the ID of the page that was last accessed, it could also store more complex information, such as shopping carts or the contents of the last form submitted.
If a user is redirected to another server whilst still holding a valid session key in a cookie or extended URL, the JSP pages used in the site can then pull back the relevant information from the session registry, populating pages and allowing the user to continue with minimal disruption. All that would be needed would be a means of storing local session IDs, which can be compared with the user’s session ID – and using JSP and servlets, that can be implemented as a single JavaBean that can be reused throughout your web application.
As web applications become more complex, and the numbers of users for them grow into the millions, web farms will become more and more common. A single web server solution will not be enough – and certainly won’t provide you with the reliability and scalability that your users will demand. Web applications will have to become distributed applications, working against central clusters of application servers and information repositories. Tools like Java servlets and JSP are going to become more and more important to web developers, as they have the features you need to help you develop this type of application quickly and easily.
If you’re thinking of using Java servlets or JSP in your web applications, you’ll probably find these two books very useful. The first is part of O’Reilly’s Java Series, Java Servlet Programming by Jason Hunter (ISBN 1-56592-391-X), and the second is one of Wrox’s “Programmer-to-Programmer” books, Professional Java Server Programming (ISBN 1-861002-77-7). Both are full of information, with the O’Reilly offering a primer in servlet development, and the Wrox looking at the deeper issues of developing and deploying servlet and JSP applications on the web.
Java is becoming a popular tool in web applications, thanks to servlets and JSP
