Taking your data to the web

Introduction

If all the information you had about web applications came from newspapers or the TV, you’d probably think that you’d need to throw away all your existing hardware and software in order to set up a web presence. Whilst the .COMs may have the advantage of a green-field IT infrastructure, there’s actually more opportunity for existing organisations to use the web to make better use of their existing data.

From the big iron of mainframes, to the small desktop PC, there’s a lot of data sitting the average enterprise. It’s also data that is often useful, either to customers and partners, or internally. By giving your data web front ends, you’ll be able to maximise its usage, and to expand your organisation’s knowledge base.

Web-scraping

Big iron users will be familiar with screen-scraping, taking the contents of a green screen terminal’s display, and displaying them in a window on a desktop PC. Whilst some tools are just terminal emulators, others allow you to take advantage of the tools provided by a modern windowing operating system.

It wasn’t long before these techniques began to applied to the web, allowing existing applications to be deployed on company intranets and out on corporate web sites. It wasn’t a large shift for the development of web-scraping tools, which applied the screen-scraping model to the browser. Sections of green screen layout are translated into HTML, and then delivered to browsers. A server-side application is used to translate the resulting form responses, and to deliver them to the mainframe application.

This process allows organisations to provide a consistent look and feel to their applications, and can provide a focus for the unification of business processes. One option would be to allow a similar set of web applications to offer single-point access to a series of applications, hosted on several servers. This type of function is ideal for call centre use, where the cost of training is kept to a minimum. If you were running the call centre for an insurance company you could use a web interface to the many separate quotation and management systems – allowing a single operator access to diverse life and general insurance products, with a common user interface.

Another option is to take advantage of host access tools, provided by companies like Attachmate. Using these, you can embed access to mainframe applications and data into your applications. Attachmate’s products include a set of COM objects that can be included in ASP-based web applications. Using these tools, it is possible to include components from several applications in a single page of a web application. Similarly, Microsoft provides components for access to legacy systems using SNA or CICS – so you can include these in your server components. Similar components are available in Java – either as beans or classes – giving you the opportunity to connect your legacy systems directly to Java application servers. You'll find some tools on IBM’s Alphaworks web site (www.alphaworks.ibm.com), as well as through most of the major application server vendors.

Of course, the option is there to avoid web-scraping completely – as web servers are available for most of the mid-range and mainframe systems currently installed. The market leaders here are probably IBM, as their two main operating systems and platforms all come with web servers, allowing you to deploy web applications on everything from desktop PCs to OS/390 mainframes.

The advantage of using a built-in web server is also that you have access to the scripting and development environment of the server. Instead of having to develop completely new skills, developers can instead produce web applications using the scripting languages they’re already using – including REXX. You may also find that alternate operating systems with Internet-centric features may have been ported to your platform, allowing you to access legacy data and hardware directly.

ODBC, JDBC and friends…

The client-server revolution left us with a new problem, in the shape of many different data stores, running on different platforms and all with different proprietary access methods. The proliferation of data in the enterprise has led to increasing complexity in the applications that need to collate and access this information. Where once a single mainframe held all an organisation’s data, it is now spread through out the business – making it both essential and difficult to access.

You can access these applications through their proprietary interfaces, thanks to the work done by many volunteer programmers, and a visit to CPAN (the Complete Perl Archive Network) at will give you access to a collection of Perl modules. These include tools that give you access to the native interfaces of most major databases, which you can then include in your CGI applications. Whilst you could create web applications that communicate directly with the Oracle SQL*Net interfaces, they are cumbersome and difficult to debug without significant Oracle skills.

It’s a lot easier to use generic tools to handle data access – especially if they can be cross-platform tools, with a consistent API. On of the big advantages that modern databases have over legacy systems is the SQL query language, and you should look for tools that support this.

Probably the most common tool is ODBC, Open DataBase Connectivity. Originally developed by Microsoft for the SQL Access Group, ODBC has rapidly become a common database access standard. With the goal of allowing access to any data from any application, ODBC acts as a universal middle tier – linking a database driver between your application and the database. This then translates SQL queries into a database’s native commands. If you’re using Microsoft’s Visual Studio tools, then you’ll find that Microsoft bundle ODBC drivers for most of the common databases, including Oracle and Sybase, with the ADO database access objects.

Unfortunately ODBC drivers aren’t available for all platforms, and are probably best supported under the Microsoft operating systems. Although Microsoft Windows was the first to provide an ODBC product, versions also exist for UNIX, OS/2, as well as Apple’s MacOS. If you’re working with Unix, you’ll probably find drivers for the leading databases, but you may well need to pay for them.

If there’s a JVM for your system, and you’re happy to work in Java, then you could try JDBC – the Java Database Connectivity toolkit. Like ODBC, JDBC is a set of APIs for linking Java applications to external databases. Like ODBC, you can use SQL statements to extract data from database tables, using JDBC as a translation layer. As JDBC is similar to ODBC, you can use a bridge to link JDBC to ODBC drivers – so that your Java application can take advantage of the available drivers. Using the JDBC classes, you can then include database access into your Java applications. These classes will map the most common SQL data types to Java, and also give you access to transactional queries. As a result, you can commit or rollback transactions from the JDBC interface.

One key feature of JDBC is its addressing scheme for data sources. As JDBC is an Internet-ready tool, its not surprising that it uses a URL-like addressing schema. A data source can be accessed using this form of statement:

jdbc:odbc://www.somecompany.com:400/databasefile

One of JDBC’s big advantages is its integration with the JavaBeans component model, as well as Java servlets and with JSP dynamically generated web pages. For larger scale applications you can also use it in conjunction with Enterprise JavaBeans.

Database Publishing

Live access to data isn’t always necessary, especially if you want to sanitise your data before publishing it, or don’t need to use dynamic content. Most of the common databases in use already come with built in database publishing tools, which will run queries on your data, and then create HTML pages.

Microsoft Office lets you create both dynamic and static pages from the contents of an Access database. However, we wouldn’t recommend using Access for more than small sites, as it isn’t designed to scale. Larger sites that want to use Access data will need to either move their data to a larger database, or use a static publication model.

Static pages are both fast and convenient, and are a lot easier to manage than a dynamic site. However, they do limit you to only displaying the data that you’ve published – if there are changes during a day then you’ll need to manually republish your site. It’s also important to be aware that you may end up with orphaned historical data on your site, in the form of pages that you don’t want anyone to see any more – but are still on your server and have been indexed by a search engine… There have been legal cases lost as a result of an orphan page containing out of date information.

You’ll need to work with an HTML designer in order to get the best look and feel for your pages. Some tools will allow you to use templates as a publishing mechanism, whilst others will let you create and then edit the HTML. If you’re planning regular data exports, then you may wish to go down the template route. One database that may not be normally considered as a publishing tool is FileMaker Inc.’s FileMaker Pro. The latest version, 5.0, will not only publish static and live databases, but will also use ODBC to import data from external databases for use on the web. If you’re after a quick database publishing solution, there aren’t many tools that are this simple and easy to use.

Another approach is to use a web design tool, which has database-publishing features. NetObject’s Fusion is an Intranet and small public site design tool that offers users ODBC links to external databases. These are used to generate what Fusion calls “stacked pages”. These are pages containing content generated from database records. It’s possible to create a single stacked page, along with a master navigation page, very quickly. Fusion will then populate your design and create the navigation view automatically. Fusion’s stacked-pages are an excellent tool for creating small catalogues, or for running a site that uses regularly updated database content. If you don’t need to rely on dynamic content, then this approach allows you to offer data on the web quickly and easily.

Application servers

Large-scale applications require a more complex approach to web application design and development. By using application servers, large-scale transactional object-oriented distributed systems can be developed. With the application server hosting your web-application business logic, you’re going to need to give it access to your data sources.

Most of the common application server environments come with data access components of their own, usually based on ODBC or JDBC. However you may find you need access to proprietary back-end systems, like PeopleSoft or SAP. You may need to find third party tools for these, though tools like HAHTsite and NetDynamics do provide support for ERP systems as well as databases.

Of course, you could just work with a CORBA or COM transactional system, developing your own data access components and tools. As large web systems are likely to require their own servlet architectures, this approach is becoming more and more common. You may wish to use a servlet development tool, like IBM’s WebSphere or Forte’s SynerJ (recently bought by Sun, and likely to become part of the Sun application server environment), as these will help you manage data connections, as well as providing classes to help you implement data-driven applications.

One of the more interesting solutions is provided by Apple’s WebObjects 4.0. This uses its own Java database classes to access databases, and provides persistence by adding information in its own database tables. This is similar to the features supported by EJB, but is able to work with a wide range of databases.

Some of the larger databases come with their own application server environments. At the forefront of this move is Oracle, who have recently launched Oracle 8i. This can be used with Oracle Application Server, as well as Oracle WebDB. WebDB is a browser based content publishing and development solution that gives users and developers easy access to data, as well as the tools required to publish it on the web.

With an application server, you will be able to handle more than just simple database queries – more complex applications can be built, that can work across multiple tables, as well as across multiple databases. You’ll also be able to use stored procedures, triggers and other internal database actions more effectively. By integrating and processing data, you can use an application server as the basis of a complex ecommerce application or a large-scale web system.

Cleaning up with SOAP

Things may well get a lot easier in the future, thanks to the concept of SOAP, the Simple Object Access Protocol. By providing an architecture-neutral access method for access to applications and remote objects, SOAP could form the basis of any future database access protocols. Instead of complex interfaces, data can be exposed through XML over HTTP. An implementation of XML-RPC, SOAP is set for wide industry acceptance, and cross-platform, cross-technology take-up.

By taking advantage of low impact technologies like XML and HTTP, SOAP gives you the opportunity to use lowest common denominator tools to handle data access and transfer. The SOAP specification defines the use of XML and HTTP as method to access services, objects, and servers in a platform-independent manner. Instead of having to worry about ODBC, JDBC, COM and CORBA, all you need to do is build a SOAP interface to your applications.

You can think of SOAP as a glue technology, linking diverse software components. As XML has been touted as the basis for Enterprise Application Integration, the implementation of SOAP interfaces in applications and servers will allow applications to communicate at an object and component level. You can even use a full-blown application – or even a web site – as a SOAP component as long as it outputs data in an XML RPC format. A SOAP aware database would be able to respond to queries embedded in XML documents, returning data sets as XML, with a defined schema.

SOAP implementations are being developed for most operating systems and application technologies, and has wide cross industry support. There was a recent boost for SOAP when Microsoft recently announced that it would be replacing DCOM as the default inter-object communication protocol in the next release of their Visual Studio development tool suite. You can find out more about SOAP from the msdn.microsoft.com web site, or www.xml-rpc.com.

Conclusion

Data on the web is a vitally important tool. Once it’s made available, it can be offered to as many people as possible, and can be interacted with using an extremely wide range of tools. There are as many ways of putting data on the web as there are of storing it in the first place.

From web-scraping legacy systems to XML-RPC components, you can take your pick of technologies. Access to data is a vital part of the modern web, and as we move to a disconnected, loosely coupled any-to-any Internet, it’s going to become more and more important.

 

Taking your Data to the Web
Home
Columns