URL

A remote HTML file contains important information. With Java's URI and URL classes we can download it and use its contents in a String.

With openStream, we obtain a stream of the file contents. With a buffer array, we can create a string from the data we download. A StringBuilder here is helpful.

First program

This example implements a getPage method. It takes a file from a remote address and places it into a new String. There are some complexities in getPage.

Start We first create a URI object from the address argument (a String). This is used to create a new URL object.

Next We invoke openStream on our URL instance to get a readable stream of the file contents.

Then We use a while-loop to read the InputStream into a byte array. We then append to a StringBuilder to get the total file.

Result We can see that on the "Example" domain, it fetched the correct HTML document. The document is more than 1024 bytes.

import java.io.IOException;
import java.io.InputStream;
import java.net.URISyntaxException;
import java.net.URL;
import java.net.URI;

public class Program {

    public static String getPage(String address) throws IOException, URISyntaxException {
        // Get URI and URL objects.
        URI uri = new URI(address);
        URL url = uri.toURL();

        // Get stream of the response.
        InputStream in = url.openStream();

        // Store results in StringBuilder.
        StringBuilder builder = new StringBuilder();
        byte[] data = new byte[1024];

        // Read in the response into the buffer.
        // ... Read many bytes each iteration.
        int c;
        while ((c = in.read(data, 0, 1024)) != -1) {
            builder.append(new String(data, 0, c));
        }

        // Return String.
        return builder.toString();
    }

    public static void main(String[] args) {

        try {
            String page = getPage("http://www.example.com/");
            System.out.println(page);
        } catch (Exception ex) {
            System.out.println("ERROR");
        }
    }
}<!doctype html>
<html>
<head>
    <title>Example Domain</title>

    <meta charset="utf-8" />

`Short` example

I developed this program when learning to use URI and URL objects. It creates a BufferedInputStream from the InputStream.

However It is unclear whether this approach has any advantage over using the InputStream directly.

Also When you have a byte array, we can convert it into a String with the String constructor.

So With this method, we can quickly download the first bytes of a document. This is helpful if we only need a small piece of a document.

import java.io.BufferedInputStream;
import java.io.InputStream;
import java.net.URL;
import java.net.URI;

public class Program {
    public static void main(String[] args) throws Exception {

        // Create URI and URL objects.
        URI uri = new URI("http://en.wikipedia.org/wiki/Main_Page");
        URL url = uri.toURL();
        InputStream in = url.openStream();

        // Used a BufferedInputStream.
        BufferedInputStream reader = new BufferedInputStream(in);

        // Read in the first 200 bytes from the website.
        byte[] data = new byte[200];
        reader.read(data, 0, 200);

        // Convert the bytes to a String.
        String result = new String(data);
        System.out.println(result);
    }
}<!DOCTYPE html>
<html lang="en" dir="ltr" class="client-nojs">
<head>
<meta charset="UTF-8" />
<title>Wikipedia, the free encyclopedia</title>
...

To download web pages, we combine many classes. We use URI and URL objects to start, and an InputStream to get the data. A byte array is a suitable buffer.

And A StringBuilder may also be used. In the getPage method above, we fetch an entire web page as a String.

Some notes

If only the first bytes of a web page are needed, it is probably best to avoid looping to get the entire file. This may also prevent errors with unusually long web pages.

URL

First program

Short example

Some notes

`Short` example