Home
Java
URL Class Example, Download Web Page
This page was last reviewed on Dec 26, 2023.
Dot Net Perls
URL. A remote HTML file contains important information. With Java's URI and URL classes we can download it and use its contents in a String.
With openStream, we obtain a stream of the file contents. With a buffer array, we can create a string from the data we download. A StringBuilder here is helpful.
First program. This example implements a getPage method. It takes a file from a remote address and places it into a new String. There are some complexities in getPage.
Start We first create a URI object from the address argument (a String). This is used to create a new URL object.
Next We invoke openStream on our URL instance to get a readable stream of the file contents.
Then We use a while-loop to read the InputStream into a byte array. We then append to a StringBuilder to get the total file.
Result We can see that on the "Example" domain, it fetched the correct HTML document. The document is more than 1024 bytes.
import java.io.IOException; import java.io.InputStream; import java.net.URISyntaxException; import java.net.URL; import java.net.URI; public class Program { public static String getPage(String address) throws IOException, URISyntaxException { // Get URI and URL objects. URI uri = new URI(address); URL url = uri.toURL(); // Get stream of the response. InputStream in = url.openStream(); // Store results in StringBuilder. StringBuilder builder = new StringBuilder(); byte[] data = new byte[1024]; // Read in the response into the buffer. // ... Read many bytes each iteration. int c; while ((c = in.read(data, 0, 1024)) != -1) { builder.append(new String(data, 0, c)); } // Return String. return builder.toString(); } public static void main(String[] args) { try { String page = getPage("http://www.example.com/"); System.out.println(page); } catch (Exception ex) { System.out.println("ERROR"); } } }
<!doctype html> <html> <head> <title>Example Domain</title> <meta charset="utf-8" />
Short example. I developed this program when learning to use URI and URL objects. It creates a BufferedInputStream from the InputStream.
However It is unclear whether this approach has any advantage over using the InputStream directly.
Also When you have a byte array, we can convert it into a String with the String constructor.
So With this method, we can quickly download the first bytes of a document. This is helpful if we only need a small piece of a document.
import java.io.BufferedInputStream; import java.io.InputStream; import java.net.URL; import java.net.URI; public class Program { public static void main(String[] args) throws Exception { // Create URI and URL objects. URI uri = new URI("http://en.wikipedia.org/wiki/Main_Page"); URL url = uri.toURL(); InputStream in = url.openStream(); // Used a BufferedInputStream. BufferedInputStream reader = new BufferedInputStream(in); // Read in the first 200 bytes from the website. byte[] data = new byte[200]; reader.read(data, 0, 200); // Convert the bytes to a String. String result = new String(data); System.out.println(result); } }
<!DOCTYPE html> <html lang="en" dir="ltr" class="client-nojs"> <head> <meta charset="UTF-8" /> <title>Wikipedia, the free encyclopedia</title> ...
To download web pages, we combine many classes. We use URI and URL objects to start, and an InputStream to get the data. A byte array is a suitable buffer.
And A StringBuilder may also be used. In the getPage method above, we fetch an entire web page as a String.
Some notes. If only the first bytes of a web page are needed, it is probably best to avoid looping to get the entire file. This may also prevent errors with unusually long web pages.
Dot Net Perls is a collection of tested code examples. Pages are continually updated to stay current, with code correctness a top priority.
Sam Allen is passionate about computer languages. In the past, his work has been recommended by Apple and Microsoft and he has studied computers at a selective university in the United States.
This page was last updated on Dec 26, 2023 (simplify).
Home
Changes
© 2007-2024 Sam Allen.