Saturday, 24 August 2013

Parsing HTML with Jsoup issue in Android Application

Parsing HTML with Jsoup issue in Android Application

I am new to android development. I am using Jsoup to parse an URL to get
the file location.
Below is the code I have for parsing the URL, It works for most of the URL
I inserted. For example, www.baidu.com/ or www.nba.com/, the title Logged
is exactly same as shown in the page source.
However,
1) for http://music.baidu.com/ the title displayed in the Eclipse Log is
different from the page resource. Eclipse shows °Ù¶ÈÒôÀÖ Page Resource
shows °Ù¶ÈÒôÀÖ-ÖйúµÚÒ»ÒôÀÖÃÅ»§
/* This is the most important one I want to solve */. 2) For
http://music.baidu.com/search?key=%E5%86%8D%E8%A7%81%E7%8E%8B%E5%AD%90+%E6%A3%89%E8%8A%B1%E7%B3%96
Eclipse again shows °Ù¶ÈÒôÀÖ again, Page Resource shows ËÑË÷º¬ÓÐ"ÔÙ¼ûÍõ×Ó
ÃÞ»¨ÌÇ"µÄÒôÀÖ_°Ù¶ÈÒôÀÖ-ÖйúµÚÒ»ÒôÀÖÃÅ»§
Also, for those 2 webpage, nothing is in Element links, so the
Log.d("text", link.text()); never returns anything.
I notice that the 2 webpages source does not have in HTML like other HTML
has.
package com.example.htmlparser;
import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import android.app.Activity;
import android.os.Bundle;
import android.util.Log;
public class MainActivity extends Activity {
@Override
protected void onCreate(Bundle savedInstanceState) {
//set layout view
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main);
Thread downloadThread = new Thread() {
public void run() {
Document doc;
try {
String url = "";
doc = Jsoup.connect(url).get();
//doc = Jsoup.parse(new URL(url).openStream(),
"UTF-8", url);
String title = doc.title();
Log.d("title", title);
Elements links = doc.select("a[href]");
for (Element link : links) {
//Log.d("link", link.attr("href").toString());
Log.d("text", link.text());
}
} catch (IOException e) {
Log.d("exception", e.toString());
}
}
};
downloadThread.start();
}
}
Can someone help me to solve this problem?

No comments:

Post a Comment