6 Steps to Easily Parse Data from a Trusted Source

by Gary Roberts | Updated: 03/02/2016 | Comments: 1

Blog Topics


Search the Blog


Subscribe to the Blog

Get an email when a new article is posted. Choose the topics that interest you most.


Enter your email address:



Suggest an Article

Is there a topic you would like to learn more about? Let us know.

Leave this field empty

parsing data in code

Would it be helpful to include data from a reputable source with your own data? If you have permission to use another source’s data for free or by agreement, how can you easily extract the specific data you want to use without doing a lot of coding?

In this article, I’ll show you how you can use an instruction in the CRBasic programming language to reap the benefit of a trusted source’s data while saving yourself a lot of time and effort. For example, you might want to pull data from a known, good source. The data may be stored on a government server or another source at no cost, and be offered in several different formats, including the eXtensible Markup Language (XML).

An Example of How Parsing Data Works

To highlight the six parsing steps, I’ll walk you through an example. In this example, we need the temperature in Fahrenheit from a NOAA weather station at an airport (KLGU). We want to use this temperature data with the data from a weather station that is just a couple of miles away.

The airport’s NOAA weather station data is hosted on NOAA web servers and is available for public use. By doing a quick search of their website, we found the airport’s weather station data here: http://w1.weather.gov/xml/current_obs/KLGU.xml. The data, including temperature, is posted hourly in XML format similar to this:

KLGU Airport weather data

To see the actual XML code, we have to right-click the web page and select View page source from the menu. What we see looks similar to this:

XML code from KLGU Airport weather data

There is a lot of data in the XML code. If we used the normal programming methods, it would take us quite some time to do the coding. Fortunately, CRBasic has an XMLParse() instruction that we can use to save us a couple of hours of keyboarding time.

#1 - Declare the variables

To get started with the XMLParse() instruction, we need a few declared variables (constants). The XMLParse() instruction uses these variables to know where it is in the XML file, if an error occurred, and if it is finished parsing.


'Return values for XMLParse.  We use these to keep track of where we are and if we have errors.
Const XML_TOO_MANY_NAMESPACES = -3		' Too many name space declarations encountered while parsing an element
Const XML_NESTED_TOO_DEEP = -2			' Too many nested XML elements
Const XML_SYNTAX_ERROR_OR_FAILED = -1		' XML syntax error or XMLParse failed
Const XML_UNRECOGNIZED_ERROR_CONDITION = 0	' Unrecognized error condition
Const XML_START = 1				' Start of XML element
Const XML_ATTRIBUTE_READ = 2			' XML attribute read.
Const XML_END_OF_ELEMENT = 3			' End of XML element
Const XML_END_OF_DOCUMENT = 4			' END of XML document encountered.

'XMLParse max settings so we don't use all of the datalogger's memory
Const XML_MAX_DEPTH = 10       

Const XML_MAX_NAMESPACES = 3

#2 - Use a variable to store the results

We also need a variable in which to store the results from the XMLParse() instruction:


Public noaa_air_temperature_f

#3 - Add variables for parsing

In addition, we need a few more variables for the XMLParse() instruction to use while it is parsing the XML file. We could use Dim variable declarations, but let’s use Public variables to aid in our troubleshooting.


Public xml_attribute_name As String
Public xml_attribute_namespace As String * 100
Public xml_data As String * 3000
Public xml_element_name As String * 50
Public xml_element_namespace As String * 30
Public xml_response_code
Public xml_state
Public xml_value As String * 50

#4 - Add variables for file retrieval

To retrieve the XML file from the server, we are going to use the HTTPGet() instruction. For this instruction, we need to add a couple of variables:


Public xml_http_header As String * 300
Public xml_http_socket As Long

#5 - Add code to load the file

To get the XML file from the server and load it into the xml_data variable, we need to add the following code somewhere in a slow sequence scan:


xml_http_header = ""
xml_http_socket = HTTPGet("http://w1.weather.gov/xml/current_obs/KLGU.xml", xml_data, xml_http_header) 
TCPClose(xml_http_socket) 'Close our connection to the web server.

#6 - Add a while loop

To let the XMLParse() instruction do its work, we add a while loop (using the While/Wend instruction) and set the initial xml_response_code.


xml_response_code = XML_START 'Tells XMLParse that we are just starting.
While ((xml_response_code > XML_UNRECOGNIZED_ERROR_CONDITION) AND (xml_response_code <> XML_END_OF_DOCUMENT))
	xml_response_code = XMLParse(xml_data, xml_value, xml_attribute_name, xml_attribute_namespace, _
		xml_element_name, xml_element_namespace, XML_MAX_DEPTH, XML_MAX_NAMESPACES) 

	If xml_response_code = XML_END_OF_ELEMENT AND xml_element_name = "temp_f" Then 
		noaa_air_temperature_f = xml_value
	EndIf
Wend

While the XMLParse() instruction is running the while loop, it is searching for the element named temp_f. When the XMLParse() instruction finds this element, it assigns the value positioned between <temp_f> and </temp_f> to noaa_air_temperature_f.

We are now getting the value we wanted (temperature in Fahrenheit) from the NOAA station, and we can include it with our own weather station data.

More Information

If you have a CR1000, CR3000, CR800, CR850, or CR6 datalogger with an Ethernet interface, you can download and run a working copy of this program.

Recommended for You: For an explanation of the different parts of an XML file, review the “XML Tree” section offered by w3schools.com. This web developer site has basic tutorials that detail the different elements, namespaces, and attributes that can be used in XML.

I hope this information was helpful to you. If you have any questions, please post them below.


Share This Article


About the Author

gary roberts Gary Roberts is the Product Manager over communications and software products at Campbell Scientific, Inc. He spends his days researching new technology, turning solutions to problems into stellar products, doing second-tier support, or geeking out on Campbell Scientific gear. Gary's education and background are in Information Technology and Computer Science. When he's not at work, he is out enjoying the great outdoors with his Scouts, fighting fire/EMS, working amateur radio, or programming computers.

View all articles by this author.


Comments

OnAMission | 03/29/2016 at 10:09 AM

Thanks for the great blog post! These advanced guides are very useful so please keep them comming!

Please log in or register to comment.