-
-
Save tenderlove/250477 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
require 'rubygems' | |
require 'nokogiri' | |
puts "nokogiri: #{Nokogiri::VERSION}" | |
puts "libxml: #{Nokogiri::LIBXML_VERSION}" | |
count = 0 | |
File.open(ARGV[0], "r") do |input| | |
reader = Nokogiri::XML::Reader(input) { |cfg| | |
cfg.dtdload.dtdattr | |
} | |
reader.each do |node| | |
if reader.node_type == 1 && reader.name == "Product" | |
# usually I'd be grabbing the entire node and doing something useful | |
# with it.. | |
foo = reader.outer_xml | |
count += 1 | |
end | |
end | |
end | |
puts "found #{count} Product nodes" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<?xml version="1.0" encoding="UTF-8"?> | |
<!DOCTYPE ONIXMessage SYSTEM "http://www.editeur.org/onix/2.1/02/reference/onix-international.dtd"> | |
<ONIXMessage> | |
<Header> | |
<FromCompany>HarperCollins Publishers</FromCompany> | |
<ToCompany>Australian Booksellers Association</ToCompany> | |
<SentDate>20081106</SentDate> | |
</Header> | |
<Product> | |
<RecordReference>9780732287573</RecordReference> | |
<NotificationType>03</NotificationType> | |
<ProductIdentifier> | |
<ProductIDType>02</ProductIDType> | |
<IDValue>073228757X</IDValue> | |
</ProductIdentifier> | |
<ProductIdentifier> | |
<ProductIDType>03</ProductIDType> | |
<IDValue>9780732287573</IDValue> | |
</ProductIdentifier> | |
<ProductIdentifier> | |
<ProductIDType>15</ProductIDType> | |
<IDValue>9780732287573</IDValue> | |
</ProductIdentifier> | |
<ProductForm>BC</ProductForm> | |
<ProductFormDetail>B106</ProductFormDetail> | |
<Title> | |
<TitleType>01</TitleType> | |
<TitleText>High Noon–in Nimbin</TitleText> | |
</Title> | |
<Website> | |
<WebsiteLink>http://www.harpercollins.com.au/global_scripts/product_catalog/book_xml.asp?isbn=073228757X</WebsiteLink> | |
</Website> | |
<Contributor> | |
<ContributorRole>A01</ContributorRole> | |
<PersonNameInverted>Barrett, Robert G</PersonNameInverted> | |
<KeyNames>Barrett</KeyNames> | |
</Contributor> | |
<BICMainSubject>FA</BICMainSubject> | |
<OtherText> | |
<TextTypeCode>02</TextTypeCode> | |
<Text textformat="02">'He looks kind of happyn, the Ken Done of local literature' Courier-Mail </Text> | |
</OtherText> | |
<Imprint> | |
<ImprintName>HarperCollins</ImprintName> | |
</Imprint> | |
<Publisher> | |
<PublishingRole>01</PublishingRole> | |
<PublisherName>HarperCollins Publishers</PublisherName> | |
</Publisher> | |
<PublishingStatus>02</PublishingStatus> | |
<PublicationDate>20090301</PublicationDate> | |
<Measure><MeasureTypeCode>01</MeasureTypeCode> | |
<Measurement>234</Measurement> | |
<MeasureUnitCode>mm</MeasureUnitCode> | |
</Measure> | |
<Measure><MeasureTypeCode>02</MeasureTypeCode> | |
<Measurement>153</Measurement> | |
<MeasureUnitCode>mm</MeasureUnitCode> | |
</Measure> | |
<SupplyDetail> | |
<SupplierName>Harper Entertainment Distribution Services</SupplierName> | |
<Price> | |
<PriceTypeCode>02</PriceTypeCode> | |
<PriceAmount>29.99</PriceAmount> | |
</Price> | |
</SupplyDetail> | |
<MarketRepresentation> | |
<AgentName>HarperCollins Publishers</AgentName> | |
<AgentRole>07</AgentRole> | |
<MarketCountry>AU</MarketCountry> | |
<MarketPublishingStatus>02</MarketPublishingStatus> | |
<MarketDate> | |
<MarketDateRole>01</MarketDateRole> | |
<Date>20090301</Date> | |
</MarketDate> | |
</MarketRepresentation> | |
</Product> | |
</ONIXMessage> |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[jh@gaz onix.git (master)]$ ruby entities.rb | |
nokogiri: 1.4.0 | |
libxml: 2.7.6 | |
/var/lib/gems/1.8/gems/nokogiri-1.4.0/lib/nokogiri/xml/reader.rb:68:in `read': Entity 'ndash' not defined (Nokogiri::XML::SyntaxError) | |
from /var/lib/gems/1.8/gems/nokogiri-1.4.0/lib/nokogiri/xml/reader.rb:68:in `each' | |
from entities.rb:12 | |
from entities.rb:10:in `open' | |
from entities.rb:10 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment