Created
June 5, 2014 17:23
-
-
Save twneale/4c4bc80c35ed7c1fef40 to your computer and use it in GitHub Desktop.
US Code thread
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
usc25.xml: | |
- The xml for /us/usc/t25/s450l (Contract or grant specifications) contains a model agreement with its own internal | |
Thom Neale <[email protected]> | |
Apr 13 | |
to Katherine | |
Hi Katherine, please forgive the earlier email fragment; I accidentally sent if before I was done typing. | |
Thom Neale <[email protected]> | |
Apr 13 | |
to Katherine | |
Hi Katherine, | |
Are the USLM identifiers (described on page 40 of the schema user manual http://uscode.house.gov/download/resources/USLM-User-Guide.pdf) designed to be unique? I found a number of them that aren't (in the attached file). I researched two of these to understand why they're happening: | |
This first was /us/usc/t25/s450l (Contract or grant specifications). This element contains an embedded model agreement with its own internal divisions. Those internal divisions have inaccurate identifiers; all of them begin with /us/usc/t25/s1, which is clearly a bug, since the model agreement is a part of /us/usc/t25/s450l. | |
The second was Title 21, section 812. This section lays out several schedules of controlled substances in subsection (c). In this case, the schedules also have their own internal paragraph numbering schemes, which are unrelated to the structure of section 812. Because each schedule's paragraph numbering scheme starts anew at (a), the identifiers for all the paragraph (a)'s in each schedule have the same identifiers. Moreover, the identifier for schedule 1, paragraph (a) is /us/usc/t21/s812/a, which suggests paragraph (a) is a child of section 812, when in reality it's a child of 812(c). | |
Last question--is there a bug tracker were it would be more convenient for me report issues like this? Thank you for your time, | |
Thom Neale | |
Attachments area | |
Text | |
identifiers.txt | |
Lane, Katherine <[email protected]> | |
Apr 16 | |
to me | |
Mr. Neale: | |
Thank you for your question on the USLM identifiers. We are looking into the issue and will get back to you soon with a detailed response. Our apologies for any inconvenience this may have caused you. | |
We appreciate your emails. They keep us on our toes and help us identify and resolve problems with our website. | |
Thanks, again. | |
Katherine Lane | |
Assistant Counsel | |
Office of the Law Revision Counsel | |
U.S. House of Representatives | |
(202) 226-9053 | |
From: Thom Neale [mailto:[email protected]] | |
Sent: Sunday, April 13, 2014 9:46 PM | |
To: Lane, Katherine | |
Subject: Re: US Code release point 113-88 issues | |
Thom Neale <[email protected]> | |
Apr 16 | |
to Katherine | |
It's my pleasure to help test out this otherwise uniquely high-quality dataset; finding one or two unresolved edge cases is a small inconvenience, if any. Thank you for your reply, | |
Thom | |
Lane, Katherine <[email protected]> | |
May 2 | |
to me | |
Mr. Neale: | |
The USLM identifier issue is taking longer than expected. We are working with our contractor and will provide a detailed answer to your question as soon as we can. It may be several more weeks. If your question is urgent, please let us know and we will see if we can move it up on the priority project list. | |
For now, emails either to me or to [email protected] are the best way to contact us with questions or comments about the website or the U.S. Code. | |
Thank you for your patience. | |
Katherine Lane | |
From: Thom Neale [mailto:[email protected]] | |
Sent: Wednesday, April 16, 2014 1:46 PM | |
Lane, Katherine | |
Jun 3 (2 days ago) | |
to me | |
Thom: | |
The identifier attribute is designed to be unique, but uniqueness cannot be guaranteed at this time. The identifier attribute is meant to reflect the numbering that exists in the text. Most of the time the identifier attribute is unique, but there are some duplicates. This generally occurs where the section text structure is non-traditional. For instance, if two subsections (a) have been enacted in a section 1234, they will both get the identifier /us/usc/tXX/s1234/a. When a user or program asks for /us/usc/tXX/s1234/a, the user or program will get two results. | |
In your examples, the identifiers should not have been duplicates. The reason for these duplicates is that the conversion is not yet handling non-traditional structures within section text. Your examples contain an "insertion" of "external content" into the text of the section, making the structure non-traditional. As you point out, the model agreement embedded in 25 USC 450l (c) and the schedules of controlled substances inserted in 21 USC 812(c) have their own structures independent of the section text. | |
The id attribute, on the other hand, is unique and the schema enforces the uniqueness. | |
The US Code in XML is being created by converting free form text with GPO photocomposition codes (locators) into XML and not by an XML editing environment. Because of this there are some limitations in regards to its content. We are working on creating a native US Code in XML editing environment. | |
We do not currently have a publicly available bug-tracker. Please continue to email us with any questions or comments. We value your insight. | |
Thank you for your patience. I hope this answers your questions. | |
Katherine Lane | |
Assistant Counsel | |
Office of the Law Revision Counsel | |
U.S. House of Representatives |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment