How was the repository built (cont.)
Custom VB Parser for each collection
- UIUC records already in DC and XML, so loaded into DOM and inserted records into DB
- Dlib Mag. already in XML, so loaded into DOM, translated to DC and inserted into DB
- NetLib used custom flat file format, so wrote a web crawler with custom parser, translated into DC, inserted into DB
- Berkeley used RFC1807 or flat file ASCII DB dumps, so wrote custom parser, translated to DC, and inserted into DB