researchers records be bundled into larger files...
JSON and XML format are perfectly suited to contain numerous instances of similar objects. Your public data files contain millions of records, each of which in its own file, usually smaller than your typical filesystem's blocksize. This is VERY bad practice and a nightmare for anyone interested in the data.
Thank you for your suggestion.
We have marked this iDea as declined since we are not planning to implement this suggestion for the public data file.
The primary purpose of the data dump is archiving exact representations of what the API would return. If we presented the data dump in tabular form some rows would be 1G or larger. Not to mention representing complicated data structures in tabular form is complicated. This may be less than ideal for your purpose.
However we are always striving to make improvements. Over the next couple of months expect to see some data dump to Mongo DB recipes published (you can even now google” orcid data dump mongodb”). These recipes can be easily modified for streaming to other db models. Hopefully that might ease the process.
Side note for those who have access to Premium Membership: there are also AWS S3 syncing options and webhooks.
ORCID Community Team