Sometimes each item will have some key value and this can help the alignment of the items. ‘10’ in one file is represented as ’10.0’ in the other one and if the comparison is text-based (which most are) it will not match these. Often though this is not the case, for example all the A items have an additional field in them or a systematic difference, e.g. most of the items in A have an exactly equal item in B. There is always the hope that the two data sets are almost the same, i.e. We use statistical methods to align as many items as possible, starting of course with those that are equal and then moving to those that are nearly equal and so on. We have recently done a lot of work in this area to get the best possible results. And now the fun starts: we are left with those that are not exact matches. So exactly equal items can be aligned with each other fairly easily. If an item in A is exactly the same as one in B then these two can be aligned and are not a candidate for a difference. Is it possible to have a comparison that ignores the order, so there is no need to sort first? Let us say we have two JSON arrays, one array or items from a data source A and one array of items from data source B. So a different sort will produce different results if there is data that is not equal, and this is probably not what you want. If the sort order is not ‘good’ then a small change may result in items that should correspond with one another being a long way apart and so you cannot determine whether an item is missing or has simply changed slightly. By ‘good’ I mean one that will put similar items adjacent to each other and then you can more easily see if something has changed. You can of course sort the data first then compare it but this is not always as easy as it sounds if you have complex data items because it is difficult to define a good sorting order. This is why most comparison tools, including the ubiquitous diff utility, compare ordered data and cannot handle orderless or unordered comparison. Orderless comparison sounds quite easy but in practice it is simpler to compare items if they are ordered. In this case, although a change to the order of items in a JSON array is deemed by the JSON standard to be change, in this case it is not – so the array needs to be compared without worrying about any change to the order. If the JSON and XML have come from different sources (which were meant to be the same) or from the same source but by different processes, then the order of the data items may differ. Multiple similar data items would probably be represented in XML as child elements within a single parent and this might be converted to a JSON array. Very often data that is in both formats will have some differences, and one that is pertinent to the paper mentioned above is order of the data. If you have a schema you can use this to normalise the data, but that is an additional step in the process. The Saxon XSLT processor will write JSON out from an XML representation though you may need to do some transformation to get it into the correct XML first.Īnother reason for using JSON is that it understands numbers whereas XML content is handled as text. If you have access to XSLT then this is a good way to convert from XML to JSON. I would suggest converting from XML into JSON because data is easier to compare in JSON because it is a simpler representation. This depends on the tools you are familiar with and also how you would prefer to see the changes, whether in JSON or XML. Is it easier to convert XML to JSON or vice versa? One benefit of DeltaXML tools is that the changes are represented directly in these formats so they are easy to process and you do not need to load large files into an editor to review changes. The reason is simply that unless your data is small and you can look at it all easily, you will likely need to do some post-processing to get the changes into a form that you can easily review, and this is easier if the changes are represented in JSON or XML. If some of your data is in CSV then it is probably best to convert this to JSON or XML and then compare. Unfortunately comparing the different formats directly is not possible so some pre-processing is needed. Related to this is the question we sometimes get in our support channel: how do I compare the same data in XML and in CSV (Comma Separated Value)? there is data that is in XML and also in JSON and it is necessary to check that the information is the same or to find out where it differs. One question that arose was how to compare one with the other, i.e. I presented a paper at the Balisage conference recently on the significance (or not!) of element order in XML, “Element order is always important in XML, except when it isn’t”. Data is often published in different formats or serialised from one data source in XML and from another in JSON.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |