While using Amazon’s Elastic MapReduce to import data into DynamoDB, my jobs would fail with following following message:
java.lang.RuntimeException: Error while reading from task log url
Apparently I need to learn more about Hive error logging, because the cause of the problem didn’t have anything to do with the task log, but rather with the CSV from which I was importing. That file didn’t use any special formatting to indicate empty fields (they just showed up as field 1,, field3). This didn’t cause any errors when reading from that file (with select * from table, since it was mapped as an external table), but only when I ran the import operation.
Anyway, the solution was to use the string literal \N to indicate empty fields, which Hive will read in as NULL. When writing to DynamoDB, columns with a NULL value just won’t have attributes created for them, since DynamoDB doesn’t allow empty attribute values.