Wednesday, May 13, 2015

Managed Vs ExternalTables

-A Hive table that's not external is called a managed table.

-One of the main differences between an external and a managed table in Hive is that when an external table is dropped,
 the data associated with it  doesn't get deleted, only the metadata (number of columns, type of columns, terminators,
 etc.) gets dropped from the Hive metastore. When a managed table gets dropped, both the metadata and data get dropped.
 I have so far always preferred making tables external because if the schema of my Hive table changes,
 I can just drop the external table and re-create another external table over the same HDFS data with the new schema.
 However, most (if not all) of the changes to schema can now be made through ALTER TABLE or similar commands so my
 recommendation/preference to use external tables over managed ones might be more of a legacy concern than a contemporary

 one.

No comments:

Post a Comment