In Kafka, the responsibility of what has been consumed is the responsibility of the consumer and this is also one of the main reasons why Kafka has such great horizontal scalability.
Using the high level consumer API will automatically do this for you by committing consumed offsets in Zookeeper (or a more recent configuration option is using by a special Kafka topic to keep track of consumed messages).
The simple consumer API make you deal with how and where to keep track of consumed messages yourself.
Purging of messages in Kafka is done automatically by either specifying a retention time for a topic or by defining a disk quota for it so for your case of one 5GB file, this file will be deleted after the retention period you define has passed, regardless of if it has been consumed or not.