-
-
Save prantlf/6304336 to your computer and use it in GitHub Desktop.
Interface for stateful and stateless BASE64 encoding. (Maintaining the encoding state is needed when encoding chunked or stream input.)
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#ifndef BOOST_NETWORK_UTILS_BASE64_ENCODE_HPP | |
#define BOOST_NETWORK_UTILS_BASE64_ENCODE_HPP | |
#include <boost/range/begin.hpp> | |
#include <boost/range/end.hpp> | |
#include <algorithm> | |
#include <iterator> | |
#include <string> | |
namespace boost { | |
namespace network { | |
namespace utils { | |
// Implements a BASE64 converter working on an iterator range. | |
namespace base64 { | |
// Stores the state after processing the last chunk by the encoder. If the | |
// chunk byte-length is not divisible by three, the last two or four bits | |
// will be stored and processed in front of the next chunk. | |
template <typename Value> | |
struct state { | |
bool empty() const; | |
void clear(); | |
}; | |
// Encodes an input sequence to BASE64 writing it to the output iterator | |
// and stopping if the last input tree-octet quantum is not complete, in | |
// which case it stores the state for the later continuation, when another | |
// input chunk is ready for the encoding. The encoding must be finished | |
// by calling the encode_rest after processing the last chunk. | |
// | |
// std::vector<unsigned char> buffer = ...; | |
// std::string result; | |
// std::back_insert_iterator<std::string> appender(result); | |
// base64::state<unsigned char> rest; | |
// base64::encode(buffer.begin(), buffer.end(), appender, rest); | |
// ... | |
// base64::encode_rest(appender, rest); | |
template < | |
typename InputIterator, | |
typename OutputIterator, | |
typename State | |
> | |
OutputIterator encode(InputIterator begin, | |
InputIterator end, | |
OutputIterator output, | |
State & rest); | |
// Finishes encoding of the previously processed chunks. If their total | |
// byte-length is divisible by three, nothing is needed; if not, the last | |
// quantum will be encoded as if padded with zeroes, which will be indicated | |
// by appending '=' characters to the output. This method must be always | |
// used at the end of encoding, if the previous chunks were encoded by the | |
// method overload using the encoding state. | |
// | |
// std::vector<unsigned char> buffer = ...; | |
// std::string result; | |
// std::back_insert_iterator<std::string> appender(result); | |
// base64::state<unsigned char> rest; | |
// base64::encode(buffer.begin(), buffer.end(), appender, rest); | |
// ... | |
// base64::encode_rest(appender, rest); | |
template < | |
typename OutputIterator, | |
typename State | |
> | |
OutputIterator encode_rest(OutputIterator output, | |
State & rest); | |
// Encodes an entire input sequence to BASE64, which either supports begin() | |
// and end() methods returning boundaries of the sequence or the boundaries | |
// can be computed by the Boost::Range, writing it to the output iterator | |
// and stopping if the last input tree-octet quantum is not complete, in | |
// which case it stores the state for the later continuation, when another | |
// input chunk is ready for the encoding. The encoding must be finished | |
// by calling the encode_rest after processing the last chunk. | |
// | |
// Warning: Buffers identified by C-pointers are processed including their | |
// termination character, if they have any. This is unexpected at least | |
// for the string literals, which have a specialization here to avoid it. | |
// | |
// std::vector<unsigned char> buffer = ...; | |
// std::string result; | |
// std::back_insert_iterator<std::string> appender(result); | |
// base64::state<unsigned char> rest; | |
// base64::encode(buffer, appender, rest); | |
// ... | |
// base64::encode_rest(appender, rest); | |
template < | |
typename InputRange, | |
typename OutputIterator, | |
typename State | |
> | |
OutputIterator encode(InputRange const & input, | |
OutputIterator output, | |
State & rest); | |
// Encodes an entire string literal to BASE64, writing it to the output | |
// iterator and stopping if the last input tree-octet quantum is not | |
// complete, in which case it stores the state for the later continuation, | |
// when another input chunk is ready for the encoding. The encoding must | |
// be finished by calling the encode_rest after processing the last chunk. | |
// | |
// The string literal is encoded without processing its terminating zero | |
// character, which is the usual expectation. | |
// | |
// std::string result; | |
// std::back_insert_iterator<std::string> appender(result); | |
// base64::state<char> rest; | |
// base64::encode("ab", appender, rest); | |
// ... | |
// base64::encode_rest(appender, rest); | |
template <typename OutputIterator> | |
OutputIterator encode(char const * value, | |
OutputIterator output, | |
state<char> & rest); | |
// Encodes a part of an input sequence specified by the pair of begin and | |
// end iterators.to BASE64 writing it to the output iterator. If its total | |
// byte-length is not divisible by three, the output will be padded by the | |
// '=' characters. If you encode an input consisting of mutiple chunks, | |
// use the method overload maintaining the encoding state. | |
// | |
// std::vector<unsigned char> buffer = ...; | |
// std::string result; | |
// base64::encode(buffer.begin(), buffer.end(), std::back_inserter(result)); | |
template < | |
typename InputIterator, | |
typename OutputIterator | |
> | |
OutputIterator encode(InputIterator begin, | |
InputIterator end, | |
OutputIterator output); | |
// Encodes an entire input sequence to BASE64 writing it to the output | |
// iterator, which either supports begin() and end() methods returning | |
// boundaries of the sequence or the boundaries can be computed by the | |
// Boost::Range. If its total byte-length is not divisible by three, | |
// the output will be padded by the '=' characters. If you encode an | |
// input consisting of mutiple chunks, use the method overload maintaining | |
// the encoding state. | |
// | |
// Warning: Buffers identified by C-pointers are processed including their | |
// termination character, if they have any. This is unexpected at least | |
// for the string literals, which have a specialization here to avoid it. | |
// | |
// std::vector<unsigned char> buffer = ...; | |
// std::string result; | |
// base64::encode(buffer, std::back_inserter(result)); | |
template < | |
typename InputRange, | |
typename OutputIterator | |
> | |
OutputIterator encode(InputRange const & value, | |
OutputIterator output); | |
// Encodes an entire string literal to BASE64 writing it to the output | |
// iterator. If its total length (without the trailing zero) is not | |
// divisible by three, the output will be padded by the '=' characters. | |
// If you encode an input consisting of mutiple chunks, use the method | |
// overload maintaining the encoding state. | |
// | |
// The string literal is encoded without processing its terminating zero | |
// character, which is the usual expectation. | |
// | |
// std::string result; | |
// base64::encode("ab", std::back_inserter(result)); | |
template <typename OutputIterator> | |
OutputIterator encode(char const * value, | |
OutputIterator output); | |
// Encodes an entire input sequence to BASE64 returning the result as | |
// string, which either supports begin() and end() methods returning | |
// boundaries of the sequence or the boundaries can be computed by the | |
// Boost::Range. If its total byte-length is not divisible by three, | |
// the output will be padded by the '=' characters. If you encode an | |
// input consisting of mutiple chunks, use other method maintaining | |
// the encoding state writing to an output iterator. | |
// | |
// Warning: Buffers identified by C-pointers are processed including their | |
// termination character, if they have any. This is unexpected at least | |
// for the string literals, which have a specialization here to avoid it. | |
// | |
// std::vector<unsigned char> buffer = ...; | |
// std::string result = base64::encode<char>(buffer); | |
template < | |
typename Char, | |
typename InputRange | |
> | |
std::basic_string<Char> encode(InputRange const & value); | |
// Encodes an entire string literal to BASE64 returning the result as | |
// string. If its total byte-length is not divisible by three, the | |
// output will be padded by the '=' characters. If you encode an | |
// input consisting of mutiple chunks, use other method maintaining | |
// the encoding state writing to an output iterator. | |
// | |
// The string literal is encoded without processing its terminating zero | |
// character, which is the usual expectation. | |
// | |
// std::string result = base64::encode<char>("ab"); | |
template <typename Char> | |
std::basic_string<Char> encode(char const * value); | |
// Offers an interface to the BASE64 converter based on stream manipulators | |
// to be friendly to the usage with output streams combining heterogenous | |
// output by using the output operators. The encoding state is serialized to | |
// long and maintained in te extensible internal array of the output stream. | |
namespace io { | |
// Encoding ostream manipulator for sequences specified by the pair of begin | |
// and end iterators. | |
// | |
// std::vector<unsigned char> buffer = ...; | |
// std::basic_ostream<Char> & output = ...; | |
// output << base64::io::encode(buffer.begin(), buffer.end()) << ... << | |
// base64::io::encode_rest<unsigned char>; | |
template <typename InputIterator> | |
detail::input_wrapper<InputIterator> encode(InputIterator begin, | |
InputIterator end); | |
// Encoding ostream manipulator processing whole sequences which either | |
// support begin() and end() methods returning boundaries of the sequence | |
// or the boundaries can be computed by the Boost::Range. | |
// | |
// Warning: Buffers identified by C-pointers are processed including their | |
// termination character, if they have any. This is unexpected at least | |
// for the string literals, which have a specialization here to avoid it. | |
// | |
// std::vector<unsigned char> buffer = ...; | |
// std::basic_ostream<Char> & output = ...; | |
// output << base64::io::encode(buffer) << ... << | |
// base64::io::encode_rest<unsigned char>; | |
template <typename InputRange> | |
detail::input_wrapper< | |
typename boost::range_const_iterator<InputRange>::type | |
> | |
encode(InputRange const & value); | |
// Encoding ostream manipulator processing string literals; the usual | |
// expectation from their encoding is processing only the string content | |
// without the terminating zero character. | |
// | |
// std::basic_ostream<Char> & output = ...; | |
// output << base64::io::encode("ab") << ... << | |
// base64::io::encode_rest<char>; | |
inline detail::input_wrapper<char const *> encode(char const * value); | |
// Encoding ostream manipulator which finishes encoding of the previously | |
// processed chunks. If their total byte-length is divisible by three, | |
// nothing is needed, if not, the last quantum will be encoded as if padded | |
// with zeroes, which will be indicated by appending '=' characters to the | |
// output. This manipulator must be always used at the end of encoding, | |
// after previous usages of the encode manipulator. | |
// | |
// std::basic_ostream<Char> & output = ...; | |
// output << base64::io::encode("ab") << ... << | |
// base64::io::encode_rest<char>; | |
template < | |
typename Value, | |
typename Char | |
> | |
std::basic_ostream<Char> & encode_rest(std::basic_ostream<Char> & output); | |
// Clears the encoding state in the internal array of the output stream. | |
// Use it to re-use a state object in an unknown state only; encoding of | |
// the last chunk must be followed by encode_rest. The encode_rest clears | |
// the state when finished. | |
// | |
// std::basic_ostream<Char> & output = ...; | |
// output << base64::io::encode("ab") << ...; | |
// ... | |
// output << clear_state<char>; | |
template < | |
typename Value, | |
typename Char | |
> | |
std::basic_ostream<Char> & clear_state(std::basic_ostream<Char> & output); | |
// Checks if the encoding state in the internal array of the output stream | |
// is empty. | |
// | |
// std::basic_ostream<Char> & output = ...; | |
// output << base64::io::encode("ab") << ...; | |
// bool is_complete = base64::io::empty_state<char>(output); | |
template < | |
typename Value, | |
typename Char | |
> | |
bool empty_state(std::basic_ostream<Char> & output); | |
} // namespace io | |
} // namespace base64 | |
} // namespace utils | |
} // namespace network | |
} // namespace boost | |
#endif // BOOST_NETWORK_UTILS_BASE64_ENCODE_HPP |
I'm going on with the encoder after coming back from vacation. I updated this gist with the interface I implemented in the initial version. I'm not satisfied with the stream manipulator interface; it works for iterators/ranges only and not . I'd like to try how a transforming streambuf, wrapper stream or a codecvt facet would feel in the user code. The interface resembles the boost::network::uri::encode; I didn't create a class with static methods. I'm not sure if the usage stays this easy when encoding options are introduced. I'd better update the original issue and the e-mail thread.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Thanks for the review!
Fixed the return type typo. (I'll have to find the right way to work with gists. Pasting code from other project and renaming the identifiers is annoying...)
I'll have to think about the iterator type requirement. Bidirectional would be OK for chunked encoding, but not for stream encoding. (I'm not sure if supporting iterators reading directly from a socket isn't pushing this too much.) The two approaches you suggest would enable forward iterators and still allow using the boost serialization iterator adaptors.
Interestingly, both approaches would allow keeping the carry-over from encoding a previous chunk without modifying the transform_width - which I like. The original transform_width is otherwise too eager and processes incomplete triplets too. I created transform_width_with_carry and iterator_adaptor_with_carry to get over it, but you showed me another way, probably with less code.
Alternatively, i was thinking about an iostream interface:
In such case I wouldn't need to compute the iterator difference and cold use the stream position since the last padding in base64_end: