Skip to content

Instantly share code, notes, and snippets.

@andiac
Last active January 30, 2020 16:01
Show Gist options
  • Save andiac/a5787e43148c9fc324dc607d54fa3052 to your computer and use it in GitHub Desktop.
Save andiac/a5787e43148c9fc324dc607d54fa3052 to your computer and use it in GitHub Desktop.
Merge PDF
import glob
from PyPDF2 import PdfFileMerger
def merger(output_path, input_paths):
pdf_merger = PdfFileMerger()
file_handles = []
for path in input_paths:
pdf_merger.append(path)
with open(output_path, "wb") as f:
pdf_merger.write(f)
def my_order(s):
assert s[1].isdigit()
if s[2].isdigit():
return (int(s[1])*10 + int(s[2]), ord(s[3]), s[4:])
else:
return (int(s[1]), ord(s[2]), s[3:])
'''
why we need my_order? because:
['w0a_admin.pdf', 'w0b_books.pdf', 'w0c_background_selftest.pdf', 'w0c_background_selftest_answers.pdf', 'w0d_maths.pdf', 'w0e_programming.pdf', 'w0f_expectations.pdf', 'w10a_sparsity_and_L1.pdf', 'w10b_more_optimization.pdf', 'w10c_ensembles_and_model_combination.pdf', 'w1a_intro.pdf', 'w1b_linear_regression.pdf', 'w1c_linear_regression_regularization.pdf', 'w2a_train_test_val.pdf', 'w2b_univariate_gaussian.pdf', 'w2b_univariate_gaussian_answers.pdf', 'w2c_central_limit.pdf', 'w2c_central_limit_answers.pdf', 'w2d_error_bars.pdf', 'w2e_multivariate_gaussian.pdf', 'w3a_intro_classification.pdf', 'w3b_regression_gradients.pdf', 'w3c_logistic_regression.pdf', 'w4a_softmax_and_robust_regressions.pdf', 'w4b_neural_net_intro.pdf', 'w4c_neural_net_more_fit.pdf', 'w5a_backprop.pdf', 'w5b_autoencoders_pca.pdf', 'w6a_netflix_prize.pdf', 'w6b_bayesian_regression.pdf', 'w6c_bayesian_inference_prediction.pdf', 'w7a_bayesian_complexity_control.pdf', 'w7b_gaussian_processes.pdf', 'w8a_gaussian_process_kernels.pdf', 'w8b_bayes_logistic_regression_laplace.pdf', 'w8c_logistic_regression_prediction.pdf', 'w9a_variational_kl.pdf', 'w9b_variational_details.pdf', 'w9c_mixture_models.pdf']
'''
if __name__ == "__main__":
paths = glob.glob("w*.pdf")
paths.sort(key=my_order)
print(paths)
merger('merged.pdf', paths)
@andiac
Copy link
Author

andiac commented May 13, 2019

sort may cause

before sort:
['1.pdf', '2.pdf', '3.pdf', '4.pdf', '5.pdf', '6.pdf', '7.pdf', '8.pdf', '9.pdf', '10.pdf', '11.pdf', '12.pdf', '13.pdf', '14.pdf']
after sort:
['1.pdf', '10.pdf', '11.pdf', '12.pdf', '13.pdf', '14.pdf', '2.pdf', '3.pdf', '4.pdf', '5.pdf', '6.pdf', '7.pdf', '8.pdf', '9.pdf']

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment