Lendo txt e reagrupando itens para virar um JSON
A partir do txt a seguir:
https://gist.github.com/rg3915/641e7245908a191f445f54137c9948d2#file-rna-txt
Estou tentando gerar um JSON com o formato abaixo:
[
{
'id': 'GCVF01004444.1.2369',
'reino': 'Bacteria'
'rna': 'CGUGCACGGUGGAUGCCUUGGCAGCCAGAGGCGAUGAAGGACGUUGUAGCCUGCGAUAAGCUCCGGUUAGGUGGCAAACA
ACCGUUUGACCCGGAGAUCUCCGAAUGGGGCAACCCACCCGUUGUAAGGCGGGUAUCACCGACUGAAUCCAUAGGUCGGU
GAGGCGAACGCGGGGAACUGAAACAUCUAAGUACCCGUAGGAACAGAAAUCAAUUGAGAUUCCCUGAGUAGCGGCGAGCG
AACGGGGAUUAGCCCUUAAGCUGAUGACUGAUUAGGAGAACGGUCUGGGAAGGCCGACCAUAGUGGGUGAUAGUCCCGUA
UCCGAAAAUCUGAUUCAGUGAAAACGAGUAGGUCGGGGCACGUGUAACCUUGACUGAACAUGGGGGGACCAUCCUCCAAG
GCUAAAUACUCCUGGCUGACCGAUAGUGAACCAGUACCGUGAGGGAAAGGCGAAAAGAACCCCGGAGAGGGGAGUGAAAU
AGAUCCUGAAACCGUGCACGUACAAGCAGUCGGAGCCCGCUUUGUUGGGUGACGGCGUACCUUUUGUAUAAUGGGUCAGC
GACUUAUUCUCAGUAGCGAGGUUAACCAUCUAGGGGAGCCGUAGGGAAACCGAGUCUGAAUAGGGCGUUGAGUUGCUGGG
CCGUUUGACCCGGAGAUCUCCGAAUGGGGCAACCCACCCGUUGUAAGGCGGGUAUCACCGACUGAAUCCAUAGGUCGGU
GAGGCGAACGCGGGGAACUGAAACAUCUAAGUACCCGUAGGAACAGAAAUCAAUUGAGAUUCCCUGAGUAGCGGCGAGCG
AACGGGGAUUAGCCCUUAAGCUGAUGACUGAUUAGGAGAACGGUCUGGGAAGGCCGACCAUAGUGGGUGAUAGUCCCGUA
UCCGAAAAUCUGAUUCAGUGAAAACGAGUAGGUCGGGGCACGUGUAACCUUGACUGAACAUGGGGGGACCAUCCUCCAAG
GCUAAAUACUCCUGGCUGACCGAUAGUGAACCAGUACCGUGAGGGAAAGGCGAAAAGAACCCCGGAGAGGGGAGUGAAAU
AGAUCCUGAAACCGUGCACGUACAAGCAGUCGGAGCCCGCUUUGUUGGGUGACGGCGUACCUUUUGUAUAAUGGGUCAGC
GACUUAUUCUCAGUAGCGAGGUUAACCAUCUAGGGGAGCCGUAGGGAAACCGAGUCUGAAUAGGGCGUUGAGUUGCUGGG'
},
{
'id': 'GCVF77777777.1.1963',
'reino': 'Bacteria',
'rna': 'GCGCAAACGGUGGAUGCCUAGGCAGUAAGAGGCGAUGAAGGACGUGGAAUCCUGCGAAAAGCUAUGGUGAGCUGGAAACA
AGCGCUGAGCCGUAGAUGUCCGAAUGGGGAAACCCGGCCAUAUGCAGAUAUGGUCACUCAUAAGUGAAUACAUAGGUUAU
GAGGGCGAACUCGGGGAACUGAAACAUCUAAGUACCCGAAGGAAAAGAAAUCAAACGAGAUUCCCUAAGUAGCGGCGAGC
GAACGGGGAGGAGCCUGGUGUGAUAUAGGUAAGAACUAAGUGGAAGCAACUGGAAAGUUGAGACAUAGAGGGUGAUAUCC
CCGUACACGAAGAGACUGCUGGAACUAAGCACACGAACAAGUAGGUCGGAACACGAGAAAUUCUGAUUGAAUAUGGGUGG
ACCAUCAUCCAAGGCUAAAUACUCCUUACUGACCGAUAGUGAACCAGUACCGUGAGGGAAAGGUGAAAAGAACCCCGGAG
AGGGGAGUGAAAUAGAUCCUGAAACCGUUUGCGUACAAGCAGUGGGAGCAUGGGCUUAGGCUUCGUGUGACUGCGUACCU
UUUGUAUAAUGGGUCAGCGAGUUACUUUCAGUGGCGAGGUUAACAAAGAAGGAAGCCGUAGAGAAAUCGAGUCUUAAAAG
GGCGCGAGUCGCUGGGAGUAGACCCGAAACCGGGCGAUCUAGCCAUGUCCAGGAUGAAGGUUGGGUAACACCAAGUGGAG
GUCCGAACCGGGUAAUGUUGAAAAAUUAUCGGAUGAGGUGUGGCUAGGAGUGAAAGGCUAAUCAAGCCCGGAGAUAGCUG
CCGUUUGACCCGGAGAUCUCCGAAUGGGGCAACCCACCCGUUGUAAGGCGGGUAUCACCGACUGAAUCCAUAGGUCGGU
GAGGCGAACGCGGGGAACUGAAACAUCUAAGUACCCGUAGGAACAGAAAUCAAUUGAGAUUCCCUGAGUAGCGGCGAGCG
AACGGGGAUUAGCCCUUAAGCUGAUGACUGAUUAGGAGAACGGUCUGGGAAGGCCGACCAUAGUGGGUGAUAGUCCCGUA
UCCGAAAAUCUGAUUCAGUGAAAACGAGUAGGUCGGGGCACGUGUAACCUUGACUGAACAUGGGGGGACCAUCCUCCAAG
GCUAAAUACUCCUGGCUGACCGAUAGUGAACCAGUACCGUGAGGGAAAGGCGAAAAGAACCCCGGAGAGGGGAGUGAAAU
AGAUCCUGAAACCGUGCACGUACAAGCAGUCGGAGCCCGCUUUGUUGGGUGACGGCGUACCUUUUGUAUAAUGGGUCAGC
GACUUAUUCUCAGUAGCGAGGUUAACCAUCUAGGGGAGCCGUAGGGAAACCGAGUCUGAAUAGGGCGUUGAGUUGCUGGG'
},
]
Vejam o código que eu tentei até aqui
https://gist.github.com/rg3915/641e7245908a191f445f54137c9948d2#file-rna-ipynb
Mas eu parei nesse ponto, onde não consegui continuar
[{'id': 'GCVF02004444.1.2369', 'reino': 'Bacteria'},
'CGUGCACGGUGGAUGCCUUGGCAGCCAGAGGCGAUGAAGGACGUUGUAGCCUGCGAUAAGCUCCGGUUAGGUGGCAAACA',
'ACCGUUUGACCCGGAGAUCUCCGAAUGGGGCAACCCACCCGUUGUAAGGCGGGUAUCACCGACUGAAUCCAUAGGUCGGU',
'GAGGCGAACGCGGGGAACUGAAACAUCUAAGUACCCGUAGGAACAGAAAUCAAUUGAGAUUCCCUGAGUAGCGGCGAGCG',
'AACGGGGAUUAGCCCUUAAGCUGAUGACUGAUUAGGAGAACGGUCUGGGAAGGCCGACCAUAGUGGGUGAUAGUCCCGUA',
'UCCGAAAAUCUGAUUCAGUGAAAACGAGUAGGUCGGGGCACGUGUAACCUUGACUGAACAUGGGGGGACCAUCCUCCAAG',
'GCUAAAUACUCCUGGCUGACCGAUAGUGAACCAGUACCGUGAGGGAAAGGCGAAAAGAACCCCGGAGAGGGGAGUGAAAU',
'AGAUCCUGAAACCGUGCACGUACAAGCAGUCGGAGCCCGCUUUGUUGGGUGACGGCGUACCUUUUGUAUAAUGGGUCAGC',
'GACUUAUUCUCAGUAGCGAGGUUAACCAUCUAGGGGAGCCGUAGGGAAACCGAGUCUGAAUAGGGCGUUGAGUUGCUGGG',
'CCGUUUGACCCGGAGAUCUCCGAAUGGGGCAACCCACCCGUUGUAAGGCGGGUAUCACCGACUGAAUCCAUAGGUCGGU',
'GAGGCGAACGCGGGGAACUGAAACAUCUAAGUACCCGUAGGAACAGAAAUCAAUUGAGAUUCCCUGAGUAGCGGCGAGCG',
'AACGGGGAUUAGCCCUUAAGCUGAUGACUGAUUAGGAGAACGGUCUGGGAAGGCCGACCAUAGUGGGUGAUAGUCCCGUA',
'UCCGAAAAUCUGAUUCAGUGAAAACGAGUAGGUCGGGGCACGUGUAACCUUGACUGAACAUGGGGGGACCAUCCUCCAAG',
'GCUAAAUACUCCUGGCUGACCGAUAGUGAACCAGUACCGUGAGGGAAAGGCGAAAAGAACCCCGGAGAGGGGAGUGAAAU',
'AGAUCCUGAAACCGUGCACGUACAAGCAGUCGGAGCCCGCUUUGUUGGGUGACGGCGUACCUUUUGUAUAAUGGGUCAGC',
'GACUUAUUCUCAGUAGCGAGGUUAACCAUCUAGGGGAGCCGUAGGGAAACCGAGUCUGAAUAGGGCGUUGAGUUGCUGGG',
{'id': 'GCVF02004444.1.2369', 'reino': 'Bacteria'},
'GCGCAAACGGUGGAUGCCUAGGCAGUAAGAGGCGAUGAAGGACGUGGAAUCCUGCGAAAAGCUAUGGUGAGCUGGAAACA',
'AGCGCUGAGCCGUAGAUGUCCGAAUGGGGAAACCCGGCCAUAUGCAGAUAUGGUCACUCAUAAGUGAAUACAUAGGUUAU',
'GAGGGCGAACUCGGGGAACUGAAACAUCUAAGUACCCGAAGGAAAAGAAAUCAAACGAGAUUCCCUAAGUAGCGGCGAGC',
'GAACGGGGAGGAGCCUGGUGUGAUAUAGGUAAGAACUAAGUGGAAGCAACUGGAAAGUUGAGACAUAGAGGGUGAUAUCC',
'CCGUACACGAAGAGACUGCUGGAACUAAGCACACGAACAAGUAGGUCGGAACACGAGAAAUUCUGAUUGAAUAUGGGUGG',
'ACCAUCAUCCAAGGCUAAAUACUCCUUACUGACCGAUAGUGAACCAGUACCGUGAGGGAAAGGUGAAAAGAACCCCGGAG',
'AGGGGAGUGAAAUAGAUCCUGAAACCGUUUGCGUACAAGCAGUGGGAGCAUGGGCUUAGGCUUCGUGUGACUGCGUACCU',
'UUUGUAUAAUGGGUCAGCGAGUUACUUUCAGUGGCGAGGUUAACAAAGAAGGAAGCCGUAGAGAAAUCGAGUCUUAAAAG',
'GGCGCGAGUCGCUGGGAGUAGACCCGAAACCGGGCGAUCUAGCCAUGUCCAGGAUGAAGGUUGGGUAACACCAAGUGGAG',
'GUCCGAACCGGGUAAUGUUGAAAAAUUAUCGGAUGAGGUGUGGCUAGGAGUGAAAGGCUAAUCAAGCCCGGAGAUAGCUG',
'CCGUUUGACCCGGAGAUCUCCGAAUGGGGCAACCCACCCGUUGUAAGGCGGGUAUCACCGACUGAAUCCAUAGGUCGGU',
'GAGGCGAACGCGGGGAACUGAAACAUCUAAGUACCCGUAGGAACAGAAAUCAAUUGAGAUUCCCUGAGUAGCGGCGAGCG',
'AACGGGGAUUAGCCCUUAAGCUGAUGACUGAUUAGGAGAACGGUCUGGGAAGGCCGACCAUAGUGGGUGAUAGUCCCGUA',
'UCCGAAAAUCUGAUUCAGUGAAAACGAGUAGGUCGGGGCACGUGUAACCUUGACUGAACAUGGGGGGACCAUCCUCCAAG',
'GCUAAAUACUCCUGGCUGACCGAUAGUGAACCAGUACCGUGAGGGAAAGGCGAAAAGAACCCCGGAGAGGGGAGUGAAAU',
'AGAUCCUGAAACCGUGCACGUACAAGCAGUCGGAGCCCGCUUUGUUGGGUGACGGCGUACCUUUUGUAUAAUGGGUCAGC',
'GACUUAUUCUCAGUAGCGAGGUUAACCAUCUAGGGGAGCCGUAGGGAAACCGAGUCUGAAUAGGGCGUUGAGUUGCUGGG',
{'id': 'GCVF02004444.1.2369', 'reino': 'Bacteria'},]