Skip to content

Instantly share code, notes, and snippets.

@JEEN
Created September 29, 2014 08:38
Show Gist options
  • Save JEEN/26331164e99e3d74d7a3 to your computer and use it in GitHub Desktop.
Save JEEN/26331164e99e3d74d7a3 to your computer and use it in GitHub Desktop.
Pick Something
use strict;
use warnings;
use Lingua::Sentence;
use Data::Printer;
open my $fh, "<", $ARGV[0] or die "OH FILE!!";
my $body = do { local $/; <$fh> };
my $keyword = $ARGV[1];
die "OH KEYWORD" unless $keyword;
#$body =~ s/ / /g;
#$body =~ s/\n+/\n/g;
#my ($date_of_name_change) = $body =~ /DATE OF NAME CHANGE:[\t\s]+([0-9-]+)/gmsi;
#print $accession_number."\n";
#print $date_of_name_change."\n";
my $splitter = Lingua::Sentence->new("en");
my @arr = $splitter->split_array($body);
my $idx = 0;
my ($accession_number) = $body =~ /ACCESSION NUMBER:[\t\s]+([0-9-]+)/gmsi;
my @sentences = ();
for my $row (@arr) {
if ($row =~ /$keyword/) {
my @target_arr = @arr[$idx-1 .. $idx+1];
if (grep { m{\$[0-9,\.]+ ?(?:million|billion|trillion)?}; } @target_arr) {
push @sentences, \@target_arr;
}
}
++$idx;
}
my $count = 0;
for my $r (@sentences) {
for my $sentence (@{ $r }) {
my (@amounts) = $sentence =~ /(\$[0-9,\.]+ ?(?:million|billion|trillion)?)/;
for my $amount (@amounts) {
print join("\t", $accession_number, ++$count, $amount, $sentence)."\n";
}
}
}
@JEEN
Copy link
Author

JEEN commented Sep 29, 2014

perl pick_something.pl 0000796343-09-000007.txt approximately

0000796343-09-000007    1   $6.4 million    The backlog of orders from customers, as of January 16, 2009 and January 18, 2008, was approximately $6.4 million and $13.8 million, respectively.
0000796343-09-000007    2   $100.0 million  As previously disclosed, we plan to invest $100.0 million directly in venture capital, of which, approximately $33.5 million has already been invested.
0000796343-09-000007    3   $86.3 million   The annual base rent expense (including operating expenses, property taxes and assessments, as applicable) for all facilities is currently approximately $86.3 million and is subject to annual adjustments as well as changes in interest rates.
0000796343-09-000007    4   $69.3 million   Revenue in EMEA measured in U.S. dollars was favorably impacted by approximately $69.3 million during fiscal 2008 as compared to fiscal 2007 primarily due to the strength of the Euro against the U.S. dollar.
0000796343-09-000007    5   $13.2 million   $13.2 million.
0000796343-09-000007    6   $39.6 million   Revenue in Asia was favorably impacted by approximately $39.6 million during fiscal 2008 as compared to fiscal 2007 primarily due to the strength of the Yen against the U.S. dollar.
0000796343-09-000007    7   $65.9 million   Revenue in EMEA increased during fiscal 2007 as compared to fiscal 2006 due to the release of localized versions of our CS3 family of products and increases in revenue from the Acrobat Pro products.  Additionally, revenue in EMEA increased approximately $65.9 million due to the strength of the Euro against the U.S. dollar.
0000796343-09-000007    8   $56.4 million   Of this cost, an estimated $56.4 million and $44.8 million during fiscal 2008 and fiscal 2007, respectively, was related to future licensing rights and has been capitalized and will be amortized on a straight-line basis over the estimated useful lives up to fifteen years.
0000796343-09-000007    9   $27.2 million   Of the remaining costs, we estimated that approximately $27.2 million and $15.2 million was related to historical use of licensing rights which was expensed as cost of sales and the residual of $16.8 million for fiscal 2008 was expensed as general and administrative costs.  In connection with these licensing arrangements, we have the ability to acquire additional rights to use technology in the future.
0000796343-09-000007    10  $56.4 million   Of this cost, an estimated $56.4 million and $44.8 million during fiscal 2008 and fiscal 2007, respectively, was related to future licensing rights and has been capitalized and will be amortized on a straight-line basis over the estimated useful lives up to fifteen years.
0000796343-09-000007    11  $27.2 million   Of the remaining costs, we estimated that approximately $27.2 million and $15.2 million during fiscal 2008 and fiscal 2007, respectively, was related to historical use of licensing rights which was expensed as cost of sales and the residual of $16.8 million for fiscal 2008 was expensed as general and administrative costs.
0000796343-09-000007    12  $29.2 million   In connection with this restructuring plan, we recorded restructuring charges totaling $29.2 million related to termination benefits for the elimination of approximately 460 of these full-time positions globally.
0000796343-09-000007    13  $29.2 million   In connection with this restructuring plan, we recorded restructuring charges totaling $29.2 million related to termination benefits for the elimination of approximately 460 of these full-time positions globally.
0000796343-09-000007    14  $0.4 million    As of November 28, 2008, $0.4 million was paid.
0000796343-09-000007    15  $10.0 million   In fiscal 2009, we expect to record approximately $10.0 million to $13.0 million primarily related to the consolidation of leased facilities and approximately $6.0 million to $7.0 million related to employee severance arrangements for the elimination of approximately 100 of the remaining full-time positions globally.
0000796343-09-000007    16  $1.3 million    with a write-down for an other-than-temporary impairment totaling approximately $1.3 million during the second quarter of fiscal 2008.
0000796343-09-000007    17  $100.0 million  As previously disclosed, we plan to invest $100.0 million directly in venture capital, of which, approximately $33.5 million has already been invested.
0000796343-09-000007    18  $29.2 million   In connection with this restructuring plan, we recorded restructuring charges totaling $29.2 million related to termination benefits for the elimination of approximately 460 of these full-time positions globally.
0000796343-09-000007    19  $29.2 million   In connection with this restructuring plan, we recorded restructuring charges totaling $29.2 million related to termination benefits for the elimination of approximately 460 of these full-time positions globally.
0000796343-09-000007    20  $0.4 million    As of November 28, 2008, $0.4 million was paid.
0000796343-09-000007    21  $10.0 million   In fiscal 2009, we expect to record approximately $10.0 million to $13.0 million primarily related to the consolidation of leased facilities and approximately $6.0 million to $7.0 million related to employee severance arrangements for the elimination of approximately 100 of the remaining full-time positions globally.
0000796343-09-000007    22  $134.7  As of November 28, 2008 and November 30, 2007, approximately $134.7 million and $422.6 million, respectively, of up-front payments remained under the agreements.
0000796343-09-000007    23  $37.15. During fiscal 2008, we repurchased 31.9 million shares under these structured agreements at an average price of $37.15.
0000796343-09-000007    24  $40.50  During fiscal 2007, we repurchased 17.7 million shares under these structured agreements at an average price of $40.50 and approximately $133.7 million of up-front payments remained under these agreements as of November 30, 2007.
0000796343-09-000007    25  $4.3 million    During fiscal 2008, we completed one business combination for cash consideration of approximately $4.3 million.  This acquisition was not material to our consolidated balance sheet and results of operations.
0000796343-09-000007    26  $186.9 million  Thus, approximately $186.9 million, included in the net tangible assets, was established as a deferred tax liability for the future amortization of the intangible assets.
0000796343-09-000007    27  $237.8 million  In accordance with SFAS No. 109, “Accounting for Income Taxes,” the valuation allowance on Macromedia’s financial statements as of December 3, 2005 was reduced by $237.8 million to $16.1 million, to the extent the deferred tax assets are more likely than not realizable.
0000796343-09-000007    28  $63.0 million   In addition to the acquisition of Macromedia, during fiscal 2006, we completed three business combinations and five asset acquisitions for cash consideration of approximately $63.0 million.
0000796343-09-000007    29  $56.4 million   Of this cost, an estimated $56.4 million and $44.8 million during fiscal 2008 and fiscal 2007, respectively, was related to future licensing rights and has been capitalized and will be amortized on a straight-line basis over the estimated useful lives up to fifteen years.
0000796343-09-000007    30  $27.2 million   Of the remaining costs, we estimated that approximately $27.2 million and $15.2 million
0000796343-09-000007    31  $1.1 billion    paid on these earnings.  As of November 28, 2008, the cumulative amount of earnings upon which U.S. income taxes have not been provided is approximately $1.1 billion.
0000796343-09-000007    32  $342.0 million  The unrecognized deferred tax liability for these earnings is approximately $342.0 million.
0000796343-09-000007    33  $1.1 billion    paid on these earnings.  As of November 28, 2008, the cumulative amount of earnings upon which U.S. income taxes have not been provided is approximately $1.1 billion.
0000796343-09-000007    34  $342.0 million  The unrecognized deferred tax liability for these earnings is approximately $342.0 million.
0000796343-09-000007    35  $19.0 million   As of November 28, 2008, we have net operating loss carryforward assets of approximately $19.0 million for federal, $7.7 million for state and $1.3 million related to foreign net operating losses.
0000796343-09-000007    36  $8.6 million    We also have federal, state and foreign tax credit carryforwards of approximately $8.6 million, $10.9 million and $3.5 million, respectively.  The net operating loss carryforward assets, federal tax credits and foreign tax credits will expire in various years from fiscal 2014 through 2029.
0000796343-09-000007    37  $19.0 million   As of November 28, 2008, we have net operating loss carryforward assets of approximately $19.0 million for federal, $7.7 million for state and $1.3 million related to foreign net operating losses.
0000796343-09-000007    38  $8.6 million    We also have federal, state and foreign tax credit carryforwards of approximately $8.6 million, $10.9 million and $3.5 million, respectively.  The net operating loss carryforward assets, federal tax credits and foreign tax credits will expire in various years from fiscal 2014 through 2029.
0000796343-09-000007    39  $42.8   As of December 1, 2007, the combined amount of accrued interest and penalties related to tax positions taken on our tax returns and included in non-current income taxes payable was approximately $42.8 million.
0000796343-09-000007    40  $15.3       As of November 28, 2008, the combined amount of accrued interest and penalties related to tax positions taken on our tax returns and included in non-current income taxes payable was approximately $15.3 million.
0000796343-09-000007    41  $29.2 million   In connection with this restructuring plan, we recorded restructuring charges totaling $29.2 million related to termination benefits for the elimination of approximately 460 of the 560 full-time positions globally.
0000796343-09-000007    42  $29.2 million   In connection with this restructuring plan, we recorded restructuring charges totaling $29.2 million related to termination benefits for the elimination of approximately 460 of the 560 full-time positions globally.
0000796343-09-000007    43  $0.4 million    As of November 28, 2008, $0.4 million was paid.
0000796343-09-000007    44  $134.7  As of November 28, 2008 and November 30, 2007, approximately $134.7 million and $422.6 million, respectively, of up-front payments remained under the agreements.
0000796343-09-000007    45  $850.0 million  During fiscal 2007, we provided prepayments of $850.0 million under structured share repurchase agreements to large financial institutions.
0000796343-09-000007    46  $40.50  During fiscal 2007, we repurchased 17.7 million shares under these structured agreements at an average price of $40.50 and approximately $133.7 million of up-front payments remained under these agreements as of November 30, 2007.
0000796343-09-000007    47  $37.07,     For fiscal 2008, 2007 and 2006, options to purchase approximately 16.5 million, 10.4 million and 17.7 million shares, respectively, of common stock with exercise prices greater than the annual average fair market value of our stock of $37.07, $41.77 and $35.32, respectively, were not included in the calculation because the effect would have been anti-dilutive.
0000796343-09-000007    48  $143.2 million  Under the agreement for the East and West Towers and the agreement for the Almaden Tower, we have the option to purchase the buildings at anytime during the lease term for approximately $143.2 million and $103.6 million, respectively.
0000796343-09-000007    49  $126.8 million  The residual value guarantees under the East and West Towers and the Almaden Tower obligations are $126.8 million and $89.4 million, respectively.
0000796343-09-000007    50  $47.8 million   Royalty expense, which was recorded under our cost of products revenue on our consolidated statements of income, was approximately $47.8 million, $37.4 million and $19.1 million in fiscal 2008, 2007 and 2006, respectively.

@wyim-pgl
Copy link

1,2,4 ,7,8,9,10 번 같은 경우는 "consolidated" 단어가 없는데도 뽑혀 나오는데 어찌된일인지 알수 있겠스미?

perl html_convert.pl 0000796343-09-000007.txt consolidated
0000796343-09-000007 1 $5.2 million As such, we recognized $5.2 million and $3.0 million in liabilities, related to the extended East and West Towers and Almaden Tower leases, respectively.
0000796343-09-000007 2 $35.0 million Specifically, there was a reclassification totaling $35.0 million from purchased intangibles to long-term and short-term other assets.  See Notes 5 and 6 for additional information regarding this reclassification.
0000796343-09-000007 3 $4.3 million During fiscal 2008, we completed one business combination for cash consideration of approximately $4.3 million.  This acquisition was not material to our consolidated balance sheet and results of operations.
0000796343-09-000007 4 $77.0 million During fiscal 2007, we completed two business combinations and one asset acquisition for cash consideration of $77.0 million.
0000796343-09-000007 5 $1.5 million Related to the acquisition that occurred during the second quarter of fiscal 2007, $1.5 million of in-process research and development was included in our amortization of purchased intangibles on our consolidated statements of income.
0000796343-09-000007 6 $1.5 million Related to the acquisition that occurred during the second quarter of fiscal 2007, $1.5 million of in-process research and development was included in our amortization of purchased intangibles on our consolidated statements of income.
0000796343-09-000007 7 $63.0 million In addition to the acquisition of Macromedia, during fiscal 2006, we completed three business combinations and five asset acquisitions for cash consideration of approximately $63.0 million.
0000796343-09-000007 8 $55.5 million Specifically, we reclassified $55.5 million of cost and $20.5 million of accumulated amortization ($35.0 million, net) from purchased intangibles to long-term and short-term other assets associated with certain technology license arrangements.
0000796343-09-000007 9 $35.0 million Specifically, there was a reclassification associated with certain technology licensing arrangements totaling $35.0 million, net from purchased intangibles of which $28.7 million and $4.7 million were reclassified to acquired rights to use technology and long-term prepaid royalties, respectively.
0000796343-09-000007 10 $5.2 million As such, we recognized $5.2 million and $3.0 million in liabilities, related to the extended East and West Towers and Almaden Tower leases, respectively.
0000796343-09-000007 11 $3.9 The adoption of FIN 48 resulted in an increase of $3.9 million to both assets and unrecognized tax benefits in our consolidated balance sheet as of the beginning of fiscal 2008.
0000796343-09-000007 12 $218.4 Upon adoption, the gross liability for unrecognized tax benefits at December 1, 2007 was $218.4 million, exclusive of interest and penalties.
0000796343-09-000007 13 $3.9 Thus, we recognized additional deferred income tax assets of $3.9 million to present the unrecognized tax benefits as gross amounts on our consolidated balance sheet.
0000796343-09-000007 14 $32.1 million In fiscal 2008, we recorded restructuring charges totaling $32.1 million of which $29.2 related to fiscal 2008 restructuring charges and $2.9 million related to changes in estimates associated with pre-existing facilities accruals for the Macromedia acquisition.
0000796343-09-000007 15 $13.1 million Accrued restructuring charges of $13.1 million at November 28, 2008 includes $6.9 million recorded in accrued restructuring, current and $6.2 million, related to long-term facilities obligations, recorded in accrued restructuring, non-current in the accompanying consolidated balance sheets.
0000796343-09-000007 16 $17.7 million Accrued restructuring charges of $17.7 million at November 30, 2007 included $3.7 million recorded in accrued restructuring, current and $14.0 million, related to long-term facilities obligations, recorded in accrued restructuring, non-current in the accompanying consolidated balance sheets.
0000796343-09-000007 17 $31.4 million At December 1, 2006, accrued restructuring charges of $31.4 million included $9.8 million recorded in accrued restructuring, current and $21.6 million, related to long-term facilities obligations, recorded in accrued restructuring, non-current in the accompanying consolidated balance sheets.
0000796343-09-000007 18 $0.3 million Accrued restructuring charges as of December 1, 2006 included $0.3 million recorded in accrued restructuring, current and $0.3 million, related to long-term facilities obligations recorded in accrued restructuring, non-current in the accompanying consolidated balance sheets.
0000796343-09-000007 19 $126.8 million As part of the lease extensions, we purchased the lease receivable from the lessor of the East and West Towers for $126.8 million and a portion of the lease receivable from the lessor of the Almaden Tower for $80.4 million, both of which are recorded as investments in lease receivables on our consolidated balance sheet.
0000796343-09-000007 20 $5.2 million As such, we recognized $5.2 million and $3.0 million in liabilities, related to the extended East and West Towers and Almaden Tower leases, respectively.
0000796343-09-000007 21 $47.8 million Royalty expense, which was recorded under our cost of products revenue on our consolidated statements of income, was approximately $47.8 million, $37.4 million and $19.1 million in fiscal 2008, 2007 and 2006, respectively.
0000796343-09-000007 22 $350.0 As of November 28, 2008 and November 30, 2007, the amount outstanding under this credit facility was $350.0 million and zero, respectively, which is included in long-term liabilities on our consolidated balance sheet.

@JEEN
Copy link
Author

JEEN commented Dec 4, 2014

어... 이런 걸 언제 남겼대...

@JEEN
Copy link
Author

JEEN commented Dec 4, 2014

43라인에서 if ($row =~ /$keyword/) { 체크를 한번 더 해야할 듯

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment